report: small review before my vaccation

68159ecc · Andreas Schärtl · 2793be39 · 68159ecc · 68159ecc · 68159ecc
Commit 68159ecc authored Aug 20, 2020 by Andreas Schärtl
--- a/doc/report/abstract.tex
+++ b/doc/report/abstract.tex
 Organizational data extracted from formal libraries has the potential
 to be usable in the design of a universal search engine for
 mathematical knowledge. However, it is not enough to just extract
-formal knowledge into a unified format, it is also necessary that this
-information is readily available for querying. \emph{ulo-storage} aims
-to lay out the groundwork to do just that. In this project, we
+formal knowledge into a unified format, it is also necessary for this
+information to be readily available for querying. \emph{ulo-storage}
+aims to lay out the groundwork to do just that. In this project, we
 collected various pieces of exported data into a centralized and
 efficient store, made that store available as a publicly available
 endpoint and then evaluated different ways of querying that store. In

--- a/doc/report/implementation.tex
+++ b/doc/report/implementation.tex
@@ -51,12 +51,12 @@ We previously described Collecter and Importer as two distinct
 components.  The Collecter pulls RDF data from various sources as an
 input and outputs a stream of standardized RDF data while the Importer
 takes such a stream of RDF data and then dumps it to some sort of
-persistent storage.  However in the implementation for
-\emph{ulo-storage}, both Collecter and Importer ended up being one
-piece of monolithic software. This does not need to be the case but
-proved convenient because (1)~combining Collecter and Importer forgoes
-the needs for an additional IPC~mechanism and (2)~neither Collecter
-nor Importer are terribly large pieces of software.
+persistent storage.  In the implementation for \emph{ulo-storage},
+both Collecter and Importer ended up being one piece of monolithic
+software. This does not need to be the case but proved convenient
+because (1)~combining Collecter and Importer forgoes the needs for an
+additional IPC~mechanism and (2)~neither Collecter nor Importer are
+terribly large pieces of software in themselves.

 Our implementation supports two sources for RDF files, namely Git
 repositories and the local file system. The file system Collecter
@@ -65,7 +65,7 @@ RDF~XMl~files~\cite{rdfxml} while the Git Collecter first clones a Git
 repository and then passes the checked out working copy to the file
 system Collecter. Because it is not uncommon for RDF files to be
 compressed, our Collecter supports on the fly extraction of
-Gzip~\cite{gzip} and XZ~\cite{xz} formats which can greatly reduce the
+gzip~\cite{gzip} and xz~\cite{xz} formats which can greatly reduce the
 required disk space in the collection step.

 During development of the Collecter, we found that existing exports
@@ -74,13 +74,13 @@ which were not discovered previously. In particular, both Isabelle and
 Coq export contained URIs which do not fit the official syntax
 specification~\cite{rfc3986} as they contained illegal
 characters. Previous work~\cite{ulo} that processed Coq and Isabelle
-exports used database software such as Virtuoso Open Source which does
+exports used database software such as Virtuoso Open Source which do
 not properly check URIs according to spec, in consequence these faults
 were only discovered now. To tackle these problems, we introduced on
 the fly correction steps during collection that take the broken RDF
 files, fix the mentioned problems related to URIs (by escaping illegal
 characters) and then continue processing. Of course this is only a
-work-around; related bug reports were filed in the respective export
+work-around. Related bug reports were filed in the respective export
 projects to ensure that in the future this extra step is not
 necessary.

@@ -97,10 +97,16 @@ import itself is straight-forward, our software only needs to upload
 the RDF file stream as-is to an HTTP endpoint provided by our GraphDB
 instance.

+\emph{({TODO}: Write down a small comparison of different database
+  types, triplet stores and implementations. Honestly the main
+  advantage of GraphDB is that it's easy to set up and import to;
+  maybe I'll also write an Importer for another DB to show that the
+  choice of database is not that important.)}
+
 \subsubsection{Scheduling and Version Management}

 Collecter and Importer were implemented as library code that can be
-called in various front ends. For this project, we provide both a
+called from various front ends. For this project, we provide both a
 command line interface as well as a graphical web front end. While the
 command line interface is only useful for manually starting single
 jobs, the web interface allows scheduling of jobs. In particular, it

--- a/doc/report/introduction.tex
+++ b/doc/report/introduction.tex
@@ -59,5 +59,5 @@ interface. While the applications themselves are admittedly not very
 useful in itself, they can give us insight about future development of
 the upper level ontology. These applications and queries are the focus
 of Section~\ref{sec:applications}. A summary of encountered problems
-and suggestions for next step concludes this report in
+and suggestions for next steps concludes this report in
 Section~\ref{sec:conclusion}.
--- a/doc/report/report.tex
+++ b/doc/report/report.tex
@@ -51,11 +51,8 @@

 \newpage
 \input{introduction.tex}
-\newpage
 \input{implementation.tex}
-\newpage
 \input{applications.tex}
-\newpage
 \input{conclusion.tex}

 \newpage