report: review intro

3ac4cf5b · Andreas Schärtl · c7f44916 · 3ac4cf5b
Commit 3ac4cf5b authored 4 years ago by Andreas Schärtl
--- a/doc/report/intro.tex
+++ b/doc/report/intro.tex
@@ -21,10 +21,10 @@ Tetrapodal search aims to provide a unified search engine that indexes
 each of the four different subsets of mathematical knowledge.  Because
 all four kinds of knowledge are inherently different in their
 structure, tetrapodal search proposes that each kind of mathematical
-knowledge should be made available in a storage backend that fits
+knowledge should be made available in a storage backend that fits the
-exactly with the kind of data it is providing. With all four areas
+kind of data it is providing. With all four areas available for
-available for querying, tetrapodal search intends to then combine
+querying, tetrapodal search intends to then combine the four indexes
-the four indexes into a single query interface.
+into a single query interface.
 \subsection{Focus on Organizational Knowledge}
@@ -38,7 +38,7 @@ A previously proposed way to structure such organizational data is the
 OWL~ontology~\cite{uloonto} and as such all organization information
 is stored as RDF~triplets with a unified schema of
 ULO~predicates~\cite{owl}.  Some effort has been made to export
-existing databases of formal mathematical knowledge to {ULO}. In
+existing databases of formal mathematical knowledge to {ULO}, in
 particular, there exist exports from Isabelle and Coq
 libraries~\cite{uloisabelle, ulocoq}. The resulting data set is
 already quite large, the Isabelle export alone containing more than
@@ -53,72 +53,79 @@ accessible for querying and analysis. We collected RDF files spread
 over different Git repositories, imported them into a database and
 then experimented with APIs for accessing that data set.
-The main contribution of this project is twofold. First, (1) we built
+The main contribution of \emph{ulo-storage} is twofold. First, (1)~we
-up various infrastructure components that can make up building blocks
+built up various infrastructure components that can make up building
-in a larger tetrapodal search system. Second, (2)~we ran sample
+blocks in a larger tetrapodal search system. Second, (2)~we ran sample
 prototype applications and queries on top of this interface. While the
-applications themselves are admittedly not very interesting, they can give
+applications themselves are admittedly not very interesting, they can
-us insight about future development of~{ULO}.
+give us insight about future development of the upper level ontology.
 \subsection{Components Implemented for \emph{ulo-storage}}\label{sec:components}
 With RDF files exported and available for download as Git repositories
 on MathHub, we have the goal of making the underlying data available
 for use in applications.  It makes sense to first identify the various
-components that might be involved in such a system. Figure~\ref{fig:components}
+components that might be involved in such a system.
-illustrates all components and their relationships.
+Figure~\ref{fig:components} illustrates all components and their
+relationships.
 \begin{figure}[]\begin{center}
    \includegraphics{figs/components}
    \caption{Components involved in the \emph{ulo-storage} system.}\label{fig:components}
 \end{center}\end{figure}
-\begin{description}
+\begin{itemize}
-\item[Collecter] ULO triplets are present in various locations, be it Git
+\item ULO triplets are present in various locations, be it Git
-  repositories, available on web servers or on local disk.
+  repositories, on web servers or the local disk.  It is the job of a
-  It is the job of a Collecter to assemble these {RDF}~files and
+  \emph{Collecter} to assemble these {RDF}~files and forward them for further
-  forward them for further processing. This may involve cloning a Git
+  processing. This may involve cloning a Git repository or crawling
-  repository or crawling the file system.
+  the file system.
-\item[Importer] With streams of ULO files assembled by the Collecter, this
+  \item With streams of ULO files assembled by the Collecter, this
-  data then gets passed to an Importer. An Importer uploads
+  data then gets passed to an \emph{Importer}. An Importer uploads
-  received RDF~streams into some kind of permanent storage. For
+  RDF~streams into some kind of permanent storage. For
  use in this project, the GraphDB~\cite{graphdb} triplet store was
  a natural fit.
-  In this project, both Collecter and Importer ended up being one piece
+  For this project, both Collecter and Importer ended up being one
-  of software, but this does not have to be the case.
+  piece of monolithic software, but this does not have to be the case.
-\item[Endpoint] Finally, with all triplets stored in a database, an
+\item Finally, with all triplets stored in a database, an
-  Endpoint is where applications access the underlying
+  \emph{Endpoint} is where applications access the underlying
  knowledge base. This does not necessarily need to be any custom
  software, rather the programming interface of the underlying
-  database itself could be understood as an endpoint of its
+  database itself could be understood as an endpoint of its own.
-  own.
  Regardless, some thought can be put into designing an Endpoint as a
  layer that lives between application and database that is more
-  convenient to use than the one provided by the database.
+  convenient to use than the one provided by the database. It comes
-\end{description}
+  down to the programming interface we wish to provide to a developer
+  using this system.
+\end{itemize}
 \subsection{An Additional Harvester Component}
-These are the components realized for \emph{ulo-storage}. However,
+These are the three components realized for
-additionally to these components, one could think of a
+\emph{ulo-storage}. However, additionally to these components, one
-\emph{Harvester} component.  We assumed that the ULO triplets are
+could think of a \emph{Harvester} component.  Above we assumed that
-already available in RDF~format.  This is not necessarily true.  It
+the ULO triplets are already available in RDF~format.  This is not
-might be desirable to automate the export from third party formats to
+necessarily true.  It might be desirable to automate the export from
-ULO and we think this should be the job of a Harvester component.  It
+third party formats to ULO and we think this should be the job of a
-fetches mathematical knowledge from some remote source and then
+Harvester component.  It fetches mathematical knowledge from some
-provides a volatile stream of ULO data to the Collecter, which then
+remote source and then provides a volatile stream of ULO data to the
-passes it to the Importer and so on. The big advantage of such an
+Collecter, which then passes it to the Importer and so on. The big
-approach would be that exports from third party libraries can always
+advantage of such an approach would be that exports from third party
-be up to date and do not have to be initiated manually.
+libraries can always be up to date and do not have to be initiated
+manually. Another advantage of this hypothetical component is that
+running exports through the Harvester involves the whole import chain
+of Collecter and Importer which involves syntax~checking for the
+exported RDF data. Bugs in exporters that produce faulty XML would be
+found earlier in development.
 We did not implement a Harvester for \emph{ulo-storage} but we suggest
 that it is an idea to keep in mind. The components we did implement
 (Collecter, Importer and Endpoint) provide us with an easy and
 automated way of making RDF files ready for use with applications.  In
 this introduction we only wanted to give the reader a general
-understanding in the infrastructure that makes up \emph{ulo-storage};
+understanding in the infrastructure that makes up \emph{ulo-storage},
 the following sections will explain each component in more detail.