Skip to content
Snippets Groups Projects
Commit 68159ecc authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: small review before my vaccation

parent 2793be39
No related branches found
No related tags found
No related merge requests found
Organizational data extracted from formal libraries has the potential
to be usable in the design of a universal search engine for
mathematical knowledge. However, it is not enough to just extract
formal knowledge into a unified format, it is also necessary that this
information is readily available for querying. \emph{ulo-storage} aims
to lay out the groundwork to do just that. In this project, we
formal knowledge into a unified format, it is also necessary for this
information to be readily available for querying. \emph{ulo-storage}
aims to lay out the groundwork to do just that. In this project, we
collected various pieces of exported data into a centralized and
efficient store, made that store available as a publicly available
endpoint and then evaluated different ways of querying that store. In
......
......@@ -51,12 +51,12 @@ We previously described Collecter and Importer as two distinct
components. The Collecter pulls RDF data from various sources as an
input and outputs a stream of standardized RDF data while the Importer
takes such a stream of RDF data and then dumps it to some sort of
persistent storage. However in the implementation for
\emph{ulo-storage}, both Collecter and Importer ended up being one
piece of monolithic software. This does not need to be the case but
proved convenient because (1)~combining Collecter and Importer forgoes
the needs for an additional IPC~mechanism and (2)~neither Collecter
nor Importer are terribly large pieces of software.
persistent storage. In the implementation for \emph{ulo-storage},
both Collecter and Importer ended up being one piece of monolithic
software. This does not need to be the case but proved convenient
because (1)~combining Collecter and Importer forgoes the needs for an
additional IPC~mechanism and (2)~neither Collecter nor Importer are
terribly large pieces of software in themselves.
Our implementation supports two sources for RDF files, namely Git
repositories and the local file system. The file system Collecter
......@@ -65,7 +65,7 @@ RDF~XMl~files~\cite{rdfxml} while the Git Collecter first clones a Git
repository and then passes the checked out working copy to the file
system Collecter. Because it is not uncommon for RDF files to be
compressed, our Collecter supports on the fly extraction of
Gzip~\cite{gzip} and XZ~\cite{xz} formats which can greatly reduce the
gzip~\cite{gzip} and xz~\cite{xz} formats which can greatly reduce the
required disk space in the collection step.
During development of the Collecter, we found that existing exports
......@@ -74,13 +74,13 @@ which were not discovered previously. In particular, both Isabelle and
Coq export contained URIs which do not fit the official syntax
specification~\cite{rfc3986} as they contained illegal
characters. Previous work~\cite{ulo} that processed Coq and Isabelle
exports used database software such as Virtuoso Open Source which does
exports used database software such as Virtuoso Open Source which do
not properly check URIs according to spec, in consequence these faults
were only discovered now. To tackle these problems, we introduced on
the fly correction steps during collection that take the broken RDF
files, fix the mentioned problems related to URIs (by escaping illegal
characters) and then continue processing. Of course this is only a
work-around; related bug reports were filed in the respective export
work-around. Related bug reports were filed in the respective export
projects to ensure that in the future this extra step is not
necessary.
......@@ -97,10 +97,16 @@ import itself is straight-forward, our software only needs to upload
the RDF file stream as-is to an HTTP endpoint provided by our GraphDB
instance.
\emph{({TODO}: Write down a small comparison of different database
types, triplet stores and implementations. Honestly the main
advantage of GraphDB is that it's easy to set up and import to;
maybe I'll also write an Importer for another DB to show that the
choice of database is not that important.)}
\subsubsection{Scheduling and Version Management}
Collecter and Importer were implemented as library code that can be
called in various front ends. For this project, we provide both a
called from various front ends. For this project, we provide both a
command line interface as well as a graphical web front end. While the
command line interface is only useful for manually starting single
jobs, the web interface allows scheduling of jobs. In particular, it
......
......@@ -59,5 +59,5 @@ interface. While the applications themselves are admittedly not very
useful in itself, they can give us insight about future development of
the upper level ontology. These applications and queries are the focus
of Section~\ref{sec:applications}. A summary of encountered problems
and suggestions for next step concludes this report in
and suggestions for next steps concludes this report in
Section~\ref{sec:conclusion}.
......@@ -51,11 +51,8 @@
\newpage
\input{introduction.tex}
\newpage
\input{implementation.tex}
\newpage
\input{applications.tex}
\newpage
\input{conclusion.tex}
\newpage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment