Skip to content
Snippets Groups Projects
Commit c44d4786 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: rework structure

The contribution of this project is twofold. So we have two big
chapter for either for those two contribution: (1) Implementation
and (2) Applications.
parent 5df1e0aa
No related branches found
No related tags found
No related merge requests found
\section{Collecter and Importer}\label{sec:collecter}
\emph{TODO}
\section{Conclusion and Future Work}\label{sec:conclusion}
\section{Conclusions and Next Steps}\label{sec:conclusion}
\subsection{An Additional Harvester Component} % copied from introduction.tex
These are the three components realized for
\emph{ulo-storage}. However, additionally to these components, one
could think of a \emph{Harvester} component. Above we assumed that
the ULO triplets are already available in RDF~format. This is not
necessarily true. It might be desirable to automate the export from
third party formats to ULO and we think this should be the job of a
Harvester component. It fetches mathematical knowledge from some
remote source and then provides a volatile stream of ULO data to the
Collecter, which then passes it to the Importer and so on. The big
advantage of such an approach would be that exports from third party
libraries can always be up to date and do not have to be initiated
manually. Another advantage of this hypothetical component is that
running exports through the Harvester involves the whole import chain
of Collecter and Importer which involves syntax~checking for the
exported RDF data. Bugs in exporters that produce faulty XML would be
found earlier in development.
\section{Endpoints}\label{sec:endpoints}
\section{Implementation}\label{sec:implementation}
\subsection{Components Implemented for \emph{ulo-storage}}\label{sec:components}
With RDF files exported and available for download as Git repositories
on MathHub, we have the goal of making the underlying data available
for use in applications. It makes sense to first identify the various
components that might be involved in such a system.
Figure~\ref{fig:components} illustrates all components and their
relationships.
\begin{figure}[]\begin{center}
\includegraphics{figs/components}
\caption{Components involved in the \emph{ulo-storage} system.}\label{fig:components}
\end{center}\end{figure}
\begin{itemize}
\item ULO triplets are present in various locations, be it Git
repositories, on web servers or the local disk. It is the job of a
\emph{Collecter} to assemble these {RDF}~files and forward them for further
processing. This may involve cloning a Git repository or crawling
the file system.
\item With streams of ULO files assembled by the Collecter, this
data then gets passed to an \emph{Importer}. An Importer uploads
RDF~streams into some kind of permanent storage. For
use in this project, the GraphDB~\cite{graphdb} triplet store was
a natural fit.
For this project, both Collecter and Importer ended up being one
piece of monolithic software, but this does not have to be the case.
\item Finally, with all triplets stored in a database, an
\emph{Endpoint} is where applications access the underlying
knowledge base. This does not necessarily need to be any custom
software, rather the programming interface of the underlying
database itself could be understood as an endpoint of its own.
Regardless, some thought can be put into designing an Endpoint as a
layer that lives between application and database that is more
convenient to use than the one provided by the database. It comes
down to the programming interface we wish to provide to a developer
using this system.
\end{itemize}
Collecter, Importer and Endpoint provide us with an easy and automated
way of making RDF files ready for use with applications. In this
introduction we only wanted to give the reader a general understanding
in the infrastructure that makes up \emph{ulo-storage}, the following
sections will explain each component in more detail.
\subsection{Endpoints}\label{sec:endpoints}
With ULO triplets imported into the GraphDB triplet store by Collecter
and Importer, we now have all data available necessary for querying.
......
\section{Introduction}\label{sec:introduction}
\section{Introduction to the \emph{ulo-storage} Project}\label{sec:introduction}
To tackle the vast array of mathematical
publications, various ways of \emph{computerizing} mathematical
......@@ -26,8 +26,6 @@ kind of data it is providing. With all four areas available for
querying, tetrapodal search intends to then combine the four indexes
into a single query interface.
\subsection{Focus on Organizational Knowledge}
Currently, research is focused on providing schemas, storage backends
and indexes for the four different kinds of mathematical
knowledge. The focus of \emph{ulo-storage} is the area of
......@@ -59,73 +57,3 @@ blocks in a larger tetrapodal search system. Second, (2)~we ran sample
prototype applications and queries on top of this interface. While the
applications themselves are admittedly not very interesting, they can
give us insight about future development of the upper level ontology.
\subsection{Components Implemented for \emph{ulo-storage}}\label{sec:components}
With RDF files exported and available for download as Git repositories
on MathHub, we have the goal of making the underlying data available
for use in applications. It makes sense to first identify the various
components that might be involved in such a system.
Figure~\ref{fig:components} illustrates all components and their
relationships.
\begin{figure}[]\begin{center}
\includegraphics{figs/components}
\caption{Components involved in the \emph{ulo-storage} system.}\label{fig:components}
\end{center}\end{figure}
\begin{itemize}
\item ULO triplets are present in various locations, be it Git
repositories, on web servers or the local disk. It is the job of a
\emph{Collecter} to assemble these {RDF}~files and forward them for further
processing. This may involve cloning a Git repository or crawling
the file system.
\item With streams of ULO files assembled by the Collecter, this
data then gets passed to an \emph{Importer}. An Importer uploads
RDF~streams into some kind of permanent storage. For
use in this project, the GraphDB~\cite{graphdb} triplet store was
a natural fit.
For this project, both Collecter and Importer ended up being one
piece of monolithic software, but this does not have to be the case.
\item Finally, with all triplets stored in a database, an
\emph{Endpoint} is where applications access the underlying
knowledge base. This does not necessarily need to be any custom
software, rather the programming interface of the underlying
database itself could be understood as an endpoint of its own.
Regardless, some thought can be put into designing an Endpoint as a
layer that lives between application and database that is more
convenient to use than the one provided by the database. It comes
down to the programming interface we wish to provide to a developer
using this system.
\end{itemize}
\subsection{An Additional Harvester Component}
These are the three components realized for
\emph{ulo-storage}. However, additionally to these components, one
could think of a \emph{Harvester} component. Above we assumed that
the ULO triplets are already available in RDF~format. This is not
necessarily true. It might be desirable to automate the export from
third party formats to ULO and we think this should be the job of a
Harvester component. It fetches mathematical knowledge from some
remote source and then provides a volatile stream of ULO data to the
Collecter, which then passes it to the Importer and so on. The big
advantage of such an approach would be that exports from third party
libraries can always be up to date and do not have to be initiated
manually. Another advantage of this hypothetical component is that
running exports through the Harvester involves the whole import chain
of Collecter and Importer which involves syntax~checking for the
exported RDF data. Bugs in exporters that produce faulty XML would be
found earlier in development.
We did not implement a Harvester for \emph{ulo-storage} but we suggest
that it is an idea to keep in mind. The components we did implement
(Collecter, Importer and Endpoint) provide us with an easy and
automated way of making RDF files ready for use with applications. In
this introduction we only wanted to give the reader a general
understanding in the infrastructure that makes up \emph{ulo-storage},
the following sections will explain each component in more detail.
......@@ -50,11 +50,9 @@
\tableofcontents
\newpage
\input{intro.tex}
\input{introduction.tex}
\newpage
\input{collecter.tex}
\newpage
\input{endpoints.tex}
\input{implementation.tex}
\newpage
\input{applications.tex}
\newpage
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment