components.tex

\newpage
\section{Components}\label{sec:components}

With ULO/RDF files in place and available for download on
MathHub~\cite{uloisabelle, ulocoq} as Git repositories, we have the
aim of making the underlying data available for use in applications.
It makes sense to first identify out the various components that might
be involved in such a system. They are illustrated in
figure~\ref{fig:components}.

\begin{figure}[]\begin{center}
    \includegraphics{figs/components}
    \caption{Components involved in the \emph{ulo-storage} system.}\label{fig:components}
\end{center}\end{figure}

\begin{itemize}
\item ULO/RDF data is present on various locations, be it Git
  repositories, available on web servers or on the local disk.
  Regardless where this ULO/RDF data is stored, it is the job of a
  \emph{Collecter} to assemble these {ULO/RDF}~files and forward them
  for further processing. This may involve cloning a Git repository or
  crawling a file system.

\item With streams of ULO/RDF files assembled bye the Collecter, this
  data then gets passed to an \emph{Importer}. The Importer uploads
  the received RDF~streams into some kind of permanent storage. For
  use in this project, the GraphDB~\cite{graphdb} triplet store was
  natural fit.

  In practice, both Collecter and Importer end up being one piece of
  software, but this does not have to be the case.

\item Finally, with all triplets stored in a database, an
  \emph{Endpoint} is where applications access the underlying
  knowledge base. This does not necessarily need to be any specific
  software, rather the programming API of the database could be
  understood as an endpoint of its own. However, some thought can be
  put into designing an Endpoint that is more convenient to use than
  the one provided by the database.
\end{itemize}

Additionally to these components, one could think of a \emph{Harvester} component.
We assumed that the ULO/RDF triplets are already available in RDF~format.
Indeed for this project this is the case as we work with already
exported triplets from the Isabelle and Coq libraries. However, this
does not need to be the case.

It might be desirable to automate the export from third party formats
to ULO/RDF and indeed this is what a Harvester would do.  It fetches
mathematical knowledge from some remote source and then provides a
volatile stream of ULO/RDF data to the Collecter, which then passes it
to the Importer and so on. The big advantage of such an approach would
be that exports from third party libraries can always be up to date
and do not have to be initiated manually. However, due to time
constraints this idea was not followed up upon in this project.

Each component (Collecter, Importer, Endpoint) will later be discussed
in more detail. This section serves only for the reader to get a
general understanding of the developed infrastructure and its
topology.