introduction.tex

\section{Introduction to the \emph{ulo-storage} Project}\label{sec:introduction}

To tackle the vast array of mathematical
publications, various ways of \emph{computerizing} mathematical
knowledge have been experimented with. As it is already difficult for
human mathematicians to keep even a subset of all mathematical
knowledge in their mind, a hope is that computerization will yield
great improvement to mathematical (and really any formal) research by
making the results of all collected publications readily available and
easy to search~\cite{onebrain}.

One research topic in this field is the idea of a \emph{tetrapodal
search} that combines four distinct areas of mathematical knowledge.
These four kinds being (1)~the actual formulae as \emph{symbolic
knowledge}, (2)~examples and concrete objects as \emph{concrete knowledge},
(3)~names and comments as \emph{narrative knowledge} and finally
(4)~identifiers, references and their relationships, referred to as
\emph{organizational knowledge}~\cite{tetra}.

Tetrapodal search aims to provide a unified search engine that indexes
each of the four different subsets of mathematical knowledge.  Because
all four kinds of knowledge are inherently different in their
structure, tetrapodal search proposes that each kind of mathematical
knowledge should be made available in a storage backend that fits the
kind of data it is providing. With all four areas available for
querying, tetrapodal search intends to then combine the four indexes
into a single query interface.

Currently, research is focused on providing schemas, storage backends
and indexes for the four different kinds of mathematical
knowledge. The focus of \emph{ulo-storage} is the area of
organizational knowledge.

A previously proposed way to structure such organizational data is the
\emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
OWL~ontology~\cite{uloonto} and as such all organization information
is stored as RDF~triplets with a unified schema of
ULO~predicates~\cite{owl}.  Some effort has been made to export
existing databases of formal mathematical knowledge to {ULO}, in
particular, there exist exports from Isabelle and Coq
libraries~\cite{uloisabelle, ulocoq}. The resulting data set is
already quite large, the Isabelle export alone containing more than
200~million triplets.

Existing exports from Isabelle and Coq result in single or multiple
RDF~files. This is a convenient format for exchange and easily
versioned using Git. However, considering the vast number of triplets,
it is impossible to query easily and efficiently in this state. This
is what \emph{ulo-storage} is focused on: Making ULO data sets
accessible for querying and analysis. We collected RDF files spread
over different Git repositories, imported them into a database and
then experimented with APIs for accessing that data set.

The main contribution of \emph{ulo-storage} is twofold. First, (1)~we
built up various infrastructure components for making organizational
knowledge queryable.  These components can make up building blocks of
a larger tetrapodal search system. Design and implementation are
discussed in Section~\ref{sec:implementation}.  Second, (2)~we ran
sample prototype applications and queries on top of this
interface. While the applications themselves are admittedly not very
useful in itself, they can give us insight about future development of
the upper level ontology. These applications and queries are the focus
of Section~\ref{sec:applications}. A summary of encountered problems
and suggestions for next step concludes this report in
Section~\ref{sec:conclusion}.