introduction.tex

\section{Introduction}\label{sec:introduction}

To tackle the vast array of mathematical publications, various ways of
\emph{computerizing} mathematical knowledge have been experimented
with. Already it is difficult for human mathematicians to keep even a
subset of all mathematical knowledge in their mind, a problem referred
to as the ``one brain barrier'' in literature~\cite{onebrain}.  A hope
is that computerization will yield great improvement to formal
research by making the results of all collected publications readily
available and easy to search.

One topic of research in this field is the idea of a \emph{tetrapodal
  search} that combines four distinct areas of mathematical knowledge.
These four kinds being (1)~the actual formulae as \emph{symbolic
  knowledge}, (2)~examples and concrete objects as \emph{concrete
  knowledge}, (3)~prose, names and comments as \emph{narrative knowledge} and
finally (4)~identifiers, references and their relationships, referred
to as \emph{organizational knowledge}~\cite{tetra}.

Tetrapodal search aims to provide a unified search engine that indexes
each of these four different subsets of mathematical knowledge.
Because all four kinds of knowledge are inherently unique in their
structure, tetrapodal search proposes that each subset of mathematical
knowledge should be made available in a storage backend that fits the
kind of data it is providing. With all four areas available for
querying, tetrapodal search intends to then combine the four indexes
into a single query interface. Currently, research aims at providing
schemas, storage backends and indexes for the four different kinds of
mathematical knowledge. In this context, the focus of
\emph{ulo-storage} is the area of organizational knowledge.

A previously proposed way to structure organizational knowledge is the
\emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
OWL~ontology~\cite{uloonto} and as such all organization knowledge is
stored as RDF~triplets with a unified schema of
ULO~predicates~\cite{owl}.  Some effort has been made to export
existing collections of formal mathematical knowledge to {ULO}. In
particular, exports from Isabelle and Coq-based libraries are
available~\cite{afpexport, uloisabelle, ulocoq}. The resulting data
set is already quite large, the Isabelle export alone containing more
than 200~million triplets.

Existing exports from Isabelle and Coq result in sets of XML~files
that contain RDF~triplets. This is a convenient format for exchange
and easily tracked using version control systems such as
Git~\cite{gitpaper} as employed by MathHub~\cite{mh}. However,
considering the vast number of triplets, it is impossible to query
easily and efficiently in this state. This is what \emph{ulo-storage}
is focused on, that is making ULO data sets accessible for querying and
analysis. We collected RDF files spread over different Git
repositories, imported them into a database and then experimented with
APIs for accessing that data set.

The contribution of \emph{ulo-storage} is twofold. First, (1)~we built
up various infrastructure components that make organizational
knowledge easy to query.  These components can make up building blocks
of a larger tetrapodal search system. Their design and implementation
are discussed in Section~\ref{sec:implementation}.  Second, (2)~we ran
prototype applications and queries on top of this
infrastructure. While the applications are not useful in itself, they
can give us insight about future development of the upper level
ontology and related schemas. These applications and queries are the
focus of Section~\ref{sec:applications}. A summary of encountered
problems and suggestions for next steps concludes this report in
Section~\ref{sec:conclusion}.