intro.tex

\section{Introduction}\label{sec:introduction}

To tackle the vast array of mathematical publications, various ways of
\emph{computerizing} mathematical knowledge have been researched and
developed. As it is already difficult for human mathematicians to keep
even a subset of all mathematical knowledge in their mind, a hope is
that computerization will yield great improvement to mathematical (and
really any) research by making the results of all collected research
readily available and easy to search~\cite{onebrain}.

One research topic in this field is the idea of a \emph{tetrapodal
search} that combines four distinct areas of mathematical knowledge
and data. These four kinds being (1)~the actual formulae as \emph{symbolic
knowledge}, (2)~examples and concrete objects as \emph{concrete knowledge},
(3)~names and comments as \emph{narrative knowledge} and finally
(4)~identifiers, references and their relationships, referred to as
\emph{organizational knowledge}~\cite{tetra}.

Tetrapodal search aims to provide a unified search engine that indexes
each of the four different subsets of mathematical knowledge.  Because
all four kinds of knowledge are inherently different in their
structure, tetrapodal search proposes that each kind of mathematical
knowledge should be made available in a storage and index backend that
fits exactly with the kind of data it is providing. With all four
areas available for querying, tetrapodal search then intends to then
combine the four indexes into a single query interface.

As it stands, tetrapodal search is not really close to usable
prototypes Currently, research is focused on providing storage
backends and indexes for the four different kinds of mathematical
knowledge. The focus of \emph{ulo-storage} is to achieve this for
organizational knowledge.

A previously proposed way to structure such organizational data is the
\emph{upper level ontology} (ULO/RDF)~\cite{ulo}. ULO/RDF takes the
form of an OWL~ontology and as such all organization information is
stored as RDF~triplets with a unified schema of ULO~predicates.  Some
effort has been made to export existing databases of formal
mathematical knowledge to {ULO/RDF}. In particular, there exist
exports from Isabelle and Coq libraries~\cite{uloisabelle,
ulocoq}. The resulting data set is already quite large, the Isabelle
export alone containing more than 200~million triplets.

Existing exports from Isabelle and Coq result in a single or multiple
RDF~files. This is a convenient form for exchange and easily versioned
using Git. However, considering the vast number of triplets, it is
impossible to query easily and efficiently as it is. Ultimately this
is what \emph{ulo-storage} work is focused on: Making ULO/RDF
accessible for querying and analysis.

The remainder of this report is split up in $n$~sections. First, in
the following section~\ref{sec:components} we take a look at the
various components involved in this project. Each major component is
then explained in a dedicated section; we will first collect data,
store it in a database, make it available for querying and then try
out some applications previously suggested in literature. Finally,
section~\ref{sec:conclusion} sums up the project and suggests some
next steps to take.