Skip to content
Snippets Groups Projects
introduction.tex 3.54 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Introduction to the \emph{ulo-storage} Project}\label{sec:introduction}
    
    To tackle the vast array of mathematical
    publications, various ways of \emph{computerizing} mathematical
    knowledge have been experimented with. As it is already difficult for
    human mathematicians to keep even a subset of all mathematical
    knowledge in their mind, a hope is that computerization will yield
    great improvement to mathematical (and really any formal) research by
    making the results of all collected publications readily available and
    easy to search~\cite{onebrain}.
    
    One research topic in this field is the idea of a \emph{tetrapodal
    search} that combines four distinct areas of mathematical knowledge.
    These four kinds being (1)~the actual formulae as \emph{symbolic
    knowledge}, (2)~examples and concrete objects as \emph{concrete knowledge},
    (3)~names and comments as \emph{narrative knowledge} and finally
    (4)~identifiers, references and their relationships, referred to as
    \emph{organizational knowledge}~\cite{tetra}.
    
    Tetrapodal search aims to provide a unified search engine that indexes
    each of the four different subsets of mathematical knowledge.  Because
    all four kinds of knowledge are inherently different in their
    structure, tetrapodal search proposes that each kind of mathematical
    knowledge should be made available in a storage backend that fits the
    kind of data it is providing. With all four areas available for
    querying, tetrapodal search intends to then combine the four indexes
    into a single query interface.
    
    Currently, research is focused on providing schemas, storage backends
    and indexes for the four different kinds of mathematical
    knowledge. The focus of \emph{ulo-storage} is the area of
    organizational knowledge.
    
    A previously proposed way to structure such organizational data is the
    \emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
    OWL~ontology~\cite{uloonto} and as such all organization information
    is stored as RDF~triplets with a unified schema of
    ULO~predicates~\cite{owl}.  Some effort has been made to export
    existing databases of formal mathematical knowledge to {ULO}, in
    particular, there exist exports from Isabelle and Coq
    libraries~\cite{uloisabelle, ulocoq}. The resulting data set is
    already quite large, the Isabelle export alone containing more than
    200~million triplets.
    
    Existing exports from Isabelle and Coq result in single or multiple
    RDF~files. This is a convenient format for exchange and easily
    versioned using Git. However, considering the vast number of triplets,
    it is impossible to query easily and efficiently in this state. This
    is what \emph{ulo-storage} is focused on: Making ULO data sets
    accessible for querying and analysis. We collected RDF files spread
    over different Git repositories, imported them into a database and
    then experimented with APIs for accessing that data set.
    
    The main contribution of \emph{ulo-storage} is twofold. First, (1)~we
    
    built up various infrastructure components for making organizational
    knowledge queryable.  These components can make up building blocks of
    a larger tetrapodal search system. Design and implementation are
    discussed in Section~\ref{sec:implementation}.  Second, (2)~we ran
    sample prototype applications and queries on top of this
    interface. While the applications themselves are admittedly not very
    useful in itself, they can give us insight about future development of
    the upper level ontology. These applications and queries are the focus
    of Section~\ref{sec:applications}. A summary of encountered problems
    and suggestions for next step concludes this report in
    Section~\ref{sec:conclusion}.