Skip to content
Snippets Groups Projects
intro.tex 6.27 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Introduction}\label{sec:introduction}
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    
    To tackle the vast array of mathematical
    publications, various ways of \emph{computerizing} mathematical
    knowledge have been experimented with. As it is already difficult for
    human mathematicians to keep even a subset of all mathematical
    knowledge in their mind, a hope is that computerization will yield
    great improvement to mathematical (and really any formal) research by
    making the results of all collected publications readily available and
    easy to search~\cite{onebrain}.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    One research topic in this field is the idea of a \emph{tetrapodal
    
    search} that combines four distinct areas of mathematical knowledge.
    These four kinds being (1)~the actual formulae as \emph{symbolic
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    knowledge}, (2)~examples and concrete objects as \emph{concrete knowledge},
    (3)~names and comments as \emph{narrative knowledge} and finally
    (4)~identifiers, references and their relationships, referred to as
    \emph{organizational knowledge}~\cite{tetra}.
    
    
    Tetrapodal search aims to provide a unified search engine that indexes
    each of the four different subsets of mathematical knowledge.  Because
    all four kinds of knowledge are inherently different in their
    structure, tetrapodal search proposes that each kind of mathematical
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    knowledge should be made available in a storage backend that fits
    exactly with the kind of data it is providing. With all four areas
    available for querying, tetrapodal search intends to then combine
    the four indexes into a single query interface.
    
    \subsection{Focus on Organizational Knowledge}
    
    Currently, research is focused on providing schemas, storage backends
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    and indexes for the four different kinds of mathematical
    
    knowledge. The focus of \emph{ulo-storage} is the area of
    
    organizational knowledge.
    
    A previously proposed way to structure such organizational data is the
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    \emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    OWL~ontology~\cite{uloonto} and as such all organization information
    is stored as RDF~triplets with a unified schema of
    ULO~predicates~\cite{owl}.  Some effort has been made to export
    existing databases of formal mathematical knowledge to {ULO}. In
    particular, there exist exports from Isabelle and Coq
    libraries~\cite{uloisabelle, ulocoq}. The resulting data set is
    already quite large, the Isabelle export alone containing more than
    200~million triplets.
    
    Existing exports from Isabelle and Coq result in single or multiple
    RDF~files. This is a convenient format for exchange and easily
    versioned using Git. However, considering the vast number of triplets,
    it is impossible to query easily and efficiently in this state. This
    is what \emph{ulo-storage} is focused on: Making ULO data sets
    accessible for querying and analysis. We collected RDF files spread
    over different Git repositories, imported them into a database and
    then experimented with APIs for accessing that data set.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    The main contribution of this project is twofold. First, (1) we built
    up various infrastructure components that can make up building blocks
    in a larger tetrapodal search system. Second, (2)~we ran sample
    prototype applications and queries on top of this interface. While the
    
    applications themselves are admittedly not very interesting, they can give
    us insight about future development of~{ULO}.
    
    \subsection{Components Implemented for \emph{ulo-storage}}\label{sec:components}
    
    With RDF files exported and available for download as Git repositories
    on MathHub, we have the goal of making the underlying data available
    for use in applications.  It makes sense to first identify the various
    components that might be involved in such a system. Figure~\ref{fig:components}
    illustrates all components and their relationships.
    
    \begin{figure}[]\begin{center}
        \includegraphics{figs/components}
        \caption{Components involved in the \emph{ulo-storage} system.}\label{fig:components}
    \end{center}\end{figure}
    
    \begin{description}
    \item[Collecter] ULO triplets are present in various locations, be it Git
      repositories, available on web servers or on local disk.
      It is the job of a Collecter to assemble these {RDF}~files and
      forward them for further processing. This may involve cloning a Git
      repository or crawling the file system.
    
    \item[Importer] With streams of ULO files assembled by the Collecter, this
      data then gets passed to an Importer. An Importer uploads
      received RDF~streams into some kind of permanent storage. For
      use in this project, the GraphDB~\cite{graphdb} triplet store was
      a natural fit.
    
      In this project, both Collecter and Importer ended up being one piece
      of software, but this does not have to be the case.
    
    \item[Endpoint] Finally, with all triplets stored in a database, an
      Endpoint is where applications access the underlying
      knowledge base. This does not necessarily need to be any custom
      software, rather the programming interface of the underlying
      database itself could be understood as an endpoint of its
      own.
    
      Regardless, some thought can be put into designing an Endpoint as a
      layer that lives between application and database that is more
      convenient to use than the one provided by the database.
    \end{description}
    
    \subsection{An Additional Harvester Component}
    
    These are the components realized for \emph{ulo-storage}. However,
    additionally to these components, one could think of a
    \emph{Harvester} component.  We assumed that the ULO triplets are
    already available in RDF~format.  This is not necessarily true.  It
    might be desirable to automate the export from third party formats to
    ULO and we think this should be the job of a Harvester component.  It
    fetches mathematical knowledge from some remote source and then
    provides a volatile stream of ULO data to the Collecter, which then
    passes it to the Importer and so on. The big advantage of such an
    approach would be that exports from third party libraries can always
    be up to date and do not have to be initiated manually.
    
    We did not implement a Harvester for \emph{ulo-storage} but we suggest
    that it is an idea to keep in mind. The components we did implement
    (Collecter, Importer and Endpoint) provide us with an easy and
    automated way of making RDF files ready for use with applications.  In
    this introduction we only wanted to give the reader a general
    understanding in the infrastructure that makes up \emph{ulo-storage};
    the following sections will explain each component in more detail.