applications.tex

  
 \section{Applications}\label{sec:applications}

With programming endpoints in place, we can now query the data set containing
both Isabelle and Coq exports stored in {GraphDB}. We experimented with
various queries and applications:

\begin{itemize}
    \item Exploring which ULO predicates are actually used in the
      existing Coq and Isabelle exports.  We find that more than two
      thirds of existing ULO predicates were not taken advantage of
      (Section~\ref{sec:expl}).

    \item We investigated queries that could be used to extend the
      system into a larger tetrapodal search system. While some
      organizational queries have obvious canonical solutions others
      introduce questions on how organizational knowledge should be
      organized (Section~\ref{sec:tetraq}).

    \item We also experimented with various other more general queries
      for organizational data recommended in literature
      (Section~\ref{sec:miscq}).

    \item Finally we built a small web front end that takes visualizes
      the ULO data set (Section~\ref{sec:webq}).
\end{itemize}

\noindent Each application will now be discussed in a dedicated section.

\subsection{Exploring Existing Data Sets}\label{sec:expl}

Four our first application, we looked at what ULO predicates are
actually used by the respective data sets. With more than 250~million
triplets in the store, we hoped that this would give us some insight
into the kind of knowledge we are dealing with.

Implementing a query for this job is not very difficult. In SPARQL,
this can be achieved with the \texttt{COUNT} aggregate, the full query
is given in verbatim in Figure~\ref{fig:preds-query}.  This yields a
list of all used predicates with \texttt{?count} being the number of
occurrences (Figure~\ref{fig:preds-result}). Looking at the results,
we find that both the Isabelle and the Coq data sets only use subsets
of the predicates provided by the ULO ontology. The full results are
listed in Appendix~\ref{sec:used}. In both cases, what sta ndsndsout is
that either exports use less than a third of the available predicates.

We also see that the Isabelle and Coq exports use different
predicates.  For example, the Isabelle contains organizational meta
information such as information about paragraphs and sections in the
source document while the Coq export only tells us about the filename
of the Coq source. That is not particularly problematic as long as we
can trace a given object back to the original source.  Regardless, our
results do show that both exports have their own particularities and
with more and more third party libraries exported to ULO one has to
assume that this heterogeneity will only grow. In particular we want
to point to the large number of predicates which remain unused in both
Isabelle and Coq exports. A user formulating queries for ULO might be
oblivious to the fact that only subsets of exports support given
predicates.

While not a problem for \emph{ulo-storage} per se, we do expect this
to be a challenge when building a tetrapodal search
system. Recommended ways around this ``missing fields'' problem in
database literature include the clever use of default values or
inference of missing values~\cite{kdisc, aidisc}, neither of which
feels particularly applicable to an ULO data set.

\input{applications-preds.tex}

\subsection{Querying for Tetrapodal Search}\label{sec:tetraq}