Select Git revision
applications.tex
applications.tex 18.72 KiB
\section{Applications}\label{sec:applications}
With programming endpoints in place, we can now query the data set
containing both Isabelle and Coq exports stored in {GraphDB}. We
experimented with the following applications that talk to a GraphDB
Endpoint, our second contribution.
\begin{itemize}
\item Exploring which ULO predicates are actually used in the
existing Coq and Isabelle exports. We find that more than two
thirds of existing ULO predicates were not taken advantage of
(Section~\ref{sec:expl}).
\item Providing an interactive interface for browsing the data
set. Our implementation is limited to listing basic information
about contributors and their work (Section~\ref{sec:exp}).
\item We investigated queries that could be used to extend the
system into a larger tetrapodal search system. While some
organizational queries have obvious canonical solutions others
introduce questions on how organizational knowledge should be
structured (Section~\ref{sec:tetraq}).
\end{itemize}
\noindent These applications will now be discussed in the following
sections.
\subsection{Exploring Existing Data Sets}\label{sec:expl}
For our first application, we looked at which ULO predicates are
actually used by the respective data sets. With more than 250~million
triplets in the store, we hoped that this would give us some insight
into the kind of knowledge we are dealing with.
Implementing a query for this job is not very difficult. In SPARQL,
this can be achieved with the \texttt{COUNT} aggregate, the full query
is given in verbatim in Figure~\ref{fig:preds-query}. Our query
yields a list of all used predicates together with the number of
occurrences (Figure~\ref{fig:preds-result}). Looking at the results,
we find that both the Isabelle and the Coq data sets only use subsets
of the predicates provided by~{ULO}. The full results are listed in
Appendix~\ref{sec:used}. In both cases, what stands out is that either
export uses less than a third of all available ULO~predicates.
We also see that the Isabelle and Coq exports use different
predicates. For example, the Isabelle export contains organizational
meta information such as information about paragraphs and sections in
the source document while the Coq export only tells us the filename of
the original Coq source. Both exports have their particularities and
with more and more third party libraries exported to ULO one has to
assume that this heterogeneity will only grow. In particular we want
to point to the large number of predicates which remain unused in both
Isabelle and Coq exports. A user formulating queries for ULO might be
oblivious to the fact that only subsets of exports support given
predicates.
\input{applications-preds.tex}
We expect the difference between Coq and Isabelle exports to be caused
by the difference in source material. It is only natural that
different third party libraries expressed in different languages with
different capabilities will result in different
ULO~predicates. Regardless, we want to hint at the possibility that
this could also be an omission in the exporter code that originally
generated the RDF~triplets. This shows the importance of writing good
exporters. Exporters translating existing libraries to ULO~triplets
must lose as little information as possible to ensure good results in
a larger tetrapodal search system.
Our first application gave us an initial impression of the structure