Skip to content
Snippets Groups Projects
Commit 2127bd46 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: review stats query

parent b27054b8
No related branches found
No related tags found
No related merge requests found
\section{Applications}\label{sec:applications} \section{Applications}\label{sec:applications}
With endpoints in place, we can now query the ULO/RDF data set. This With programming endpoints in place, we can now query the data set containing
section describes some experiments with the \emph{ulo-endpoint} both Isabelle and Coq exports stored in {GraphDB}. We experimented with
Endpoint {API}. In particular, we query the storage backend for some various queries and applications:
statistics, implement some queries suggested for tetrapodal search
\begin{itemize}
\subsection{Exploring Existing Data Sets} \item Exploring which ULO predicates are actually used and which
remain unused (Section~\ref{sec:expl}).
As previously stated, there already exist exports to ULO for both
Isabelle and Coq libraries~\cite{uloisabelle, ulocoq}. As a very first \item We ran some queries that were suggested as building blocks
application, we simply look at what ULO predicates are actually used of a larger tetrapodal search system (Section~\ref{sec:tetraq}).
by the respective data sets. Implementing such a query is not very
difficult. In SPARQL, this can be achieved with the \texttt{COUNT} \item We also experimented with various other more general queries
aggregate. for organizational data recommended in literature
(Section~\ref{sec:miscq}).
\item Finally we built a small web front end that takes visualizes
the ULO data set (Section~\ref{sec:webq}).
\end{itemize}
For each example query or application, we try to describe how to
implement it, what results we observed and if possible we conclude
with some recommendations for future development of {ULO}.
\subsection{Exploring Existing Data Sets}\label{sec:expl}
As a very first application, we simply looked at what ULO predicates
are actually used by the respective data sets. With more than
250~million triplets in the store, we hoped that this would give us
some insight into the kind of knowledge we are dealing with.
Implementing a query for this job is not very difficult. In SPARQL,
this can be achieved with the \texttt{COUNT} aggregate.
\begin{lstlisting} \begin{lstlisting}
PREFIX ulo: <https://mathhub.info/ulo#> PREFIX ulo: <https://mathhub.info/ulo#>
...@@ -27,7 +46,7 @@ This yields a list of all used predicates with \texttt{?count} being ...@@ -27,7 +46,7 @@ This yields a list of all used predicates with \texttt{?count} being
the number of occurrences. Looking at the results, we find that both the number of occurrences. Looking at the results, we find that both
the Isabelle and the Coq data sets only use subsets of the predicates the Isabelle and the Coq data sets only use subsets of the predicates
provided by the ULO ontology. The results are listed in provided by the ULO ontology. The results are listed in
figure~\ref{fig:used}. In both cases, the exports use less than a Figure~\ref{fig:used}. In both cases, the exports use less than a
third of the available predicates. third of the available predicates.
\input{applications-ulo-table.tex} \input{applications-ulo-table.tex}
...@@ -37,19 +56,23 @@ predicates. For example, the Isabelle contains organizational meta ...@@ -37,19 +56,23 @@ predicates. For example, the Isabelle contains organizational meta
information such as information about paragraphs and sections in the information such as information about paragraphs and sections in the
source document while the Coq export only tells us about the filename source document while the Coq export only tells us about the filename
of the Coq source. That is not particularly problematic as long as we of the Coq source. That is not particularly problematic as long as we
can trace a given object back to the original Isabelle/Coq source. can trace a given object back to the original source. Regardless, our
results do show that both exports have their own particularities and
However, our results do show that both exports have their own with more and more third party libraries exported to ULO one has to
particularities and with more and more third party libraries exported assume that this heterogeneity will only grow. In particular we want
to ULO one can assume that this heterogeneity only grows. In particular to point to the large number of predicates which remain unused in both
we want to point to the large number of predicates which remain unused Isabelle and Coq exports. A user formulating queries for ULO might be
in both Isabelle and Coq exports. A user formulating queries for ULO oblivious to the fact that only subsets of exports support given
might be oblivious to the fact that only subsets of exports support predicates.
given predicates. While not a problem for \emph{ulo-storage} per se,
we expect this to be a major challenge when building a system of While not a problem for \emph{ulo-storage} per se, we do expect this
tetrapodal search. to be a challenge when building a tetrapodal search
system. Recommended ways around this ``missing fields'' problem in
\subsection{Querying for Tetrapodal Search} database literature include the clever use of default values or
inference of missing values~\cite{kdisc, aidisc}, neither of which
feels particularly applicable to an ULO data set.
\subsection{Querying for Tetrapodal Search}\label{sec:tetraq}
\emph{ulo-storage} was started with the goal of making organizational \emph{ulo-storage} was started with the goal of making organizational
knowledge available for tetrapodal search. We will first take a look knowledge available for tetrapodal search. We will first take a look
...@@ -259,6 +282,8 @@ proof of concept implementations. ...@@ -259,6 +282,8 @@ proof of concept implementations.
handled by the database access should be quick. handled by the database access should be quick.
\end{itemize} \end{itemize}
\subsection{Other Queries} \subsection{Organizational Queries}\label{sec:miscq}
\emph{{TODO}: SPARQL Queries references in ULO paper} \emph{{TODO}: SPARQL Queries references in ULO paper}
\subsection{Experience with Building a Web Frontend}\label{sec:webq}
...@@ -154,3 +154,20 @@ ...@@ -154,3 +154,20 @@
year={2013}, year={2013},
publisher={" O'Reilly Media, Inc."} publisher={" O'Reilly Media, Inc."}
} }
@article{kdisc,
title={Knowledge discovery in databases: An overview},
author={Frawley, William J and Piatetsky-Shapiro, Gregory and Matheus, Christopher J},
journal={AI magazine},
volume={13},
number={3},
pages={57--57},
year={1992}
}
@inproceedings{aidisc,
title={Discovering Missing Values in Semi-Structured Databases.},
author={Yi, Xing and Allan, James and Lavrenko, Victor},
booktitle={RIAO},
year={2007}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment