\section{Applications}\label{sec:applications} With programming endpoints in place, we can now query the data set containing both Isabelle and Coq exports stored in {GraphDB}. We experimented with various queries and applications: \begin{itemize} \item Exploring which ULO predicates are actually used and which remain unused (Section~\ref{sec:expl}). \item We ran some queries that were suggested as building blocks of a larger tetrapodal search system (Section~\ref{sec:tetraq}). \item We also experimented with various other more general queries for organizational data recommended in literature (Section~\ref{sec:miscq}). \item Finally we built a small web front end that takes visualizes the ULO data set (Section~\ref{sec:webq}). \end{itemize} For each example query or application, we try to describe how to implement it, what results we observed and if possible we conclude with some recommendations for future development of {ULO}. \subsection{Exploring Existing Data Sets}\label{sec:expl} As a very first application, we simply looked at what ULO predicates are actually used by the respective data sets. With more than 250~million triplets in the store, we hoped that this would give us some insight into the kind of knowledge we are dealing with. Implementing a query for this job is not very difficult. In SPARQL, this can be achieved with the \texttt{COUNT} aggregate. \begin{lstlisting} PREFIX ulo: <https://mathhub.info/ulo#> SELECT ?predicate (COUNT(?predicate) as ?count) WHERE { ?s ?predicate ?o . } GROUP BY ?predicate ORDER BY DESC(?count) \end{lstlisting} This yields a list of all used predicates with \texttt{?count} being the number of occurrences. Looking at the results, we find that both the Isabelle and the Coq data sets only use subsets of the predicates provided by the ULO ontology. The results are listed in Figure~\ref{fig:used}. In both cases, the exports use less than a third of the available predicates. \input{applications-ulo-table.tex} We also see that the Isabelle and Coq exports use different predicates. For example, the Isabelle contains organizational meta information such as information about paragraphs and sections in the source document while the Coq export only tells us about the filename of the Coq source. That is not particularly problematic as long as we can trace a given object back to the original source. Regardless, our results do show that both exports have their own particularities and with more and more third party libraries exported to ULO one has to assume that this heterogeneity will only grow. In particular we want to point to the large number of predicates which remain unused in both Isabelle and Coq exports. A user formulating queries for ULO might be oblivious to the fact that only subsets of exports support given predicates. While not a problem for \emph{ulo-storage} per se, we do expect this to be a challenge when building a tetrapodal search system. Recommended ways around this ``missing fields'' problem in database literature include the clever use of default values or inference of missing values~\cite{kdisc, aidisc}, neither of which feels particularly applicable to an ULO data set. \subsection{Querying for Tetrapodal Search}\label{sec:tetraq} Various queries for a tetrapodal search system were previously suggested in literature~\cite{tetra}. We investigated how some of them can be realized with ULO data sets and where other data sources are required. Where possible, we evaluate proof of concept implementations. \begin{itemize} \item \textbf{$\mathcal{Q}_1$ ``Find theorems with non-elementary proofs.''} Elementary proofs are those that are considered easy and obvious~\cite{elempro}. In consequence,~$\mathcal{Q}_1$ has to search for all proofs which are not trivial. Of course, just like the distinction between ``theorem'' and ``corollary'' is arbitrary, so is any judgment about whether a proof is easy or not. \textbf{Organizational Aspect} A first working hypothesis might be to assume that easy proofs are short. In that case, the size, that is the number of bytes to store the proof, is our first indicator of proof complexity. {ULO} offers the \texttt{ulo:external-size} predicate which will allow us to sort by file size. Maybe small file size also leads to quick check times in proof assistants and automatic theorem provers. With this assumption in mind we could use the \texttt{ulo:check-time} predicate. Correlating proof complexity with file size allows us to define one indicator of proof complexity based on organizational knowledge alone. \textbf{Other Aspects} A tetrapodal search system should probably also take symbolic knowledge into account. Based on some kind of measure of formula complexity, different proofs could be rated. Similarly, with narrative knowledge available to us, we could count the number of words, references and so on to rate the narrative complexity of a proof. Combining symbolic knowledge, narrative knowledge and organizational knowledge allows us to find proofs which are probably straight forward. \textbf{Implementation} Implementing a naive version of the organizational aspect can be as simple as querying for all theorems justified by proofs, ordered by size (or check time). \begin{lstlisting} PREFIX ulo: <https://mathhub.info/ulo#> SELECT ?theorem ?proof ?size WHERE { ?theorem ulo:theorem ?i . ?proof ulo:justifies ?theorem . ?proof ulo:proof ?j . ?proof ulo:external-size ?size . } ORDER BY DESC(?size) \end{lstlisting} Maybe we want to go one step further and calculate a rating that assigns each proof some numeric score of complexity based on a number of properties. We can achieve this in SPARQL as recent versions support arithmetic as part of the SPARQL specification. \begin{lstlisting} PREFIX ulo: <https://mathhub.info/ulo#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?theorem ?proof ?size ( xsd:float(?size) + xsd:float(?checktime) as ?rating ) WHERE { ?theorem ulo:theorem ?i . ?proof ulo:justifies ?theorem . ?proof ulo:proof ?j . ?proof ulo:external-size ?size . ?proof ulo:check-time ?checktime . } ORDER BY DESC(?rating) \end{lstlisting} Finding a reasonable rating is its own topic of research, but we see that as long as it is based on basic arithmetic, it will be possible to formulate in a SPARQL query. \item \textbf{$\mathcal{Q}_2$ ``Find algorithms that solve $NP$-complete graph problems.''} Here we want the tetrapodal search system to return a listing of algorithms that solve (graph) problems with a given property (runtime complexity). We need to consider where each of these three components might be stored. \textbf{Symbolic and Concrete Aspects} First, let us consider algorithms. Algorithms can be formulated as computer code which can be understood as symbolic knowledge (code represented as a syntax tree) or as concrete knowledge (code as text files)~\cites[pp. 8--9]{tetra}{virt}. Either way, we will not be able to query these indices for what problem a given algorithm is solving, nor is it possible to infer properties as complex as $NP$-completeness automatically. Meta data of this kind needs to be stored in a separate index for organizational knowledge, it being the only fit. \textbf{Organizational Aspect} If we wish to look up properties about algorithms from organizational knowledge, we first have to think about how to represent this information. We propose to approaches, one using the existing ULO ontology and one that recommends extending ULO to nativly support the concept of algorithms. Both approaches have their distinct trade-offs. \textbf{Representing Algorithms} As a first approach, we can try to represent algorithms and problems in terms of existing ULO predicates. As ULO does have a concept of \texttt{theorem} and \texttt{proof}, it might be tempting to exploit the Curry-Howard correspondence and represent algorithms understood as programs in terms of proofs. But that does not really capture the canonical understanding of algorithms; algorithms are not actually programs, rather there are programs that implement algorithms. Even if we do work around this problem, it is not clear how to represent problems (e.g.\ the traveling salesman problem or the sorting problem) in terms of theorems (propositions, types) that get implemented by a proof (algorithm, program). As algorithms make up an important part of certain areas of research, it might be reasonable to introduce native level support for algorithms in ULO or separately in another ontology. An argument for adding support directly to ULO is that ULO aims to be universal and as such should not be without algorithms. An argument for a separate ontology is that what we understand as ULO data sets (Isabelle, Coq exports) already contain triplets from other ontologies (e.g.\ Dublin Core meta data~\cite{dcreport, dcowl}) and keeping concepts separate is not entirely unattractive in itself. \item \textbf{$\mathcal{Q}_3$ ``All areas of math that {Nicolas G.\ de Bruijn} has worked in and his main contributions.''} This query is asking by works of a given author~$A$. It also asks for their main contributions, e.g.\ what paragraphs or code~$A$ has authored. \textbf{Organizational Aspect} ULO has no concept of authors, contributors, dates and so on. Rather, the idea is to take advantage of the Dublin Core project which provides an ontology for such metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides us with the \texttt{dcterms:creator} and \texttt{dcterms:contributor} predicates. Servicing~$\mathcal{Q}_3$ requires us to look for creator~$A$ and then list all associated \texttt{dcterms:title}s that they have worked on. Of course this requires above authorship predicates to actually be in use. With the Isabelle and Coq exports this was hardly the case; running some experiments we found less than 15 unique contributors and creators, raising suspicion that meat data is missing in the original library files. Regardless, in theory ULO allows us to query for objects ordered by authors. Query $\mathcal{Q}_3$ is also asking for the main contributions of~$A$, that is those works that~$A$ authored that are the most important. Importance is a quality measure, simply sorting the result by number of references might be a good start. \textbf{Implementation} A search for contributions by a given author can easily be formulated in {SPARQL}. \begin{lstlisting} PREFIX ulo: <https://mathhub.info/ulo#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT ?work WHERE { ?work dcterms:creator|dcterms:contributor "John Smith" . } GROUP BY ?work \end{lstlisting} To get the main contributions, we rate each individual \texttt{?work} by its number of \texttt{ulo:uses} references. Extending the {SPARQL} query above, we can query the database for a ordered list of works, starting with the one that has the most references. \begin{lstlisting} PREFIX ulo: <https://mathhub.info/ulo#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT ?work (COUNT(?user) as ?refcount) WHERE { ?work dcterms:creator|dcterms:contributor "John Smith" . ?user ulo:uses ?work . } GROUP BY ?work ORDER BY DESC(?refcount) \end{lstlisting} We can formulate~$\mathcal{Q}_3$ with just one SPARQL query. Because everything is handled by the database, access should be quick. \end{itemize} Experimenting with $\mathcal{Q}_1$ to $\mathcal{Q}_3$ provided us with some insight into ULO and existing ULO exports. $\mathcal{Q}_1$ shows that while there is no formal definition for ``elementary proof'', ULO allows us to query for heuristics and calculate a difficulty score for proofs and their associated theorems. Query~$\mathcal{Q}_2$ illustrates the difficulty in finding universal schemas. It remains an open question whether ULO should include algorithms as a first class citizen, as a concept based around existing ULO predicates or whether it is a better idea to design a dedicated ontology and potentially data store entirely. Finally, while we were able to formulate a SPARQL query that should take care of most of~$\mathcal{Q}_3$ we found that the existing data sets contain very little information about authorship. This underlines the observations made in Section~\label{sec:expl}, developers writing applications that query ULO storage need to be aware of the fact that existing exports have ``holes''. \subsection{Organizational Queries}\label{sec:miscq} Other organizational queries for organizational data sets have previously been proposed~\cite{adl}. Here we implement some of them. \subsection{Experience with Building a Web Frontend}\label{sec:webq}