Skip to content
Snippets Groups Projects
applications.tex 8.31 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Applications}\label{sec:applications}
    
    
    With endpoints in place, we can now query the ULO/RDF
    data set. Depending on the kind of application, different interfaces
    and approaches to querying the database might make sense.
    
    
    \subsection{Querying for Tetrapodal Search}
    
    
    \emph{ulo-storage} was started with the goal of making organizational
    knowledge available for tetrapodal search. We will first take a look
    
    at how ULO/RDF performs at this task. Conveniently, various queries
    
    for a tetrapodal search system were suggested in~\cite{tetra}; we will
    
    investigate how some of them can be realized with ULO/RDF data sets
    and where other data sources are required. Where possible, we evaluate
    proof of concept implementations.
    
    \begin{itemize}
        \item \textbf{$\mathcal{Q}_{1}$ ``Find theorems with non-elementary
        proofs.''}  Elementary proofs are those that are considered easy and
        obvious. In consequence,~$\mathcal{Q}_{1}$ asks for all proofs
        which are more difficult and not trivial. Of course, just like the
        distinction between ``theorem'' and ``corollary'' is arbitrary, so
        is any judgment about whether a proof is elementary or not.
    
        A first working hypothesis might be to assume that easy proofs are
        short. In that case, the size, that is the number of bytes to
        store the proof, is our first indicator of proof
        complexity. ULO/RDF offers the \texttt{ulo:external-size}
        predicate which will allow us to sort by file size. Maybe small
        file size also leads to quick check times in proof assistants and
        automatic theorem provers. With this assumption in mind we could
        use the \texttt{ulo:check-time} predicate. Correlating proof
        complexity with file size allows us to answer this query with
        organizational data based on {ULO/RDF}.
    
        A tetrapodal search system should probably also take symbolic
        knowledge into account. Based on some kind of measure of formula
        complexity, different proofs could be rated. Similarly, with
        narrative knowledge available to us, we could count the number of
        words, references and so on to rate the narrative complexity of a
        proof. This shows that combining symbolic knowledge, narrative
        knowledge and organizational knowledge could allow us to
        service~$\mathcal{Q}_{1}$ in a heuristic fashion.
    
        \item \textbf{$\mathcal{Q}_{2}$ ``Find algorithms that solve
        $NP$-complete graph problems.''} Here we want the tetrapodal search
        system to return a listing of algorithms that solve (graph)
        problems with a given property (runtime complexity). We need
        to consider where each of these three components might be stored.
    
        First, let us consider algorithms. Algorithms can be formulated as
        computer code which should be accessible from storage for symbolic
        knowledge. But this encodes just the algorithm itself, not what
        the algorithm is for and nor is it possible to quickly evaluate
        properties such as $NP$~completeness for querying. They need to be
        stored in a separate index.
    
        It seems that what an algorithm implements and with which
        properties is a case of organizational knowledge. With ULO, we
        could search for an algorithm identified by a \texttt{ulo:name},
        but that leaves much to be desired. Maybe what is missing in
        ULO/RDF is a way to define ``problems~$\phi$'', e.g.\ the
        traveling salesman problem, and a way of expressing that a given
        object ``computes~$\phi$''. With problem~$\phi$ we could associate
        theorems such as ``$\phi$ is in $NP$''.
    
        At this point it appears that our ULO/RDF data sets can not help
        us with answering query~$\mathcal{Q}_{2}$. It remains a topic
        for discussion whether support for algorithms should be added
        to the upper level ontology or whether such concepts should be
        built on top of the existing predicates.
    
        \item \textbf{$\mathcal{Q}_{3}$ ``Find integer sequences whose generating
        function is a rational polynomial in $\sin(x)$ that has a Maple
        implementation not not affected by the bug in module~$x$.''} We
        see that this query is about finding specific instances, integer
        sequences, with some property.
    
       This is a case where information would be split between many
       sources.  The name of the sequences is part of the organizational
       data set, the generating function is part of symbolic knowledge,
       the Maple implementation could be part of concrete knowledge.
    
       Handling this query would probably start by filtering for all
       integer sequences. It is not clear how this should be achieved with
       ULO/RDF as it contains no unified concept of a sequence.  It might
       be possible to take advantage of \texttt{aligned-with} or some
       similar concept to find all such sequences. If this succeeds, an
       ULO index can provide the first step in servicing this query.
    
       As for the next steps, finding concrete algorithms and in
       particular looking inside of them is not organizational data and
       other indices will need to be queried. That said, it is an open
       question whether ULO should contain more information (e.g.\ about
       properties of the generating function) or whether such information
       can be deduced from symbolic knowledge.
    
       \item \textbf{$\mathcal{Q}_{4}$ ``CAS implementation of Gröbner bases that
       conform to a definition in AFP.''} Gröbner Bases are a field of
       study in mathematics particular attractive for use in computer
       algebra systems (CAS)~\cite{groebner}. This query is asking for
       concrete implementations of Gröbner Bases that match the definition
       in the Archive of Formal Proofs (AFP).
    
       We do have ULO/RDF exports for the AFP~\cite{uloisabelle}. Stated
       like this, we can probably assume that $\mathcal{Q}_{4}$ is a query
       for a very specific definition, identified by an ULO {URI}. No smart
       queries necessary. What is missing is the set of implementations,
       that is symbolic knowledge about actual implementations, and a way
       of matching this symbolic knowledge with the definition in
       {AFP}. While surely an interesting problem, it is not a task for
       organizational knowledge.
    
       \item \textbf{$\mathcal{Q}_{5}$ ``All areas of math that {Nicolas G.\
       de Bruijn} has worked in and his main contributions.''}  This query
       is asking by works of a given author~$A$.  It also ask for their
       main contributions, e.g.\ what paragraphs or code~$A$ has authored.
    
    
       \textbf{Organizational Aspect} ULO has no concept of authors,
       contributors dates and so on. Rather, the idea is to take advantage
       of the Dublin Core project which provides an ontology for such
    
       metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides
       us with the \texttt{dcterms:creator} predicate. Servicing this
       query would mean looking for the creator~$A$ and then listing all
       associated \texttt{dcterms:title} that~$A$ has worked on. For a
       first working version, the exports managed by \emph{ulo-storage}
       are enough to service this query.
    
       As~$\mathcal{Q}_{5}$ is also asking for the main contributions
       of~$A$, that is those works that~$A$ authored that are the most
       important. Importance is a quality measure, simply sorting the
       result by number of references might be a good start. Again, this
       is something that should serviceable with just organizational
       knowledge.
    
    
       \textbf{Implementation} Search for contributions by a given author
       can easily be formulated in {SPARQL}.
       \begin{lstlisting}
        PREFIX ulo: <https://mathhub.info/ulo#>
        PREFIX dcterms: <http://purl.org/dc/terms/>
    
        SELECT ?work
        WHERE {
            ?work dcterms:contributor "John Smith" .
        }
        GROUP BY ?work
       \end{lstlisting}
       To get all main contributions, we rate each
       individual \texttt{?work} by its number of \texttt{ulo:uses}
       references. Extending the {SPARQL} query, we can query the database
       for a ordered list of works, starting with the one that has the
       most references.
       \begin{lstlisting}
        PREFIX ulo: <https://mathhub.info/ulo#>
        PREFIX dcterms: <http://purl.org/dc/terms/>
    
        SELECT ?work (COUNT(?user) as ?refcount)
        WHERE {
            ?work dcterms:contributor "John Smith" .
            ?user ulo:uses ?work .
        }
        GROUP BY ?work
        ORDER BY DESC(?refcount)
       \end{lstlisting}
       We see that we can formulate the idea behind~$\mathcal{Q}_{5}$ with
       one not very complicated SPARQL query. Because here everything is
       handled by the database access should be quick.
    
    \end{itemize}