report: write about Q1 to Q5

They are actually Q{1,2,3,7} from the original tetrapodal search paper. It seems they are those queries that relate to organizational data.

report: write about Q1 to Q5
25ed50df · Andreas Schärtl · 243f1296 · 25ed50df · 25ed50df
Commit 25ed50df authored 4 years ago by Andreas Schärtl
--- a/doc/report/applications.tex
+++ b/doc/report/applications.tex
@@ -9,71 +9,128 @@ and approaches to querying the database might make sense.

 \emph{ulo-storage} was started with the goal of making organizational
 knowledge available for tetrapodal search. We will first take a look
-at how ULO/RDF performs at this task. Conviniently, various queries
+at how ULO/RDF performs at this task. Conveniently, various queries
 for a tetrapodal search system were suggested in~\cite{tetra}; we will
-investigate how well each of the suggested queries~$\mathcal{Q}_{1}$
-to~$\mathcal{Q}_{13}$ can be realized with ULO/RDF datasets. Where
-possible, we evaluate proof of concept implementations.
-
-\subsubsection*{$\mathcal{Q}_{1}$ Find theorems with non-elementary proofs.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{2}$ Find algorithms that solve $NP$-complete graph problems.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{3}$ Find integer sequences whose generating function is a rational
-  polynomial in $\sin(x)$ that has a Maple implementation not affected
-  by the bug in module~$x$.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{4}$ $CAS$ implementation of Groebner bases that conform to a
-  definition in AFP.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{5}$ Find all group representations that are good for~$X$ (say a
-  software engineer working on something and doesn't know group
-  theory), maybe ``computing with in/finite groups''.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{6}$ Math software systems that implement algorithms from MSC48CXX
-  (or that compute a particular thing).}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{7}$ All areas of math that {Nicolas G.\ de Bruijn} has worked in and
-  his main contributions.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{8}$ All the researchers that have worked on problem~$X$ (where~$X$
-  does not have a good name, maybe connected to ``Go'').}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{9}$ Areas of mathematics that immediate descendants of~$X$ worked
-  on.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{10}$ All graphs whose order is larger than the publication record of
-  its ``inventor'' (name patron).}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{11}$ Integer sequences that grow sub-exponentially.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{12}$ Published integer sequences not listed in the OEIS.}
-
-Here be dragons
-
-\subsubsection*{$\mathcal{Q}_{13}$ Find all polynomials whose list of coefficients occurs as a
-  subsequence of a specific OEIS sequence.}
-
-Here be dragons
+investigate how some of them can be realized with ULO/RDF data sets
+and where other data sources are required. Where possible, we evaluate
+proof of concept implementations.
+
+\begin{itemize}
+    \item \textbf{$\mathcal{Q}_{1}$ ``Find theorems with non-elementary
+    proofs.''}  Elementary proofs are those that are considered easy and
+    obvious. In consequence,~$\mathcal{Q}_{1}$ asks for all proofs
+    which are more difficult and not trivial. Of course, just like the
+    distinction between ``theorem'' and ``corollary'' is arbitrary, so
+    is any judgment about whether a proof is elementary or not.
+
+    A first working hypothesis might be to assume that easy proofs are
+    short. In that case, the size, that is the number of bytes to
+    store the proof, is our first indicator of proof
+    complexity. ULO/RDF offers the \texttt{ulo:external-size}
+    predicate which will allow us to sort by file size. Maybe small
+    file size also leads to quick check times in proof assistants and
+    automatic theorem provers. With this assumption in mind we could
+    use the \texttt{ulo:check-time} predicate. Correlating proof
+    complexity with file size allows us to answer this query with
+    organizational data based on {ULO/RDF}.
+
+    A tetrapodal search system should probably also take symbolic
+    knowledge into account. Based on some kind of measure of formula
+    complexity, different proofs could be rated. Similarly, with
+    narrative knowledge available to us, we could count the number of
+    words, references and so on to rate the narrative complexity of a
+    proof. This shows that combining symbolic knowledge, narrative
+    knowledge and organizational knowledge could allow us to
+    service~$\mathcal{Q}_{1}$ in a heuristic fashion.
+
+    \item \textbf{$\mathcal{Q}_{2}$ ``Find algorithms that solve
+    $NP$-complete graph problems.''} Here we want the tetrapodal search
+    system to return a listing of algorithms that solve (graph)
+    problems with a given property (runtime complexity). We need
+    to consider where each of these three components might be stored.
+
+    First, let us consider algorithms. Algorithms can be formulated as
+    computer code which should be accessible from storage for symbolic
+    knowledge. But this encodes just the algorithm itself, not what
+    the algorithm is for and nor is it possible to quickly evaluate
+    properties such as $NP$~completeness for querying. They need to be
+    stored in a separate index.
+
+    It seems that what an algorithm implements and with which
+    properties is a case of organizational knowledge. With ULO, we
+    could search for an algorithm identified by a \texttt{ulo:name},
+    but that leaves much to be desired. Maybe what is missing in
+    ULO/RDF is a way to define ``problems~$\phi$'', e.g.\ the
+    traveling salesman problem, and a way of expressing that a given
+    object ``computes~$\phi$''. With problem~$\phi$ we could associate
+    theorems such as ``$\phi$ is in $NP$''.
+
+    At this point it appears that our ULO/RDF data sets can not help
+    us with answering query~$\mathcal{Q}_{2}$. It remains a topic
+    for discussion whether support for algorithms should be added
+    to the upper level ontology or whether such concepts should be
+    built on top of the existing predicates.
+
+    \item \textbf{$\mathcal{Q}_{3}$ ``Find integer sequences whose generating
+    function is a rational polynomial in $\sin(x)$ that has a Maple
+    implementation not not affected by the bug in module~$x$.''} We
+    see that this query is about finding specific instances, integer
+    sequences, with some property.
+
+   This is a case where information would be split between many
+   sources.  The name of the sequences is part of the organizational
+   data set, the generating function is part of symbolic knowledge,
+   the Maple implementation could be part of concrete knowledge.
+
+   Handling this query would probably start by filtering for all
+   integer sequences. It is not clear how this should be achieved with
+   ULO/RDF as it contains no unified concept of a sequence.  It might
+   be possible to take advantage of \texttt{aligned-with} or some
+   similar concept to find all such sequences. If this succeeds, an
+   ULO index can provide the first step in servicing this query.
+
+   As for the next steps, finding concrete algorithms and in
+   particular looking inside of them is not organizational data and
+   other indices will need to be queried. That said, it is an open
+   question whether ULO should contain more information (e.g.\ about
+   properties of the generating function) or whether such information
+   can be deduced from symbolic knowledge.
+
+   \item \textbf{$\mathcal{Q}_{4}$ ``CAS implementation of Gröbner bases that
+   conform to a definition in AFP.''} Gröbner Bases are a field of
+   study in mathematics particular attractive for use in computer
+   algebra systems (CAS)~\cite{groebner}. This query is asking for
+   concrete implementations of Gröbner Bases that match the definition
+   in the Archive of Formal Proofs (AFP).
+
+   We do have ULO/RDF exports for the AFP~\cite{uloisabelle}. Stated
+   like this, we can probably assume that $\mathcal{Q}_{4}$ is a query
+   for a very specific definition, identified by an ULO {URI}. No smart
+   queries necessary. What is missing is the set of implementations,
+   that is symbolic knowledge about actual implementations, and a way
+   of matching this symbolic knowledge with the definition in
+   {AFP}. While surely an interesting problem, it is not a task for
+   organizational knowledge.
+
+   \item \textbf{$\mathcal{Q}_{5}$ ``All areas of math that {Nicolas G.\
+   de Bruijn} has worked in and his main contributions.''}  This query
+   is asking by works of a given author~$A$.  It also ask for their
+   main contributions, e.g.\ what paragraphs or code~$A$ has authored.
+
+   ULO has no concept of authors, contributors dates and so
+   on. Rather, the idea is to take advantage of the Dublin Core
+   project which provides an ontology for such
+   metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides
+   us with the \texttt{dcterms:creator} predicate. Servicing this
+   query would mean looking for the creator~$A$ and then listing all
+   associated \texttt{dcterms:title} that~$A$ has worked on. For a
+   first working version, the exports managed by \emph{ulo-storage}
+   are enough to service this query.
+
+   As~$\mathcal{Q}_{5}$ is also asking for the main contributions
+   of~$A$, that is those works that~$A$ authored that are the most
+   important. Importance is a quality measure, simply sorting the
+   result by number of references might be a good start. Again, this
+   is something that should serviceable with just organizational
+   knowledge.
+\end{itemize}
--- a/doc/report/references.bib
+++ b/doc/report/references.bib
@@ -95,3 +95,24 @@
    chapter={4},
    publisher={MIT press}
 }
+
+@article{groebner,
+    title={Gr{\"o}bner bases algorithm},
+    author={Ajwa, Iyad A and Liu, Zhuojun and Wang, Paul S},
+    journal={Technical Reports of the Institute for Computational Mathematics, ICM-199502-00 (versi{\'o} del 2003) Kent State University},
+    year={1995}
+}
+
+@techreport{dcreport,
+    title={The Dublin core metadata element set},
+    author={Kunze, John and Baker, Thomas},
+    year={2007},
+    institution={RFC 5013, August}
+}
+
+@online{dcowl,
+    title={DCMI Metadata expressed in RDF Schema Language},
+    organization = {Dublin Core Metadata Initiative},
+    urldate = {2020-06-30},
+    url = {https://www.dublincore.org/schemas/rdfs/},
+}
\ No newline at end of file