diff --git a/doc/report/applications.tex b/doc/report/applications.tex index e4097b82b34da820d21b78c32b3e4fc7f20104e4..5288776710ca0b8bee100eefb1a7e7b3c5b60b73 100644 --- a/doc/report/applications.tex +++ b/doc/report/applications.tex @@ -9,71 +9,128 @@ and approaches to querying the database might make sense. \emph{ulo-storage} was started with the goal of making organizational knowledge available for tetrapodal search. We will first take a look -at how ULO/RDF performs at this task. Conviniently, various queries +at how ULO/RDF performs at this task. Conveniently, various queries for a tetrapodal search system were suggested in~\cite{tetra}; we will -investigate how well each of the suggested queries~$\mathcal{Q}_{1}$ -to~$\mathcal{Q}_{13}$ can be realized with ULO/RDF datasets. Where -possible, we evaluate proof of concept implementations. - -\subsubsection*{$\mathcal{Q}_{1}$ Find theorems with non-elementary proofs.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{2}$ Find algorithms that solve $NP$-complete graph problems.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{3}$ Find integer sequences whose generating function is a rational - polynomial in $\sin(x)$ that has a Maple implementation not affected - by the bug in module~$x$.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{4}$ $CAS$ implementation of Groebner bases that conform to a - definition in AFP.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{5}$ Find all group representations that are good for~$X$ (say a - software engineer working on something and doesn't know group - theory), maybe ``computing with in/finite groups''.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{6}$ Math software systems that implement algorithms from MSC48CXX - (or that compute a particular thing).} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{7}$ All areas of math that {Nicolas G.\ de Bruijn} has worked in and - his main contributions.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{8}$ All the researchers that have worked on problem~$X$ (where~$X$ - does not have a good name, maybe connected to ``Go'').} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{9}$ Areas of mathematics that immediate descendants of~$X$ worked - on.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{10}$ All graphs whose order is larger than the publication record of - its ``inventor'' (name patron).} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{11}$ Integer sequences that grow sub-exponentially.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{12}$ Published integer sequences not listed in the OEIS.} - -Here be dragons - -\subsubsection*{$\mathcal{Q}_{13}$ Find all polynomials whose list of coefficients occurs as a - subsequence of a specific OEIS sequence.} - -Here be dragons +investigate how some of them can be realized with ULO/RDF data sets +and where other data sources are required. Where possible, we evaluate +proof of concept implementations. + +\begin{itemize} + \item \textbf{$\mathcal{Q}_{1}$ ``Find theorems with non-elementary + proofs.''} Elementary proofs are those that are considered easy and + obvious. In consequence,~$\mathcal{Q}_{1}$ asks for all proofs + which are more difficult and not trivial. Of course, just like the + distinction between ``theorem'' and ``corollary'' is arbitrary, so + is any judgment about whether a proof is elementary or not. + + A first working hypothesis might be to assume that easy proofs are + short. In that case, the size, that is the number of bytes to + store the proof, is our first indicator of proof + complexity. ULO/RDF offers the \texttt{ulo:external-size} + predicate which will allow us to sort by file size. Maybe small + file size also leads to quick check times in proof assistants and + automatic theorem provers. With this assumption in mind we could + use the \texttt{ulo:check-time} predicate. Correlating proof + complexity with file size allows us to answer this query with + organizational data based on {ULO/RDF}. + + A tetrapodal search system should probably also take symbolic + knowledge into account. Based on some kind of measure of formula + complexity, different proofs could be rated. Similarly, with + narrative knowledge available to us, we could count the number of + words, references and so on to rate the narrative complexity of a + proof. This shows that combining symbolic knowledge, narrative + knowledge and organizational knowledge could allow us to + service~$\mathcal{Q}_{1}$ in a heuristic fashion. + + \item \textbf{$\mathcal{Q}_{2}$ ``Find algorithms that solve + $NP$-complete graph problems.''} Here we want the tetrapodal search + system to return a listing of algorithms that solve (graph) + problems with a given property (runtime complexity). We need + to consider where each of these three components might be stored. + + First, let us consider algorithms. Algorithms can be formulated as + computer code which should be accessible from storage for symbolic + knowledge. But this encodes just the algorithm itself, not what + the algorithm is for and nor is it possible to quickly evaluate + properties such as $NP$~completeness for querying. They need to be + stored in a separate index. + + It seems that what an algorithm implements and with which + properties is a case of organizational knowledge. With ULO, we + could search for an algorithm identified by a \texttt{ulo:name}, + but that leaves much to be desired. Maybe what is missing in + ULO/RDF is a way to define ``problems~$\phi$'', e.g.\ the + traveling salesman problem, and a way of expressing that a given + object ``computes~$\phi$''. With problem~$\phi$ we could associate + theorems such as ``$\phi$ is in $NP$''. + + At this point it appears that our ULO/RDF data sets can not help + us with answering query~$\mathcal{Q}_{2}$. It remains a topic + for discussion whether support for algorithms should be added + to the upper level ontology or whether such concepts should be + built on top of the existing predicates. + + \item \textbf{$\mathcal{Q}_{3}$ ``Find integer sequences whose generating + function is a rational polynomial in $\sin(x)$ that has a Maple + implementation not not affected by the bug in module~$x$.''} We + see that this query is about finding specific instances, integer + sequences, with some property. + + This is a case where information would be split between many + sources. The name of the sequences is part of the organizational + data set, the generating function is part of symbolic knowledge, + the Maple implementation could be part of concrete knowledge. + + Handling this query would probably start by filtering for all + integer sequences. It is not clear how this should be achieved with + ULO/RDF as it contains no unified concept of a sequence. It might + be possible to take advantage of \texttt{aligned-with} or some + similar concept to find all such sequences. If this succeeds, an + ULO index can provide the first step in servicing this query. + + As for the next steps, finding concrete algorithms and in + particular looking inside of them is not organizational data and + other indices will need to be queried. That said, it is an open + question whether ULO should contain more information (e.g.\ about + properties of the generating function) or whether such information + can be deduced from symbolic knowledge. + + \item \textbf{$\mathcal{Q}_{4}$ ``CAS implementation of Gröbner bases that + conform to a definition in AFP.''} Gröbner Bases are a field of + study in mathematics particular attractive for use in computer + algebra systems (CAS)~\cite{groebner}. This query is asking for + concrete implementations of Gröbner Bases that match the definition + in the Archive of Formal Proofs (AFP). + + We do have ULO/RDF exports for the AFP~\cite{uloisabelle}. Stated + like this, we can probably assume that $\mathcal{Q}_{4}$ is a query + for a very specific definition, identified by an ULO {URI}. No smart + queries necessary. What is missing is the set of implementations, + that is symbolic knowledge about actual implementations, and a way + of matching this symbolic knowledge with the definition in + {AFP}. While surely an interesting problem, it is not a task for + organizational knowledge. + + \item \textbf{$\mathcal{Q}_{5}$ ``All areas of math that {Nicolas G.\ + de Bruijn} has worked in and his main contributions.''} This query + is asking by works of a given author~$A$. It also ask for their + main contributions, e.g.\ what paragraphs or code~$A$ has authored. + + ULO has no concept of authors, contributors dates and so + on. Rather, the idea is to take advantage of the Dublin Core + project which provides an ontology for such + metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides + us with the \texttt{dcterms:creator} predicate. Servicing this + query would mean looking for the creator~$A$ and then listing all + associated \texttt{dcterms:title} that~$A$ has worked on. For a + first working version, the exports managed by \emph{ulo-storage} + are enough to service this query. + + As~$\mathcal{Q}_{5}$ is also asking for the main contributions + of~$A$, that is those works that~$A$ authored that are the most + important. Importance is a quality measure, simply sorting the + result by number of references might be a good start. Again, this + is something that should serviceable with just organizational + knowledge. +\end{itemize} diff --git a/doc/report/references.bib b/doc/report/references.bib index 7c96598326aa453347ae41f6a6924993cd14d330..74d92ba94d4db24aca50342ae6720416d7146dab 100644 --- a/doc/report/references.bib +++ b/doc/report/references.bib @@ -95,3 +95,24 @@ chapter={4}, publisher={MIT press} } + +@article{groebner, + title={Gr{\"o}bner bases algorithm}, + author={Ajwa, Iyad A and Liu, Zhuojun and Wang, Paul S}, + journal={Technical Reports of the Institute for Computational Mathematics, ICM-199502-00 (versi{\'o} del 2003) Kent State University}, + year={1995} +} + +@techreport{dcreport, + title={The Dublin core metadata element set}, + author={Kunze, John and Baker, Thomas}, + year={2007}, + institution={RFC 5013, August} +} + +@online{dcowl, + title={DCMI Metadata expressed in RDF Schema Language}, + organization = {Dublin Core Metadata Initiative}, + urldate = {2020-06-30}, + url = {https://www.dublincore.org/schemas/rdfs/}, +} \ No newline at end of file