Skip to content
Snippets Groups Projects
Commit 25ed50df authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: write about Q1 to Q5

They are actually Q{1,2,3,7} from the original tetrapodal search paper.
It seems they are those queries that relate to organizational data.
parent 243f1296
No related branches found
No related tags found
No related merge requests found
...@@ -9,71 +9,128 @@ and approaches to querying the database might make sense. ...@@ -9,71 +9,128 @@ and approaches to querying the database might make sense.
\emph{ulo-storage} was started with the goal of making organizational \emph{ulo-storage} was started with the goal of making organizational
knowledge available for tetrapodal search. We will first take a look knowledge available for tetrapodal search. We will first take a look
at how ULO/RDF performs at this task. Conviniently, various queries at how ULO/RDF performs at this task. Conveniently, various queries
for a tetrapodal search system were suggested in~\cite{tetra}; we will for a tetrapodal search system were suggested in~\cite{tetra}; we will
investigate how well each of the suggested queries~$\mathcal{Q}_{1}$ investigate how some of them can be realized with ULO/RDF data sets
to~$\mathcal{Q}_{13}$ can be realized with ULO/RDF datasets. Where and where other data sources are required. Where possible, we evaluate
possible, we evaluate proof of concept implementations. proof of concept implementations.
\subsubsection*{$\mathcal{Q}_{1}$ Find theorems with non-elementary proofs.} \begin{itemize}
\item \textbf{$\mathcal{Q}_{1}$ ``Find theorems with non-elementary
Here be dragons proofs.''} Elementary proofs are those that are considered easy and
obvious. In consequence,~$\mathcal{Q}_{1}$ asks for all proofs
\subsubsection*{$\mathcal{Q}_{2}$ Find algorithms that solve $NP$-complete graph problems.} which are more difficult and not trivial. Of course, just like the
distinction between ``theorem'' and ``corollary'' is arbitrary, so
Here be dragons is any judgment about whether a proof is elementary or not.
\subsubsection*{$\mathcal{Q}_{3}$ Find integer sequences whose generating function is a rational A first working hypothesis might be to assume that easy proofs are
polynomial in $\sin(x)$ that has a Maple implementation not affected short. In that case, the size, that is the number of bytes to
by the bug in module~$x$.} store the proof, is our first indicator of proof
complexity. ULO/RDF offers the \texttt{ulo:external-size}
Here be dragons predicate which will allow us to sort by file size. Maybe small
file size also leads to quick check times in proof assistants and
\subsubsection*{$\mathcal{Q}_{4}$ $CAS$ implementation of Groebner bases that conform to a automatic theorem provers. With this assumption in mind we could
definition in AFP.} use the \texttt{ulo:check-time} predicate. Correlating proof
complexity with file size allows us to answer this query with
Here be dragons organizational data based on {ULO/RDF}.
\subsubsection*{$\mathcal{Q}_{5}$ Find all group representations that are good for~$X$ (say a A tetrapodal search system should probably also take symbolic
software engineer working on something and doesn't know group knowledge into account. Based on some kind of measure of formula
theory), maybe ``computing with in/finite groups''.} complexity, different proofs could be rated. Similarly, with
narrative knowledge available to us, we could count the number of
Here be dragons words, references and so on to rate the narrative complexity of a
proof. This shows that combining symbolic knowledge, narrative
\subsubsection*{$\mathcal{Q}_{6}$ Math software systems that implement algorithms from MSC48CXX knowledge and organizational knowledge could allow us to
(or that compute a particular thing).} service~$\mathcal{Q}_{1}$ in a heuristic fashion.
Here be dragons \item \textbf{$\mathcal{Q}_{2}$ ``Find algorithms that solve
$NP$-complete graph problems.''} Here we want the tetrapodal search
\subsubsection*{$\mathcal{Q}_{7}$ All areas of math that {Nicolas G.\ de Bruijn} has worked in and system to return a listing of algorithms that solve (graph)
his main contributions.} problems with a given property (runtime complexity). We need
to consider where each of these three components might be stored.
Here be dragons
First, let us consider algorithms. Algorithms can be formulated as
\subsubsection*{$\mathcal{Q}_{8}$ All the researchers that have worked on problem~$X$ (where~$X$ computer code which should be accessible from storage for symbolic
does not have a good name, maybe connected to ``Go'').} knowledge. But this encodes just the algorithm itself, not what
the algorithm is for and nor is it possible to quickly evaluate
Here be dragons properties such as $NP$~completeness for querying. They need to be
stored in a separate index.
\subsubsection*{$\mathcal{Q}_{9}$ Areas of mathematics that immediate descendants of~$X$ worked
on.} It seems that what an algorithm implements and with which
properties is a case of organizational knowledge. With ULO, we
Here be dragons could search for an algorithm identified by a \texttt{ulo:name},
but that leaves much to be desired. Maybe what is missing in
\subsubsection*{$\mathcal{Q}_{10}$ All graphs whose order is larger than the publication record of ULO/RDF is a way to define ``problems~$\phi$'', e.g.\ the
its ``inventor'' (name patron).} traveling salesman problem, and a way of expressing that a given
object ``computes~$\phi$''. With problem~$\phi$ we could associate
Here be dragons theorems such as ``$\phi$ is in $NP$''.
\subsubsection*{$\mathcal{Q}_{11}$ Integer sequences that grow sub-exponentially.} At this point it appears that our ULO/RDF data sets can not help
us with answering query~$\mathcal{Q}_{2}$. It remains a topic
Here be dragons for discussion whether support for algorithms should be added
to the upper level ontology or whether such concepts should be
\subsubsection*{$\mathcal{Q}_{12}$ Published integer sequences not listed in the OEIS.} built on top of the existing predicates.
Here be dragons \item \textbf{$\mathcal{Q}_{3}$ ``Find integer sequences whose generating
function is a rational polynomial in $\sin(x)$ that has a Maple
\subsubsection*{$\mathcal{Q}_{13}$ Find all polynomials whose list of coefficients occurs as a implementation not not affected by the bug in module~$x$.''} We
subsequence of a specific OEIS sequence.} see that this query is about finding specific instances, integer
sequences, with some property.
Here be dragons
This is a case where information would be split between many
sources. The name of the sequences is part of the organizational
data set, the generating function is part of symbolic knowledge,
the Maple implementation could be part of concrete knowledge.
Handling this query would probably start by filtering for all
integer sequences. It is not clear how this should be achieved with
ULO/RDF as it contains no unified concept of a sequence. It might
be possible to take advantage of \texttt{aligned-with} or some
similar concept to find all such sequences. If this succeeds, an
ULO index can provide the first step in servicing this query.
As for the next steps, finding concrete algorithms and in
particular looking inside of them is not organizational data and
other indices will need to be queried. That said, it is an open
question whether ULO should contain more information (e.g.\ about
properties of the generating function) or whether such information
can be deduced from symbolic knowledge.
\item \textbf{$\mathcal{Q}_{4}$ ``CAS implementation of Gröbner bases that
conform to a definition in AFP.''} Gröbner Bases are a field of
study in mathematics particular attractive for use in computer
algebra systems (CAS)~\cite{groebner}. This query is asking for
concrete implementations of Gröbner Bases that match the definition
in the Archive of Formal Proofs (AFP).
We do have ULO/RDF exports for the AFP~\cite{uloisabelle}. Stated
like this, we can probably assume that $\mathcal{Q}_{4}$ is a query
for a very specific definition, identified by an ULO {URI}. No smart
queries necessary. What is missing is the set of implementations,
that is symbolic knowledge about actual implementations, and a way
of matching this symbolic knowledge with the definition in
{AFP}. While surely an interesting problem, it is not a task for
organizational knowledge.
\item \textbf{$\mathcal{Q}_{5}$ ``All areas of math that {Nicolas G.\
de Bruijn} has worked in and his main contributions.''} This query
is asking by works of a given author~$A$. It also ask for their
main contributions, e.g.\ what paragraphs or code~$A$ has authored.
ULO has no concept of authors, contributors dates and so
on. Rather, the idea is to take advantage of the Dublin Core
project which provides an ontology for such
metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides
us with the \texttt{dcterms:creator} predicate. Servicing this
query would mean looking for the creator~$A$ and then listing all
associated \texttt{dcterms:title} that~$A$ has worked on. For a
first working version, the exports managed by \emph{ulo-storage}
are enough to service this query.
As~$\mathcal{Q}_{5}$ is also asking for the main contributions
of~$A$, that is those works that~$A$ authored that are the most
important. Importance is a quality measure, simply sorting the
result by number of references might be a good start. Again, this
is something that should serviceable with just organizational
knowledge.
\end{itemize}
...@@ -95,3 +95,24 @@ ...@@ -95,3 +95,24 @@
chapter={4}, chapter={4},
publisher={MIT press} publisher={MIT press}
} }
@article{groebner,
title={Gr{\"o}bner bases algorithm},
author={Ajwa, Iyad A and Liu, Zhuojun and Wang, Paul S},
journal={Technical Reports of the Institute for Computational Mathematics, ICM-199502-00 (versi{\'o} del 2003) Kent State University},
year={1995}
}
@techreport{dcreport,
title={The Dublin core metadata element set},
author={Kunze, John and Baker, Thomas},
year={2007},
institution={RFC 5013, August}
}
@online{dcowl,
title={DCMI Metadata expressed in RDF Schema Language},
organization = {Dublin Core Metadata Initiative},
urldate = {2020-06-30},
url = {https://www.dublincore.org/schemas/rdfs/},
}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment