Skip to content
Snippets Groups Projects
Commit 12539924 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: applications: rework section on queries Q_i to not use \itemize

Too many indents for my taste.
parent 098f2497
No related branches found
No related tags found
No related merge requests found
...@@ -18,7 +18,8 @@ endpoint. ...@@ -18,7 +18,8 @@ endpoint.
structured (Section~\ref{sec:tetraq}). structured (Section~\ref{sec:tetraq}).
\end{itemize} \end{itemize}
\noindent Both applications will now be discussed in a dedicated section. \noindent These applications will now be discussed in dedicated
sections.
\subsection{Exploring Existing Data Sets}\label{sec:expl} \subsection{Exploring Existing Data Sets}\label{sec:expl}
...@@ -80,87 +81,92 @@ can be realized with ULO data sets and where other data sources are ...@@ -80,87 +81,92 @@ can be realized with ULO data sets and where other data sources are
required. Where possible, we evaluate proof of concept required. Where possible, we evaluate proof of concept
implementations. implementations.
\begin{itemize} \subsubsection{Elementary Proofs and Computes Scores}
\item \textbf{$\mathcal{Q}_1$ ``Find theorems with non-elementary
proofs.''} Elementary proofs are those that are considered easy Our first query~$\mathcal{Q}_1$ illustrates how we can compute
and obvious~\cite{elempro}. In consequence,~$\mathcal{Q}_1$ has arithmetic scores for some nodes in our knowledge
to search for all proofs which are not trivial. Of course, just graph. Query~$\mathcal{Q}_1$ asks us to ``[f]ind theorems with
like nay distinction between ``theorem'' and ``corollary'' is non-elementary proofs.''. Elementary proofs are those that are
going to be somewhat arbitrary, so is any judgment about whether considered easy and obvious~\cite{elempro}. In
a proof is easy or not. consequence,~$\mathcal{Q}_1$ has to search for all proofs which are
not trivial. Of course, just like nay distinction between ``theorem''
Existing research on proof difficulty is either very broad or and ``corollary'' is going to be somewhat arbitrary, so is any
specific to one problem. For example, some experiments showed that judgment about whether a proof is easy or not.
students and prospective school teachers have problems with
notation, term rewriting and required Existing research on proof difficulty is either very broad or specific
prerequisites~\cite{proofund, proofteach}, none of which seems to one problem. For example, some experiments showed that students and
applicable for grading individual proofs for difficulty. On the prospective school teachers have problems with notation, term
other hand, there is research on rating proofs for individual rewriting and required prerequisites~\cite{proofund, proofteach}, none
subsets of problems, e.g.\ on the satisfiability of a given CNF of which seems applicable for grading individual proofs for
formula. A particular example is focused on heuristics for how difficulty. On the other hand, there is research on rating proofs for
individual subsets of problems, e.g.\ on the satisfiability of a given
CNF formula. A particular example is focused on heuristics for how
long a SAT solver will take to find a solution for a long a SAT solver will take to find a solution for a
given~{CNF}. Here, solutions that take long are considered given~{CNF}. Here, solutions that take long are considered
harder~\cite{proofsat}. harder~\cite{proofsat}.
\textbf{Organizational Aspect} A first working hypothesis might be \noindent\textbf{Organizational Aspect} A first working hypothesis
to assume that elementary proofs are short. In that case, the might be to assume that elementary proofs are short. In that case, the
size, that is the number of bytes to store the proof, is our first size, that is the number of bytes to store the proof, is our first
indicator of proof complexity. This is by no means perfect, as indicator of proof complexity. This is by no means perfect, as even
even identical proofs can be represented in different ways that identical proofs can be represented in different ways that might have
might have vastly different size in bytes. It might be tempting to vastly different size in bytes. It might be tempting to imagine a
imagine a unique normal form for each proof, but finding such a unique normal form for each proof, but finding such a normal form
normal form might very well be impossible. As it is very difficult might very well be impossible. As it is very difficult to find a
to find a generalized definition of proof difficulty, we will generalized definition of proof difficulty, we will accept proof size
accept proof size as a first working hypothesis. as a first working hypothesis.
{ULO} offers the \texttt{ulo:external-size} predicate which will {ULO} offers the \texttt{ulo:external-size} predicate which will allow
allow us to sort by file size. Maybe proof complexity also leads us to sort by file size. Maybe proof complexity also leads to quick
to quick check times in proof assistants and automatic theorem check times in proof assistants and automatic theorem provers. With
provers. With this assumption in mind we could use the this assumption in mind we could use the \texttt{ulo:check-time}
\texttt{ulo:check-time} predicate. Correlating proof complexity predicate. Correlating proof complexity with file size allows us to
with file size allows us to define one indicator of proof define one indicator of proof complexity based on organizational
complexity based on organizational knowledge alone. knowledge alone.
\textbf{Other Aspects} A tetrapodal search system should probably \noindent\textbf{Other Aspects} A tetrapodal search system should
also take symbolic knowledge into account. Based on some kind of probably also take symbolic knowledge into account. Based on some kind
measure of formula complexity, different proofs could be of measure of formula complexity, different proofs could be
rated. Similarly, with narrative knowledge available to us, we rated. Similarly, with narrative knowledge available to us, we could
could count the number of words, references and so on to rate the count the number of words, references and so on to rate the narrative
narrative complexity of a proof. Combining symbolic knowledge, complexity of a proof. Combining symbolic knowledge, narrative
narrative knowledge and organizational knowledge allows us to knowledge and organizational knowledge allows us to find proofs which
find proofs which are probably straight forward. are probably straight forward.
\input{applications-q1.tex} \input{applications-q1.tex}
\textbf{Implementation} Implementing a naive version of the \noindent\textbf{Implementation} Implementing a naive version of the
organizational aspect can be as simple as querying for all organizational aspect can be as simple as querying for all theorems
theorems justified by proofs, ordered by size (or check time). justified by proofs, ordered by size (or check time).
Figure~\ref{fig:q1short} illustrates how this can be achieved with Figure~\ref{fig:q1short} illustrates how this can be achieved with a
a SPARQL query. Maybe we want to go one step further and SPARQL query. Maybe we want to go one step further and calculate a
calculate a rating that assigns each proof some numeric score of rating that assigns each proof some numeric score of complexity based
complexity based on a number of properties. We can achieve this in on a number of properties. We can achieve this in SPARQL as recent
SPARQL as recent versions support arithmetic as part of the SPARQL versions support arithmetic as part of the SPARQL specification;
specification; Figure~\ref{fig:q1long} shows an example. Finding Figure~\ref{fig:q1long} shows an example. Finding a reasonable rating
a reasonable rating is its own topic of research, but we see that is its own topic of research, but we see that as long as it is based
as long as it is based on standard arithmetic, it will be possible on standard arithmetic, it will be possible to formulate in a SPARQL
to formulate in a SPARQL query. query.
The queries in Figure~\ref{fig:q1full} return a list of all The queries in Figure~\ref{fig:q1full} return a list of all theorems
theorems and associated proofs, naturally this list is bound to be and associated proofs, naturally this list is bound to be very long. A
very long. A suggested way to solve this problem is to introduce suggested way to solve this problem is to introduce some kind of
some kind of cutoff value for our complexity score. Another potential cutoff value for our complexity score. Another potential solution is
solution is to only list the first~$n$ results, something a user to only list the first~$n$ results, something a user interface would
interface would do anyways. Either way, this is not so much an issue do anyways. Either way, this is not so much an issue for the
for the organizational storage engine and more one that a tetrapodal organizational storage engine and more one that a tetrapodal search
search aggregator has to account for. aggregator has to account for.
\item \textbf{$\mathcal{Q}_2$ ``Find algorithms that solve \subsubsection{Categorizing Algorithms and Algorithmic Problems}
$NP$-complete graph problems.''} Here we want the tetrapodal search
system to return a listing of algorithms that solve (graph) The second query~$\mathcal{Q}_2$ we decided to focus on wants to
problems with a given property (runtime complexity). We need ``[f]ind algorithms that solve $NP$-complete graph problems.'' Here we
to consider where each of these three components might be stored. want the tetrapodal search system to return a listing of algorithms
that solve (graph) problems with a given property (runtime
\textbf{Symbolic and Concrete Aspects} First, let us consider complexity). We need to consider where each of these three components
might be stored.
\noindent\textbf{Symbolic and Concrete Aspects} First, let us consider
algorithms. Algorithms can be formulated as computer code which algorithms. Algorithms can be formulated as computer code which
can be understood as symbolic knowledge (code represented as a can be understood as symbolic knowledge (code represented as a
syntax tree) or as concrete knowledge (code as text syntax tree) or as concrete knowledge (code as text
...@@ -171,7 +177,7 @@ implementations. ...@@ -171,7 +177,7 @@ implementations.
stored in a separate index for organizational knowledge, it being stored in a separate index for organizational knowledge, it being
the only fit. the only fit.
\textbf{Organizational Aspect} If we wish to look up properties \noindent\textbf{Organizational Aspect} If we wish to look up properties
about algorithms from organizational knowledge, we first have to about algorithms from organizational knowledge, we first have to
think about how to represent this information. We propose two think about how to represent this information. We propose two
approaches, one using the existing ULO ontology and one that approaches, one using the existing ULO ontology and one that
...@@ -202,12 +208,15 @@ implementations. ...@@ -202,12 +208,15 @@ implementations.
dcowl}) and keeping concepts separate is not entirely dcowl}) and keeping concepts separate is not entirely
unattractive in itself. unattractive in itself.
\item \textbf{$\mathcal{Q}_3$ ``All areas of math that {Nicolas G.\ \subsubsection{Contributors and Number of References}
de Bruijn} has worked in and his main contributions.''} This query
is asking by works of a given author~$A$. It also asks for their The final query~$\mathcal{Q}_3$ from literature~\cite{tetra} we wish
main contributions, e.g.\ what paragraphs or code~$A$ has authored. to look at wants to know ``[a]ll areas of math that {Nicolas G.\ de
Bruijn} has worked in and his main contributions.'' This query is
asking by works of a given author~$A$. It also asks for their main
contributions, e.g.\ what paragraphs or code~$A$ has authored.
\textbf{Organizational Aspect} ULO has no concept of authors, \noindent\textbf{Organizational Aspect} ULO has no concept of authors,
contributors, dates and so on. Rather, the idea is to take contributors, dates and so on. Rather, the idea is to take
advantage of the Dublin Core project which provides an ontology advantage of the Dublin Core project which provides an ontology
for such metadata~\cite{dcreport, dcowl}. For example, Dublin Core for such metadata~\cite{dcreport, dcowl}. For example, Dublin Core
...@@ -227,7 +236,7 @@ implementations. ...@@ -227,7 +236,7 @@ implementations.
important. Importance is a quality measure, simply sorting the important. Importance is a quality measure, simply sorting the
result by number of references might be a good start. result by number of references might be a good start.
\textbf{Implementation} A search for contributions by a given author \noindent\textbf{Implementation} A search for contributions by a given author
can easily be formulated in {SPARQL}. can easily be formulated in {SPARQL}.
\begin{lstlisting} \begin{lstlisting}
PREFIX ulo: <https://mathhub.info/ulo#> PREFIX ulo: <https://mathhub.info/ulo#>
...@@ -259,7 +268,6 @@ implementations. ...@@ -259,7 +268,6 @@ implementations.
We can formulate~$\mathcal{Q}_3$ with just one SPARQL We can formulate~$\mathcal{Q}_3$ with just one SPARQL
query. Because everything is handled by the database, access query. Because everything is handled by the database, access
should be about as quick as we can hope it to be. should be about as quick as we can hope it to be.
\end{itemize}
Experimenting with $\mathcal{Q}_1$ to $\mathcal{Q}_3$ provided us with Experimenting with $\mathcal{Q}_1$ to $\mathcal{Q}_3$ provided us with
some insight into ULO and existing ULO exports. $\mathcal{Q}_1$ shows some insight into ULO and existing ULO exports. $\mathcal{Q}_1$ shows
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment