Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
schaertl_andreas
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
supervision
schaertl_andreas
Commits
12539924
Commit
12539924
authored
4 years ago
by
Andreas Schärtl
Browse files
Options
Downloads
Patches
Plain Diff
report: applications: rework section on queries Q_i to not use \itemize
Too many indents for my taste.
parent
098f2497
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/report/applications.tex
+166
-158
166 additions, 158 deletions
doc/report/applications.tex
with
166 additions
and
158 deletions
doc/report/applications.tex
+
166
−
158
View file @
12539924
...
...
@@ -18,7 +18,8 @@ endpoint.
structured (Section~
\ref
{
sec:tetraq
}
).
\end{itemize}
\noindent
Both applications will now be discussed in a dedicated section.
\noindent
These applications will now be discussed in dedicated
sections.
\subsection
{
Exploring Existing Data Sets
}
\label
{
sec:expl
}
...
...
@@ -80,87 +81,92 @@ can be realized with ULO data sets and where other data sources are
required. Where possible, we evaluate proof of concept
implementations.
\begin{itemize}
\item
\textbf
{$
\mathcal
{
Q
}_
1
$
``Find theorems with non-elementary
proofs.''
}
Elementary proofs are those that are considered easy
and obvious~
\cite
{
elempro
}
. In consequence,~
$
\mathcal
{
Q
}_
1
$
has
to search for all proofs which are not trivial. Of course, just
like nay distinction between ``theorem'' and ``corollary'' is
going to be somewhat arbitrary, so is any judgment about whether
a proof is easy or not.
Existing research on proof difficulty is either very broad or
specific to one problem. For example, some experiments showed that
students and prospective school teachers have problems with
notation, term rewriting and required
prerequisites~
\cite
{
proofund, proofteach
}
, none of which seems
applicable for grading individual proofs for difficulty. On the
other hand, there is research on rating proofs for individual
subsets of problems, e.g.
\
on the satisfiability of a given CNF
formula. A particular example is focused on heuristics for how
\subsubsection
{
Elementary Proofs and Computes Scores
}
Our first query~
$
\mathcal
{
Q
}_
1
$
illustrates how we can compute
arithmetic scores for some nodes in our knowledge
graph. Query~
$
\mathcal
{
Q
}_
1
$
asks us to ``[f]ind theorems with
non-elementary proofs.''. Elementary proofs are those that are
considered easy and obvious~
\cite
{
elempro
}
. In
consequence,~
$
\mathcal
{
Q
}_
1
$
has to search for all proofs which are
not trivial. Of course, just like nay distinction between ``theorem''
and ``corollary'' is going to be somewhat arbitrary, so is any
judgment about whether a proof is easy or not.
Existing research on proof difficulty is either very broad or specific
to one problem. For example, some experiments showed that students and
prospective school teachers have problems with notation, term
rewriting and required prerequisites~
\cite
{
proofund, proofteach
}
, none
of which seems applicable for grading individual proofs for
difficulty. On the other hand, there is research on rating proofs for
individual subsets of problems, e.g.
\
on the satisfiability of a given
CNF formula. A particular example is focused on heuristics for how
long a SAT solver will take to find a solution for a
given~
{
CNF
}
. Here, solutions that take long are considered
harder~
\cite
{
proofsat
}
.
\textbf
{
Organizational Aspect
}
A first working hypothesis
might be
to assume that elementary proofs are short. In that case, the
\noindent
\textbf
{
Organizational Aspect
}
A first working hypothesis
might be
to assume that elementary proofs are short. In that case, the
size, that is the number of bytes to store the proof, is our first
indicator of proof complexity. This is by no means perfect, as
even
identical proofs can be represented in different ways that
might have
vastly different size in bytes. It might be tempting to
imagine a
unique normal form for each proof, but finding such a
normal form
might very well be impossible. As it is very difficult
to find a
generalized definition of proof difficulty, we will
accept proof size
as a first working hypothesis.
{
ULO
}
offers the
\texttt
{
ulo:external-size
}
predicate which will
allow
us to sort by file size. Maybe proof complexity also leads
to quick
check times in proof assistants and automatic theorem
provers. With
this assumption in mind we could use the
\texttt
{
ulo:check-time
}
predicate. Correlating proof complexity
with file size allows us to define one indicator of proof
complexity based on organizational
knowledge alone.
\textbf
{
Other Aspects
}
A tetrapodal search system should
probably
also take symbolic knowledge into account. Based on some kind
of
measure of formula complexity, different proofs could be
rated. Similarly, with narrative knowledge available to us, we
could
count the number of words, references and so on to rate the
narrative
complexity of a proof. Combining symbolic knowledge,
narrative
knowledge and organizational knowledge allows us to
find proofs which
are probably straight forward.
indicator of proof complexity. This is by no means perfect, as
even
identical proofs can be represented in different ways that
might have
vastly different size in bytes. It might be tempting to
imagine a
unique normal form for each proof, but finding such a
normal form
might very well be impossible. As it is very difficult
to find a
generalized definition of proof difficulty, we will
accept proof size
as a first working hypothesis.
{
ULO
}
offers the
\texttt
{
ulo:external-size
}
predicate which will
allow
us to sort by file size. Maybe proof complexity also leads
to quick
check times in proof assistants and automatic theorem
provers. With
this assumption in mind we could use the
\texttt
{
ulo:check-time
}
predicate. Correlating proof complexity
with file size allows us to
define one indicator of proof complexity based on organizational
knowledge alone.
\noindent
\textbf
{
Other Aspects
}
A tetrapodal search system should
probably
also take symbolic knowledge into account. Based on some kind
of
measure of formula complexity, different proofs could be
rated. Similarly, with narrative knowledge available to us, we
could
count the number of words, references and so on to rate the
narrative
complexity of a proof. Combining symbolic knowledge,
narrative
knowledge and organizational knowledge allows us to
find proofs which
are probably straight forward.
\input
{
applications-q1.tex
}
\textbf
{
Implementation
}
Implementing a naive version of the
organizational aspect can be as simple as querying for all
theorems justified by proofs, ordered by size (or check time).
Figure~
\ref
{
fig:q1short
}
illustrates how this can be achieved with
a SPARQL query. Maybe we want to go one step further and
calculate a rating that assigns each proof some numeric score of
complexity based on a number of properties. We can achieve this in
SPARQL as recent versions support arithmetic as part of the SPARQL
specification; Figure~
\ref
{
fig:q1long
}
shows an example. Finding
a reasonable rating is its own topic of research, but we see that
as long as it is based on standard arithmetic, it will be possible
to formulate in a SPARQL query.
The queries in Figure~
\ref
{
fig:q1full
}
return a list of all
theorems and associated proofs, naturally this list is bound to be
very long. A suggested way to solve this problem is to introduce
some kind of cutoff value for our complexity score. Another potential
solution is to only list the first~
$
n
$
results, something a user
interface would do anyways. Either way, this is not so much an issue
for the organizational storage engine and more one that a tetrapodal
search aggregator has to account for.
\item
\textbf
{$
\mathcal
{
Q
}_
2
$
``Find algorithms that solve
$
NP
$
-complete graph problems.''
}
Here we want the tetrapodal search
system to return a listing of algorithms that solve (graph)
problems with a given property (runtime complexity). We need
to consider where each of these three components might be stored.
\textbf
{
Symbolic and Concrete Aspects
}
First, let us consider
\noindent\textbf
{
Implementation
}
Implementing a naive version of the
organizational aspect can be as simple as querying for all theorems
justified by proofs, ordered by size (or check time).
Figure~
\ref
{
fig:q1short
}
illustrates how this can be achieved with a
SPARQL query. Maybe we want to go one step further and calculate a
rating that assigns each proof some numeric score of complexity based
on a number of properties. We can achieve this in SPARQL as recent
versions support arithmetic as part of the SPARQL specification;
Figure~
\ref
{
fig:q1long
}
shows an example. Finding a reasonable rating
is its own topic of research, but we see that as long as it is based
on standard arithmetic, it will be possible to formulate in a SPARQL
query.
The queries in Figure~
\ref
{
fig:q1full
}
return a list of all theorems
and associated proofs, naturally this list is bound to be very long. A
suggested way to solve this problem is to introduce some kind of
cutoff value for our complexity score. Another potential solution is
to only list the first~
$
n
$
results, something a user interface would
do anyways. Either way, this is not so much an issue for the
organizational storage engine and more one that a tetrapodal search
aggregator has to account for.
\subsubsection
{
Categorizing Algorithms and Algorithmic Problems
}
The second query~
$
\mathcal
{
Q
}_
2
$
we decided to focus on wants to
``[f]ind algorithms that solve
$
NP
$
-complete graph problems.'' Here we
want the tetrapodal search system to return a listing of algorithms
that solve (graph) problems with a given property (runtime
complexity). We need to consider where each of these three components
might be stored.
\noindent\textbf
{
Symbolic and Concrete Aspects
}
First, let us consider
algorithms. Algorithms can be formulated as computer code which
can be understood as symbolic knowledge (code represented as a
syntax tree) or as concrete knowledge (code as text
...
...
@@ -171,7 +177,7 @@ implementations.
stored in a separate index for organizational knowledge, it being
the only fit.
\textbf
{
Organizational Aspect
}
If we wish to look up properties
\noindent
\textbf
{
Organizational Aspect
}
If we wish to look up properties
about algorithms from organizational knowledge, we first have to
think about how to represent this information. We propose two
approaches, one using the existing ULO ontology and one that
...
...
@@ -202,12 +208,15 @@ implementations.
dcowl
}
) and keeping concepts separate is not entirely
unattractive in itself.
\item
\textbf
{$
\mathcal
{
Q
}_
3
$
``All areas of math that
{
Nicolas G.
\
de Bruijn
}
has worked in and his main contributions.''
}
This query
is asking by works of a given author~
$
A
$
. It also asks for their
main contributions, e.g.
\
what paragraphs or code~
$
A
$
has authored.
\subsubsection
{
Contributors and Number of References
}
The final query~
$
\mathcal
{
Q
}_
3
$
from literature~
\cite
{
tetra
}
we wish
to look at wants to know ``[a]ll areas of math that
{
Nicolas G.
\
de
Bruijn
}
has worked in and his main contributions.'' This query is
asking by works of a given author~
$
A
$
. It also asks for their main
contributions, e.g.
\
what paragraphs or code~
$
A
$
has authored.
\textbf
{
Organizational Aspect
}
ULO has no concept of authors,
\noindent
\textbf
{
Organizational Aspect
}
ULO has no concept of authors,
contributors, dates and so on. Rather, the idea is to take
advantage of the Dublin Core project which provides an ontology
for such metadata~
\cite
{
dcreport, dcowl
}
. For example, Dublin Core
...
...
@@ -227,7 +236,7 @@ implementations.
important. Importance is a quality measure, simply sorting the
result by number of references might be a good start.
\textbf
{
Implementation
}
A search for contributions by a given author
\noindent
\textbf
{
Implementation
}
A search for contributions by a given author
can easily be formulated in
{
SPARQL
}
.
\begin{lstlisting}
PREFIX ulo: <https://mathhub.info/ulo#>
...
...
@@ -259,7 +268,6 @@ implementations.
We can formulate~
$
\mathcal
{
Q
}_
3
$
with just one SPARQL
query. Because everything is handled by the database, access
should be about as quick as we can hope it to be.
\end{itemize}
Experimenting with
$
\mathcal
{
Q
}_
1
$
to
$
\mathcal
{
Q
}_
3
$
provided us with
some insight into ULO and existing ULO exports.
$
\mathcal
{
Q
}_
1
$
shows
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment