Skip to content
Snippets Groups Projects
Commit 06cebdff authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: introduce ULO section

It's not very good. But it is an requirement that was put to me last
minute. So. That's it.
parent 12028569
Branches
No related tags found
No related merge requests found
......@@ -237,7 +237,7 @@ about~$\mathcal{Q}_1$ we found out that the infrastructure provided by
\emph{ulo-storage} allows us to compute arbitrary arithmetic scores,
something that will surely be useful for many applications.
\subsubsection{Categorizing Algorithms and Algorithmic Problems}
\subsubsection{Categorizing Algorithms and Algorithmic Problems}\label{sec:alg}
The second query~$\mathcal{Q}_2$ we decided to focus on wants to
``\emph{[f]ind algorithms that solve $NP$-complete graph
......
......@@ -10,46 +10,6 @@ In particular, it is possible to formulate queries about meta
information such as authorship and contribution. We also manged to
resolve the interlinks between proofs and theorems.
To sum up this report, we want to recap three problems we encountered
when working on \emph{ulo-storage}. They have the potential to result
in future projects that further improve the availability of
organizational knowledge.
The first problem we encountered while working on \emph{ulo-storage}
was that of malformed RDF~exports. Various exports contained invalid
URIs and incorrect namespaces. As a work around, we provided on the
fly correction. But this does not solve the problem in itself. Perhaps
a long term solution for this problem is to fully automate the export
from third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a
database, eliminating the step of XML files on disk. During import
into the database, the imported data is thoroughly checked and
mistakes are reported right away. Bugs in exporters that produce
faulty XML would be found earlier in development.
The second problem we want to recap is that of versioning triplets in
the GraphDB triple store. While adding new triplets to an existing
GraphDB store is not a problem, updating existing triplets is
difficult. \emph{ulo-storage} circumvents this problem by re-creating
the GraphDB data set in regular intervals. Indeed it might be
difficult to find an efficient alternative. Tagging each triplet with
some version number doubles the number of triplets that need to be
stored. Re-creating the index and maybe splitting up the knowledge
base into smaller easier to update sub-repositories looks like the
most promising approach for now. What is missing is an efficient way
of servicing queries across multiple database instances.
Finally, perhaps the most interesting but also most challenging
question is the discussion about how to develop ULO, the universal
ontology for organizational knowledge. On one hand, ULO boasts a
total of almost 80~predicates, yet only a third of them are actually
used by existing exports. A developer writing queries that take
advantage of the full ULO~vocabulary might be surprised that not data
is coming back. On the other hand, we found it difficult to model
algorithms in terms of ULO predicates and argued that it might be
necessary to further extend the upper level ontology. It will be
necessary to look at the properties of future exports to make a more
well-funded decision.
Despite many remaining open questions, \emph{ulo-storage} provides the
necessary infrastructure for importing ULO triplets into an efficient
storage engine. A necessary building block for a larger tetrapodal
......
......@@ -500,3 +500,25 @@
year={2004},
school={University of Waterloo}
}
@book{dragon,
title={Compilers: Principles, Techniques, And Tools: Second Edition},
author={Aho, Alfred V and Sethi, Ravi and Ullman, Jeffrey D},
journal={Addison Wesley},
year={2007}
}
@online{rval,
title = {Validation Service},
organization = {W3C},
date = {2006},
urldate = {2020-11-10},
url = {https://www.w3.org/RDF/Validator/},
}
@online{rcmd,
title = {Fedict/rdfvalidator},
organization = {Federal Public Service Policy and Support DG Digital Transformation},
urldate = {2020-11-10},
url = {https://github.com/Fedict/rdfvalidator},
}
\ No newline at end of file
......@@ -61,6 +61,7 @@
\input{introduction.tex}
\input{implementation.tex}
\input{applications.tex}
\input{towards.tex}
\input{conclusion.tex}
\newpage
......
\FloatBarrier{}
\section{Towards Manageable Ontologies}
Before finishing up this report with a general conclusion, we first
want to dedicate a section on our thoughts on the upper level
ontology. Primarily they offer the potential for interesting future
work. We hope that these insights can aid in the development of
improved versions of~{ULO}.
\subsection{Automatic Processing}
ULO and tetrapodal search are most certainly in their infancy. As
such it is easy to discard concerns about scalability as premature.
However, time intensive problems with malformed RDF~files force us to
emphasize the need for more rigor when exporting to~ULO and other
ontologies.
At the very least, syntax errors and invalid predicates need to be
avoided. Even mid sized systems require automation and automation is
not possible when each export needs to be fixed in its one particular
way. To enforce this, we recommend either (1)~automating the export
process from formal library intro GraphDB store entirely or at the
very least (2)~the use of automated tests with online or offline
validators~\cite{rval, rcmd}.
\subsection{The Challenge of Universality}
Remember that ULO aims to be a universal format for formulating
organizational mathematical knowledge. We took this for granted, but
maybe it is time to reflect on what an outstandingly difficult task
this actually is. With ULO, we are aiming for nothing less than an
universal schema on top of all collected (organizational) mathematical
knowledge. A grand task.
The current version of ULO already yields worthwhile results when
formal libraries are exported to ULO~triplets. Especially when it
comes to meta data, querying such data sets proved to be easy. But an
ontology such as~ULO can only be a real game changer when it is truly
universal, that is, when it is easy to formulate any kind of
organizational knowledge in form of an ULO data set.
As such it should not be terribly surprising that ULO forgoes the
requirement of being absolutely correct. For example, what
a \texttt{ulo:theorem} actually represent can differ depending on
where the mathematical knowledge was originally extracted from. While
at first this might feel a bit unsatisfying, it is important to
realize that the strength of ULO~data sets is search and
discovery. Particularities about meaning will eventually need to be
resolved by more concrete and specific systems.
While that is not the hardest pill to swallow, it would be preferable
to maintain organizational knowledge in a format that is both (1)~as
correct as possible and (2)~easy to generalize and search. Future
development of the upper level ontology first needs to be very clear
on where it wants to be on this spectrum between accuracy and
generalizability. We believe that ULO is best positioned as an
ontology that is more general, at the cost of accuracy.
\subsection{A Layered Knowledge Architecture}
\begin{figure}[]\begin{center}
\includegraphics[width=0.5\textwidth]{figs/layers.pdf}
\caption{The idea behind layered ontologies is that instead of
translating directly from formal library to ULO~triplets, we
would translate into intermediate ontologies specific to a given
domain. These individual ontologies could then be compiled to~ULO
in an additional step.}\label{fig:love}
\end{center}\end{figure}
While it is true that ULO as one concrete ontology will need to
converge on a specific point on that spectrum, namely one of
generalizability in favor of accuracy, this does not mean that we need
to give up on accuracy as a whole. We believe that we can have
both. We can have our cake and eat it it too.
Current exports investigated in this report take the approach of
taking some library of formal knowledge and then converting that
library directly into ULO triplets. Perhaps a better approach would be
to use a \emph{layered architecture} instead. In this layered
architecture, we would first convert a given third party library into
triplets defined by an intermediate ontology. These triplets could
then be compiled to ULO~triplets for search. It is an approach not
unlike intermediate byte codes used in compiler
construction~\cite[pp.~357]{dragon}. While lower layers preserve more
detail, higher levels are more general and easier to search.
A valid criticism of this proposed approach for future experiments is
that we could understand the base library as an ontology of its own.
In practice, the only difference is the file format. While formal
libraries are formulated in some domain specific formal language, when
we talk about ontologies, our canonical understanding is that of
OWL~ontologies, that is RDF~predicates with which knowledge is
formulated. A first retort to this must be that RDF is easier to index
using triple store databases such as GraphDB and that it will probably
e easier to architecture a system based around a unified format~(RDF)
rather than a zoo of formats and languages. But a final judgment
requires further investigation.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment