report: introduce ULO section

It's not very good. But it is an requirement that was put to me last minute. So. That's it.

report: introduce ULO section
06cebdff · Andreas Schärtl · 12028569 · 06cebdff · 06cebdff · 06cebdff
Commit 06cebdff authored 4 years ago by Andreas Schärtl
--- a/doc/report/applications.tex
+++ b/doc/report/applications.tex
@@ -237,7 +237,7 @@ about~$\mathcal{Q}_1$ we found out that the infrastructure provided by
 \emph{ulo-storage} allows us to compute arbitrary arithmetic scores,
 something that will surely be useful for many applications.

-\subsubsection{Categorizing Algorithms and Algorithmic Problems}
+\subsubsection{Categorizing Algorithms and Algorithmic Problems}\label{sec:alg}

 The second query~$\mathcal{Q}_2$ we decided to focus on wants to
 ``\emph{[f]ind algorithms that solve $NP$-complete graph

--- a/doc/report/conclusion.tex
+++ b/doc/report/conclusion.tex
@@ -10,46 +10,6 @@ In particular, it is possible to formulate queries about meta
 information such as authorship and contribution.  We also manged to
 resolve the interlinks between proofs and theorems.

-To sum up this report, we want to recap three problems we encountered
-when working on \emph{ulo-storage}. They have the potential to result
-in future projects that further improve the availability of
-organizational knowledge.
-
-The first problem we encountered while working on \emph{ulo-storage}
-was that of malformed RDF~exports. Various exports contained invalid
-URIs and incorrect namespaces. As a work around, we provided on the
-fly correction. But this does not solve the problem in itself. Perhaps
-a long term solution for this problem is to fully automate the export
-from third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a
-database, eliminating the step of XML files on disk. During import
-into the database, the imported data is thoroughly checked and
-mistakes are reported right away.  Bugs in exporters that produce
-faulty XML would be found earlier in development.
-
-The second problem we want to recap is that of versioning triplets in
-the GraphDB triple store.  While adding new triplets to an existing
-GraphDB store is not a problem, updating existing triplets is
-difficult. \emph{ulo-storage} circumvents this problem by re-creating
-the GraphDB data set in regular intervals.  Indeed it might be
-difficult to find an efficient alternative. Tagging each triplet with
-some version number doubles the number of triplets that need to be
-stored. Re-creating the index and maybe splitting up the knowledge
-base into smaller easier to update sub-repositories looks like the
-most promising approach for now. What is missing is an efficient way
-of servicing queries across multiple database instances.
-
-Finally, perhaps the most interesting but also most challenging
-question is the discussion about how to develop ULO, the universal
-ontology for organizational knowledge.  On one hand, ULO boasts a
-total of almost 80~predicates, yet only a third of them are actually
-used by existing exports. A developer writing queries that take
-advantage of the full ULO~vocabulary might be surprised that not data
-is coming back.  On the other hand, we found it difficult to model
-algorithms in terms of ULO predicates and argued that it might be
-necessary to further extend the upper level ontology.  It will be
-necessary to look at the properties of future exports to make a more
-well-funded decision.
-
 Despite many remaining open questions, \emph{ulo-storage} provides the
 necessary infrastructure for importing ULO triplets into an efficient
 storage engine. A necessary building block for a larger tetrapodal

--- a/doc/report/references.bib
+++ b/doc/report/references.bib
@@ -500,3 +500,25 @@
  year={2004},
  school={University of Waterloo}
 }
+
+@book{dragon,
+  title={Compilers: Principles, Techniques, And Tools: Second Edition},
+  author={Aho, Alfred V and Sethi, Ravi and Ullman, Jeffrey D},
+  journal={Addison Wesley},
+  year={2007}
+}
+
+@online{rval,
+    title = {Validation Service},
+    organization = {W3C},
+    date = {2006},
+    urldate = {2020-11-10},
+    url = {https://www.w3.org/RDF/Validator/},
+}
+
+@online{rcmd,
+    title = {Fedict/rdfvalidator},
+    organization = {Federal Public Service Policy and Support DG Digital Transformation},
+    urldate = {2020-11-10},
+    url = {https://github.com/Fedict/rdfvalidator},
+}
\ No newline at end of file
--- a/doc/report/report.tex
+++ b/doc/report/report.tex
@@ -61,6 +61,7 @@
 \input{introduction.tex}
 \input{implementation.tex}
 \input{applications.tex}
+\input{towards.tex}
 \input{conclusion.tex}

 \newpage

--- a/doc/report/towards.tex
+++ b/doc/report/towards.tex
+\FloatBarrier{}
+\section{Towards Manageable Ontologies}
+
+Before finishing up this report with a general conclusion, we first
+want to dedicate a section on our thoughts on the upper level
+ontology. Primarily they offer the potential for interesting future
+work.  We hope that these insights can aid in the development of
+improved versions of~{ULO}.
+
+\subsection{Automatic Processing}
+
+ULO and tetrapodal search are most certainly in their infancy.  As
+such it is easy to discard concerns about scalability as premature.
+However, time intensive problems with malformed RDF~files force us to
+emphasize the need for more rigor when exporting to~ULO and other
+ontologies.
+
+At the very least, syntax errors and invalid predicates need to be
+avoided. Even mid sized systems require automation and automation is
+not possible when each export needs to be fixed in its one particular
+way. To enforce this, we recommend either (1)~automating the export
+process from formal library intro GraphDB store entirely or at the
+very least (2)~the use of automated tests with online or offline
+validators~\cite{rval, rcmd}.
+
+\subsection{The Challenge of Universality}
+
+Remember that ULO aims to be a universal format for formulating
+organizational mathematical knowledge. We took this for granted, but
+maybe it is time to reflect on what an outstandingly difficult task
+this actually is. With ULO, we are aiming for nothing less than an
+universal schema on top of all collected (organizational) mathematical
+knowledge. A grand task.
+
+The current version of ULO already yields worthwhile results when
+formal libraries are exported to ULO~triplets. Especially when it
+comes to meta data, querying such data sets proved to be easy. But an
+ontology such as~ULO can only be a real game changer when it is truly
+universal, that is, when it is easy to formulate any kind of
+organizational knowledge in form of an ULO data set.
+
+As such it should not be terribly surprising that ULO forgoes the
+requirement of being absolutely correct. For example, what
+a \texttt{ulo:theorem} actually represent can differ depending on
+where the mathematical knowledge was originally extracted from. While
+at first this might feel a bit unsatisfying, it is important to
+realize that the strength of ULO~data sets is search and
+discovery. Particularities about meaning will eventually need to be
+resolved by more concrete and specific systems.
+
+While that is not the hardest pill to swallow, it would be preferable
+to maintain organizational knowledge in a format that is both (1)~as
+correct as possible and (2)~easy to generalize and search. Future
+development of the upper level ontology first needs to be very clear
+on where it wants to be on this spectrum between accuracy and
+generalizability.  We believe that ULO is best positioned as an
+ontology that is more general, at the cost of accuracy.
+
+\subsection{A Layered Knowledge Architecture}
+
+\begin{figure}[]\begin{center}
+    \includegraphics[width=0.5\textwidth]{figs/layers.pdf}
+    \caption{The idea behind layered ontologies is that instead of
+    translating directly from formal library to ULO~triplets, we
+    would translate into intermediate ontologies specific to a given
+    domain. These individual ontologies could then be compiled to~ULO
+    in an additional step.}\label{fig:love}
+\end{center}\end{figure}
+
+While it is true that ULO as one concrete ontology will need to
+converge on a specific point on that spectrum, namely one of
+generalizability in favor of accuracy, this does not mean that we need
+to give up on accuracy as a whole. We believe that we can have
+both. We can have our cake and eat it it too.
+
+Current exports investigated in this report take the approach of
+taking some library of formal knowledge and then converting that
+library directly into ULO triplets. Perhaps a better approach would be
+to use a \emph{layered architecture} instead. In this layered
+architecture, we would first convert a given third party library into
+triplets defined by an intermediate ontology. These triplets could
+then be compiled to ULO~triplets for search. It is an approach not
+unlike intermediate byte codes used in compiler
+construction~\cite[pp.~357]{dragon}. While lower layers preserve more
+detail, higher levels are more general and easier to search.
+
+A valid criticism of this proposed approach for future experiments is
+that we could understand the base library as an ontology of its own.
+In practice, the only difference is the file format. While formal
+libraries are formulated in some domain specific formal language, when
+we talk about ontologies, our canonical understanding is that of
+OWL~ontologies, that is RDF~predicates with which knowledge is
+formulated. A first retort to this must be that RDF is easier to index
+using triple store databases such as GraphDB and that it will probably
+e easier to architecture a system based around a unified format~(RDF)
+rather than a zoo of formats and languages. But a final judgment
+requires further investigation.