report: towards: closing thoughts

b3bf3c18 · Andreas Schärtl · 9faa7fc5 · b3bf3c18
Commit b3bf3c18 authored 4 years ago by Andreas Schärtl
--- a/doc/report/towards.tex
+++ b/doc/report/towards.tex
 \FloatBarrier{}
 \section{Towards Manageable Ontologies}
-Before finishing up this report with a general conclusion, we first
+Before finishing up this report with a general conclusion, we want to
-want to dedicate a section on our thoughts on the upper level
+dedicate a section to our thoughts on the upper level ontology and
-ontology. Primarily they offer the potential for interesting future
+ontology design in general. Primarily we offer the potential for
-work.  We hope that these insights can aid in the development of
+interesting future work.
-improved versions of~{ULO}.
 \subsection{Automatic Processing}
 ULO and tetrapodal search are most certainly in their infancy.  As
-such it is easy to discard concerns about scalability as premature.
+such it is easy to dismiss concerns about scalability as premature.
-However, time intensive problems with malformed RDF~files force us to
+Regardless, we encourage research on ULO and tetrapodal search to keep
-emphasize the need for more rigor when exporting to~ULO and other
+the greater picture in mind. We believe that a greater tetrapodal
-ontologies.
+search system can only succeed if the process of export and indexing
+is completely automated. Automation, we believe, come downs to two
-At the very least, syntax errors and invalid predicates need to be
+things, (1)~automation of the export infrastructure and (2)~enabling
-avoided. Even mid sized systems require automation and automation is
+automation through machine readability.
-not possible when each export needs to be fixed in its one particular
-way. To enforce this, we recommend either (1)~automating the export
+First of all, we believe that the export of third party library into
-process from formal library intro GraphDB store entirely or at the
+Endpoint storage needs to be fully automated. We believe this is the
-very least (2)~the use of automated tests with online or offline
+for two major reason. First of all, syntax errors and invalid
-validators~\cite{rval, rcmd}.
+predicates need to be avoided. It is unreasonable to expect a systems
+administrator to fix each ULO~export in its one particular way. At the
+very least, automated validators~\cite{rval, rcmd} should be used to
+check the validity of ULO~exports.
+The second problem is one of ontology design. The goal of RDF and
+related technologies was to have universal machine readable knowledge
+available for querying. As such it is necessary to make efforts that
+the ULO exports we create are machine readable. Here we want to remind
+the reader of the previously discussed \texttt{ulo:sourceref} dilemma
+(Section~\ref{sec:exp}). It required special domain knowledge about
+the specific export for us to resolve a source reference to actual
+source code. A machine readable approach would be to instead decide on
+a fixed format for field such as \texttt{ulo:sourceref}.
+Infrastructure that runs without the need of outside intervention and
+a machine readable knowledge base can lay out the groundwork for a
+greater tetrapodal search system.
 \subsection{The Challenge of Universality}
-Remember that ULO aims to be a universal format for formulating
+We should remember that ULO aims to be a universal format for
-organizational mathematical knowledge. We took this for granted, but
+formulating organizational mathematical knowledge.  Maybe it is time
-maybe it is time to reflect on what an outstandingly difficult task
+to reflect on what an outstandingly grand task this actually is. With
-this actually is. With ULO, we are aiming for nothing less than an
+ULO, we are aiming for nothing less than an universal schema on top of
-universal schema on top of all collected (organizational) mathematical
+all collected (organizational) mathematical knowledge.
-knowledge. A grand task.
 The current version of ULO already yields worthwhile results when
 formal libraries are exported to ULO~triplets. Especially when it
@@ -44,7 +59,7 @@ requirement of being absolutely correct. For example, what
 a \texttt{ulo:theorem} actually represent can differ depending on
 where the mathematical knowledge was originally extracted from. While
 at first this might feel a bit unsatisfying, it is important to
-realize that the strength of ULO~data sets is search and
+realize that the strength of ULO~data sets must be search and
 discovery. Particularities about meaning will eventually need to be
 resolved by more concrete and specific systems.
@@ -54,7 +69,9 @@ correct as possible and (2)~easy to generalize and search. Future
 development of the upper level ontology first needs to be very clear
 on where it wants to be on this spectrum between accuracy and
 generalizability.  We believe that ULO is best positioned as an
-ontology that is more general, at the cost of accuracy.
+ontology that is more general, at the cost of accuracy. It can serve
+as a generalized way of indexing vast amounts of formal knowledge,
+making it easy to discover and connect.
 \subsection{A Layered Knowledge Architecture}
@@ -90,8 +107,13 @@ In practice, the only difference is the file format. While formal
 libraries are formulated in some domain specific formal language, when
 we talk about ontologies, our canonical understanding is that of
 OWL~ontologies, that is RDF~predicates with which knowledge is
-formulated. A first retort to this must be that RDF is easier to index
+formulated.
-using triple store databases such as GraphDB and that it will probably
-e easier to architecture a system based around a unified format~(RDF)
+A first retort to this must be that RDF is easier to index using
+triple store databases such as GraphDB and that it will probably e
+easier to architecture a system based around a unified format~(RDF)
 rather than a zoo of formats and languages. But a final judgment
-requires further investigation.
+requires further investigation. Either way, we believe it to be
+worthwhile to consider the accuracy-generalizability spectrum into
+account and investigate how this spectrum can be serviced with
+different layers of ontologies.