report: towards: review

2c60b424 · Andreas Schärtl · b11197ea · 2c60b424
Commit 2c60b424 authored 4 years ago by Andreas Schärtl
--- a/doc/report/towards.tex
+++ b/doc/report/towards.tex
@@ -2,43 +2,45 @@
 \section{Towards Manageable Ontologies}

 Before finishing up this report with a general conclusion, we want to
-dedicate a section to our thoughts on the upper level ontology and
-ontology design in general. Primarily we offer the potential for
-interesting future work.
+first dedicate a section to thoughts on the upper level ontology and
+ontology design in general. The contribution of this section is
+primarily that of potential for future work. At this point in time,
+the ideas formulated here lack concrete implementations.

 \subsection{Automatic Processing}

-ULO and tetrapodal search are most certainly in their infancy.  As
-such it is easy to dismiss concerns about scalability as premature.
-Regardless, we encourage research on ULO and tetrapodal search to keep
-the greater picture in mind. We believe that a greater tetrapodal
-search system can only succeed if the process of export and indexing
-is completely automated. Automation, we believe, come downs to two
-things, (1)~automation of the export infrastructure and (2)~enabling
-automation through machine readability.
-
-\begin{description}
-    \item[Fully Automated Checks] First of all, we believe that the
-    export of third party library into Endpoint storage needs to be
-    fully automated. We believe this is the for two major
-    reason. First of all, syntax errors and invalid predicates need to
-    be avoided. It is unreasonable to expect a systems administrator
-    to fix each ULO~export in its one particular way. At the very
-    least, automated validators~\cite{rval, rcmd} should be used to
-    check the validity of ULO~exports.
-
-    \item[Well Defined Formats] The second problem is one of ontology
-    design. The goal of RDF and related technologies was to have
-    universal machine readable knowledge available for querying. As
-    such it is necessary to make efforts that the ULO exports we
-    create are machine readable. Here we want to remind the reader of
-    the previously discussed \texttt{ulo:sourceref} dilemma
-    (Section~\ref{sec:exp}). It required special domain knowledge
-    about the specific export for us to resolve a source reference to
-    actual source code. A machine readable approach would be to
-    instead decide on a fixed format for field such
-    as \texttt{ulo:sourceref}.
-\end{description}
+Let us first look at some system level concerns.  It is true that the
+upper level ontology and tetrapodal search are most certainly in their
+infancy.  As such it is easy to dismiss concerns about scalability as
+premature.  Regardless, we encourage research on ULO and tetrapodal
+search to keep the greater picture in mind. We believe that a greater
+tetrapodal search system can only succeed if the process of export and
+indexing is completely automated. Automation, we believe, come downs
+to two things, (1)~automation of the export infrastructure and
+(2)~enabling automation through machine readability.
+
+\emph{Automation of Exports.} First of all, we believe that the
+export of third party library into Endpoint storage needs to be fully
+automated. We believe this is the for two major reason. First of all,
+syntax errors and invalid predicates need to be avoided. It is
+unreasonable to expect a systems administrator to fix each ULO~export
+in its one particular way. At the very least, automated
+validators~\cite{rval, rcmd} should be used to check the validity of
+ULO~exports.
+
+\emph{Enabling Automation Through Machine Readability.} The second
+problem is one of normalization. The goal of RDF and related
+technologies was to have universal machine readable knowledge
+available for querying. As such it is necessary to make efforts such
+that ULO exports we create are machine readable, that is it is easy
+for programs to interpret the encoded knowledge. We want to remind the
+reader of the previously discussed \texttt{ulo:sourceref} dilemma
+(Section~\ref{sec:exp}). It required special domain knowledge about
+the specific export for us to resolve a source reference to actual
+source code. A machine readable approach would be to instead decide on
+a fixed format for field such as \texttt{ulo:sourceref}. This makes it
+easy for application implementors to take full advantage of any
+ULO knowledge base.

 Infrastructure that runs without the need of outside intervention and
 a machine readable knowledge base can lay out the groundwork for a
@@ -46,18 +48,19 @@ greater tetrapodal search system.

 \subsection{The Challenge of Universality}

-We should remember that ULO aims to be a universal format for
-formulating organizational mathematical knowledge.  Maybe it is time
-to reflect on what an outstandingly grand task this actually is. With
-ULO, we are aiming for nothing less than an universal schema on top of
-all collected (organizational) mathematical knowledge.
+While system level concerns must not be discarded, we believe they are
+a small problem compared to the challenge of ontology design as a
+whole.  Remember that ULO aims to be a universal language for
+formulating organizational mathematical knowledge.  An outstandingly
+grand task.  ULO aims at nothing less than a universal schema on top
+of all collected (organizational) mathematical knowledge.

 The current version of ULO already yields worthwhile results when
 formal libraries are exported to ULO~triplets. Especially when it
 comes to metadata, querying such data sets proved to be easy. But an
 ontology such as~ULO can only be a real game changer when it is truly
 universal, that is, when it is easy to formulate any kind of
-organizational knowledge in form of an ULO data set.
+organizational knowledge in the form of a ULO data set.

 As such it should not be terribly surprising that ULO forgoes the
 requirement of being absolutely correct. For example, what
@@ -72,11 +75,13 @@ While that is not the hardest pill to swallow, it would be preferable
 to maintain organizational knowledge in a format that is both (1)~as
 correct as possible and (2)~easy to generalize and search. Future
 development of the upper level ontology first needs to be very clear
-on where it wants to be on this spectrum between accuracy and
-generalizability.  We believe that ULO is best positioned as an
-ontology that is more general, at the cost of accuracy. It can serve
-as a generalized way of indexing vast amounts of formal knowledge,
-making it easy to discover and connect.
+on where it wants to position itself on this spectrum between accuracy
+and generalizability.
+
+In its position as an upper level ontology, we believe that ULO is best
+positioned as an ontology that favors generality at the cost of
+accuracy. It can serve as a generalized way of indexing vast amounts
+of formal knowledge, making it easy to discover and connect.

 \subsection{A Layered Knowledge Architecture}

@@ -98,27 +103,26 @@ both. We can have our cake and eat it it too.
 Current exports investigated in this report take the approach of
 taking some library of formal knowledge and then converting that
 library directly into ULO triplets. Perhaps a better approach would be
-to use a \emph{layered architecture} instead. In this layered
-architecture, we would first convert a given third party library into
-triplets defined by an intermediate ontology. These triplets could
-then be compiled to ULO~triplets for search. It is an approach not
-unlike intermediate byte codes used in compiler
-construction~\cite[pp.~357]{dragon}. While lower layers preserve more
-detail, higher levels are more general and easier to search.
-
-A valid criticism of this proposed approach for future experiments is
-that we could understand the base library as an ontology of its own.
-In practice, the only difference is the file format. While formal
-libraries are formulated in some domain specific formal language, when
-we talk about ontologies, our canonical understanding is that of
-OWL~ontologies, that is RDF~predicates with which knowledge is
-formulated.
-
-A first retort to this must be that RDF is easier to index using
-triple store databases such as GraphDB and that it will probably e
-easier to architecture a system based around a unified format~(RDF)
-rather than a zoo of formats and languages. But a final judgment
-requires further investigation. Either way, we believe it to be
-worthwhile to consider the accuracy-generalizability spectrum into
-account and investigate how this spectrum can be serviced with
+to use a \emph{layered architecture} instead. The idea is sketched out
+in Figure~\ref{fig:love}. In this layered architecture, we would first
+convert a given third party library into triplets defined by an
+intermediate ontology. These triplets could then be compiled to
+ULO~triplets for search. It is an approach not unlike intermediate
+byte codes used in compiler construction~\cite[pp.~357]{dragon}. While
+lower layers preserve more detail, higher levels are more general and
+easier to search.
+
+A valid criticism to this would be that we can understand the base
+library as an ontology of its own.  In practice, the only difference
+is the file format. While formal libraries are formulated in some
+domain specific formal language, when we talk about ontologies, our
+understanding is that of OWL~ontologies, that is RDF~predicates with
+which knowledge is formulated. But RDF is easier to index using triple
+store databases such as {GraphDB}. And it should be easier to
+architecture a search system based around a unified format~(RDF)
+rather than a zoo of formats and languages.
+
+But a final judgment requires further investigation. Either way, we
+find it is necessary to take the accuracy-generalizability spectrum
+into account and investigate how this spectrum can be serviced with
 different layers of ontologies.