report: conclusion: open questions

This still needs lots of fixing for sure.

report: conclusion: open questions
f2e58d63 · Andreas Schärtl · 09c4f6dc · f2e58d63
Commit f2e58d63 authored Aug 18, 2020 by Andreas Schärtl
--- a/doc/report/conclusion.tex
+++ b/doc/report/conclusion.tex
 \section{Conclusions and Next Steps}\label{sec:conclusion}

-\subsection{An Additional Harvester Component} % copied from introduction.tex
+Using the software stack explained in Section~\ref{sec:implementation}
+we were able to take existing RDF exports and import them into a
+GraphDB database. This made it possible to create the applications and
+examples of Section~\ref{sec:applications}. We showed that
+organizational knowledge formulated as ULO triplets can already give
+some insights, in particular it is possible to formulate queries for
+meta information such as authorship and contribution and resolve the
+interlinks between proofs and theorems. These examples also showed
+that existing ULO~exports only take advantage of a subset of
+ULO~predicates, something to keep in mind for future exports and in
+particular something developers of applications built on top of ULO
+have to be aware of.

-These are the three components realized for
-\emph{ulo-storage}. However, additionally to these components, one
-could think of a \emph{Harvester} component.  Above we assumed that
-the ULO triplets are already available in RDF~format.  This is not
-necessarily true.  It might be desirable to automate the export from
-third party formats to ULO and we think this should be the job of a
-Harvester component.  It fetches mathematical knowledge from some
-remote source and then provides a volatile stream of ULO data to the
-Collecter, which then passes it to the Importer and so on. The big
-advantage of such an approach would be that exports from third party
-libraries can always be up to date and do not have to be initiated
-manually. Another advantage of this hypothetical component is that
-running exports through the Harvester involves the whole import chain
-of Collecter and Importer which involves syntax~checking for the
-exported RDF data. Bugs in exporters that produce faulty XML would be
-found earlier in development.
+For this conclusion, we want to recap four different problems we
+encountered when working on \emph{ulo-storage}. The first problem was
+that of malformed RDF~exports. Various exports contained invalid URIs
+and wrong namespaces. As a work around we provided on the fly
+correction but of course this does not solve the problem in itself. A
+proposed long term solution to this is to automate the export from
+third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a
+database, eliminating the step of XML files on disk. During import
+into the database, the imported data is thoroughly checked and mistakes
+are reported right away.  Bugs in exporters that produce faulty XML
+would be found earlier in development.
+
+The second problem is that of versioning triplets in the GraphDB
+store.  While adding new triplets to an existing GraphDB store is not
+a problem, updating existing triplets is difficult. \emph{ulo-storage}
+solves this by simply re-creating the GraphDB data set in regular
+intervals. This does work, but introduces questions regarding
+scalability in the future. That said, it might be difficult to find an
+efficient alternative. Tagging each triplet with some version number
+doubles the number of triplets that need to be stored and will
+undoubtedly makes imports in the database more costly. Maybe re-creating
+the index is in fact the best solution.
+
+The third problem is that of missing predicates in existing ULO
+exports. The upper level ontology boats a total of almost
+80~predicates, yet only a third of them are actually used by Coq and
+Isabelle exports. A developer writing queries that take advantage of
+the full ULO~vocabulary might be surprised that not data is coming
+back. This shows the difficulty of designing an ontology that is both
+concise and expressive. While it is all good and well to recommend
+writers of exports to use the full set of predicates, it might simply
+not make sense to use the full set for a given third party library. We
+think that it is a bit too early to argue for the removal of
+particular predicates, rather it might be better to look at future
+export projects and then evaluate which predicates are used and which
+are not.
+
+Finally we encountered problems in regards to how data should be
+represented at all. Our example showed that a concept such as an
+algorithm might be representable using existing logic concepts.  This
+is surely very tempting for the theorist, but it might not necessarily
+be the most practical. The question here is what ULO is supposed to
+be. Is it supposed to be a kind of \emph{machine language} for
+representing language? If that is the case, it very well might be
+reasonable to represent algorithms and other advanced concepts in
+terms of basic logic. This however, we conjecture, a language on top
+of ULO to make this machine language representation available in terms
+of a high level language understood by the majority of users. If on
+the other hand, ULO~already is that high level language, it is not
+unreasonable to extend the ontology with the concept of algorithms and
+so on.