diff --git a/doc/report/conclusion.tex b/doc/report/conclusion.tex index d7e190f50ac935090583ec542a7ffca0e6b35ccc..5762874223f6d96676ccbcfbbc5ba44ff4708ef8 100644 --- a/doc/report/conclusion.tex +++ b/doc/report/conclusion.tex @@ -1,48 +1,50 @@ -\section{Conclusions and Next Steps}\label{sec:conclusion} +\section{Conclusion}\label{sec:conclusion} -Using the software stack explained in Section~\ref{sec:implementation} -we were able to take existing RDF exports and import them into a -GraphDB database. This made it possible to create the applications and -examples of Section~\ref{sec:applications}. We showed that -organizational knowledge formulated as ULO triplets can already give -some insights, in particular it is possible to formulate queries for -meta information such as authorship and contribution and resolve the -interlinks between proofs and theorems. These examples also showed -that existing ULO~exports only take advantage of a subset of -ULO~predicates, something to keep in mind for future exports and in -particular something developers of applications built on top of ULO -have to be aware of. +Using the \emph{ulo-storage} software stack introduced in +Section~\ref{sec:implementation} we were able to take existing RDF +exports and import them into a GraphDB database. This made it possible +to experiment with the applications and examples of +Section~\ref{sec:applications}. We showed that organizational +knowledge formulated as ULO triplets can already give some insights, +in particular it is possible to formulate queries for meta information +such as authorship and contribution and resolve the interlinks between +proofs and theorems. On the other hand, our examples also showed that existing +ULO~exports only take advantage of a subset of ULO~predicates, +something to watch as more ULO exports are being worked on. -For this conclusion, we want to recap four different problems we -encountered when working on \emph{ulo-storage}. The first problem was -that of malformed RDF~exports. Various exports contained invalid URIs -and wrong namespaces. As a work around we provided on the fly -correction but of course this does not solve the problem in itself. A -proposed long term solution to this is to automate the export from -third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a -database, eliminating the step of XML files on disk. During import -into the database, the imported data is thoroughly checked and mistakes -are reported right away. Bugs in exporters that produce faulty XML -would be found earlier in development. +\subsection{Potential for Future Work} -The second problem is that of versioning triplets in the GraphDB triple -store. While adding new triplets to an existing GraphDB store is not -a problem, updating existing triplets is difficult. \emph{ulo-storage} -solves this by simply re-creating the GraphDB data set in regular -intervals. This does work, but introduces questions regarding -scalability in the future. That said, it might be difficult to find an -efficient alternative. Tagging each triplet with some version number -doubles the number of triplets that need to be stored and will -undoubtedly makes imports in the database more costly. Maybe re-creating -the index is in fact the best solution. +Finally, we want to recap three different problems we encountered when +working on \emph{ulo-storage}. The first problem was that of malformed +RDF~exports. Various exports contained invalid URIs and wrong +namespaces. As a work around we provided on the fly correction but of +course this does not solve the problem in itself. Perhaps a long term +solution for this problem is to fully automate the export from third +party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a database, +eliminating the step of XML files on disk. During import into the +database, the imported data is thoroughly checked and mistakes are +reported right away. Bugs in exporters that produce faulty XML would +be found earlier in development. + +The second problem is that of versioning triplets in the GraphDB +triple store. While adding new triplets to an existing GraphDB store +is not a problem, updating existing triplets is +difficult. \emph{ulo-storage} circumvents this problem by re-creating +the GraphDB data set in regular intervals. Indeed it might be +difficult to find an efficient alternative. Tagging each triplet with +some version number doubles the number of triplets that need to be +stored and will undoubtedly makes imports in the database more +costly. Re-creating the index and maybe splitting up the knowledge +base into smaller easier to update sub-repositories looks like the +most promising approach for now. The third problem is that of missing predicates in existing ULO -exports. The upper level ontology boats a total of almost +exports. The upper level ontology boasts a total of almost 80~predicates, yet only a third of them are actually used by Coq and Isabelle exports. A developer writing queries that take advantage of the full ULO~vocabulary might be surprised that not data is coming back. This shows the difficulty of designing an ontology that is both -concise and expressive. While it is all good and well to recommend +concise and expressive. But while it is all good and well to recommend writers of exports to use the full set of predicates, it might simply not make sense to use the full set for a given third party library. We think that it is a bit too early to argue for the removal of @@ -50,24 +52,7 @@ particular predicates, rather it might be better to look at future export projects and then evaluate which predicates are used and which are not. -Finally we encountered problems in regards to how data should be -represented at all. Our example showed that a concept such as an -algorithm might be representable using existing logic concepts. This -is surely very tempting for the theorist, but it might not necessarily -be the most practical. The question here is what ULO is supposed to -be. Is it supposed to be a kind of \emph{machine language} for -representing language? If that is the case, it very well might be -reasonable to represent algorithms and other advanced concepts in -terms of basic logic. This however, we conjecture, a language on top -of ULO to make this machine language representation available in terms -of a high level language understood by the majority of users. If on -the other hand, ULO~already is that high level language, it is not -unreasonable to extend the ontology with the concept of algorithms and -so on. - -Despite these four problems, that is broken URIs in exports, the -challenge of versioning data sets, missing predicates and developments -of the upper level ontology, \emph{ulo-storage} provides the necessary -infrastructure for importing ULO triplets into an efficient storage -engine. A necessary building block for a larger tetrapodal search -system. +Despite these many open questions, \emph{ulo-storage} provides the +necessary infrastructure for importing ULO triplets into an efficient +storage engine. A necessary building block for a larger tetrapodal +search system.