Skip to content
Snippets Groups Projects
Commit 8a4f10e6 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: review conclusion

It is not very good :'D
parent a282ac4f
Branches
No related tags found
No related merge requests found
\section{Conclusions and Next Steps}\label{sec:conclusion}
\section{Conclusion}\label{sec:conclusion}
Using the software stack explained in Section~\ref{sec:implementation}
we were able to take existing RDF exports and import them into a
GraphDB database. This made it possible to create the applications and
examples of Section~\ref{sec:applications}. We showed that
organizational knowledge formulated as ULO triplets can already give
some insights, in particular it is possible to formulate queries for
meta information such as authorship and contribution and resolve the
interlinks between proofs and theorems. These examples also showed
that existing ULO~exports only take advantage of a subset of
ULO~predicates, something to keep in mind for future exports and in
particular something developers of applications built on top of ULO
have to be aware of.
Using the \emph{ulo-storage} software stack introduced in
Section~\ref{sec:implementation} we were able to take existing RDF
exports and import them into a GraphDB database. This made it possible
to experiment with the applications and examples of
Section~\ref{sec:applications}. We showed that organizational
knowledge formulated as ULO triplets can already give some insights,
in particular it is possible to formulate queries for meta information
such as authorship and contribution and resolve the interlinks between
proofs and theorems. On the other hand, our examples also showed that existing
ULO~exports only take advantage of a subset of ULO~predicates,
something to watch as more ULO exports are being worked on.
For this conclusion, we want to recap four different problems we
encountered when working on \emph{ulo-storage}. The first problem was
that of malformed RDF~exports. Various exports contained invalid URIs
and wrong namespaces. As a work around we provided on the fly
correction but of course this does not solve the problem in itself. A
proposed long term solution to this is to automate the export from
third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a
database, eliminating the step of XML files on disk. During import
into the database, the imported data is thoroughly checked and mistakes
are reported right away. Bugs in exporters that produce faulty XML
would be found earlier in development.
\subsection{Potential for Future Work}
The second problem is that of versioning triplets in the GraphDB triple
store. While adding new triplets to an existing GraphDB store is not
a problem, updating existing triplets is difficult. \emph{ulo-storage}
solves this by simply re-creating the GraphDB data set in regular
intervals. This does work, but introduces questions regarding
scalability in the future. That said, it might be difficult to find an
efficient alternative. Tagging each triplet with some version number
doubles the number of triplets that need to be stored and will
undoubtedly makes imports in the database more costly. Maybe re-creating
the index is in fact the best solution.
Finally, we want to recap three different problems we encountered when
working on \emph{ulo-storage}. The first problem was that of malformed
RDF~exports. Various exports contained invalid URIs and wrong
namespaces. As a work around we provided on the fly correction but of
course this does not solve the problem in itself. Perhaps a long term
solution for this problem is to fully automate the export from third
party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a database,
eliminating the step of XML files on disk. During import into the
database, the imported data is thoroughly checked and mistakes are
reported right away. Bugs in exporters that produce faulty XML would
be found earlier in development.
The second problem is that of versioning triplets in the GraphDB
triple store. While adding new triplets to an existing GraphDB store
is not a problem, updating existing triplets is
difficult. \emph{ulo-storage} circumvents this problem by re-creating
the GraphDB data set in regular intervals. Indeed it might be
difficult to find an efficient alternative. Tagging each triplet with
some version number doubles the number of triplets that need to be
stored and will undoubtedly makes imports in the database more
costly. Re-creating the index and maybe splitting up the knowledge
base into smaller easier to update sub-repositories looks like the
most promising approach for now.
The third problem is that of missing predicates in existing ULO
exports. The upper level ontology boats a total of almost
exports. The upper level ontology boasts a total of almost
80~predicates, yet only a third of them are actually used by Coq and
Isabelle exports. A developer writing queries that take advantage of
the full ULO~vocabulary might be surprised that not data is coming
back. This shows the difficulty of designing an ontology that is both
concise and expressive. While it is all good and well to recommend
concise and expressive. But while it is all good and well to recommend
writers of exports to use the full set of predicates, it might simply
not make sense to use the full set for a given third party library. We
think that it is a bit too early to argue for the removal of
......@@ -50,24 +52,7 @@ particular predicates, rather it might be better to look at future
export projects and then evaluate which predicates are used and which
are not.
Finally we encountered problems in regards to how data should be
represented at all. Our example showed that a concept such as an
algorithm might be representable using existing logic concepts. This
is surely very tempting for the theorist, but it might not necessarily
be the most practical. The question here is what ULO is supposed to
be. Is it supposed to be a kind of \emph{machine language} for
representing language? If that is the case, it very well might be
reasonable to represent algorithms and other advanced concepts in
terms of basic logic. This however, we conjecture, a language on top
of ULO to make this machine language representation available in terms
of a high level language understood by the majority of users. If on
the other hand, ULO~already is that high level language, it is not
unreasonable to extend the ontology with the concept of algorithms and
so on.
Despite these four problems, that is broken URIs in exports, the
challenge of versioning data sets, missing predicates and developments
of the upper level ontology, \emph{ulo-storage} provides the necessary
infrastructure for importing ULO triplets into an efficient storage
engine. A necessary building block for a larger tetrapodal search
system.
Despite these many open questions, \emph{ulo-storage} provides the
necessary infrastructure for importing ULO triplets into an efficient
storage engine. A necessary building block for a larger tetrapodal
search system.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment