diff --git a/doc/report/applications.tex b/doc/report/applications.tex index 5f17c4ff9f555213d6bf9c08fa7e4d3fc6acbec9..fc6536204adc9121924a2be0ca21d31403b20401 100644 --- a/doc/report/applications.tex +++ b/doc/report/applications.tex @@ -237,7 +237,7 @@ about~$\mathcal{Q}_1$ we found out that the infrastructure provided by \emph{ulo-storage} allows us to compute arbitrary arithmetic scores, something that will surely be useful for many applications. -\subsubsection{Categorizing Algorithms and Algorithmic Problems} +\subsubsection{Categorizing Algorithms and Algorithmic Problems}\label{sec:alg} The second query~$\mathcal{Q}_2$ we decided to focus on wants to ``\emph{[f]ind algorithms that solve $NP$-complete graph diff --git a/doc/report/conclusion.tex b/doc/report/conclusion.tex index 7c8f19626ea87db36a25d0f8a772a03d92940af5..b798e97d52f64f47a1254938c83dd1bcf0f93a78 100644 --- a/doc/report/conclusion.tex +++ b/doc/report/conclusion.tex @@ -10,46 +10,6 @@ In particular, it is possible to formulate queries about meta information such as authorship and contribution. We also manged to resolve the interlinks between proofs and theorems. -To sum up this report, we want to recap three problems we encountered -when working on \emph{ulo-storage}. They have the potential to result -in future projects that further improve the availability of -organizational knowledge. - -The first problem we encountered while working on \emph{ulo-storage} -was that of malformed RDF~exports. Various exports contained invalid -URIs and incorrect namespaces. As a work around, we provided on the -fly correction. But this does not solve the problem in itself. Perhaps -a long term solution for this problem is to fully automate the export -from third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a -database, eliminating the step of XML files on disk. During import -into the database, the imported data is thoroughly checked and -mistakes are reported right away. Bugs in exporters that produce -faulty XML would be found earlier in development. - -The second problem we want to recap is that of versioning triplets in -the GraphDB triple store. While adding new triplets to an existing -GraphDB store is not a problem, updating existing triplets is -difficult. \emph{ulo-storage} circumvents this problem by re-creating -the GraphDB data set in regular intervals. Indeed it might be -difficult to find an efficient alternative. Tagging each triplet with -some version number doubles the number of triplets that need to be -stored. Re-creating the index and maybe splitting up the knowledge -base into smaller easier to update sub-repositories looks like the -most promising approach for now. What is missing is an efficient way -of servicing queries across multiple database instances. - -Finally, perhaps the most interesting but also most challenging -question is the discussion about how to develop ULO, the universal -ontology for organizational knowledge. On one hand, ULO boasts a -total of almost 80~predicates, yet only a third of them are actually -used by existing exports. A developer writing queries that take -advantage of the full ULO~vocabulary might be surprised that not data -is coming back. On the other hand, we found it difficult to model -algorithms in terms of ULO predicates and argued that it might be -necessary to further extend the upper level ontology. It will be -necessary to look at the properties of future exports to make a more -well-funded decision. - Despite many remaining open questions, \emph{ulo-storage} provides the necessary infrastructure for importing ULO triplets into an efficient storage engine. A necessary building block for a larger tetrapodal diff --git a/doc/report/references.bib b/doc/report/references.bib index 7c4e6b822a28a61e3c0b019bd58454075938e487..6ff03feea81ed7205df0b933eaf65a37708bb5b9 100644 --- a/doc/report/references.bib +++ b/doc/report/references.bib @@ -500,3 +500,25 @@ year={2004}, school={University of Waterloo} } + +@book{dragon, + title={Compilers: Principles, Techniques, And Tools: Second Edition}, + author={Aho, Alfred V and Sethi, Ravi and Ullman, Jeffrey D}, + journal={Addison Wesley}, + year={2007} +} + +@online{rval, + title = {Validation Service}, + organization = {W3C}, + date = {2006}, + urldate = {2020-11-10}, + url = {https://www.w3.org/RDF/Validator/}, +} + +@online{rcmd, + title = {Fedict/rdfvalidator}, + organization = {Federal Public Service Policy and Support DG Digital Transformation}, + urldate = {2020-11-10}, + url = {https://github.com/Fedict/rdfvalidator}, +} \ No newline at end of file diff --git a/doc/report/report.tex b/doc/report/report.tex index 0128957a166ef75ee27fb586897490ccd9fd8de9..e57047e94d537ffa0f2946a22234a31bc835aa74 100644 --- a/doc/report/report.tex +++ b/doc/report/report.tex @@ -61,6 +61,7 @@ \input{introduction.tex} \input{implementation.tex} \input{applications.tex} +\input{towards.tex} \input{conclusion.tex} \newpage diff --git a/doc/report/towards.tex b/doc/report/towards.tex new file mode 100644 index 0000000000000000000000000000000000000000..cd5bec4da908de65f4b637f4e99028546c52ee21 --- /dev/null +++ b/doc/report/towards.tex @@ -0,0 +1,97 @@ +\FloatBarrier{} +\section{Towards Manageable Ontologies} + +Before finishing up this report with a general conclusion, we first +want to dedicate a section on our thoughts on the upper level +ontology. Primarily they offer the potential for interesting future +work. We hope that these insights can aid in the development of +improved versions of~{ULO}. + +\subsection{Automatic Processing} + +ULO and tetrapodal search are most certainly in their infancy. As +such it is easy to discard concerns about scalability as premature. +However, time intensive problems with malformed RDF~files force us to +emphasize the need for more rigor when exporting to~ULO and other +ontologies. + +At the very least, syntax errors and invalid predicates need to be +avoided. Even mid sized systems require automation and automation is +not possible when each export needs to be fixed in its one particular +way. To enforce this, we recommend either (1)~automating the export +process from formal library intro GraphDB store entirely or at the +very least (2)~the use of automated tests with online or offline +validators~\cite{rval, rcmd}. + +\subsection{The Challenge of Universality} + +Remember that ULO aims to be a universal format for formulating +organizational mathematical knowledge. We took this for granted, but +maybe it is time to reflect on what an outstandingly difficult task +this actually is. With ULO, we are aiming for nothing less than an +universal schema on top of all collected (organizational) mathematical +knowledge. A grand task. + +The current version of ULO already yields worthwhile results when +formal libraries are exported to ULO~triplets. Especially when it +comes to meta data, querying such data sets proved to be easy. But an +ontology such as~ULO can only be a real game changer when it is truly +universal, that is, when it is easy to formulate any kind of +organizational knowledge in form of an ULO data set. + +As such it should not be terribly surprising that ULO forgoes the +requirement of being absolutely correct. For example, what +a \texttt{ulo:theorem} actually represent can differ depending on +where the mathematical knowledge was originally extracted from. While +at first this might feel a bit unsatisfying, it is important to +realize that the strength of ULO~data sets is search and +discovery. Particularities about meaning will eventually need to be +resolved by more concrete and specific systems. + +While that is not the hardest pill to swallow, it would be preferable +to maintain organizational knowledge in a format that is both (1)~as +correct as possible and (2)~easy to generalize and search. Future +development of the upper level ontology first needs to be very clear +on where it wants to be on this spectrum between accuracy and +generalizability. We believe that ULO is best positioned as an +ontology that is more general, at the cost of accuracy. + +\subsection{A Layered Knowledge Architecture} + +\begin{figure}[]\begin{center} + \includegraphics[width=0.5\textwidth]{figs/layers.pdf} + \caption{The idea behind layered ontologies is that instead of + translating directly from formal library to ULO~triplets, we + would translate into intermediate ontologies specific to a given + domain. These individual ontologies could then be compiled to~ULO + in an additional step.}\label{fig:love} +\end{center}\end{figure} + +While it is true that ULO as one concrete ontology will need to +converge on a specific point on that spectrum, namely one of +generalizability in favor of accuracy, this does not mean that we need +to give up on accuracy as a whole. We believe that we can have +both. We can have our cake and eat it it too. + +Current exports investigated in this report take the approach of +taking some library of formal knowledge and then converting that +library directly into ULO triplets. Perhaps a better approach would be +to use a \emph{layered architecture} instead. In this layered +architecture, we would first convert a given third party library into +triplets defined by an intermediate ontology. These triplets could +then be compiled to ULO~triplets for search. It is an approach not +unlike intermediate byte codes used in compiler +construction~\cite[pp.~357]{dragon}. While lower layers preserve more +detail, higher levels are more general and easier to search. + +A valid criticism of this proposed approach for future experiments is +that we could understand the base library as an ontology of its own. +In practice, the only difference is the file format. While formal +libraries are formulated in some domain specific formal language, when +we talk about ontologies, our canonical understanding is that of +OWL~ontologies, that is RDF~predicates with which knowledge is +formulated. A first retort to this must be that RDF is easier to index +using triple store databases such as GraphDB and that it will probably +e easier to architecture a system based around a unified format~(RDF) +rather than a zoo of formats and languages. But a final judgment +requires further investigation.