diff --git a/doc/report/towards.tex b/doc/report/towards.tex index cd5bec4da908de65f4b637f4e99028546c52ee21..e8b0980ba269ec97e0fb351c80fa16e8fa012b02 100644 --- a/doc/report/towards.tex +++ b/doc/report/towards.tex @@ -1,40 +1,55 @@ \FloatBarrier{} \section{Towards Manageable Ontologies} -Before finishing up this report with a general conclusion, we first -want to dedicate a section on our thoughts on the upper level -ontology. Primarily they offer the potential for interesting future -work. We hope that these insights can aid in the development of -improved versions of~{ULO}. +Before finishing up this report with a general conclusion, we want to +dedicate a section to our thoughts on the upper level ontology and +ontology design in general. Primarily we offer the potential for +interesting future work. \subsection{Automatic Processing} ULO and tetrapodal search are most certainly in their infancy. As -such it is easy to discard concerns about scalability as premature. -However, time intensive problems with malformed RDF~files force us to -emphasize the need for more rigor when exporting to~ULO and other -ontologies. - -At the very least, syntax errors and invalid predicates need to be -avoided. Even mid sized systems require automation and automation is -not possible when each export needs to be fixed in its one particular -way. To enforce this, we recommend either (1)~automating the export -process from formal library intro GraphDB store entirely or at the -very least (2)~the use of automated tests with online or offline -validators~\cite{rval, rcmd}. +such it is easy to dismiss concerns about scalability as premature. +Regardless, we encourage research on ULO and tetrapodal search to keep +the greater picture in mind. We believe that a greater tetrapodal +search system can only succeed if the process of export and indexing +is completely automated. Automation, we believe, come downs to two +things, (1)~automation of the export infrastructure and (2)~enabling +automation through machine readability. + +First of all, we believe that the export of third party library into +Endpoint storage needs to be fully automated. We believe this is the +for two major reason. First of all, syntax errors and invalid +predicates need to be avoided. It is unreasonable to expect a systems +administrator to fix each ULO~export in its one particular way. At the +very least, automated validators~\cite{rval, rcmd} should be used to +check the validity of ULO~exports. + +The second problem is one of ontology design. The goal of RDF and +related technologies was to have universal machine readable knowledge +available for querying. As such it is necessary to make efforts that +the ULO exports we create are machine readable. Here we want to remind +the reader of the previously discussed \texttt{ulo:sourceref} dilemma +(Section~\ref{sec:exp}). It required special domain knowledge about +the specific export for us to resolve a source reference to actual +source code. A machine readable approach would be to instead decide on +a fixed format for field such as \texttt{ulo:sourceref}. + +Infrastructure that runs without the need of outside intervention and +a machine readable knowledge base can lay out the groundwork for a +greater tetrapodal search system. \subsection{The Challenge of Universality} -Remember that ULO aims to be a universal format for formulating -organizational mathematical knowledge. We took this for granted, but -maybe it is time to reflect on what an outstandingly difficult task -this actually is. With ULO, we are aiming for nothing less than an -universal schema on top of all collected (organizational) mathematical -knowledge. A grand task. +We should remember that ULO aims to be a universal format for +formulating organizational mathematical knowledge. Maybe it is time +to reflect on what an outstandingly grand task this actually is. With +ULO, we are aiming for nothing less than an universal schema on top of +all collected (organizational) mathematical knowledge. The current version of ULO already yields worthwhile results when formal libraries are exported to ULO~triplets. Especially when it -comes to meta data, querying such data sets proved to be easy. But an +comes to metadata, querying such data sets proved to be easy. But an ontology such as~ULO can only be a real game changer when it is truly universal, that is, when it is easy to formulate any kind of organizational knowledge in form of an ULO data set. @@ -44,7 +59,7 @@ requirement of being absolutely correct. For example, what a \texttt{ulo:theorem} actually represent can differ depending on where the mathematical knowledge was originally extracted from. While at first this might feel a bit unsatisfying, it is important to -realize that the strength of ULO~data sets is search and +realize that the strength of ULO~data sets must be search and discovery. Particularities about meaning will eventually need to be resolved by more concrete and specific systems. @@ -54,7 +69,9 @@ correct as possible and (2)~easy to generalize and search. Future development of the upper level ontology first needs to be very clear on where it wants to be on this spectrum between accuracy and generalizability. We believe that ULO is best positioned as an -ontology that is more general, at the cost of accuracy. +ontology that is more general, at the cost of accuracy. It can serve +as a generalized way of indexing vast amounts of formal knowledge, +making it easy to discover and connect. \subsection{A Layered Knowledge Architecture} @@ -90,8 +107,13 @@ In practice, the only difference is the file format. While formal libraries are formulated in some domain specific formal language, when we talk about ontologies, our canonical understanding is that of OWL~ontologies, that is RDF~predicates with which knowledge is -formulated. A first retort to this must be that RDF is easier to index -using triple store databases such as GraphDB and that it will probably -e easier to architecture a system based around a unified format~(RDF) +formulated. + +A first retort to this must be that RDF is easier to index using +triple store databases such as GraphDB and that it will probably e +easier to architecture a system based around a unified format~(RDF) rather than a zoo of formats and languages. But a final judgment -requires further investigation. +requires further investigation. Either way, we believe it to be +worthwhile to consider the accuracy-generalizability spectrum into +account and investigate how this spectrum can be serviced with +different layers of ontologies.