Skip to content
Snippets Groups Projects
Commit a6fb8743 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

report: final paper-based review

parent 73da559d
Branches
No related tags found
No related merge requests found
......@@ -6,8 +6,8 @@ necessary for this information to be available for querying.
\emph{ulo-storage} aims to lay out the groundwork for this task.
We collected various pieces of exported organizational knowledge into
a centralized and efficient store, made that store available on the
network and then evaluated different ways of querying that store. In
addition, implementations of some exemplary queries on top of our
created infrastructure resulted in insights on how unified schemas
for organizational mathematical knowledge could be extended.
a centralized and efficient store, made the resulting knowledge graph
available on the network and then evaluated different ways of querying
that store. In addition, implementations of some exemplary queries on
top of our created infrastructure resulted in insights on how unified
schemas for organizational mathematical knowledge could be extended.
......@@ -36,7 +36,7 @@
\caption{An adapted SPARQL~query based on~\ref{fig:q2a}. It
lists all works authored by ``John Smith'' rated by number
of references. The idea is works that were referenced more
of references. The idea is that works that were referenced more
often are more important.}\label{fig:q2b}
\end{subfigure}
\vspace{8mm}
......
......@@ -5,14 +5,14 @@
\begin{subfigure}[]{0.475\textwidth}
\centering
\shadowbox{\includegraphics[width=\textwidth]{figs/explore1.png}}
\caption{Listing of all contributors in the data set. Clicking
\caption{Listing all contributors in the data set. Clicking
on a given author shows their individual contributions.}
\end{subfigure}
\hfill
\begin{subfigure}[]{0.475\textwidth}
\centering
\shadowbox{\includegraphics[width=\textwidth]{figs/explore2.png}}
\caption{Listing works a given contributor worked on.\\}
\caption{Listing works a given contributor has worked on.\\}
\end{subfigure}
\vskip\baselineskip{}
\begin{subfigure}[]{0.475\textwidth}
......
......@@ -17,13 +17,12 @@ Endpoint, our second contribution.
\item We investigated queries that could be used to extend the
system into a larger tetrapodal search system. While some
organizational queries have obvious canonical solutions others
organizational queries have obvious canonical solutions, others
introduce questions on how organizational knowledge should be
structured (Section~\ref{sec:tetraq}).
\end{itemize}
\noindent These applications will now be discussed in the following
sections.
\noindent We will now discuss each application in more detail.
\subsection{Exploring Existing Data Sets}\label{sec:expl}
......@@ -47,7 +46,7 @@ predicates. For example, the Isabelle export contains organizational
meta information such as information about paragraphs and sections in
the source document while the Coq export only tells us the filename of
the original Coq source. Both exports have their particularities and
with more and more third party libraries exported to ULO one has to
with more and more third party libraries exported to ULO, one has to
assume that this heterogeneity will only grow. In particular we want
to point to the large number of predicates which remain unused in both
Isabelle and Coq exports. A user formulating queries for ULO might be
......@@ -77,41 +76,52 @@ increases.
\subsection{Interactive Exploration}\label{sec:exp}
Our second application wants to illustrate how to browse an ULO data
set interactively. For this purpose, we developed a web front end that
allows users to browse contributions by author. For this application,
we used the RDF4J Endpoint, the application itself is implemented in
Java. Figure~\ref{fig:appss} shows four screenshots of the current
version. Once the user has selected a given contribution, we list
some basic information about the selected object, such as type
(e.g.\ lemma or corollary) and name as well as references to other
objects in the knowledge base.
The second application we want to discuss illustrates how to browse an
ULO data set interactively. Here, we developed a web front end that
allows users to browse contributions to the knowledge base sorted by
author. In this particular case, we used the RDF4J Endpoint, the
application itself is implemented in Java. Figure~\ref{fig:appss}
shows four screenshots of the current version. Once the user has
selected a given contribution, we list some basic information about
the selected object, such as type (e.g.\ lemma or corollary) and name
as well as references to other objects in the knowledge base.
\input{applications-screenshots.tex}
This application is interesting because similar approaches could be
This experiment is interesting because similar approaches could be
applied when designing IDE~integration for working with organizational
knowledge. Given some object in a document identified by an URI, we
can look at how that object is connected to other objects in the
knowledge graph.
We can also connect the dots back to the original source. In
particular, we took a look at the \texttt{ulo:sourceref} property
defined for many objects in the knowledge base.
\texttt{ulo:sourceref} is ``the URI of the physical location (e.g.,
file/URI, line, column) of the source code that introduced the
subject''~\cite{uloonto} and in use in the Isabelle
export~\cite{uloisabelle}. Making the connection from ULO~object to
original Isabelle document is convenient for the user, but required
some extra work from us as application implementors. The reason for
this is that the format of the \texttt{ulo:sourceref} property is not
well-defined, rather it is up to implementors of library exporters how
to format source references. The consequence is that for each export,
developers of front ends will need to write custom code for finding
the original source. Maybe it would be a reasonable requirement for
Our front end can also connect the dots back from ULO~object to
original source code. In particular, we took a look at the
\texttt{ulo:sourceref} predicate defined for many objects in the
knowledge base. \texttt{ulo:sourceref} is supposed to contain ``the
URI of the physical location (e.g., file/URI, line, column) of the
source code that introduced the subject''~\cite{uloonto}; the Isabelle
export. But while making the connection from exported ULO~object to
original Isabelle document is convenient for the user, this feature
required some extra work from us as application implementors. The
reason for this is that the format of the \texttt{ulo:sourceref}
property appears to be not well-defined, rather it is up to
implementors of library exporters how to format these source
references. For example, in the Isabelle export, we have to translate
source references such as
\begin{verbatim}
https://isabelle.in.tum.de/source/HOL/HOL-Algebra/
HOL-Algebra.Group.theory#17829.576.8:17836.576.15
\end{verbatim}
to the original source
\begin{verbatim}
https://isabelle.in.tum.de/dist/library/HOL/HOL-Algebra/Group.html
\end{verbatim}
The consequence is that for each export, developers of
front ends will need to write custom code for finding the original
source. Maybe it would be a reasonable requirement for
\texttt{ulo:sourceref} to have a well-defined format, ideally an
actually reachable URI on the open web, if that is applicable for a given
library.
actually reachable URI on the open web, if that is applicable for a
given library.
While translating from \texttt{ulo:sourceref} to original URI
introduced some extra work, implementing this application was easy and
......@@ -128,11 +138,11 @@ proof of concept implementations and evaluate their applicability.
\subsubsection{Elementary Proofs and Computed Scores}
We start with query~$\mathcal{Q}_1$ which illustrates how one can
We start with query~$\mathcal{Q}_1$ which illustrates how we can
compute arithmetic scores for objects in our knowledge graph.
Query~$\mathcal{Q}_1$ asks us to ``\emph{[f]ind theorems with
non-elementary proofs}.'' Elementary proofs can be understood as
those proof that are considered easy and
non-elementary proofs}''~\cite{tetra}. Elementary proofs can be
understood as those proof that are considered easy and
obvious~\cite{elempro,elemtest}. In consequence,~$\mathcal{Q}_1$ has
to search for all proofs which are not trivial. Of course, just like
any distinction between ``theorem'' and ``corollary'' is going to be
......@@ -149,13 +159,10 @@ prospective school teachers had problems with notation and term
rewriting and were missing required prerequisites~\cite{proofund,
proofteach}, none of which seems applicable for grading individual
formal proofs for difficulty. On the other hand, there is very
specific research on rating proofs for some given subsets of problems,
e.g.\ on the satisfiability of a given CNF formula. A particular
example is focused on heuristics for how long a SAT solver will take
to find a solution for a given~{CNF}. Here, solutions that take long
are considered harder~\cite{proofsat}.
specific research on rating proofs for concrete problems such as the
satisfiability of a given CNF formula~\cite{proofsat}.
\noindent\emph{Organizational Aspect.} A first working hypothesis
\emph{Organizational Aspect.} A first working hypothesis
might be to assume that elementary proofs are short. In that case, the
size, that is the number of bytes to store the proof, is our first
indicator of proof complexity. This is by no means perfect, as even
......@@ -170,22 +177,22 @@ working hypothesis.
us to sort by file size. Maybe proof complexity also leads to quick
check times in proof assistants and automatic theorem provers. With
this assumption in mind we could use the \texttt{ulo:check-time}
predicate. Correlating proof complexity with file size allows us to
define one indicator of proof complexity based on organizational
knowledge alone.
predicate. Correlating proof complexity with file size or check time
allows us to define one indicator of proof complexity based on
organizational knowledge alone.
\noindent\emph{Other Aspects.} A tetrapodal search system should
\emph{Other Aspects.} A tetrapodal search system should
probably also take symbolic knowledge into account. Based on some kind
of measure of formula complexity, different proofs could be
rated. Similarly, with narrative knowledge available to us, we could
count the number of words, citations and so on to rate the narrative
complexity of a proof. Combining symbolic knowledge, narrative
knowledge and organizational knowledge should allow us to find proofs
which are probably easier than others.
which are easier than others.
\input{applications-q1.tex}
\noindent\emph{Implementation.} Implementing a naive version of the
\emph{Implementation.} Implementing a naive version of the
organizational aspect can be as simple as querying for all theorems
justified by proofs, ordered by size (or check time).
Figure~\ref{fig:q1short} illustrates how this can be achieved with a
......@@ -231,13 +238,13 @@ something that will surely be useful for many applications.
\subsubsection{Categorizing Algorithms and Algorithmic Problems}
The second query~$\mathcal{Q}_2$ we decided to focus on wants to
``\emph{[f]ind algorithms that solve $NP$-complete graph problems.}''
Here we want the tetrapodal search system to return a listing of
algorithms that solve (graph) problems with a given property (runtime
complexity). We need to consider where each of these three components
might be stored.
``\emph{[f]ind algorithms that solve $NP$-complete graph
problems}''~\cite{tetra}. Here we want the tetrapodal search system
to return a listing of algorithms that solve (graph) problems with a
given property (runtime complexity). We need to consider where and how
each of these components might be found.
\noindent\emph{Symbolic and Concrete Aspects.} First, let us consider
\emph{Symbolic and Concrete Aspects.} First, let us consider
algorithms. Algorithms can be formulated as computer code which can be
understood as symbolic knowledge (code represented as a syntax tree)
or as concrete knowledge (code as text
......@@ -248,25 +255,20 @@ automatically in the general case~\cite{np}. Metadata of this kind
needs to be stored in a separate index for organizational knowledge,
it being the only fit.
\noindent\emph{Organizational Aspect.} If we wish to look up properties
about algorithms from organizational knowledge, we first have to
think about how to represent this information. We propose two
approaches, one using the existing ULO ontology and one that
recommends extending ULO to nativly support the concept of
algorithms. Both approaches have their distinct trade-offs.
As a first approach, we can try to represent algorithms and
problems in terms of existing ULO predicates. As ULO does have a
concept of \texttt{theorem} and \texttt{proof}, it might be
tempting to exploit the Curry-Howard correspondence and represent
algorithms understood as programs in terms of proofs. But that
does not really capture the canonical understanding of algorithms;
algorithms are not actually programs, rather there are programs
that implement algorithms. Even if we do work around this problem,
it is not clear how to represent problems (e.g.\ the traveling
salesman problem or the sorting problem) in terms of theorems
(propositions, types) that get implemented by a proof (algorithm,
program).
\emph{Organizational Aspect.} If we wish to look up properties about
algorithms from organizational knowledge, we first have to think about
how to represent this information. As a first approach, we can try to
represent algorithms and problems in terms of existing ULO
predicates. As ULO does have a concept of \texttt{ulo:theorem} and
\texttt{ulo:proof}, it might be tempting to exploit these predicates
and represent algorithms understood as programs in terms of proofs.
But that does not really capture the canonical understanding of
algorithms. Algorithms are not actually programs, rather there are
programs that implement algorithms. Even if we do work around this
problem, it is not clear how to represent problems (e.g.\ the
traveling salesman problem or the sorting problem) in terms of
theorems (propositions, types) that get implemented by a proof
(algorithm, program).
As algorithms make up an important part of certain areas of research,
it might be reasonable to introduce native level support for
......@@ -291,16 +293,16 @@ solve algorithmic problems.
\subsubsection{Contributors and Number of References}
Finally, query~$\mathcal{Q}_3$ from literature wants to
know ``\emph{[a]ll areas of math that {Nicolas G.\ de Bruijn} has
worked in and his main contributions}''~\cite{tetra}. $\mathcal{Q}_3$~is asking
for works of a given author~$A$. It also asks for their main
Finally, query~$\mathcal{Q}_3$ from literature wants to know
``\emph{[a]ll areas of math that {Nicolas G.\ de Bruijn} has worked in
and his main contributions}''~\cite{tetra}. $\mathcal{Q}_3$~is
asking for works of a given author~$A$. It also asks for their main
contributions, for example which particularly interesting paragraphs
or code~$A$ has authored. We picked this particular query as it
is asking for metadata, something that should be easily serviced by
or code~$A$ has authored. We picked this particular query as it is
asking for metadata, something that should be easily serviced by
organizational knowledge.
\noindent\emph{Organizational Aspect.} ULO has no concept of authors,
\emph{Organizational Aspect.} ULO has no concept of authors,
contributors, dates and so on. Rather, the idea is to take advantage
of the Dublin Core project which provides an ontology for such
metadata~\cite{dcreport, dcowl}. For example, Dublin Core provides us
......@@ -316,19 +318,18 @@ objects ordered by authors.
\input{applications-q3.tex}
\noindent\emph{Implementation.} A search for contributions by a given
author can easily be formulated in {SPARQL}~(Figure~\ref{fig:q2a}).
Query $\mathcal{Q}_3$ is also asking for the main contributions
of~$A$, that is those works that~$A$ authored that are the most
important. Sorting the result by number of references might be a good
start. To get the main contributions, we rate each individual work by
its number of \texttt{ulo:uses} references. Extending the previous
{SPARQL} query, we can ask the database for an ordered list of works,
starting with the one that has the most
references~(Figure~\ref{fig:q2b}). We see that one can
formulate~$\mathcal{Q}_3$ with just one SPARQL query. Because
everything is handled by the database, access should be about as quick
as we can hope it to be.
\emph{Implementation.} A search for contributions by a given author
can easily be formulated in {SPARQL}~(Figure~\ref{fig:q2a}). Query
$\mathcal{Q}_3$ is also asking for the main contributions of~$A$, that
is those works that~$A$ authored that are the most important. Sorting
the result by number of references can be a good start. To get the
main contributions, we rate each individual work by its number of
\texttt{ulo:uses} references. Extending the previous {SPARQL} query,
we can ask the database for an ordered list of works, starting with
the one that has the most references~(Figure~\ref{fig:q2b}). We see
that one can formulate~$\mathcal{Q}_3$ with just one SPARQL
query. Because everything is handled by the database, access should be
about as quick as we can hope it to be.
While the sparse data set available to use only returned a handful of
results, we see that queries like~$\mathcal{Q}_3$ are easily serviced
......@@ -341,15 +342,16 @@ approaches~\cite{citcart}.
Experimenting with $\mathcal{Q}_1$ to $\mathcal{Q}_3$ provided us with
some insight into ULO and existing ULO exports. $\mathcal{Q}_1$ shows
that while there is no formal definition for ``elementary proof'', ULO
allows us to query for heuristics and calculate arbitrary arithmetic scores for
objects of organizational knowledge. Query~$\mathcal{Q}_2$ illustrates
the difficulty in finding universal schemas. It remains an open question
whether ULO should include algorithms as a first class citizen, as a
concept based around existing ULO predicates or whether it is a better
idea to design a dedicated ontology and potentially data store entirely.
Finally, while we were able to formulate a SPARQL query that should
take care of most of~$\mathcal{Q}_3$ we found that the existing data sets
contain very little information about authorship. This underlines
the observations made previously in Section~\ref{sec:expl} and should
be on the mind of anyone writing exporters that output ULO~triplets.
that while there is no universal definition for ``elementary proof'',
ULO allows us to query for heuristics and calculate arbitrary
arithmetic scores for objects of organizational
knowledge. Query~$\mathcal{Q}_2$ illustrates the difficulty in finding
universal schemas. It remains an open question whether ULO should
include algorithms as a first class citizen, as a concept based around
existing ULO predicates or whether it is a better idea to design a
dedicated ontology and data store entirely. Finally, while we were
able to formulate a SPARQL query that should take care of most
of~$\mathcal{Q}_3$, we found that the existing data sets contain very
little information about authorship. This underlines the observations
made previously in Section~\ref{sec:expl} and should be on the mind of
anyone writing exporters that output ULO~triplets.
......@@ -5,50 +5,50 @@ Section~\ref{sec:implementation} we were able to take existing RDF
exports and import them into a GraphDB database. This made it possible
to experiment with the applications and examples of
Section~\ref{sec:applications}. We showed that organizational
knowledge formulated as ULO triplets can already give some insights,
in particular it is possible to formulate queries for meta information
such as authorship and contribution and resolve the interlinks between
proofs and theorems. On the other hand, our examples also showed that existing
ULO~exports only take advantage of a subset of ULO~predicates,
something to watch as more ULO exports are being worked on.
knowledge formulated as ULO triplets can already give some insights.
In particular, it is possible to formulate queries about meta
information such as authorship and contribution. We also manged to
resolve the interlinks between proofs and theorems.
\subsection{Potential for Future Work}
To sum up this report, we want to recap three problems we encountered
when working on \emph{ulo-storage}. They have the potential to result
in future projects that further improve the availability of
organizational knowledge.
Finally, we want to recap three different problems we encountered when
working on \emph{ulo-storage}. The first problem was that of malformed
RDF~exports. Various exports contained invalid URIs and wrong
namespaces. As a work around we provided on the fly correction but of
course this does not solve the problem in itself. Perhaps a long term
solution for this problem is to fully automate the export from third
party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a database,
eliminating the step of XML files on disk. During import into the
database, the imported data is thoroughly checked and mistakes are
reported right away. Bugs in exporters that produce faulty XML would
be found earlier in development.
The first problem we encountered while working on \emph{ulo-storage}
was that of malformed RDF~exports. Various exports contained invalid
URIs and incorrect namespaces. As a work around, we provided on the
fly correction. But this does not solve the problem in itself. Perhaps
a long term solution for this problem is to fully automate the export
from third party libraries (e.g.\ Coq, Isabelle) to ULO triplets in a
database, eliminating the step of XML files on disk. During import
into the database, the imported data is thoroughly checked and
mistakes are reported right away. Bugs in exporters that produce
faulty XML would be found earlier in development.
The second problem is that of versioning triplets in the GraphDB
triple store. While adding new triplets to an existing GraphDB store
is not a problem, updating existing triplets is
The second problem we want to recap is that of versioning triplets in
the GraphDB triple store. While adding new triplets to an existing
GraphDB store is not a problem, updating existing triplets is
difficult. \emph{ulo-storage} circumvents this problem by re-creating
the GraphDB data set in regular intervals. Indeed it might be
difficult to find an efficient alternative. Tagging each triplet with
some version number doubles the number of triplets that need to be
stored. Re-creating the index and maybe splitting up the knowledge
base into smaller easier to update sub-repositories looks like the
most promising approach for now.
most promising approach for now. What is missing is an efficient way
of servicing queries across multiple database instances.
The third problem is that of missing predicates in existing ULO
exports. The upper level ontology boasts a total of almost
80~predicates, yet only a third of them are actually used by Coq and
Isabelle exports. A developer writing queries that take advantage of
the full ULO~vocabulary might be surprised that not data is coming
back. This shows the difficulty of designing an ontology that is both
concise and expressive. But while it is all good and well to recommend
writers of exports to use the full set of predicates, it might simply
not make sense to use the full set for a given third party library. We
think that it is too early to argue for the removal of particular
predicates, rather it is better to look at future export projects and
then evaluate which predicates are in use and which are not.
Finally, perhaps the most interesting but also most challenging
question is the discussion about how to develop ULO, the universal
ontology for organizational knowledge. On one hand, ULO boasts a
total of almost 80~predicates, yet only a third of them are actually
used by existing Coq and Isabelle exports. A developer writing queries
that take advantage of the full ULO~vocabulary might be surprised that
not data is coming back. On the other hand, we found it difficult to
model algorithms in terms of ULO predicates and argued that it might
be necessary to further extend the upper level ontology. It will be
necessary to look at the properties of future exports to make a more
well-funded decision.
Despite many open questions, \emph{ulo-storage} provides the necessary
infrastructure for importing ULO triplets into an efficient storage
......
......@@ -61,10 +61,10 @@ is no pressing need to force them into separate processes.
Our implementation supports two sources for RDF files, namely Git
repositories and the local file system. The file system Collector
crawls a given directory on the local machine and looks for
RDF~XMl~files~\cite{rdfxml} while the Git Collector first clones a Git
RDF~XML~files~\cite{rdfxml} while the Git Collector first clones a Git
repository and then passes the checked out working copy to the file
system Collector. Because we found that is not uncommon for RDF files to be
compressed, our Collector supports on the fly extraction of
system Collector. Because we found that is not uncommon for RDF files
to be compressed, our implementation supports on the fly extraction of
gzip~\cite{gzip} and xz~\cite{xz} formats which can greatly reduce the
required disk space in the collection step.
......@@ -75,7 +75,7 @@ Coq exports contained URIs which does not fit the official syntax
specification~\cite{rfc3986} as they contained illegal
characters. Previous work~\cite{ulo} that processed Coq and Isabelle
exports used database software such as Virtuoso Open
Source~\cite{wikivirtuoso} which do not properly check URIs according
Source~\cite{wikivirtuoso} which does not properly check URIs according
to spec; in consequence these faults were only discovered now. To
tackle these problems, we introduced on the fly correction steps
during collection that escape the URIs in question and then continue
......@@ -164,18 +164,19 @@ Another approach is to regularly re-create the full data
set~$\mathcal{D}$ from scratch, say every seven days. This circumvents
the problems related to updating existing data sets, but also means
that changes in a given library~$\mathcal{L}_i$ take some to propagate
to~$\mathcal{D}$. Building on this idea, an advanced version of this
approach could forgo the requirement for one single database
storage~$\mathcal{D}$ entirely. Instead of maintaining just one global
database state~$\mathcal{D}$, we suggest experimenting with dedicated
database instances~$\mathcal{D}_i$ for each given
to~$\mathcal{D}$. Continuing this train of thought, an advanced
version of this approach could forgo the requirement for one single
database storage~$\mathcal{D}$ entirely. Instead of maintaining just
one global database state~$\mathcal{D}$, we suggest experimenting with
dedicated database instances~$\mathcal{D}_i$ for each given
library~$\mathcal{L}_i$. The advantage here is that re-creating a
given database representation~$\mathcal{D}_i$ is fast as
exports~$\mathcal{E}_i$ are comparably small. The disadvantage is that
we still want to query the whole data set~$\mathcal{D} = \mathcal{D}_1
\cup \mathcal{D}_2 \cup \cdots \cup \mathcal{D}_n$. This does require the
development of some cross-database query mechanism, functionality GraphDB
currently only offers limited support for~\cite{graphdbnested}.
\cup \mathcal{D}_2 \cup \cdots \cup \mathcal{D}_n$. This does require
the development of some cross-database query mechanism, functionality
existing systems currently only offer limited support
for~\cite{graphdbnested}.
In summary, we see that versioning is a potential challenge for a
greater tetrapodal search system. While not a pressing issue for
......@@ -188,7 +189,7 @@ Endpoint. Recall that an Endpoint provides the programming interface
for applications that wish to query our collection of organizational
knowledge. In practice, the choice of Endpoint programming interface
is determined by the choice of database system as the Endpoint is
provided directly by the database.
provided directly by the database system.
In our project, organizational knowledge is formulated as
RDF~triplets. The canonical choice for us is to use a triple store,
......@@ -206,7 +207,7 @@ OWL~Reasoning~\cite{owlspec, graphdbreason}. In particular, this
means that GraphDB offers support for transitive queries as described
in previous work on~ULO~\cite{ulo}. A transitive query is one that,
given a relation~$R$, asks for the transitive closure~$S$
of~$R$~\cite{tc} (Figure~\ref{fig:tc}).
(Figure~\ref{fig:tc}) of~$R$.
\input{implementation-transitive-closure.tex}
......@@ -235,14 +236,14 @@ includes not just syntax and semantics of the language itself, but
also a standardized REST interface~\cite{rest} for querying database
servers.
SPARQL was inspired by SQL and as such the \texttt{SELECT}
The SPARQL syntax was inspired by SQL and as such the \texttt{SELECT}
\texttt{WHERE} syntax should be familiar to many software developers.
A simple query that returns all triplets in the store looks like
\begin{lstlisting}
SELECT * WHERE { ?s ?p ?o }
\end{lstlisting}
where \texttt{?s}, \texttt{?p} and \texttt{?o} are query
variables. The result of any query are valid substitutions for the
variables. The result of any query are valid substitutions for all
query variables. In this particular case, the database would return a
table of all triplets in the store sorted by subject~\texttt{?o},
predicate~\texttt{?p} and object~\texttt{?o}.
......@@ -300,7 +301,7 @@ containers, that is lightweight virtual machines with a fixed
environment for running a given application~\cite[pp. 22]{dockerbook}.
Docker Compose then is a way of combining individual Docker containers
to run a full tech stack of application, database server and so
on~\cite[pp. 42]{dockerbook}. All configuration of the overarching a
on~\cite[pp. 42]{dockerbook}. All configuration of the overarching
setup is stored in a Docker Compose file that describes the software
stack.
......@@ -313,11 +314,12 @@ code for Collector and Importer is available in the
deployment files, that is Docker Compose configuration and additional
Dockerfiles are stored in a separate repository~\cite{dockerfilerepo}.
This concludes our discussion of the implementation developed for the
\emph{ulo-storage} project. We designed a system based around (1)~a
Collector which collects RDF triplets from third party sources, (2)~an
Importer which imports these triplets into a GraphDB database and
(3)~looked at different ways of querying a GraphDB Endpoint. All of
this is easy to deploy using a single Docker Compose file. With this
stack ready for use, we will continue with a look at some interesting
applications and queries built on top of this infrastructure.
With this, we conclude our discussion of the implementation developed
for the \emph{ulo-storage} project. We designed a system based around
(1)~a Collector which collects RDF triplets from third party sources,
(2)~an Importer which imports these triplets into a GraphDB database
and (3)~looked at different ways of querying a GraphDB Endpoint. All
of this is easy to deploy using a single Docker Compose file. With
this stack ready for use, we will now continue with a look at some
interesting applications and queries built on top of this
infrastructure.
......@@ -13,14 +13,14 @@ One topic of research in this field is the idea of a \emph{tetrapodal
search} that combines four distinct areas of mathematical knowledge.
These four kinds being (1)~the actual formulae as \emph{symbolic
knowledge}, (2)~examples and concrete objects as \emph{concrete
knowledge}, (3)~names and comments as \emph{narrative knowledge} and
knowledge}, (3)~prose, names and comments as \emph{narrative knowledge} and
finally (4)~identifiers, references and their relationships, referred
to as \emph{organizational knowledge}~\cite{tetra}.
Tetrapodal search aims to provide a unified search engine that indexes
each of these four different subsets of mathematical knowledge.
Because all four kinds of knowledge are inherently unique in their
structure, tetrapodal search proposes that each kind of mathematical
structure, tetrapodal search proposes that each subset of mathematical
knowledge should be made available in a storage backend that fits the
kind of data it is providing. With all four areas available for
querying, tetrapodal search intends to then combine the four indexes
......@@ -31,8 +31,8 @@ mathematical knowledge. In this context, the focus of
A previously proposed way to structure organizational knowledge is the
\emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
OWL~ontology~\cite{uloonto} and as such all organization information
is stored as RDF~triplets with a unified schema of
OWL~ontology~\cite{uloonto} and as such all organization knowledge is
stored as RDF~triplets with a unified schema of
ULO~predicates~\cite{owl}. Some effort has been made to export
existing collections of formal mathematical knowledge to {ULO}. In
particular, exports from Isabelle and Coq-based libraries are
......@@ -51,8 +51,8 @@ analysis. We collected RDF files spread over different Git
repositories, imported them into a database and then experimented with
APIs for accessing that data set.
The main contribution of \emph{ulo-storage} is twofold. First, (1)~we
built up various infrastructure components that make organizational
The contribution of \emph{ulo-storage} is twofold. First, (1)~we built
up various infrastructure components that make organizational
knowledge easy to query. These components can make up building blocks
of a larger tetrapodal search system. Their design and implementation
are discussed in Section~\ref{sec:implementation}. Second, (2)~we ran
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment