Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
schaertl_andreas
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
supervision
schaertl_andreas
Commits
3ac4cf5b
Commit
3ac4cf5b
authored
4 years ago
by
Andreas Schärtl
Browse files
Options
Downloads
Patches
Plain Diff
report: review intro
parent
c7f44916
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/report/intro.tex
+48
-41
48 additions, 41 deletions
doc/report/intro.tex
with
48 additions
and
41 deletions
doc/report/intro.tex
+
48
−
41
View file @
3ac4cf5b
...
@@ -21,10 +21,10 @@ Tetrapodal search aims to provide a unified search engine that indexes
...
@@ -21,10 +21,10 @@ Tetrapodal search aims to provide a unified search engine that indexes
each of the four different subsets of mathematical knowledge. Because
each of the four different subsets of mathematical knowledge. Because
all four kinds of knowledge are inherently different in their
all four kinds of knowledge are inherently different in their
structure, tetrapodal search proposes that each kind of mathematical
structure, tetrapodal search proposes that each kind of mathematical
knowledge should be made available in a storage backend that fits
knowledge should be made available in a storage backend that fits
the
exactly with the
kind of data it is providing. With all four areas
kind of data it is providing. With all four areas
available for
available for
querying, tetrapodal search intends to then combine
querying, tetrapodal search intends to then combine
the four indexes
the four indexes
into a single query interface.
into a single query interface.
\subsection
{
Focus on Organizational Knowledge
}
\subsection
{
Focus on Organizational Knowledge
}
...
@@ -38,7 +38,7 @@ A previously proposed way to structure such organizational data is the
...
@@ -38,7 +38,7 @@ A previously proposed way to structure such organizational data is the
OWL~ontology~
\cite
{
uloonto
}
and as such all organization information
OWL~ontology~
\cite
{
uloonto
}
and as such all organization information
is stored as RDF~triplets with a unified schema of
is stored as RDF~triplets with a unified schema of
ULO~predicates~
\cite
{
owl
}
. Some effort has been made to export
ULO~predicates~
\cite
{
owl
}
. Some effort has been made to export
existing databases of formal mathematical knowledge to
{
ULO
}
. I
n
existing databases of formal mathematical knowledge to
{
ULO
}
, i
n
particular, there exist exports from Isabelle and Coq
particular, there exist exports from Isabelle and Coq
libraries~
\cite
{
uloisabelle, ulocoq
}
. The resulting data set is
libraries~
\cite
{
uloisabelle, ulocoq
}
. The resulting data set is
already quite large, the Isabelle export alone containing more than
already quite large, the Isabelle export alone containing more than
...
@@ -53,72 +53,79 @@ accessible for querying and analysis. We collected RDF files spread
...
@@ -53,72 +53,79 @@ accessible for querying and analysis. We collected RDF files spread
over different Git repositories, imported them into a database and
over different Git repositories, imported them into a database and
then experimented with APIs for accessing that data set.
then experimented with APIs for accessing that data set.
The main contribution of
this project
is twofold. First, (1)
we
built
The main contribution of
\emph
{
ulo-storage
}
is twofold. First, (1)
~
we
up various infrastructure components that can make up building
blocks
built
up various infrastructure components that can make up building
in a larger tetrapodal search system. Second, (2)~we ran sample
blocks
in a larger tetrapodal search system. Second, (2)~we ran sample
prototype applications and queries on top of this interface. While the
prototype applications and queries on top of this interface. While the
applications themselves are admittedly not very interesting, they can
give
applications themselves are admittedly not very interesting, they can
us insight about future development of
~
{
ULO
}
.
give
us insight about future development of
the upper level ontology
.
\subsection
{
Components Implemented for
\emph
{
ulo-storage
}}
\label
{
sec:components
}
\subsection
{
Components Implemented for
\emph
{
ulo-storage
}}
\label
{
sec:components
}
With RDF files exported and available for download as Git repositories
With RDF files exported and available for download as Git repositories
on MathHub, we have the goal of making the underlying data available
on MathHub, we have the goal of making the underlying data available
for use in applications. It makes sense to first identify the various
for use in applications. It makes sense to first identify the various
components that might be involved in such a system. Figure~
\ref
{
fig:components
}
components that might be involved in such a system.
illustrates all components and their relationships.
Figure~
\ref
{
fig:components
}
illustrates all components and their
relationships.
\begin{figure}
[]
\begin{center}
\begin{figure}
[]
\begin{center}
\includegraphics
{
figs/components
}
\includegraphics
{
figs/components
}
\caption
{
Components involved in the
\emph
{
ulo-storage
}
system.
}
\label
{
fig:components
}
\caption
{
Components involved in the
\emph
{
ulo-storage
}
system.
}
\label
{
fig:components
}
\end{center}\end{figure}
\end{center}\end{figure}
\begin{
description
}
\begin{
itemize
}
\item
[Collecter]
ULO triplets are present in various locations, be it Git
\item
ULO triplets are present in various locations, be it Git
repositories,
available
on web servers or
on
local disk.
repositories, on web servers or
the
local disk.
It is the job of a
It is the job of a
Collecter to assemble these
{
RDF
}
~files and
\emph
{
Collecter
}
to assemble these
{
RDF
}
~files and
forward them for further
forward them for further
processing. This may involve cloning a Git
processing. This may involve cloning a Git
repository or crawling
repository or crawling
the file system.
the file system.
\item
[Importer]
With streams of ULO files assembled by the Collecter, this
\item
With streams of ULO files assembled by the Collecter, this
data then gets passed to an Importer. An Importer uploads
data then gets passed to an
\emph
{
Importer
}
. An Importer uploads
received
RDF~streams into some kind of permanent storage. For
RDF~streams into some kind of permanent storage. For
use in this project, the GraphDB~
\cite
{
graphdb
}
triplet store was
use in this project, the GraphDB~
\cite
{
graphdb
}
triplet store was
a natural fit.
a natural fit.
In
this project, both Collecter and Importer ended up being one
piece
For
this project, both Collecter and Importer ended up being one
of
software, but this does not have to be the case.
piece of monolithic
software, but this does not have to be the case.
\item
[Endpoint]
Finally, with all triplets stored in a database, an
\item
Finally, with all triplets stored in a database, an
Endpoint is where applications access the underlying
\emph
{
Endpoint
}
is where applications access the underlying
knowledge base. This does not necessarily need to be any custom
knowledge base. This does not necessarily need to be any custom
software, rather the programming interface of the underlying
software, rather the programming interface of the underlying
database itself could be understood as an endpoint of its
database itself could be understood as an endpoint of its own.
own.
Regardless, some thought can be put into designing an Endpoint as a
Regardless, some thought can be put into designing an Endpoint as a
layer that lives between application and database that is more
layer that lives between application and database that is more
convenient to use than the one provided by the database.
convenient to use than the one provided by the database. It comes
\end{description}
down to the programming interface we wish to provide to a developer
using this system.
\end{itemize}
\subsection
{
An Additional Harvester Component
}
\subsection
{
An Additional Harvester Component
}
These are the components realized for
\emph
{
ulo-storage
}
. However,
These are the three components realized for
additionally to these components, one could think of a
\emph
{
ulo-storage
}
. However, additionally to these components, one
\emph
{
Harvester
}
component. We assumed that the ULO triplets are
could think of a
\emph
{
Harvester
}
component. Above we assumed that
already available in RDF~format. This is not necessarily true. It
the ULO triplets are already available in RDF~format. This is not
might be desirable to automate the export from third party formats to
necessarily true. It might be desirable to automate the export from
ULO and we think this should be the job of a Harvester component. It
third party formats to ULO and we think this should be the job of a
fetches mathematical knowledge from some remote source and then
Harvester component. It fetches mathematical knowledge from some
provides a volatile stream of ULO data to the Collecter, which then
remote source and then provides a volatile stream of ULO data to the
passes it to the Importer and so on. The big advantage of such an
Collecter, which then passes it to the Importer and so on. The big
approach would be that exports from third party libraries can always
advantage of such an approach would be that exports from third party
be up to date and do not have to be initiated manually.
libraries can always be up to date and do not have to be initiated
manually. Another advantage of this hypothetical component is that
running exports through the Harvester involves the whole import chain
of Collecter and Importer which involves syntax~checking for the
exported RDF data. Bugs in exporters that produce faulty XML would be
found earlier in development.
We did not implement a Harvester for
\emph
{
ulo-storage
}
but we suggest
We did not implement a Harvester for
\emph
{
ulo-storage
}
but we suggest
that it is an idea to keep in mind. The components we did implement
that it is an idea to keep in mind. The components we did implement
(Collecter, Importer and Endpoint) provide us with an easy and
(Collecter, Importer and Endpoint) provide us with an easy and
automated way of making RDF files ready for use with applications. In
automated way of making RDF files ready for use with applications. In
this introduction we only wanted to give the reader a general
this introduction we only wanted to give the reader a general
understanding in the infrastructure that makes up
\emph
{
ulo-storage
}
;
understanding in the infrastructure that makes up
\emph
{
ulo-storage
}
,
the following sections will explain each component in more detail.
the following sections will explain each component in more detail.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment