Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
schaertl_andreas
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
supervision
schaertl_andreas
Commits
a40c5d96
Commit
a40c5d96
authored
Oct 9, 2020
by
Andreas Schärtl
Browse files
Options
Downloads
Patches
Plain Diff
report: review impl
parent
97199bf4
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/report/implementation.tex
+27
-25
27 additions, 25 deletions
doc/report/implementation.tex
with
27 additions
and
25 deletions
doc/report/implementation.tex
+
27
−
25
View file @
a40c5d96
...
...
@@ -61,10 +61,10 @@ is no pressing need to force them into separate processes.
Our implementation supports two sources for RDF files, namely Git
repositories and the local file system. The file system Collector
crawls a given directory on the local machine and looks for
RDF~XM
l
~files~
\cite
{
rdfxml
}
while the Git Collector first clones a Git
RDF~XM
L
~files~
\cite
{
rdfxml
}
while the Git Collector first clones a Git
repository and then passes the checked out working copy to the file
system Collector. Because we found that is not uncommon for RDF files
to be
compressed, our
Collector
supports on the fly extraction of
system Collector. Because we found that is not uncommon for RDF files
to be
compressed, our
implementation
supports on the fly extraction of
gzip~
\cite
{
gzip
}
and xz~
\cite
{
xz
}
formats which can greatly reduce the
required disk space in the collection step.
...
...
@@ -75,7 +75,7 @@ Coq exports contained URIs which does not fit the official syntax
specification~
\cite
{
rfc3986
}
as they contained illegal
characters. Previous work~
\cite
{
ulo
}
that processed Coq and Isabelle
exports used database software such as Virtuoso Open
Source~
\cite
{
wikivirtuoso
}
which do not properly check URIs according
Source~
\cite
{
wikivirtuoso
}
which do
es
not properly check URIs according
to spec; in consequence these faults were only discovered now. To
tackle these problems, we introduced on the fly correction steps
during collection that escape the URIs in question and then continue
...
...
@@ -164,18 +164,19 @@ Another approach is to regularly re-create the full data
set~
$
\mathcal
{
D
}$
from scratch, say every seven days. This circumvents
the problems related to updating existing data sets, but also means
that changes in a given library~
$
\mathcal
{
L
}_
i
$
take some to propagate
to~
$
\mathcal
{
D
}$
.
Build
ing
on
this
idea, an advanced version of this
approach could forgo the requirement for one single
database
storage~
$
\mathcal
{
D
}$
entirely. Instead of maintaining just
one global
database state~
$
\mathcal
{
D
}$
, we suggest experimenting with
dedicated
database instances~
$
\mathcal
{
D
}_
i
$
for each given
to~
$
\mathcal
{
D
}$
.
Continu
ing this
train of thought, an advanced
version of this
approach could forgo the requirement for one single
database
storage~
$
\mathcal
{
D
}$
entirely. Instead of maintaining just
one global
database state~
$
\mathcal
{
D
}$
, we suggest experimenting with
dedicated
database instances~
$
\mathcal
{
D
}_
i
$
for each given
library~
$
\mathcal
{
L
}_
i
$
. The advantage here is that re-creating a
given database representation~
$
\mathcal
{
D
}_
i
$
is fast as
exports~
$
\mathcal
{
E
}_
i
$
are comparably small. The disadvantage is that
we still want to query the whole data set~
$
\mathcal
{
D
}
=
\mathcal
{
D
}_
1
\cup
\mathcal
{
D
}_
2
\cup
\cdots
\cup
\mathcal
{
D
}_
n
$
. This does require the
development of some cross-database query mechanism, functionality GraphDB
currently only offers limited support for~
\cite
{
graphdbnested
}
.
\cup
\mathcal
{
D
}_
2
\cup
\cdots
\cup
\mathcal
{
D
}_
n
$
. This does require
the development of some cross-database query mechanism, functionality
existing systems currently only offer limited support
for~
\cite
{
graphdbnested
}
.
In summary, we see that versioning is a potential challenge for a
greater tetrapodal search system. While not a pressing issue for
...
...
@@ -188,7 +189,7 @@ Endpoint. Recall that an Endpoint provides the programming interface
for applications that wish to query our collection of organizational
knowledge. In practice, the choice of Endpoint programming interface
is determined by the choice of database system as the Endpoint is
provided directly by the database.
provided directly by the database
system
.
In our project, organizational knowledge is formulated as
RDF~triplets. The canonical choice for us is to use a triple store,
...
...
@@ -206,7 +207,7 @@ OWL~Reasoning~\cite{owlspec, graphdbreason}. In particular, this
means that GraphDB offers support for transitive queries as described
in previous work on~ULO~
\cite
{
ulo
}
. A transitive query is one that,
given a relation~
$
R
$
, asks for the transitive closure~
$
S
$
of~
$
R
$
~
\cite
{
tc
}
(Figure~
\ref
{
fig:tc
}
).
(Figure~
\ref
{
fig:tc
}
)
of~
$
R
$
.
\input
{
implementation-transitive-closure.tex
}
...
...
@@ -235,14 +236,14 @@ includes not just syntax and semantics of the language itself, but
also a standardized REST interface~
\cite
{
rest
}
for querying database
servers.
SPARQL was inspired by SQL and as such the
\texttt
{
SELECT
}
The
SPARQL
syntax
was inspired by SQL and as such the
\texttt
{
SELECT
}
\texttt
{
WHERE
}
syntax should be familiar to many software developers.
A simple query that returns all triplets in the store looks like
\begin{lstlisting}
SELECT * WHERE
{
?s ?p ?o
}
\end{lstlisting}
where
\texttt
{
?s
}
,
\texttt
{
?p
}
and
\texttt
{
?o
}
are query
variables. The result of any query are valid substitutions for
the
variables. The result of any query are valid substitutions for
all
query variables. In this particular case, the database would return a
table of all triplets in the store sorted by subject~
\texttt
{
?o
}
,
predicate~
\texttt
{
?p
}
and object~
\texttt
{
?o
}
.
...
...
@@ -300,7 +301,7 @@ containers, that is lightweight virtual machines with a fixed
environment for running a given application~
\cite
[pp. 22]
{
dockerbook
}
.
Docker Compose then is a way of combining individual Docker containers
to run a full tech stack of application, database server and so
on~
\cite
[pp. 42]
{
dockerbook
}
. All configuration of the overarching
a
on~
\cite
[pp. 42]
{
dockerbook
}
. All configuration of the overarching
setup is stored in a Docker Compose file that describes the software
stack.
...
...
@@ -313,11 +314,12 @@ code for Collector and Importer is available in the
deployment files, that is Docker Compose configuration and additional
Dockerfiles are stored in a separate repository~
\cite
{
dockerfilerepo
}
.
This concludes our discussion of the implementation developed for the
\emph
{
ulo-storage
}
project. We designed a system based around (1)~a
Collector which collects RDF triplets from third party sources, (2)~an
Importer which imports these triplets into a GraphDB database and
(3)~looked at different ways of querying a GraphDB Endpoint. All of
this is easy to deploy using a single Docker Compose file. With this
stack ready for use, we will continue with a look at some interesting
applications and queries built on top of this infrastructure.
With this, we conclude our discussion of the implementation developed
for the
\emph
{
ulo-storage
}
project. We designed a system based around
(1)~a Collector which collects RDF triplets from third party sources,
(2)~an Importer which imports these triplets into a GraphDB database
and (3)~looked at different ways of querying a GraphDB Endpoint. All
of this is easy to deploy using a single Docker Compose file. With
this stack ready for use, we will now continue with a look at some
interesting applications and queries built on top of this
infrastructure.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment