Skip to content
Snippets Groups Projects
Commit 491113a5 authored by Andreas Schärtl's avatar Andreas Schärtl
Browse files

write down ideas from talk

But the way it is written isn't very good. I wish we had
good GPT-based lectors already.:
parent ab4a53e7
Branches
Tags
No related merge requests found
...@@ -122,20 +122,43 @@ representation~$\mathcal{D}$. We see that data flows ...@@ -122,20 +122,43 @@ representation~$\mathcal{D}$. We see that data flows
\end{align*} \end{align*}
which means that if records in~$\mathcal{L}$ change, this will which means that if records in~$\mathcal{L}$ change, this will
probably result in different triplets~$\mathcal{E}$ which in turn probably result in different triplets~$\mathcal{E}$ which in turn
results in a need to update~$\mathcal{D}$. This is non-trivial. As it results in a need to update~$\mathcal{D}$. Finding an efficient
stands, \emph{ulo-storage} only knows about what is in~$\mathcal{E}$. implementation for this problem is not trivial. As it stands,
While it should be possible to find out the difference between a new \emph{ulo-storage} only knows about what is in~$\mathcal{E}$. While
version of~$\mathcal{E}$ and the current version of~$\mathcal{D}$ and it should be possible to find out the difference between a new version
compute the changes necessary to be applied to~$\mathcal{D}$, the big of~$\mathcal{E}$ and the current version of~$\mathcal{D}$ and compute
number of triplets makes this appear unfeasible. So far, our only the changes necessary to be applied to~$\mathcal{D}$, the big number
suggestion to solve the problem of changing third party libraries is of triplets makes this appear unfeasible. While this is not exactly a
to regularly re-create the full data set~$\mathcal{D}$ from scratch, burning issue for \emph{ulo-storage} itself, it is a problem an
say every seven days. This circumvents all problems related to implementor of a greater tetrapodal serach system will encounter. We
updating existing data sets, but it does mean additional computation suggest two possible approaches to solving this problem.
requirements. It also means that changes in~$\mathcal{L}$ take some
to propagate to~$\mathcal{D}$. If the number of triplets raises One approach is to annotate each triplet in~$\mathcal{D}$ with
by orders of magnitude, this approach will eventually not be scalable versioning information about which particular~$\mathcal{E}$ it was
anymore. derived from. During an import from~$\mathcal{E}$ into~$\mathcal{D}$,
we could (1)~first remove all triplets in~$\mathcal{D}$ that were
derived from a previous version of~$\mathcal{E}$ and (2)~then re-import
all triplets from the current version of~$\mathcal{E}$. Annotating
triplets with versioning information is an approach that should work,
but introduces~$\mathcal{O}(n)$ additional triplets in~$\mathcal{D}$
where $n$~is the number of triplets in~$\mathcal{E}$. This does mean
effectively doubling the database storage space, a not very satisfying
solution.
Another approach is to regularly re-create the full data
set~$\mathcal{D}$ from scratch, say every seven days. This circumvents
all problems related to updating existing data sets, but it does mean
additional computation requirements. It also means that changes
in~$\mathcal{L}$ take some to propagate to~$\mathcal{D}$. An advanced
version of this approach could forgo the requirement of only one
single database storage~$\mathcal{D}$. Instead of only running one
database instace, we could decide to run dedicated database servers
for each export~$\mathcal{E}$. The advantage here is that re-creating
a database representation~$\mathcal{D}$ is fast. The disadvantage is
that we still want to query the whole data set. This requires the
development of some cross-repository query mechanism, something
GraphDB currently only offers limited support
for~\cite{graphdbnested}.
\subsection{Endpoints}\label{sec:endpoints} \subsection{Endpoints}\label{sec:endpoints}
......
...@@ -350,3 +350,11 @@ ...@@ -350,3 +350,11 @@
year={2017}, year={2017},
publisher={Packt Publishing Ltd} publisher={Packt Publishing Ltd}
} }
@online{graphdbnested,
title = {Nested Repositories},
organization = {Ontotext},
date = {2020},
urldate = {2020-09-23},
url = {http://graphdb.ontotext.com/documentation/standard/nested-repositories.html},
}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment