write down ideas from talk

But the way it is written isn't very good. I wish we had good GPT-based lectors already.:

write down ideas from talk
491113a5 · Andreas Schärtl · ab4a53e7 · 491113a5 · 491113a5
Commit 491113a5 authored Sep 23, 2020 by Andreas Schärtl
--- a/doc/report/implementation.tex
+++ b/doc/report/implementation.tex
@@ -122,20 +122,43 @@ representation~$\mathcal{D}$. We see that data flows
 \end{align*}
 which means that if records in~$\mathcal{L}$ change, this will
 probably result in different triplets~$\mathcal{E}$ which in turn
-results in a need to update~$\mathcal{D}$. This is non-trivial.  As it
+results in a need to update~$\mathcal{D}$. Finding an efficient
-stands, \emph{ulo-storage} only knows about what is in~$\mathcal{E}$.
+implementation for this problem is not trivial.  As it stands,
-While it should be possible to find out the difference between a new
+\emph{ulo-storage} only knows about what is in~$\mathcal{E}$.  While
-version of~$\mathcal{E}$ and the current version of~$\mathcal{D}$ and
+it should be possible to find out the difference between a new version
-compute the changes necessary to be applied to~$\mathcal{D}$, the big
+of~$\mathcal{E}$ and the current version of~$\mathcal{D}$ and compute
-number of triplets makes this appear unfeasible. So far, our only
+the changes necessary to be applied to~$\mathcal{D}$, the big number
-suggestion to solve the problem of changing third party libraries is
+of triplets makes this appear unfeasible.  While this is not exactly a
-to regularly re-create the full data set~$\mathcal{D}$ from scratch,
+burning issue for \emph{ulo-storage} itself, it is a problem an
-say every seven days. This circumvents all problems related to
+implementor of a greater tetrapodal serach system will encounter. We
-updating existing data sets, but it does mean additional computation
+suggest two possible approaches to solving this problem.
-requirements. It also means that changes in~$\mathcal{L}$ take some
-to propagate to~$\mathcal{D}$.  If the number of triplets raises
+One approach is to annotate each triplet in~$\mathcal{D}$ with
-by orders of magnitude, this approach will eventually not be scalable
+versioning information about which particular~$\mathcal{E}$ it was
-anymore.
+derived from.  During an import from~$\mathcal{E}$ into~$\mathcal{D}$,
+we could (1)~first remove all triplets in~$\mathcal{D}$ that were
+derived from a previous version of~$\mathcal{E}$ and (2)~then re-import
+all triplets from the current version of~$\mathcal{E}$. Annotating
+triplets with versioning information is an approach that should work,
+but introduces~$\mathcal{O}(n)$ additional triplets in~$\mathcal{D}$
+where $n$~is the number of triplets in~$\mathcal{E}$. This does mean
+effectively doubling the database storage space, a not very satisfying
+solution.
+Another approach is to regularly re-create the full data
+set~$\mathcal{D}$ from scratch, say every seven days. This circumvents
+all problems related to updating existing data sets, but it does mean
+additional computation requirements. It also means that changes
+in~$\mathcal{L}$ take some to propagate to~$\mathcal{D}$. An advanced
+version of this approach could forgo the requirement of only one
+single database storage~$\mathcal{D}$. Instead of only running one
+database instace, we could decide to run dedicated database servers
+for each export~$\mathcal{E}$. The advantage here is that re-creating
+a database representation~$\mathcal{D}$ is fast. The disadvantage is
+that we still want to query the whole data set. This requires the
+development of some cross-repository query mechanism, something
+GraphDB currently only offers limited support
+for~\cite{graphdbnested}.
 \subsection{Endpoints}\label{sec:endpoints}

--- a/doc/report/references.bib
+++ b/doc/report/references.bib
@@ -350,3 +350,11 @@
    year={2017},
    publisher={Packt Publishing Ltd}
 }
+@online{graphdbnested,
+    title = {Nested Repositories},
+    organization = {Ontotext},
+    date = {2020},
+    urldate = {2020-09-23},
+    url = {http://graphdb.ontotext.com/documentation/standard/nested-repositories.html},
+}
\ No newline at end of file