report: review implementation

1f5ccf24 · Andreas Schärtl · fe251517 · 1f5ccf24 · 1f5ccf24 · 1f5ccf24
Commit 1f5ccf24 authored 4 years ago by Andreas Schärtl
--- a/doc/report/figs/screenshot0.png
+++ b/doc/report/figs/screenshot0.png
--- a/doc/report/figs/screenshot1.png
+++ b/doc/report/figs/screenshot1.png
--- a/doc/report/implementation-screenshots.tex
+++ b/doc/report/implementation-screenshots.tex
+\begin{figure}
+    \centering
+    \begin{subfigure}[b]{0.8\textwidth}
+        \centering
+        \shadowbox{\includegraphics[width=\textwidth]{figs/screenshot0.png}}
+        \caption{Scheduling a new import job in the web interface.}
+    \end{subfigure}
+    \vspace{5mm}
+    \begin{subfigure}[b]{0.8\textwidth}
+        \centering
+        \shadowbox{\includegraphics[width=\textwidth]{figs/screenshot1.png}}
+        \caption{Reviewing a previous import job. This particular job
+          failed, which illustrates how our web interface presents errors
+          and logs.}
+    \end{subfigure}
+    \caption{The web interface for controlling the Collector and
+      Importer components. While Collector and Importer can also be
+      used as library code or with a command line utility, managing
+      jobs in a web interface is probably the most
+      convenient.}\label{fig:ss}
+\end{figure}
--- a/doc/report/implementation-transitive-closure.tex
+++ b/doc/report/implementation-transitive-closure.tex
@@ -6,14 +6,15 @@
        \caption{We can think of this tree as visualizing a relation~$R$ where
          $(X, Y)~\in~R$ iff there is an edge from~$X$ to~$Y$.}
    \end{subfigure}
+    \hspace{5mm}
    \begin{subfigure}[b]{0.4\textwidth}
        \centering
        \includegraphics[width=\textwidth]{figs/tree-transitive.pdf}
        \caption{Transitive closure~$S$ of relation~$R$. Additionally to
          each tuple from~$R$ (solid edges), $S$~also contains additional
-          transitive edges (dotted lines).}
+          transitive edges (dotted edges).}
    \end{subfigure}
-    \caption{Illustrating the idea behind the transative closure. A
+    \caption{Illustrating the idea behind transitive closures. A
-      transative closure~$S$ of relation~$R$ is defined as the
+      transitive closure~$S$ of relation~$R$ is defined as the
      ``minimal transitive relation that contains~$R$''~\cite{tc}.}\label{fig:tc}
 \end{figure*}
--- a/doc/report/implementation.tex
+++ b/doc/report/implementation.tex
@@ -12,9 +12,9 @@ describes some details of the actual implementation for this project.
 With RDF files exported and available for download as Git
 repositories~\cite{uloisabelle, ulocoq}, we have the goal of making
 the underlying data available for use in applications. Let us first
-look at a high level overview of all involved
+take a high level look at the three components we made out for this
-components. Figure~\ref{fig:components} illustrates each component and
+job. Figure~\ref{fig:components} illustrates how data flows through
-the flow of data.
+the different components.
 \begin{figure}[]\begin{center}
    \includegraphics[width=0.9\textwidth]{figs/components.png}
@@ -37,12 +37,12 @@ the flow of data.
  \emph{Endpoint} is where applications access the underlying
  knowledge base. This does not necessarily need to be any custom
  software, rather the programming interface of the underlying
-  database itself can be understood as an endpoint of its own.
+  database server itself can be understood as an endpoint of its own.
 \end{itemize}
-Collector, Importer and Endpoint provide us with an easy and automated
+Collector, Importer and Endpoint provide us with an automated way of
-way of making RDF files available for use within applications. We will
+making RDF files available for use within applications. We will now
-now take a look at the actual implementation created for
+take a look at the actual implementation created for
 \emph{ulo-storage}, beginning with the implementation of Collector and
 Importer.
@@ -50,21 +50,20 @@ Importer.
 We previously described Collector and Importer as two distinct
 components.  The Collector pulls RDF data from various sources as an
-input and outputs a stream of standardized RDF data. Then, the
+input and outputs a stream of standardized RDF data. The Importer
-Importer takes such a stream of RDF data and then dumps it to some
+takes such a stream of RDF data and then dumps it to some sort of
-sort of persistent storage.  In the implementation for
+persistent storage.  In our implementation, both Collector and
-\emph{ulo-storage}, both Collector and Importer ended up being one
+Importer ended up as one piece of monolithic software. This does not
-piece of monolithic software. This does not need to be the case but
+need to be the case but proved convenient because combining Collector
-proved convenient because (1)~combining Collector and Importer forgoes
+and Importer forgoes the needs for an additional IPC~mechanism
-the needs for an additional IPC~mechanism and (2)~neither Collector
+between Collector and Importer.
-nor Importer are terribly large pieces of software in themselves.
 Our implementation supports two sources for RDF files, namely Git
 repositories and the local file system. The file system Collector
 crawls a given directory on the local machine and looks for
 RDF~XMl~files~\cite{rdfxml} while the Git Collector first clones a Git
 repository and then passes the checked out working copy to the file
-system Collector. Because it is not uncommon for RDF files to be
+system Collector. Because we found that is not uncommon for RDF files to be
 compressed, our Collector supports on the fly extraction of
 gzip~\cite{gzip} and xz~\cite{xz} formats which can greatly reduce the
 required disk space in the collection step.
@@ -72,7 +71,7 @@ required disk space in the collection step.
 During development of the Collector, we found that existing exports
 from third party mathematical libraries contain RDF syntax errors
 which were not discovered previously. In particular, both Isabelle and
-Coq export contained URIs which do not fit the official syntax
+Coq exports contained URIs which do not fit the official syntax
 specification~\cite{rfc3986} as they contained illegal
 characters. Previous work~\cite{ulo} that processed Coq and Isabelle
 exports used database software such as Virtuoso Open
@@ -95,17 +94,24 @@ into the database.  As such the import itself is straight-forward, our
 software only needs to upload the RDF file stream as-is to an HTTP
 endpoint provided by our GraphDB instance.
+To review, our combination of Collector and Importer fetches XML~files
+from Git repositories, applies on the fly decompression and fixes and
+then imports the collected RDF~triplets into persistent database
+storage.
-\subsection{Scheduling and Version Management}
+\subsubsection{Scheduling}
 Collector and Importer were implemented as library code that can be
 called from various front ends. For this project, we provide both a
 command line interface as well as a graphical web front end. While the
 command line interface is only useful for manually starting single
-jobs, the web interface allows scheduling of jobs. In particular, it
+jobs, the web interface allows scheduling of jobs.  Import jobs can be
-allows the user to automate import jobs. For example, it is possible
+started manually or scheduled automatically in a web interface
-to schedule an import of a given Git repository every seven days to a
+(Figure~\ref{fig:ss}) which also keeps history and logs.
-given GraphDB instance.
+\input{implementation-screenshots.tex}
+\subsubsection{Version Management}
 Automated job control that regularly imports data from the same
 sources leads us to the problem of versioning.  In our current design,
@@ -122,16 +128,17 @@ representation~$\mathcal{D}$. We see that data flows
 from $n$~individual libraries~$\mathcal{L}_i$ into a single
 database storage~$\mathcal{D}$ that is used for querying.
-However, mathematical knowledge isn't static. When a given
+However, we must not ignore that mathematical knowledge is ever
-library~$\mathcal{L}^{t}_i$ at revision~$t$ gets updated to a new
+changing and not static. When a given library~$\mathcal{L}^{t}_i$ at
-version~$\mathcal{L}^{t+1}_i$, this change will eventually propagate
+revision~$t$ gets updated to a new version~$\mathcal{L}^{t+1}_i$, this
-to the associated export and result in a new set of RDF
+change will eventually propagate to the associated export and result
-triplets~$\mathcal{E}^{t+1}_i$. Our global database
+in a new set of RDF triplets~$\mathcal{E}^{t+1}_i$. Our global
-state~$\mathcal{D}$ needs to get updated to match the changes
+database state~$\mathcal{D}$ needs to get updated to match the changes
-between~$\mathcal{E}^{t}_i$ and $\mathcal{E}^{t+1}_i$.  Finding an
+between~$\mathcal{E}^{t}_i$ and $\mathcal{E}^{t+1}_i$.
-efficient implementation for this problem is not trivial.  While it
-should be possible to find out the difference between two
+Finding an efficient implementation for this problem is not trivial.
-exports~$\mathcal{E}^{t}_i$ and $\mathcal{E}^{t+1}_i$ and compute the
+While it should be possible to compute the difference between two
+exports~$\mathcal{E}^{t}_i$ and $\mathcal{E}^{t+1}_i$ and infer the
 changes necessary to be applied to~$\mathcal{D}$, the big number of
 triplets makes this appear unfeasible.  As this is a problem an
 implementer of a greater tetrapodal search system will encounter, we
@@ -142,7 +149,7 @@ versioning information about which particular
 export~$\mathcal{E}^{t}_i$ it was derived from.  During an import
 from~$\mathcal{E}^{s}_i$ into~$\mathcal{D}$, we could (1)~first remove all
 triplets in~$\mathcal{D}$ that were derived from the previous version
-of~$\mathcal{E}^{t-1}_i$ and (2)~then re-import all triplets from the current
+of~$\mathcal{E}^{s-1}_i$ and (2)~then re-import all triplets from the current
 version~$\mathcal{E}^{s}_i$. Annotating triplets with versioning
 information is an approach that should work, but it does
 introduce~$\mathcal{O}(n)$ additional triplets in~$\mathcal{D}$ where
@@ -152,29 +159,33 @@ solution.
 Another approach is to regularly re-create the full data
 set~$\mathcal{D}$ from scratch, say every seven days. This circumvents
-all problems related to updating existing data sets, but it does have
+the problems related to updating existing data sets, but also means
-additional computation requirements. It also means that changes in a
+that changes in a given library~$\mathcal{L}_i$ take some to propagate
-given library~$\mathcal{L}_i$ take some to propagate to~$\mathcal{D}$.
+to~$\mathcal{D}$.  Building on top of this idea, an advanced version
-Building on top of this idea, an advanced version of this approach
+of this approach could forgo the requirement of only one single
-could forgo the requirement of only one single database
+database storage~$\mathcal{D}$ entirely. Instead of only maintaining
-storage~$\mathcal{D}$. Instead of only maintaining one global database
+one global database state~$\mathcal{D}$, we suggest the use of
-state~$\mathcal{D}$, we suggest the use of dedicated database
+dedicated database instances~$\mathcal{D}_i$ for each given
-instances~$\mathcal{D}_i$ for each given library~$\mathcal{L}_i$.  The
+library~$\mathcal{L}_i$.  The advantage here is that re-creating a
-advantage here is that re-creating a given database
+given database representation~$\mathcal{D}_i$ is fast as
-representation~$\mathcal{D}_i$ is fast as exports~$\mathcal{E}_i$ are
+exports~$\mathcal{E}_i$ are comparably small. The disadvantage is that
-comparably small. The disadvantage is that we still want to query the
+we still want to query the whole data set~$\mathcal{D} = \mathcal{D}_1
-whole data set~$\mathcal{D} = \mathcal{D}_1 \cup \mathcal{D}_2 \cup
+\cup \mathcal{D}_2 \cup \cdots \cup \mathcal{D}_n$. This requires the
-\cdots \cup \mathcal{D}_n$. This requires the development of some
+development of some cross-repository query mechanism, something
-cross-repository query mechanism, something GraphDB currently only
+GraphDB currently only offers limited support
-offers limited support for~\cite{graphdbnested}.
+for~\cite{graphdbnested}.
+In summary, we see that versioning is a potential challenge for a
+greater tetrapodal search system. While not a pressing issue for
+\emph{ulo-storage} now, we consider this a topic of future research.
 \subsection{Endpoint}\label{sec:endpoints}
 Finally, we need to discuss how \emph{ulo-storage} realizes the
 Endpoint. Recall that the Endpoint provides the programming interface
-for systems that wish to query our collection of organizational
+for applications that wish to query our collection of organizational
 knowledge. In practice, the choice of Endpoint programming interface
-is determined by the choice backing database storage.
+is determined by the choice of what database system to use.
 In our project, organizational knowledge is formulated as
 RDF~triplets.  The canonical choice for us is to use a triple store,
@@ -185,45 +196,45 @@ at~\cite{graphdbfree}.
 \subsubsection{Transitive Queries}
-A big advantage of GraphDB compared to other systems, such as Virtuoso
+A notable advantage of GraphDB compared to other systems such as
-Open Source~\cite{wikivirtuoso} used in previous work related to the
+Virtuoso Open Source~\cite{wikivirtuoso, ulo} is that GraphDB supports
-upper level ontology~\cite{ulo}, is that it supports recent versions
+recent versions of the SPARQL query language~\cite{graphdbsparql} and
-of SPARQL~\cite{graphdbsparql} and OWL~Reasoning~\cite{owlspec,
+OWL~Reasoning~\cite{owlspec, graphdbreason}.  In particular, this
-graphdbreason}.  In particular, this means that GraphDB offers
+means that GraphDB offers support for transitive queries as described
-support for transitive queries as described in previous
+in previous work~\cite{ulo}. A transitive query is one that, given a
-work~\cite{ulo}. A transitive query is one that, given a relation~$R$,
+relation~$R$, asks for the transitive closure~$S$ of~$R$~\cite{tc}
-asks for the transitive closure~$S$ of~$R$~\cite{tc}
 (Figure~\ref{fig:tc}).
+\input{implementation-transitive-closure.tex}
 In fact, GraphDB supports two approaches for realizing transitive
 queries.  On one hand, GraphDB supports the
-\texttt{owl:TransitiveProperty} property that defines a given
+\texttt{owl:TransitiveProperty}~\cite[Section 4.4.1]{owlspec} property
-predicate~$P$ to be transitive. With $P$~marked in this way, querying
+that defines a given predicate~$P$ to be transitive. With $P$~marked
-the knowledge base is equivalent to querying the transitive closure
+in this way, querying the knowledge base is equivalent to querying the
-of~$P$. This, of course, requires transitivity to be hard-coded into
+transitive closure of~$P$. This, of course, requires transitivity to
-the knowledge base. If we only wish to query the transitive closure
+be hard-coded into the knowledge base. If we only wish to query the
-for a given query, we can take advantage of property
+transitive closure for a given query, we can take advantage of
-paths~\cite{paths} which allows us to indicate that a given
+property paths~\cite{paths} which allows us to indicate that a given
 predicate~$P$ is to be understood as transitive when querying. Only
-during querying is the transitive closure then evaluated.
+during querying is the transitive closure then evaluated. Either way,
+GraphDB supports transitive queries without awkward workarounds
+necessary in other systems~\cite{ulo}.
-\input{implementation-transitive-closure.tex}
+\subsubsection{SPARQL Endpoint}
 There are multiple approaches to querying the GraphDB triple store,
 one based around the standardized SPARQL query language and the other
 on the RDF4J Java library. Both approaches have unique advantages.
+Let us first take a look at {SPARQL}, which is a standardized query
-\subsubsection{SPARQL Endpoint}
+language for RDF triplet data~\cite{sparql}. The specification
+includes not just syntax and semantics of the language itself, but
-SPARQL is a standardized query language for RDF triplet
+also a standardized REST interface~\cite{rest} for querying database
-data~\cite{sparql}. The specification includes not just syntax and
+servers.
-semantics of the language itself, but also a standardized REST
-interface~\cite{rest} for querying database servers.
+SPARQL was inspired by SQL and as such the \texttt{SELECT}
+\texttt{WHERE} syntax should be familiar to many software developers.
-\noindent\textbf{Syntax} SPARQL is inspired by SQL and as such the
+A simple query that returns all triplets in the store looks like
-\texttt{SELECT} \texttt{WHERE} syntax should be familiar to many
-software developers.  A simple query that returns all triplets in the
-store looks like
 \begin{lstlisting}
    SELECT * WHERE { ?s ?p ?o }
 \end{lstlisting}
@@ -233,50 +244,49 @@ query variables. In this particular case, the database would return a
 table of all triplets in the store sorted by subject~\texttt{?o},
 predicate~\texttt{?p} and object~\texttt{?o}.
-\noindent\textbf{Advantage} Probably the biggest advantage is that SPARQL
+Probably the biggest advantage is that SPARQL is ubiquitous. As it is
-is ubiquitous. As it is the de facto standard for querying
+the de facto standard for querying triple stores, lots of
-triple stores, lots of implementations and documentation are
+implementations (client and server) as well as documentation are
 available~\cite{sparqlbook, sparqlimpls, gosparql}.
 \subsubsection{RDF4J Endpoint}
-RDF4J is a Java API for interacting with triple stores, implemented
+SPARQL is one way of accessing a triple store database. Another
-based on a superset of the {SPARQL} REST interface~\cite{rdf4j}.
+approach is RDF4J, a Java API for interacting with RDF graphs,
-GraphDB is one of the database servers that supports RDF4J, in fact it
+implemented based on a superset of the {SPARQL} REST
-is the recommended way of interacting with GraphDB
+interface~\cite{rdf4j}.  GraphDB is one of the database servers that
-repositories~\cite{graphdbapi}.
+supports RDF4J, in fact it is the recommended way of interacting with
+GraphDB repositories~\cite{graphdbapi}.
-\noindent\textbf{Syntax} Instead of formulating textual queries, RDF4J allows
+Instead of formulating textual queries, RDF4J allows developers to
-developers to query a repository by calling Java API methods. Previous
+query a knowledge base by calling Java library methods. Previous query
-query that requests all triplets in the store looks like
+that asks for all triplets in the store looks like
 \begin{lstlisting}
    connection.getStatements(null, null, null);
 \end{lstlisting}
 in RDF4J. \texttt{getStatements(s, p, o)} returns all triplets that
 have matching subject~\texttt{s}, predicate~\texttt{p} and
 object~\texttt{o}. Any argument that is \texttt{null} can be replace
-with any value, i.e.\ it is a query variable to be filled by the call
+with any value, that is it is a query variable to be filled by the
-to \texttt{getStatements}.
+call to \texttt{getStatements}.
-\noindent\textbf{Advantage} Using RDF4J does introduce a dependency on the JVM
+Using RDF4J does introduce a dependency on the JVM and its
-and its languages. But in practice, we found RDF4J to be quite
+languages. But in practice, we found RDF4J to be quite convenient,
-convenient, especially for simple queries, as it allows us to
+especially for simple queries, as it allows us to formulate everything
-formulate everything in a single programming language rather than
+in a single programming language rather than mixing programming
-mixing programming language with awkward query strings.
+language with awkward query strings. We also found it quite helpful to
+generate Java classes from OWL~ontologies that contain all definitions
-We also found it quite helpful to generate Java classes from
+of the ontology as easily accessible constants~\cite{rdf4jgen}.  This
-OWL~ontologies that contain all definitions of the
+provides us with powerful IDE auto completion features during
-ontology~\cite{rdf4jgen}.  This provides us with powerful IDE auto
+development of ULO applications.
-completion features during development of ULO applications.
+Summarizing the last two sections, we see that both SPARQL and RDF4J
-\subsubsection{Endpoints in \emph{ulo-storage}}
+have unique advantages. While SPARQL is an official W3C standard and
+implemented by more database systems, RDF4J can be more convenient
-We see that both SPARQL and RDF4J have unique advantages. While SPARQL
+when dealing with JVM-based projects.  For \emph{ulo-storage}, we
-is an official W3C standard and implemented by more database systems,
+played around with both interfaces and chose whatever seemed more
-RDF4J can be more convenient when dealing with JVM-based code bases.
+convenient at the moment. We recommend any implementors to do the
-For \emph{ulo-storage}, we played around with both interfaces and
+same.
-chose whatever seemed more convenient at the moment. We recommend any
-implementors to do the same.
 \subsection{Deployment and Availability}
@@ -291,17 +301,16 @@ environment for running a given application~\cite[pp. 22]{dockerbook}.
 Docker Compose then is a way of combining individual Docker containers
 to run a full tech stack of application, database server and so
 on~\cite[pp. 42]{dockerbook}. All configuration of such a setup is
-stored in a Docker Compose file that describes the tech stack.
+stored in a Docker Compose file that describes the software stack.
 For \emph{ulo-storage}, we provide a single Docker Compose file which
 starts three containers, namely (1)~the Collector/Importer web
-interface, (2)~a database server for that web interface such that it
+interface, (2)~a GraphDB instance which provides us with the required
-can persist import jobs and finally (3)~a GraphDB instance which
+Endpoint and (3)~some test applications that use that Endpoint.  All
-provides us with the required Endpoint. All code for Collector and
+code for Collector and Importer is available in the
-Importer is available in the \texttt{ulo-storage-collect} Git
+\texttt{ulo-storage-collect} Git repository~\cite{gorepo}.  Additional
-repository~\cite{gorepo}.  Additional deployment files, that is Docker
+deployment files, that is Docker Compose and additional Dockerfiles
-Compose and additional Dockerfiles are stored in a separate
+are stored in a separate repository~\cite{dockerfilerepo}.
-repository~\cite{dockerfilerepo}.
 This concludes our discussion of the implementation developed for the
 \emph{ulo-storage} project. We designed a system based around (1)~a

--- a/doc/report/introduction.tex
+++ b/doc/report/introduction.tex
@@ -2,12 +2,12 @@
 To tackle the vast array of mathematical publications, various ways of
 \emph{computerizing} mathematical knowledge have been experimented
-with.  As it is already difficult for human mathematicians to keep
+with. Already it is difficult for human mathematicians to keep even a
-even a subset of all mathematical knowledge in their mind, a hope is
+subset of all mathematical knowledge in their mind, a problem referred
-that computerization will yield great improvement to formal research
+to as the ``one brain barrier'' in literature~\cite{onebrain}.  A hope
-by making the results of all collected publications readily available
+is that computerization will yield great improvement to formal
-and easy to search. In literature, this problem is referred to as the
+research by making the results of all collected publications readily
-``one brain barrier''~\cite{onebrain}.
+available and easy to search.
 One research topic in this field is the idea of a \emph{tetrapodal
 search} that combines four distinct areas of mathematical knowledge.
@@ -24,30 +24,32 @@ structure, tetrapodal search proposes that each kind of mathematical
 knowledge should be made available in a storage backend that fits the
 kind of data it is providing. With all four areas available for
 querying, tetrapodal search intends to then combine the four indexes
-into a single query interface. Currently, research is focused on
+into a single query interface. Currently, research aims at providing
-providing schemas, storage backends and indexes for the four different
+schemas, storage backends and indexes for the four different kinds of
-kinds of mathematical knowledge. The focus of \emph{ulo-storage} is
+mathematical knowledge. In this context, the focus of
-the area of organizational knowledge.
+\emph{ulo-storage} is the area of organizational knowledge.
-A previously proposed way to structure such organizational data is the
+A previously proposed way to structure organizational knowledge is the
 \emph{upper level ontology} (ULO)~\cite{ulo}. ULO takes the form of an
 OWL~ontology~\cite{uloonto} and as such all organization information
 is stored as RDF~triplets with a unified schema of
 ULO~predicates~\cite{owl}.  Some effort has been made to export
-existing databases of formal mathematical knowledge to {ULO}. In
+existing collections of formal mathematical knowledge to {ULO}. In
 particular, exports from Isabelle and Coq-based libraries are
-available~\cite{uloisabelle, ulocoq}. The resulting data set is already
+available~\cite{uloisabelle, ulocoq}. The resulting data set is
-quite large, the Isabelle export alone containing more than
+already quite large, the Isabelle export alone containing more than
 200~million triplets.
 Existing exports from Isabelle and Coq result in a set of XML~files
 that contain RDF~triplets. This is a convenient format for exchange
-and easily versioned using Git. However, considering the vast number
+and easily tracked using version control systems such as
-of triplets, it is impossible to query easily and efficiently in this
+Git~\cite{gitpaper} as employed by MathHub~\cite{mh}. However,
-state. This is what \emph{ulo-storage} is focused on: Making ULO data
+considering the vast number of triplets, it is impossible to query
-sets accessible for querying and analysis. We collected RDF files
+easily and efficiently in this state. This is what \emph{ulo-storage}
-spread over different Git repositories, imported them into a database
+is focused on: Making ULO data sets accessible for querying and
-and then experimented with APIs for accessing that data set.
+analysis. We collected RDF files spread over different Git
+repositories, imported them into a database and then experimented with
+APIs for accessing that data set.
 The main contribution of \emph{ulo-storage} is twofold. First, (1)~we
 built up various infrastructure components for making organizational

--- a/doc/report/references.bib
+++ b/doc/report/references.bib
@@ -401,3 +401,22 @@
    urldate = {2020-09-27},
    url = {https://www.w3.org/2009/sparql/wiki/Feature:PropertyPaths},
 }
+@article{gitpaper,
+  title={GIT--A stupid content tracker},
+  author={Hamano, Junio C},
+  journal={Proc. Ottawa Linux Sympo},
+  volume={1},
+  pages={385--394},
+  year={2006},
+  publisher={Citeseer}
+}
+@incollection{mh,
+  title={System description: MathHub. info},
+  author={Iancu, Mihnea and Jucovschi, Constantin and Kohlhase, Michael and Wiesing, Tom},
+  booktitle={Intelligent Computer Mathematics},
+  pages={431--434},
+  year={2014},
+  publisher={Springer}
+}
--- a/doc/report/report.tex
+++ b/doc/report/report.tex
@@ -3,6 +3,7 @@
 \usepackage{amsmath}
 \usepackage{booktabs}
 \usepackage{caption}
+\usepackage{fancybox, graphicx}
 \usepackage{float}
 \usepackage{geometry}
 \usepackage{graphicx}