Newer
Older
With ULO triplets imported into the GraphDB triplet store by Collecter
and Importer, we now have all data available necessary for querying.
As discussed before, querying from applications happens through an
Endpoint that exposes some kind of {API}. The interesting question
here is probably not so much the implementation of the endpoint itself,
rather it is the choice of API than can make or break such a project.
There are multiple approaches to querying the GraphDB triplet store,
one based around the standardized SPARQL query language and the other
on the RDF4J Java library implemented by various vendors. Both
approaches have unique advantages.
\subsection{Available Application Interfaces}
\begin{itemize}
\item SPARQL is a standardized query language for RDF triplet
data~\cite{sparql}. The spec includes not just syntax and
semantics of the language itself, but also a standardized REST
interface for querying databases. Various implementations of
this standard, e.g.~\cite{gosparql}, are available so using
SPARQL has the advantage of making us independent of a specific
programming language or environment.
SPARQL is inspired by SQL and as such the \texttt{SELECT}
\texttt{WHERE} syntax should be familiar to many software
developers. A simple query that returns all triplets in the
store looks like
\begin{verbatim}
SELECT * WHERE { ?s ?p ?o }
\end{verbatim}
where \texttt{?s}, \texttt{?p} and \texttt{?o} are query
variables. The result of a query are valid substitutions for the
query variables. In this case, the database would return a table
of all triplets in the store sorted by subject~\texttt{?o},
predicate~\texttt{?p} and object~\texttt{?o}.
Of course, queries might return a lot of data. Importing just
the Isabelle exports into GraphDB results in more than 200
million triplets. For practical applications it will be
necessary to limit the number of result or use pagination
techniques~\cite{sparqlpagination}.
\item RDF4J is a Java API for interacting with triplet stores,
implemented based on a superset of
{SPARQL}~\cite{rdf4j}. GraphDB supports RDF4J, in fact it is the
recommended way of interacting with GraphDB
repositories~\cite{graphdbapi}. Instead of formulating textual
queries, RDF4J allows developers to query a repository by
calling Java API methods. Above query that returns all triplets
in the store looks like
\begin{verbatim}
connection.getStatements(null, null, null);
\end{verbatim}
in RDF4J. Method \texttt{getStatements(s, p, o)} returns all
triplets that have matching subject~\texttt{s},
predicate~\texttt{p} and object~\texttt{o}. If any of these
arguments is \texttt{null}, it can be any value, i.e.\ it is a
query variable that is to be filled by the call to
\texttt{getStatements}.
Using RDF4J does introduce a dependency on the JVM family of
languages, but also offers some conveniences. For example, we
can generate Java classes that contain all URIs in an OWL
ontology as constants~\cite{rdf4jgen}. In combination with IDE
support, we found this to be very convenient when writing
applications that interface with ULO data sets.
\end{itemize}
\subsection{Comparison}