Newer
Older
With ULO/RDF triplets imported into the GraphDB triplet store, we have
all data available for querying. There are multiple approaches to
querying this database, one based around the standardized SPARQL query
language and the other on the RDF4J Java library implemented by
various vendors. We will quickly look at both approaches as they have
unique advantages.
\subsection{SPARQL}
SPARQL is a standardized query language for RDF triplet
data~\cite{sparql}. The spec includes not just syntax and semantics of
the language itself, but also a standardized REST interface for
querying databases. Various implementations of this standard,
e.g.~\cite{gosparql}, are available so using SPARQL has the advantage
of making us independent of a specific programming language or
environment.
SPARQL is inspired by SQL and as such the \texttt{SELECT} \texttt{WHERE}
syntax should be familiar to many software developers. A simple query
that returns all triplets in the store looks like
\begin{verbatim}
SELECT * WHERE { ?s ?p ?o }
\end{verbatim}
where \texttt{?s}, \texttt{?p} and \texttt{?o} are query
variables. The result of a query are valid substitutions for the query
variables. In this case, the database would return a table of all
triplets in the store sorted by subject~\texttt{?o},
predicate~\texttt{?p} and object~\texttt{?o}.
Of course, queries might return a lot of data. Importing just the
Isabelle exports into GraphDB results in more than 200 million
triplets. For practical applications it will be necessary to limit
the number of result or use pagination
techniques~\cite{sparqlpagination}.
\subsection{RDF4J}
RDF4J is a Java API for interacting with triplet stores, implemented
based on a superset of {SPARQL}~\cite{rdf4j}. GraphDB supports RDF4J,
in fact it is the recommended way of interacting with GraphDB
repositories~\cite{graphdbapi}. Instead of formulating textual
queries, RDF4J allows developers to query a repository by calling Java
API methods. Above query that returns all triplets in the store looks
like
\begin{verbatim}
connection.getStatements(null, null, null);
\end{verbatim}
in RDF4J. Method \texttt{getStatements(s, p, o)} returns all triplets
that have matching subject~\texttt{s}, predicate~\texttt{p} and
object~\texttt{o}. If any of these arguments is \texttt{null}, it can
be any value, i.e.\ it is a query variable that is to be filled by the
call to \texttt{getStatements}.
Using RDF4J does introduce a dependency on the JVM family of
languages, but also offers some conveniences. For example, we can
generate Java classes that contain all URIs in an OWL ontology as
constants~\cite{rdf4jgen}. In combination with IDE support, we found
this to be very convenient when writing applications that interface
with ULO data sets.
\subsection{Comparision}
\emph{TODO}