Newer
Older
Endpoints
=========
With ULO/RDF triplets imported into a database, in our case GraphDB, we
have all data available for querying. There are multiple approaches
to querying such triplet stores.
SPARQL
------
SPARQL [1] is a standardized query language for RDF triplet data. The spec
includes not just syntax and semantics of the language itself, but also
a standardized REST interface for querying databases.
Various implementations of this standard, e.g. [2], are available so
using SPARQL has the advantage of making us independent of a specific
programming language or environment.
SPARQL is inspired by SQL, a simple query that returns all triplets
in the store looks like
where `s`, `p` and `o` are query variables. The result of a query are
valid substitutions for the query variables. In this case, the
database would return a table of all triplets in the store sorted by
subject `s`, predicate `p` and object `o`.
Of course, queries might return a lot of data. Importing just the Isabelle exports [3] into GraphDB results in >200M triplets. This
is solved with pagination techniques [4].
RDF4J
-----
RDF4J [5] is a Java API for interacting with triplet stores, implemented
based on a superset of SPARQL. GraphDB supports RDF4J, in fact it is the
recommended way of interacting with GraphDB repositories [6].
Instead of formulating textual queries, RDF4J allows developers to query
a repository by calling Java API methods. Above query that returns all triplets
in the store looks like
connection.getStatements(null, null, null);
in RDF4J. Method `getStatements(s, p, o)` returns all triplets that
have matching subject `s`, predicate `p` and object `o`. If any of
these arguments is `null`, it can be any value, i.e. it is a query
variable that is to be filled by the call to `getStatements`.
Comparing SPARQL to RDF4J
-------------------------
While plain SPARQL offers independence from the JVM, RDF4J is a
convenient interface that makes it easy to write applications with
problems like pagination already taken care of.
Of course the interesting question for the task of building an ULO/RDF
endpoint is what works better for this particular application.
References
----------
[1] https://www.w3.org/TR/rdf-sparql-query/
[2] https://godoc.org/github.com/knakk/sparql
[3] https://gl.mathhub.info/Isabelle
[4] https://stackoverflow.com/questions/27488403/paginating-sparql-results
[5] https://rdf4j.org/
[6] http://graphdb.ontotext.com/documentation/free/using-graphdb-with-the-rdf4j-api.html