Skip to content
Snippets Groups Projects
endpoints.tex 2.79 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Endpoints}\label{sec:endpoints}
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    With ULO/RDF triplets imported into the GraphDB triplet store, we have
    all data available for querying. There are multiple approaches to
    querying this database, one based around the standardized SPARQL query
    language and the other on the RDF4J Java library implemented by
    various vendors. We will quickly look at both approaches as they have
    unique advantages.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    SPARQL is a standardized query language for RDF triplet
    data~\cite{sparql}. The spec includes not just syntax and semantics of
    the language itself, but also a standardized REST interface for
    querying databases.  Various implementations of this standard,
    e.g.~\cite{gosparql}, are available so using SPARQL has the advantage
    of making us independent of a specific programming language or
    environment.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    SPARQL is inspired by SQL and as such the \texttt{SELECT} \texttt{WHERE}
    syntax should be familiar to many software developers.  A simple query
    that returns all triplets in the store looks like
    
    \begin{verbatim}
        SELECT * WHERE { ?s ?p ?o }
    \end{verbatim}
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    where \texttt{?s}, \texttt{?p} and \texttt{?o} are query
    variables. The result of a query are valid substitutions for the query
    variables. In this case, the database would return a table of all
    triplets in the store sorted by subject~\texttt{?o},
    predicate~\texttt{?p} and object~\texttt{?o}.
    
    
    Of course, queries might return a lot of data. Importing just the
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    Isabelle exports into GraphDB results in more than 200 million
    triplets. For practical applications it will be necessary to limit
    the number of result or use pagination
    
    techniques~\cite{sparqlpagination}.
    
    \subsection{RDF4J}
    
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    RDF4J is a Java API for interacting with triplet stores, implemented
    based on a superset of {SPARQL}~\cite{rdf4j}. GraphDB supports RDF4J,
    in fact it is the recommended way of interacting with GraphDB
    repositories~\cite{graphdbapi}. Instead of formulating textual
    queries, RDF4J allows developers to query a repository by calling Java
    API methods. Above query that returns all triplets in the store looks
    like
    
    \begin{verbatim}
        connection.getStatements(null, null, null);
    \end{verbatim}
    in RDF4J. Method \texttt{getStatements(s, p, o)} returns all triplets
    that have matching subject~\texttt{s}, predicate~\texttt{p} and
    object~\texttt{o}. If any of these arguments is \texttt{null}, it can
    be any value, i.e.\ it is a query variable that is to be filled by the
    call to \texttt{getStatements}.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    Using RDF4J does introduce a dependency on the JVM family of
    languages, but also offers some conveniences. For example, we can
    generate Java classes that contain all URIs in an OWL ontology as
    constants~\cite{rdf4jgen}. In combination with IDE support, we found
    this to be very convenient when writing applications that interface
    with ULO data sets.
    
    \subsection{Comparision}
    
    \emph{TODO}