Skip to content
Snippets Groups Projects
endpoints.tex 3.39 KiB
Newer Older
  • Learn to ignore specific revisions
  • \section{Endpoints}\label{sec:endpoints}
    
    With ULO triplets imported into the GraphDB triplet store by Collecter
    and Importer, we now have all data available necessary for querying.
    As discussed before, querying from applications happens through an
    Endpoint that exposes some kind of {API}. The interesting question
    here is probably not so much the implementation of the endpoint itself,
    rather it is the choice of API than can make or break such a project.
    
    There are multiple approaches to querying the GraphDB triplet store,
    one based around the standardized SPARQL query language and the other
    on the RDF4J Java library implemented by various vendors. Both
    approaches have unique advantages.
    
    \subsection{Available Application Interfaces}
    
    \begin{itemize}
        \item SPARQL is a standardized query language for RDF triplet
          data~\cite{sparql}. The spec includes not just syntax and
          semantics of the language itself, but also a standardized REST
          interface for querying databases.  Various implementations of
          this standard, e.g.~\cite{gosparql}, are available so using
          SPARQL has the advantage of making us independent of a specific
          programming language or environment.
    
          SPARQL is inspired by SQL and as such the \texttt{SELECT}
          \texttt{WHERE} syntax should be familiar to many software
          developers.  A simple query that returns all triplets in the
          store looks like
          \begin{verbatim}
              SELECT * WHERE { ?s ?p ?o }
          \end{verbatim}
          where \texttt{?s}, \texttt{?p} and \texttt{?o} are query
          variables. The result of a query are valid substitutions for the
          query variables. In this case, the database would return a table
          of all triplets in the store sorted by subject~\texttt{?o},
          predicate~\texttt{?p} and object~\texttt{?o}.
    
          Of course, queries might return a lot of data. Importing just
          the Isabelle exports into GraphDB results in more than 200
          million triplets. For practical applications it will be
          necessary to limit the number of result or use pagination
          techniques~\cite{sparqlpagination}.
    
        \item RDF4J is a Java API for interacting with triplet stores,
          implemented based on a superset of
          {SPARQL}~\cite{rdf4j}. GraphDB supports RDF4J, in fact it is the
          recommended way of interacting with GraphDB
          repositories~\cite{graphdbapi}. Instead of formulating textual
          queries, RDF4J allows developers to query a repository by
          calling Java API methods. Above query that returns all triplets
          in the store looks like
          \begin{verbatim}
              connection.getStatements(null, null, null);
          \end{verbatim}
          in RDF4J. Method \texttt{getStatements(s, p, o)} returns all
          triplets that have matching subject~\texttt{s},
          predicate~\texttt{p} and object~\texttt{o}. If any of these
          arguments is \texttt{null}, it can be any value, i.e.\ it is a
          query variable that is to be filled by the call to
          \texttt{getStatements}.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    
          Using RDF4J does introduce a dependency on the JVM family of
          languages, but also offers some conveniences. For example, we
          can generate Java classes that contain all URIs in an OWL
          ontology as constants~\cite{rdf4jgen}. In combination with IDE
          support, we found this to be very convenient when writing
          applications that interface with ULO data sets.
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    
    \end{itemize}
    
    \subsection{Comparison}
    
    Andreas Schärtl's avatar
    Andreas Schärtl committed
    
    \emph{TODO}