diff --git a/doc/endpoints.md b/doc/endpoints.md new file mode 100644 index 0000000000000000000000000000000000000000..31307b9a92f73fcd1203eb939c8ce0231b05b314 --- /dev/null +++ b/doc/endpoints.md @@ -0,0 +1,43 @@ +Endpoints +========= + +With ULO/RDF triplets imported into a database, in our case GraphDB, we +have all data available for querying. There are multiple approaches +to querying such triplet stores. + +SPARQL +------ + +SPARQL [1] is a standardized query language for RDF triplet data. The spec +includes not just syntax and semantics of the language itself, but also +a standardized REST interface for querying databases. + +Various implementations of this standard, e.g. [2], are available so +using SPARQL has the advantage of making us independent of a specific +programming language or environment. + +SPARQL is inspired by SQL, a simple query that returns all triplets +in the store looks like + + SELECT * + WHERE { ?s ?p ?o } + +where `s`, `p` and `o` are query variables. The result of a query are +valid substitutions for the query variables. In this case, the +database would return a table of all triplets in the store sorted by +subject `s`, predicate `p` and object `o`. + +Of course, queries might return a lot of data. Importing just the +Isabelle exports [3] into GraphDB results in >200M triplets. This +is solved with pagination techniques [4]. + +References +---------- + +[1] https://www.w3.org/TR/rdf-sparql-query/ + +[2] https://godoc.org/github.com/knakk/sparql + +[3] https://gl.mathhub.info/Isabelle + +[4] https://stackoverflow.com/questions/27488403/paginating-sparql-results