4. Querying RDFox¶
RDFox supports most SPARQL 1.1 Query Language features, as well as the most commonly needed parts of SPARQL 1.1 Update. It also implements a few proprietary built-in functions that are not part of SPARQL 1.1.
In this section, we describe in detail the support in RDFox for the SPARQL 1.1 standard specification, describe the built-in functions that are specific to RDFox.
Finally, we also describe the functionality implemented in RDFox for monitoring query execution.
4.1. SPARQL 1.1 Support¶
The SPARQL 1.1 specification provides a suite of languages and protocols for querying and manipulating RDF graph data.
4.1.1. Query Language¶
The core of the specification is the SPARQL 1.1 Query Language, which specifies the syntax and semantics of allowed queries. The SPARQL 1.1 query language extends the previous version of SPARQL with a number of important features for applications, including nested subqueries, aggregation, negation, creating values by expressions, and named graphs, and property paths.
RDFox provides full support of the SPARQL 1.1 query language with
the only exception of property
paths,
the BNODE
function,
and the non-normative DESCRIBE query
form.
The functionality provided by property paths is, however,
already covered to a large extent by Datalog rules (as will be described
in Section 5.4). The intended semantics of BNODE
and DESCRIBE
is,
in our opinion, not sufficiently specified in the standard, which is why
those features have not yet been implemented in RDFox.
4.1.1.1. Answer Formats¶
Results of SELECT
queries in SPARQL 1.1 are often represented in
tabular form in applications. In order for query results to be easily
exchanged in a machine-readable format, the SPARQL 1.1 specification
describes four common exchange formats in three different
documents:
XML,
JSON,
CSV
, and
TSV.
All of these formats are fully supported in RDFox (see Section 8.9.2 for further details).
4.1.1.2. Deviations from the Standard¶
The implementation of SPARQL in RDFox deviates slightly from the SPARQL
1.1 specification in that the variables introduced in the SELECT
clause can be used in the HAVING
clause. The following example
illustrates that point.
Example Differences in the implementation of HAVING
In the following query, variable ?a
is computed to contain the area
of a rectangle. However, this is done in the SELECT
clause so,
according to the SPARQL 1.1 specification, this should be done after the
HAVING
clause is evaluated. Thus, when condition ?a < 10
in the
HAVING
clause is evaluated, variable ?a
should be unbound.
SELECT ?r ?w ?h (?w * ?h AS ?a) WHERE { ?r :width ?w . ?r :height ?h } HAVING (?a < 10)
Because of this order of evaluation, queries with the HAVING
clause
often contain repetition; for example, the above query must be written
as follows.
SELECT ?r ?w ?h (?w * ?h AS ?a) WHERE { ?r :width ?w . ?r :height ?h } HAVING (?w * h < 10)
In order to eliminate such repetition, RDFox evaluates the HAVING
clause after computing the variable bindings in the SELECT
clause,
which is often convenient. On queries that either do not contain the
HAVING
clause or where the HAVING
clause does not reference
variables introduced in the SELECT
clause, RDFox follows the SPARQL
1.1 standard closely.
4.1.2. Update Language¶
SPARQL 1.1 provides an update language) for RDF graphs. In particular the update language allows users to insert triples into a store, delete triples from a store, load an RDF graph into a store, clear an RDF graph in a store, create a new RDF graph in a store, drop an RDF graph from a store, copy (move, or add)= the content of one store to another, and perform a group of update operations as a single action.
RDFox supports INSERT and DELETE
operations,
which can be used to remove or add triples from/to the Graph Store based
on bindings for a query pattern specified in a where
clause.
4.2. RDFox Proprietary Extensions¶
RDFox implements a number of extensions to the SPARQL 1.1 standard, which have proved useful in a number of practical use cases. This section specifies each of these extensions and illustrates their use by means of examples.
4.2.1. Built-in Functions¶
RDFox provides a number of built-in functions which extend the function definitions in the SPARQL 1.1 query language.
4.2.1.1. The SKOLEM Function¶
The SKOLEM
function creates a new resource based on the resources
obtained from the evaluation of a given list of expressions. This
function is useful, for instance, when trying to represent a k-ary
relation in RDF.
Syntax:
SKOLEM(exp1, ..., expk)
Description: The function creates a new resource based on the
resources evaluated from exp1
through expk
, the
first of which must be of data type
http://www.w3.org/2001/XMLSchema#string.
Example
Consider the data in our Getting Started guide. Let us issue the
following INSERT
statements after loading the data.
INSERT { ?x :employs ?y } WHERE { ?x :forename "Lois" . ?y :forename "Peter" }
INSERT {
?employment :hasEmployer ?employer ;
:hasEmployee ?employee .
}
WHERE {
?employer :employs ?employee .
BIND(SKOLEM("employment", ?employer, ?employee) as ?employment)
}
In this example, the first INSERT
statement adds to the data store
the triple :lois :employs :peter
. The function SKOLEM
is used in
the second INSERT
statement to create a new “employment” resource
for each triple with the predicate :employs
. The new resource is
then used in the subject position of two new triples connecting it to
its employer and employee resources. In particular the query
SELECT ?x ?y WHERE { ?x :hasEmployer ?y }
produces a single answer indicating that the newly created employment
resource has :lois
as employer. It would now be possible to add the
start and end dates of each employment as direct relations of the new
resource, where it would have been impossible to hold that information
in relation to the original binary :employs
relations.
4.2.1.2. The DATETIME Function¶
Syntax:
DATETIME(year, month, day, hour, minute, second)
DATETIME(year, month, day, hour, minute, second, offset)
Description: The function takes as input integer values for year,
month, day, hour, minute, second, and optionally timezone offset, and
transforms these into a resource of type xsd:dateTime
.
Example: Consider the data in our previous example on the use of
SKOLEM
. Let us issue the following INSERT
statement, which adds
start and end dates to the employment instance in our previous example,
where :lois
was the employer.
INSERT {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed
}
WHERE {
?employment :hasEmployer :lois .
BIND(DATETIME(2020,4,5,9,0,0) as ?sd) .
BIND(DATETIME(2022,4,4,9,0,0) as ?ed))
}
This statement attaches a start date of 9am April 5th 2020 and an end
date of 9am April 4th 2022 to the employment, where the dates will be
represented using xsd:dateTime
.
4.2.1.3. TIME_ON_TIMELINE Function¶
Syntax:
TIME_ON_TIMELINE(date)
Description: The function takes as input a value of one of the date/time data types, and it returns a decimal number representing its time on timeline.
Example: Consider the previous example, where we added a start date to
an employment. The following INSERT
statement converts all
employment start dates to a decimal number using the time on
timeline
specification. This decimal number is attached to the employment using a
new property :hasStartDateOnTimeline
.
INSERT {
?employment :hasStartDateOnTimeline ?dot
}
WHERE {
?employment :hasStartDate ?date .
BIND(TIME_ON_TIMELINE(?date) as ?dot)
}
4.2.1.4. MINFN Function¶
Syntax:
MINFN(exp1, ..., expk)
Description: The function evaluates the expressions exp1 through expk and returns the smallest value.
Example: The following query returns, for each employment, the minimum between its start and end dates.
SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MINFN(?sd, ?ed) as ?md)
}
4.2.1.5. MAXFN Function¶
Syntax:
MAXFN(exp1, ..., expk)
Description: The function evaluates the expressions exp1 through expk and returns the largest value.
Example: The following query returns, for each employment, the maximum between its start and end dates.
SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MAXFN(?sd, ?ed) as ?md)
}
4.3. Monitoring Query Execution¶
RDFox implements functionality for monitoring the execution of queries. In particular, users can gain access to query plans generated by the RDFox query optimizer as well as to useful statistics about the execution of such plans.
Suppose that we initialize a data store with the example data in our Getting Started guide. The following shell command provides access to the query plans produced by RDFox:
set query.explain true
Now, let’s issue the following SPARQL query against the store
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }
which returns the following answers:
:meg "Meg" .
:stewie "Stewie" .
:chris "Chris" .
:meg "Meg" .
The shell now also displays the query plan that has been actually executed.
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?n ?p }
CONJUNCTION { --> ?n ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :person] { ?p ?z --> ?p ?z } TripleTableIterator
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The query plan is executed top-down in a depth-first-search manner and we can think of solution variable bindings as being generated one-at-a-time. It is useful to go in more detail through the execution of the plan for a given solution binding.
When we first “visit” the PROJECT
block, we haven’t obtained any
variable bindings yet (hence the empty space of the left of the “–>”
symbol); in contrast, by the time we have finished executing the subplan
underneath, we will have obtained a binding of variables ?n
and
?p
and hence an answer to the query (as reflected on the right-hand
side of the “–>” symbol). Similarly, when we first visit the
CONJUNCTION
block, which performs the join of the query, we have an
empty binding and, by the time we return from it, we will have a binding
for ?n
, ?p
and ?z
. The join is performed also top-down.
First, we obtain a binding for ?p
and ?z
by matching the the
triple pattern [?p, :hasParent, ?z]
. We then consider the second
triple pattern [?p, rdf:type, :person]
and finally the third triple
pattern [?p, :forename, ?n]
, which extends the binding by providing
also a value for variable ?n
.
Let us consider a slightly more complex query, which uses the
OPTIONAL
operator in SPARQL.
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :hasParent ?z . OPTIONAL { ?p :forename ?n } }
RDFox will execute the following query plan:
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?p | ?n }
OPTIONAL { --> ?p ?z | ?n } OptionalIterator
CONJUNCTION { --> ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :person] { ?p ?z --> ?p ?z } TripleTableIterator
FILTER true
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The important difference to notice in this plan is the use of the ” | ” symbol. The variables on the left-hand-side of “|” are always bound by the corresponding block, whereas those indicated on the right-hand-side may or may not be returned.