4. Querying RDFox¶
RDFox supports most SPARQL 1.1 Query Language features, as well as the most commonly needed parts of SPARQL 1.1 Update. It also implements a few proprietary built-in functions that are not part of SPARQL 1.1.
In this section, we describe in detail the support in RDFox for the SPARQL 1.1 standard specification, describe the built-in functions that are specific to RDFox.
Finally, we also describe the functionality implemented in RDFox for monitoring query execution.
4.1. SPARQL 1.1 Support¶
The SPARQL 1.1 specification provides a suite of languages and protocols for querying and manipulating RDF graph data.
4.1.1. Query Language¶
The core of the specification is the SPARQL 1.1 Query Language, which specifies the syntax and semantics of allowed queries. The SPARQL 1.1 query language extends the previous version of SPARQL with a number of important features for applications, including nested subqueries, aggregation, negation, creating values by expressions, and named graphs, and property paths.
RDFox provides full support of the SPARQL 1.1 query language with
the only exception of property
paths,
the BNODE
function,
and the non-normative DESCRIBE query
form.
The functionality provided by property paths is, however,
already covered to a large extent by Datalog rules (as will be described
in Section 5.4). The intended semantics of BNODE
and DESCRIBE
is,
in our opinion, not sufficiently specified in the standard, which is why
those features have not yet been implemented in RDFox.
4.1.2. Query Answer Formats¶
Results of SELECT
queries in SPARQL 1.1 are often represented in
tabular form in applications. In order for query results to be easily
exchanged in a machine-readable format, the SPARQL 1.1 specification
describes four common exchange formats in three different
documents:
XML,
JSON,
CSV
, and
TSV.
All of these formats are fully supported in RDFox (see Section 8.9.2
for further details).
4.1.3. Update Language¶
SPARQL 1.1 provides an update language) for RDF graphs. In particular the update language allows users to insert triples into a store, delete triples from a store, load an RDF graph into a store, clear an RDF graph in a store, create a new RDF graph in a store, drop an RDF graph from a store, copy (move, or add) the content of one store to another, and perform a group of update operations as a single action.
RDFox supports INSERT and DELETE
operations,
which can be used to remove or add triples from/to the Graph Store based
on bindings for a query pattern specified in a where
clause.
4.2. RDFox Proprietary Extensions¶
RDFox implements a number of extensions to the SPARQL 1.1 standard, which have proved useful in a number of practical use cases. This section specifies each of these extensions and illustrates their use by means of examples.
4.2.1. Built-in Functions¶
RDFox provides a number of built-in functions which extend the function definitions in the SPARQL 1.1 query language.
4.2.1.1. The SKOLEM Function¶
The SKOLEM
function creates a new resource based on the resources
obtained from the evaluation of a given list of expressions. This
function is useful, for instance, when trying to represent a k-ary
relation in RDF.
Syntax:
SKOLEM(exp1, ..., expk)
Description: The function creates a new resource based on the
resources evaluated from exp1
through expk
, the
first of which must be of data type
http://www.w3.org/2001/XMLSchema#string.
Example
Consider the data in our Getting Started guide. Let us issue the
following INSERT
statements after loading the data.
INSERT { ?x :employs ?y } WHERE { ?x :forename "Lois" . ?y :forename "Peter" }
INSERT {
?employment :hasEmployer ?employer ;
:hasEmployee ?employee .
}
WHERE {
?employer :employs ?employee .
BIND(SKOLEM("employment", ?employer, ?employee) as ?employment)
}
In this example, the first INSERT
statement adds to the data store
the triple :lois :employs :peter
. The function SKOLEM
is used in
the second INSERT
statement to create a new “employment” resource
for each triple with the predicate :employs
. The new resource is
then used in the subject position of two new triples connecting it to
its employer and employee resources. In particular the query
SELECT ?x ?y WHERE { ?x :hasEmployer ?y }
produces a single answer indicating that the newly created employment
resource has :lois
as employer. It would now be possible to add the
start and end dates of each employment as direct relations of the new
resource, where it would have been impossible to hold that information
in relation to the original binary :employs
relations.
4.2.1.2. The CONSTRAINT_VIOLATION Function¶
The CONSTRAINT_VIOLATION
function is similar to the SKOLEM
function
in that it also creates a new resource based on a given list of expressions.
Unlike the SKOLEM
function, however, CONSTRAINT_VIOLATION
does not
require the first expression to evaluate to any particular type and produces
shorter, more readable identifiers for use as constraint violation names.
The cost of this improved readability is a greater chance of collision between
distinct expressions so it is advisable to reserve the CONSTRAINT_VIOLATION
function for its intended purpose and to use SKOLEM
in all other scenarios.
See Section 7.2 for more information on Datalog
constraints including an example of how to use the CONSTRAINT_VIOLATION
function.
Syntax:
CONSTRAINT_VIOLATION(exp1, ..., expk)
Description: The function creates a new resource based on the
resources evaluated from exp1
through expk
.
4.2.1.3. The DATETIME Function¶
Syntax:
DATETIME(year, month, day, hour, minute, second)
DATETIME(year, month, day, hour, minute, second, offset)
Description: The function takes as input integer values for year,
month, day, hour, minute, second, and optionally timezone offset, and
transforms these into a resource of type xsd:dateTime
.
Example: Consider the data in our previous example on the use of
SKOLEM
. Let us issue the following INSERT
statement, which adds
start and end dates to the employment instance in our previous example,
where :lois
was the employer.
INSERT {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed
}
WHERE {
?employment :hasEmployer :lois .
BIND(DATETIME(2020,4,5,9,0,0) as ?sd) .
BIND(DATETIME(2022,4,4,9,0,0) as ?ed))
}
This statement attaches a start date of 9am April 5th 2020 and an end
date of 9am April 4th 2022 to the employment, where the dates will be
represented using xsd:dateTime
.
4.2.1.4. TIME_ON_TIMELINE Function¶
Syntax:
TIME_ON_TIMELINE(date)
Description: The function takes as input a value of one of the date/time data types, and it returns a decimal number representing its time on timeline.
Example: Consider the previous example, where we added a start date to
an employment. The following INSERT
statement converts all
employment start dates to a decimal number using the time on
timeline
specification. This decimal number is attached to the employment using a
new property :hasStartDateOnTimeline
.
INSERT {
?employment :hasStartDateOnTimeline ?dot
}
WHERE {
?employment :hasStartDate ?date .
BIND(TIME_ON_TIMELINE(?date) as ?dot)
}
4.2.1.5. MINFN Function¶
Syntax:
MINFN(exp1, ..., expk)
Description: The function evaluates the expressions exp1 through expk and returns the smallest value.
Example: The following query returns, for each employment, the minimum between its start and end dates.
SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MINFN(?sd, ?ed) as ?md)
}
4.2.1.6. MAXFN Function¶
Syntax:
MAXFN(exp1, ..., expk)
Description: The function evaluates the expressions exp1 through expk and returns the largest value.
Example: The following query returns, for each employment, the maximum between its start and end dates.
SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MAXFN(?sd, ?ed) as ?md)
}
4.2.2. Querying Tuple Tables¶
RDFox organizes information in a data store using tuple tables, as described
in more detail in How Information is Structured in RDFox.
Briefly, tuple tables include named graphs, but can also represent data stored in external
data sources. RDFox provides proprietary TT
expressions to access data stored in tuple tables,
as shown in the following example.
Example: Assume that that a binary tuple table :EmployeeName
is
mounted from an external data source (e.g., a database), and that it contains
pairs that relate employee IDs to their names. The following query retrieves
all pairs whose ID is contained in the :Manager
class.
SELECT ?id ?name
WHERE {
?id rdf:type :Manager .
TT :EmployeeName { ?id ?name }
}
In the above example, TT :EmployeeName { ?id ?name }
retrieves all pairs of IDs
and names stored in the :EmployeeName
tuple table. Since the tuple table is binary,
only two terms (i.e., ?id
and ?name
) are allowed to occur inside the
TT
expression. This is analogous to named graphs, where GRAPH :G { ?X ?Y ?Z }
accesses all triples in a named graph :G
. The difference to GRAPH
expressions
can be summarised as follows.
The number of terms inside
TT
must match with the arity (i.e., the number of positions) of the tuple table.Each
TT
expression represents exactly one reference to a tuple table. For example, to retrieve pairs of employee IDs with the same name, one can useTT :EmployeeName { ?id1 ?name } . TT :EmployeeName { ?id2 ?name }
(whereasTT :EmployeeName { ?id1 ?name . ?id2 ?name }
is syntactically invalid).Variables cannot be used in place of tuple table names. For example,
TT ?T { ?id ?name }
is syntactically invalid.
4.3. Monitoring Query Execution¶
RDFox implements functionality for monitoring the execution of queries. In particular, users can gain access to query plans generated by the RDFox query optimizer as well as to useful statistics about the execution of such plans.
Suppose that we initialize a data store with the example data in our Getting Started guide. The following shell command provides access to the query plans produced by RDFox:
set query.explain true
Now, let’s issue the following SPARQL query against the store
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }
which returns the following answers:
:meg "Meg" .
:stewie "Stewie" .
:chris "Chris" .
:meg "Meg" .
The shell now also displays the query plan that has been actually executed.
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?n ?p }
CONJUNCTION { --> ?n ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :person] { ?p ?z --> ?p ?z } TripleTableIterator
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The query plan is executed top-down in a depth-first-search manner and we can think of solution variable bindings as being generated one-at-a-time. It is useful to go in more detail through the execution of the plan for a given solution binding.
When we first “visit” the PROJECT
block, we haven’t obtained any
variable bindings yet (hence the empty space of the left of the “–>”
symbol); in contrast, by the time we have finished executing the subplan
underneath, we will have obtained a binding of variables ?n
and
?p
and hence an answer to the query (as reflected on the right-hand
side of the “–>” symbol). Similarly, when we first visit the
CONJUNCTION
block, which performs the join of the query, we have an
empty binding and, by the time we return from it, we will have a binding
for ?n
, ?p
and ?z
. The join is performed also top-down.
First, we obtain a binding for ?p
and ?z
by matching the the
triple pattern [?p, :hasParent, ?z]
. We then consider the second
triple pattern [?p, rdf:type, :person]
and finally the third triple
pattern [?p, :forename, ?n]
, which extends the binding by providing
also a value for variable ?n
.
Let us consider a slightly more complex query, which uses the
OPTIONAL
operator in SPARQL.
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :hasParent ?z . OPTIONAL { ?p :forename ?n } }
RDFox will execute the following query plan:
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?p | ?n }
OPTIONAL { --> ?p ?z | ?n } OptionalIterator
CONJUNCTION { --> ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :person] { ?p ?z --> ?p ?z } TripleTableIterator
FILTER true
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The important difference to notice in this plan is the use of the ” | ” symbol. The variables on the left-hand-side of “|” are always bound by the corresponding block, whereas those indicated on the right-hand-side may or may not be returned.