5. Querying RDFox¶
RDFox supports most SPARQL 1.1 Query Language features, and it fully supports the SPARQL 1.1 Update Language. It also implements a few proprietary built-in functions that are not part of SPARQL 1.1.
In this section, we describe in detail the support in RDFox for the SPARQL 1.1 standard specification, and the built-in functions that are specific to RDFox.
Finally, we also describe the functionality implemented in RDFox for monitoring query execution.
5.1. SPARQL 1.1 Support¶
The SPARQL 1.1 specification provides a suite of languages and protocols for querying and manipulating RDF graph data.
5.1.1. Query Language¶
The core of the specification is the SPARQL 1.1 Query Language, which specifies the syntax and semantics of allowed queries. The SPARQL 1.1 query language extends the previous version of SPARQL with a number of important features for applications, including nested subqueries, aggregation, negation, creating values by expressions, named graphs, and property paths.
RDFox provides full support of the SPARQL 1.1 query language with the
exception of the
variant of the BNODE function that
takes a single string argument and the
nonnormative DESCRIBE query form.
The intended semantics of BNODE
and DESCRIBE
is, in our opinion, not sufficiently specified in the standard, which is why
those features have not yet been implemented in RDFox.
Although RDFox supports SPARQL 1.1 property paths, in many situations it might be beneficial to use reasoning instead. This is explained further in Section 6.5.3
5.1.2. Query Answer Formats¶
Results of SELECT
queries in SPARQL 1.1 are often represented in tabular
form in applications. In order for query results to be easily exchanged in a
machine-readable format, the SPARQL 1.1 specification describes four common
exchange formats in three different documents:
XML, JSON, CSV, and
TSV.
All of these formats are fully supported in RDFox (see
Section 14.3.2 for further details).
5.1.3. Update Language¶
RDFox fully supports the SPARQL 1.1 update language. In particular, this language allows users to insert triples into a store, delete triples from a store, load an RDF graph into a store, clear an RDF graph in a store, create a new RDF graph in a store, drop an RDF graph from a store, copy (move or add) the content of one store to another, and perform a group of update operations as a single action.
5.2. Built-In Functions¶
RDFox supports a wide range of built-in operators and functions that can be used during query answering and reasoning. Concretely, RDFox supports all SPARQL 1.1 Functions and Operators, with the exception of the variant of the BNODE function that takes a single string argument, whose semantics is unclear. In addition to that, RDFox also supports most of the XPath and XQuery Functions and Operators. Finally, RDFox also provides a number of proprietary functions that are useful in practice.
There is a large overlap between the functions and operators defined in SPARQL 1.1 and XPath and XQuery. As a result most functions supported by RDFox can be accessed using their short names, as specified in SPARQL 1.1, as well as their IRI name, as specified in XPath and XQuery. RDFox also provides a short name for many of the XPath and XQuery functions that have no SPARQL equivalent.
RDFox often provides additional overloads for the functions and operators from
the SPARQL and the XPath and XQuery specifications, which improves usability.
So, for example, one can extract the year of an xsd:gYearMonth
value, and
can also compare without restrictions two xsd:duration
values according to
the partial order on durations. All such extensions are documented where the
respective functions and operators are introduced.
A full list of the RDFox built-in functions and operators is given in the following sections. Functions will be presented with all their available names. If present, short names will precede the IRI names. If a function name is part of the SPARQL 1.1 specification or the XPath and XQuery specification, the name will be given as a link pointing to the respective part of the specification. We will assume the following prefix definitions.
|
|
|
|
|
|
All RDFox functions and operators can be used in SPARQL queries. Furthermore,
all RDFox functions and operators can also be used in rules, with the exception
of NOW
, RAND
, UUID
, and STRUUID
, whose values are not
determined by their arguments.
5.2.1. Operators¶
The RDFox operators are listed in the following table and discussed in detail next.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Boolean operators in RDFox are the logical not ( !
), logical and
( &&
), and logical or (
||
), which behave as defined in SPARQL 1.1.
The comparison operators in RDFox are <
, <=
, =
, !=
, >=
and >
. These operators have been significantly extended when compared to
the respective operators in SPARQL 1.1 and in XPath/XQuery. When compared to
SPARQL, the comparison operators in RDFox have additional overloads for all the
date/time and duration datatypes. When compared to XPath/XQuery, which also
defines such overloads, RDFox has the following differences.
Operators
==
and!=
are can be used to compare IRIs, blank nodes, and literals regardless of their type. For example, it is possible to compare an IRI with a blank node, or to compare two literals of different datatypes.In RDFox,
xsd:duration
values are compared according to the partial order defined in the XML Schema specification. In contrast, in XPath/XQuery, the operators<, <=, =>, >
are only defined for the subtypesxsd:dayTimeDuration
andxsd:yearMonthDuration
.Similarly, in RDFox date and time values are compared according to the partial order on dates and times defined in the XML Schema. In contrast, XPath/XQuery imposes various restrictions on the allowed comparisons.
The mathematical operators in RDFox are the unary +
and -
operators, and the binary addition +
, subtraction -
, multiplication
*
, division /
, integer division idiv
, and modulo mod
operators.
SPARQL 1.1 defines only a subset of the above operators and their overloads in
comparison to XPath/XQuery (e.g. idiv
and mod
are not in SPARQL 1.1).
RDFox extends the XPath/XQuery behavior of these operators as outlined next.
The unary
+
and-
operators have been extended to the datatypexsd:duration
.The binary subtraction operator
-
has been extended to all compatible date and time datatypes. The result of such operation is anxsd:duration
.The binary addition and subtraction operators have been extended so that durations can be added to and subtracted from values of any of the date and time datatypes.
5.2.2. Functions on Terms¶
The following table lists the RDF functions on terms. Most of the functions
behave as specified in SPARQL 1.1. One difference is the addition of the two
Boolean functions isInteger
and isDecimal
. The function isInteger
returns true if the argument has one of the integer datatypes, while the
isDecimal
function returns true if its argument is of type xsd:decimal
.
Another change concerns the functions IRI
and URI
, which take an
optional second argument that specify the base against which the first argument
is resolved. If not provided, the default base is used. Furthermore, the
BOUND
function has been extended to operate on arbitrary expressions. The
function returns "true"^^xsd:boolean
, when the input expression can be
successfully evaluated. In particular, when the expression is a variable, its
evaluation succeeds when the variable is bound.
isInteger |
isDecimal |
||
5.2.3. Constructor Functions¶
RDFox has a number of constructor functions that allow users to create a value of a particular type. The following table lists the constructor functions defined in SPARQL 1.1.
xsd:anyURI |
xsd:boolean |
xsd:date |
xsd:dateTime |
xsd:dateTimeStamp |
xsd:dayTimeDuration |
xsd:decimal |
xsd:double |
xsd:duration |
xsd:float |
xsd:gDay |
xsd:gMonth |
xsd:gMonthDay |
xsd:gYear |
xsd:gYearMonth |
xsd:integer |
xsd:string |
xsd:time |
xsd:yearMonthDuration |
RDFox additionally provides the following constructor functions for the
date/time datatypes. The offset parameter in all functions is optional except
in the case of DATE_TIME_STAMP
.
DURATION ( year, month, day, hours, minutes, seconds ) |
Constructs an |
YEAR_MONTH_DURATION ( year, month ) |
Constructs an |
DAY_TIME_DURATION ( day, hours, minutes, seconds ) |
Constructs an |
DATE_TIME ( year, month, day, hour, minute, second, offset ) |
Constructs an |
DATE_TIME_STAMP ( year, month, day, hour, minute, second, offset ) |
Constructs an |
TIME ( hour, minute, second, offset ) |
Constructs an |
DATE ( year, month, day, offset ) |
Constructs an |
G_DAY ( day, offset ) |
Constructs an |
G_MONTH ( month, offset ) |
Constructs an |
G_MONTH_DAY ( month, day, offset ) |
Constructs an |
G_YEAR_MONTH ( year, month, offset ) |
Constructs an |
G_YEAR ( year, offset ) |
Constructs an |
5.2.4. IRI and String Functions¶
The IRI and string functions in SPARQL 1.1 and XPath/XQuery almost fully overlap, and RDFox provides full support for them, with the exception of fn:contains, which only supports the variant with two arguments.
ESCAPE_HTML_URI ( fn:escape-html-uri ) |
IRI_TO_URI ( fn:iri-to-uri ) |
|
CONTAINS ( fn:contains ) |
STRENDS ( fn:ends-with ) |
LCASE ( fn:lower-case ) |
SUBSTR ( fn:substring ) |
||
UCASE ( fn:upper-case ) |
||
REGEX ( fn:matches ) |
||
REPLACE ( fn:replace ) |
5.2.5. Hash Functions¶
The RDFox hash functions are the ones specified in SPARQL 1.1 and are given in the following table.
5.2.6. Mathematical Functions¶
RDFox supports all mathematical functions from SPARQL 1.1 and most of the
mathematical functions in XPath/XQuery. RDFox also provides additional
functions that are useful in practice. The only nonstandard functions are
MAXFN
and MINFN
, which take any number of arguments and return the
maximum and the minimum value, respectively.
MINFN |
MAXFN |
CEIL ( fn:ceiling ) |
|
PI ( math:pi ) |
POW ( math:pow ) |
SQRT ( math:sqrt ) |
||
LOG ( math:log ) |
LOG10 ( math:log10 ) |
LOG2 |
|
EXP ( math:exp ) |
EXP10 ( math:exp10 ) |
EXP2 |
SIN ( math:sin ) |
COS ( math:cos ) |
ASIN ( math:asin ) |
ACOS ( math:acos ) |
TAN ( math:tan ) |
|
ATAN ( math:atan ) |
ATAN2 ( math:atan2 ) |
SINH |
COSH |
TANH |
ASINH |
ACOSH |
ATANH |
5.2.7. Date and Time Functions¶
This section describes the RDFox functions on dates, times and durations. RDFox supports all SPARQL 1.1 date time functions and most XPath/XQuery date time functions. In RDFox, many of these functions have been extended to apply to all date/time values.
Returns an xsd:dateTime value representing the current moment in time. |
|
Returns the year component of a date/time value. Extends fn:year-from-dateTime and fn:year-from-date to all date/time datatypes with a valid year component. |
|
Returns the month component of a date/time value. Extends fn:month-from-dateTime and fn:month-from-date to data/time datatypes with a valid month component. |
|
Returns the day component of a date/time value. Extends fn:day-from-dateTime and fn:day-from-date to date/time datatypes with a valid day component. |
|
YEARS |
Returns the number of years in a duration: fn:years-from-duration. |
MONTHS |
Returns the number of months in a duration: fn:months-from-duration. |
DAYS |
Returns the number of days in a duration: fn:days-from-duration. |
Returns the hours component of a date/time value. Extends fn:hours-from-dateTime, fn:hours-from-time, and fn:hours-from-duration to date/time datatypes with a valid hours component. |
|
Returns the minutes component of a date/time value. Extends fn:minutes-from-dateTime, fn:minutes-from-time, and fn:minutes-from-duration to date/time datatypes with a valid minutes component. |
|
Returns the seconds component of a date/time value. Extends fn:seconds-from-dateTime, fn:seconds-from-time, fn:seconds-from-duration to date/time datatypes with a valid seconds component. |
|
Returns the timezone component of a date/time value. Extends fn:timezone-from-dateTime, fn:timezone-from-date, fn:timezone-from-time to date/time datatypes with a valid timezone component. |
|
Returns the timezone component of a date/time value as a simple literal. Extended to date/time values with a valid timezone component. |
|
DURATION_MONTHS |
Returns the number of months in the internal representation |
DURATION_SECONDS |
Returns the number of seconds in the internal representation |
TIME_ON_TIMELINE |
Returns the decimal number representing a date time value on the timeline. Note that this is the number of seconds elapsed since 0001-01-01T00:00:00 in accordance with the W3C XML Schema Definition. |
TO_TIMEZONE |
Adjusts the time zone of a date/time value. An extension of fn:adjust-dateTime-to-timezone, fn:adjust-date-to-timezone, fn:adjust-time-to-timezone, to all date/time datatypes. The function takes two arguments: a mandatory date/time value and an optional timezone value. If the timezone value is provided, the function returns a date/time value in the specified timezone that is equivalent to the first argument. If the timezone is not provided, the function returns the local value of its first argument with no timezone. |
5.3. Aggregate Functions¶
RDFox supports all aggregate functions of SPARQL 1.1, plus a custom MUL
aggregate function that is analogous to SUM
but it multiplies (rather
than adds) its arguments
MUL |
|||
5.4. Querying Tuple Tables¶
RDFox organizes information in a data store using tuple tables, as described in
more detail in Section 4. Briefly, tuple tables include named graphs,
but can also represent data stored in external data sources. RDFox provides two
ways of referring to tuple tables in queries: one uses the proprietary operator
TT
and the other uses the reserved IRI rdfox:TT
.
Querying Tuple Tables Using TT
Expressions
RDFox provides a proprietary TT
operator to access data stored in tuple
tables, as shown in the following example.
Example: Assume that a binary tuple table :EmployeeName
is
mounted from an external data source (e.g., a database), and that it
contains pairs that relate employee IDs to their names. The following query
retrieves all pairs whose ID is contained in the :Manager
class.
SELECT ?id ?name
WHERE {
?id rdf:type :Manager .
TT :EmployeeName { ?id ?name }
}
In the above example, TT :EmployeeName { ?id ?name }
retrieves all pairs of
IDs and names stored in the :EmployeeName
tuple table. Since the tuple
table is binary, only two terms (i.e., ?id
and ?name
) are allowed to
occur inside the TT
expression. This is analogous to named graphs, where
GRAPH :G { ?X ?Y ?Z }
accesses all triples in a named graph :G
. The
difference to GRAPH
expressions can be summarized as follows.
The number of terms inside
TT
must match with the arity (i.e., the number of positions) of the tuple table.Each
TT
expression represents exactly one reference to a tuple table. For example, to retrieve pairs of employee IDs with the same name, one can useTT :EmployeeName { ?id1 ?name } . TT :EmployeeName { ?id2 ?name }
(whereasTT :EmployeeName { ?id1 ?name .?id2 ?name }
is syntactically invalid).Variables cannot be used in place of tuple table names. For example,
TT ?T { ?id ?name }
is syntactically invalid.
Querying Tuple Tables Using rdfox:TT
The use of the proprietary operator TT
is a syntactic extension of the
SPARQL language and may result in queries being rejected by third-party
libraries. To address this, RDFox provides an additional method for accessing
tuple tables, which stays within the syntactic rules of the SPARQL language.
This method specifies tuple table atoms using the reserved IRI rdfox:TT
and
RDF Collections, as
shown in the following example.
Example: Consider again the tuple table :EmployeeName
. We can
retrieve all pairs whose ID is contained in the :Manager
class using
the following query.
SELECT ?id ?name
WHERE {
?id rdf:type :Manager .
rdfox:TT :EmployeeName (?id ?name)
}
The reserved IRI rdfox::TT
links the tuple table name :EmployeeName
with the tuple table arguments encoded as an RDF Collection (i.e. (?id
?name)
). The query can also be given in its equivalent expanded form.
SELECT ?id ?name
WHERE {
?id rdf:type :Manager .
rdfox:TT :EmployeeName _:b0 .
_:b0 rdf:first ?id ;
rdf:rest _:b1.
_:b1 rdf:first ?name ;
rdf:rest rdf:nil.
}
In its expanded form, a tuple table atom is encoded using multiple triple
patterns and, as a result, its position in the query body may be ambiguous. To
avoid ambiguity, RDFox determines the position of the tuple table atom using
the position of the triple pattern with subject rdfox:TT
. In the above
example the position of the tuple table atom is determined by the position of
the triple pattern triple rdfox:TT :EmployeeName _:b0 .
The following restrictions apply to the use of rdfox:TT
.
The object of the triple pattern with subject
rdfox:TT
should be a well-formed RDF collection.The RDF collection that encodes the tuple table arguments should be of size equal to the arity of the tuple table.
Variables cannot be used in place of tuple table names. For example, the use of
rdfox:TT ?T (?id ?name)
is invalid.
5.5. Distinguishing Explicit and Derived Facts in Queries¶
As described in Section 6, RDFox supports reasoning, by which it can use Datalog rules to derive additional facts from the facts explicitly given in the input. Sometimes, it can be useful to distinguish in the query whether a fact has been explicitly given in the input, or whether it has been derived by a rule. To facilitate this, RDFox provides a proprietary extension of the SPARQL 1.1 syntax. The following example demonstrates this.
Example: Assume that the following triples are loaded into the data store.
:brian rdf:type :LivingThing .
:peter rdf:type :Person .
If the data store contains rule :LivingThing[?X] :- :Person[?X] .
, then
the following triple is going to be derived.
:peter rdf:type :Person .
The following query retrieves all facts of the form x rdf:type
:LivingThing
, together with a Boolean flag which is set to true
if
the fact is explicitly given in the input.
SELECT ?id ?e
WHERE {
?id rdf:type :LivingThing EXPLICIT ?e
}
Our our example data, this query returns one answer where ?id
is mapped
to :brian
and ?e
is mapped to true
, and another answer where
?id
is mapped to :peter
and ?e
is mapped to false
.
The TT
pattern provides an analogous extension. Moreover, variables bound
by EXPLICIT
can be used in the rest of the query like any other variable;
for example, they can be used in other triple patterns to facilitate joins.
Example: The query for employees from Section 5.4 can be modified as follows to compute the join between managers and their names, while requiring both facts to be either explicit or derived.
SELECT ?id ?name
WHERE {
?id rdf:type :Manager EXPLICIT ?e .
TT :EmployeeName { ?id ?name EXPLICIT ?e }
}
5.6. Monitoring Query Evaluation¶
RDFox provides different ways of analyzing query evaluation. In particular, users can gain access to query plans generated by the RDFox query optimizer as well as to useful statistics about the execution of such plans.
Suppose that we initialize a data store with the example data from our Getting Started guide. Furthermore, consider the SPARQL query
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }
which returns the following four answers:
:meg "Meg" .
:stewie "Stewie" .
:chris "Chris" .
:meg "Meg" .
Next we discuss two ways of analysing how RDFox evaluates the above query using query plans and query profiling, respectively.
5.6.1. Query Plans¶
Query plans provide insights into the order in which RDFox evaluates the
different parts of the query. They are determined by the query optimizer in
RDFox based on the shape of the query and the statistical properties of the
data present in the store at the time of planning. To see the query plan that
RDFox uses to evaluate a given query, one needs to set the shell variable
query.explain
to true
, as shown next.
set query.explain true
The next time we evaluate the above query, the shell will also display the following query plan.
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?n ?p }
CONJUNCTION { --> ?n ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :Person] { ?p ?z --> ?p ?z } TripleTableIterator
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The query plan is executed top-down in a depth-first-search manner and we can think of solution variable bindings as being generated one-at-a-time. It is useful to go in more detail through the execution of the plan for a given solution binding.
When we first “visit” the PROJECT
block, we haven’t obtained any variable
bindings yet (hence the empty space left of the “–>”symbol); in
contrast, by the time we have finished executing the subplan underneath, we
will have obtained a binding of variables ?n
and ?p
and hence an answer
to the query (as reflected on the right-hand side of the “–>” symbol).
Similarly, when we first visit the CONJUNCTION
block, which performs the
join of the query, we have an empty binding and, by the time we return from it,
we will have a binding for ?n
, ?p
and ?z
. The join is performed
also top-down. First, we obtain a binding for ?p
and ?z
by matching the
triple pattern [?p, :hasParent, ?z]
. We then consider the second triple
pattern [?p, rdf:type, :Person]
and finally the third triple pattern [?p,
:forename, ?n]
, which extends the binding by providing also a value for
variable ?n
.
Let us consider a slightly more complex query, which uses the OPTIONAL
operator in SPARQL.
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :hasParent ?z . OPTIONAL { ?p :forename ?n } }
RDFox will execute the following query plan:
QUERY ?p ?n QueryIterator
PROJECT ?n ?p { --> ?p | ?n }
OPTIONAL { --> ?p ?z | ?n } OptionalIterator
CONJUNCTION { --> ?p ?z } NestedIndexLoopJoinIterator
[?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
[?p, rdf:type, :Person] { ?p ?z --> ?p ?z } TripleTableIterator
FILTER true
[?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
The important difference to notice in this plan is the use of the ” | ” symbol. The variables on the left-hand-side of “|” are always bound by the corresponding block, whereas those indicated on the right-hand-side may or may not be returned.
5.6.2. Query Profiling¶
In addition to the ordering of the plan nodes of a given query specified in a query plan, users can also obtain runtime information about the number of operations performed on each of them. This information could be useful when debugging potential performance issues with query evaluation.
To obtain such runtime information, one needs to enable a query profiler using the following command.
set query.monitor profile
When evaluating our original query
SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }
with query profiler enabled, RDFox will print the following output.
== QUERY EVALUATION STATISTICS ==
Statistics after 0 second(s)
0 / 0 QUERY ?p ?n QueryIterator
0 / 0 PROJECT ?n ?p { --> ?n ?p }
1 / 4 CONJUNCTION { --> ?n ?p ?z } NestedIndexLoopJoinIterator
1 / 4 [?p, :hasParent, ?z] { --> ?p ?z } TripleTableIterator
4 / 4 [?p, rdf:type, :Person] { ?p ?z --> ?p ?z } TripleTableIterator
4 / 4 [?p, :forename, ?n] { ?p ?z --> ?n ?p ?z } TripleTableIterator
=================================
The output is an enriched version of the query plan with information about the
number of times the iterator of each plan node was accessed, i.e. was
opened
and advanced
. For example, the iterator for [?p, :hasParent,
?z]
was opened
once (successfully), and advanced four times (three times
successfully and once unsuccessfully), signified as 1 / 4
. For each of the
four successful operations on the iterator of [?p, :hasParent, ?z]
, the
iterator for [?p, rdf:type, :Person]
is opened
once (each time
successfully) and advanced once (each time unsuccessfully), signified as 4 /
4
. The statistics for the other plan nodes are determined in the same way
with the exception of the top level plan nodes. The statistics for those plan
nodes are unavailable, due to their special handling, but they can be inferred
from the number of query results: the iterators of these plan nodes are opened
once, and advanced as many times as the number of query results.
By default, the query profiler prints statistics only once, at the end of query
evaluation. Alternatively, it can be configured to print statistics at a given
frequency using the shell variable log-frequency
. This can be useful when
fixing problematic queries with long evaluation times.