Warning: This document is for an old version of RDFox.

4. Querying RDFox

RDFox supports most SPARQL 1.1 Query Language features, as well as the most commonly needed parts of SPARQL 1.1 Update. It also implements a few proprietary built-in functions that are not part of SPARQL 1.1.

In this section, we describe in detail the support in RDFox for the SPARQL 1.1 standard specification, describe the built-in functions that are specific to RDFox.

Finally, we also describe the functionality implemented in RDFox for monitoring query execution.

4.1. SPARQL 1.1 Support

The SPARQL 1.1 specification provides a suite of languages and protocols for querying and manipulating RDF graph data.

4.1.1. Query Language

The core of the specification is the SPARQL 1.1 Query Language, which specifies the syntax and semantics of allowed queries. The SPARQL 1.1 query language extends the previous version of SPARQL with a number of important features for applications, including nested subqueries, aggregation, negation, creating values by expressions, and named graphs, and property paths.

RDFox provides full support of the SPARQL 1.1 query language with the only exception of property paths, the BNODE function, and the non-normative DESCRIBE query form. The functionality provided by property paths is, however, already covered to a large extent by Datalog rules (as will be described in Section 5.4). The intended semantics of BNODE and DESCRIBE is, in our opinion, not sufficiently specified in the standard, which is why those features have not yet been implemented in RDFox.

4.1.1.1. Answer Formats

Results of SELECT queries in SPARQL 1.1 are often represented in tabular form in applications. In order for query results to be easily exchanged in a machine-readable format, the SPARQL 1.1 specification describes four common exchange formats in three different documents: XML, JSON, CSV , and TSV.

All of these formats are fully supported in RDFox (see Section 8.9.2 for further details).

4.1.1.2. Deviations from the Standard

The implementation of SPARQL in RDFox deviates slightly from the SPARQL 1.1 specification in that the variables introduced in the SELECT clause can be used in the HAVING clause. The following example illustrates that point.

Example Differences in the implementation of HAVING

In the following query, variable ?a is computed to contain the area of a rectangle. However, this is done in the SELECT clause so, according to the SPARQL 1.1 specification, this should be done after the HAVING clause is evaluated. Thus, when condition ?a < 10 in the HAVING clause is evaluated, variable ?a should be unbound.

SELECT ?r ?w ?h (?w * ?h AS ?a) WHERE { ?r :width ?w . ?r :height ?h } HAVING (?a < 10)

Because of this order of evaluation, queries with the HAVING clause often contain repetition; for example, the above query must be written as follows.

SELECT ?r ?w ?h (?w * ?h AS ?a) WHERE { ?r :width ?w . ?r :height ?h } HAVING (?w * h < 10)

In order to eliminate such repetition, RDFox evaluates the HAVING clause after computing the variable bindings in the SELECT clause, which is often convenient. On queries that either do not contain the HAVING clause or where the HAVING clause does not reference variables introduced in the SELECT clause, RDFox follows the SPARQL 1.1 standard closely.

4.1.2. Update Language

SPARQL 1.1 provides an update language) for RDF graphs. In particular the update language allows users to insert triples into a store, delete triples from a store, load an RDF graph into a store, clear an RDF graph in a store, create a new RDF graph in a store, drop an RDF graph from a store, copy (move, or add)= the content of one store to another, and perform a group of update operations as a single action.

RDFox supports INSERT and DELETE operations, which can be used to remove or add triples from/to the Graph Store based on bindings for a query pattern specified in a where clause.

4.2. RDFox Proprietary Extensions

RDFox implements a number of extensions to the SPARQL 1.1 standard, which have proved useful in a number of practical use cases. This section specifies each of these extensions and illustrates their use by means of examples.

4.2.1. Built-in Functions

RDFox provides a number of built-in functions which extend the function definitions in the SPARQL 1.1 query language.

4.2.1.1. The SKOLEM Function

The SKOLEM function creates a new resource based on the resources obtained from the evaluation of a given list of expressions. This function is useful, for instance, when trying to represent a k-ary relation in RDF.

Syntax:

SKOLEM(exp1, ..., expk)

Description: The function creates a new resource based on the resources evaluated from exp1 through expk, the first of which must be of data type http://www.w3.org/2001/XMLSchema#string.

Example

Consider the data in our Getting Started guide. Let us issue the following INSERT statements after loading the data.

INSERT { ?x :employs ?y } WHERE { ?x :forename "Lois" . ?y :forename "Peter" }
INSERT {
?employment :hasEmployer ?employer ;
            :hasEmployee ?employee .
}
WHERE {
?employer :employs ?employee .
BIND(SKOLEM("employment", ?employer, ?employee) as ?employment)
}

In this example, the first INSERT statement adds to the data store the triple :lois :employs :peter. The function SKOLEM is used in the second INSERT statement to create a new “employment” resource for each triple with the predicate :employs. The new resource is then used in the subject position of two new triples connecting it to its employer and employee resources. In particular the query

SELECT ?x ?y WHERE { ?x :hasEmployer ?y }

produces a single answer indicating that the newly created employment resource has :lois as employer. It would now be possible to add the start and end dates of each employment as direct relations of the new resource, where it would have been impossible to hold that information in relation to the original binary :employs relations.

4.2.1.2. The DATETIME Function

Syntax:

DATETIME(year, month, day, hour, minute, second)
DATETIME(year, month, day, hour, minute, second, offset)

Description: The function takes as input integer values for year, month, day, hour, minute, second, and optionally timezone offset, and transforms these into a resource of type xsd:dateTime.

Example: Consider the data in our previous example on the use of SKOLEM. Let us issue the following INSERT statement, which adds start and end dates to the employment instance in our previous example, where :lois was the employer.

INSERT {
?employment :hasStartDate ?sd .
   ?employment :hasEndDate ?ed
}
WHERE {
?employment :hasEmployer :lois .
BIND(DATETIME(2020,4,5,9,0,0) as ?sd) .
BIND(DATETIME(2022,4,4,9,0,0) as ?ed))
}

This statement attaches a start date of 9am April 5th 2020 and an end date of 9am April 4th 2022 to the employment, where the dates will be represented using xsd:dateTime.

4.2.1.3. TIME_ON_TIMELINE Function

Syntax:

TIME_ON_TIMELINE(date)

Description: The function takes as input a value of one of the date/time data types, and it returns a decimal number representing its time on timeline.

Example: Consider the previous example, where we added a start date to an employment. The following INSERT statement converts all employment start dates to a decimal number using the time on timeline specification. This decimal number is attached to the employment using a new property :hasStartDateOnTimeline.

INSERT {
?employment :hasStartDateOnTimeline ?dot
}
WHERE {
?employment :hasStartDate ?date .
BIND(TIME_ON_TIMELINE(?date) as ?dot)
}

4.2.1.4. MINFN Function

Syntax:

MINFN(exp1, ..., expk)

Description: The function evaluates the expressions exp1 through expk and returns the smallest value.

Example: The following query returns, for each employment, the minimum between its start and end dates.

SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MINFN(?sd, ?ed) as ?md)
}

4.2.1.5. MAXFN Function

Syntax:

MAXFN(exp1, ..., expk)

Description: The function evaluates the expressions exp1 through expk and returns the largest value.

Example: The following query returns, for each employment, the maximum between its start and end dates.

SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MAXFN(?sd, ?ed) as ?md)
}

4.3. Monitoring Query Execution

RDFox implements functionality for monitoring the execution of queries. In particular, users can gain access to query plans generated by the RDFox query optimizer as well as to useful statistics about the execution of such plans.

Suppose that we initialize a data store with the example data in our Getting Started guide. The following shell command provides access to the query plans produced by RDFox:

set query.explain true

Now, let’s issue the following SPARQL query against the store

SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }

which returns the following answers:

:meg "Meg" .
:stewie "Stewie" .
:chris "Chris" .
:meg "Meg" .

The shell now also displays the query plan that has been actually executed.

QUERY ?p ?n                                                            QueryIterator
    PROJECT ?n ?p                      {          -->    ?n ?p }
        CONJUNCTION                    {          -->    ?n ?p ?z }    NestedIndexLoopJoinIterator
            [?p, :hasParent, ?z]       {          -->    ?p ?z }       TripleTableIterator
            [?p, rdf:type, :person]    { ?p ?z    -->    ?p ?z }       TripleTableIterator
            [?p, :forename, ?n]        { ?p ?z    -->    ?n ?p ?z }    TripleTableIterator

The query plan is executed top-down in a depth-first-search manner and we can think of solution variable bindings as being generated one-at-a-time. It is useful to go in more detail through the execution of the plan for a given solution binding.

When we first “visit” the PROJECT block, we haven’t obtained any variable bindings yet (hence the empty space of the left of the “–>” symbol); in contrast, by the time we have finished executing the subplan underneath, we will have obtained a binding of variables ?n and ?p and hence an answer to the query (as reflected on the right-hand side of the “–>” symbol). Similarly, when we first visit the CONJUNCTION block, which performs the join of the query, we have an empty binding and, by the time we return from it, we will have a binding for ?n, ?p and ?z. The join is performed also top-down. First, we obtain a binding for ?p and ?z by matching the the triple pattern [?p, :hasParent, ?z]. We then consider the second triple pattern [?p, rdf:type, :person] and finally the third triple pattern [?p, :forename, ?n], which extends the binding by providing also a value for variable ?n.

Let us consider a slightly more complex query, which uses the OPTIONAL operator in SPARQL.

SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :hasParent ?z . OPTIONAL { ?p :forename ?n } }

RDFox will execute the following query plan:

QUERY ?p ?n                                                                  QueryIterator
    PROJECT ?n ?p                          {          -->    ?p | ?n }
        OPTIONAL                           {          -->    ?p ?z | ?n }    OptionalIterator
            CONJUNCTION                    {          -->    ?p ?z }         NestedIndexLoopJoinIterator
                [?p, :hasParent, ?z]       {          -->    ?p ?z }         TripleTableIterator
                [?p, rdf:type, :person]    { ?p ?z    -->    ?p ?z }         TripleTableIterator
            FILTER true
                [?p, :forename, ?n]        { ?p ?z    -->    ?n ?p ?z }      TripleTableIterator

The important difference to notice in this plan is the use of the ” | ” symbol. The variables on the left-hand-side of “|” are always bound by the corresponding block, whereas those indicated on the right-hand-side may or may not be returned.