4. Querying RDFox

RDFox supports most SPARQL 1.1 Query Language features, as well as the most commonly needed parts of SPARQL 1.1 Update. It also implements a few proprietary built-in functions that are not part of SPARQL 1.1.

In this section, we describe in detail the support in RDFox for the SPARQL 1.1 standard specification, describe the built-in functions that are specific to RDFox.

Finally, we also describe the functionality implemented in RDFox for monitoring query execution.

4.1. SPARQL 1.1 Support

The SPARQL 1.1 specification provides a suite of languages and protocols for querying and manipulating RDF graph data.

4.1.1. Query Language

The core of the specification is the SPARQL 1.1 Query Language, which specifies the syntax and semantics of allowed queries. The SPARQL 1.1 query language extends the previous version of SPARQL with a number of important features for applications, including nested subqueries, aggregation, negation, creating values by expressions, and named graphs, and property paths.

RDFox provides full support of the SPARQL 1.1 query language with the only exception of property paths, the BNODE function, and the non-normative DESCRIBE query form. The functionality provided by property paths is, however, already covered to a large extent by Datalog rules (as will be described in Section 5.4). The intended semantics of BNODE and DESCRIBE is, in our opinion, not sufficiently specified in the standard, which is why those features have not yet been implemented in RDFox.

4.1.2. Query Answer Formats

Results of SELECT queries in SPARQL 1.1 are often represented in tabular form in applications. In order for query results to be easily exchanged in a machine-readable format, the SPARQL 1.1 specification describes four common exchange formats in three different documents: XML, JSON, CSV , and TSV. All of these formats are fully supported in RDFox (see Section 8.9.2 for further details).

4.1.3. Update Language

SPARQL 1.1 provides an update language) for RDF graphs. In particular the update language allows users to insert triples into a store, delete triples from a store, load an RDF graph into a store, clear an RDF graph in a store, create a new RDF graph in a store, drop an RDF graph from a store, copy (move, or add) the content of one store to another, and perform a group of update operations as a single action.

RDFox supports INSERT and DELETE operations, which can be used to remove or add triples from/to the Graph Store based on bindings for a query pattern specified in a where clause.

4.2. RDFox Proprietary Extensions

RDFox implements a number of extensions to the SPARQL 1.1 standard, which have proved useful in a number of practical use cases. This section specifies each of these extensions and illustrates their use by means of examples.

4.2.1. Built-in Functions

RDFox provides a number of built-in functions which extend the function definitions in the SPARQL 1.1 query language.

4.2.1.1. The SKOLEM Function

The SKOLEM function creates a new resource based on the resources obtained from the evaluation of a given list of expressions. This function is useful, for instance, when trying to represent a k-ary relation in RDF.

Syntax:

SKOLEM(exp1, ..., expk)

Description: The function creates a new resource based on the resources evaluated from exp1 through expk, the first of which must be of data type http://www.w3.org/2001/XMLSchema#string.

Example

Consider the data in our Getting Started guide. Let us issue the following INSERT statements after loading the data.

INSERT { ?x :employs ?y } WHERE { ?x :forename "Lois" . ?y :forename "Peter" }
INSERT {
?employment :hasEmployer ?employer ;
            :hasEmployee ?employee .
}
WHERE {
?employer :employs ?employee .
BIND(SKOLEM("employment", ?employer, ?employee) as ?employment)
}

In this example, the first INSERT statement adds to the data store the triple :lois :employs :peter. The function SKOLEM is used in the second INSERT statement to create a new “employment” resource for each triple with the predicate :employs. The new resource is then used in the subject position of two new triples connecting it to its employer and employee resources. In particular the query

SELECT ?x ?y WHERE { ?x :hasEmployer ?y }

produces a single answer indicating that the newly created employment resource has :lois as employer. It would now be possible to add the start and end dates of each employment as direct relations of the new resource, where it would have been impossible to hold that information in relation to the original binary :employs relations.

4.2.1.2. The CONSTRAINT_VIOLATION Function

The CONSTRAINT_VIOLATION function is similar to the SKOLEM function in that it also creates a new resource based on a given list of expressions. Unlike the SKOLEM function, however, CONSTRAINT_VIOLATION does not require the first expression to evaluate to any particular type and produces shorter, more readable identifiers for use as constraint violation names. The cost of this improved readability is a greater chance of collision between distinct expressions so it is advisable to reserve the CONSTRAINT_VIOLATION function for its intended purpose and to use SKOLEM in all other scenarios.

See Section 7.2 for more information on Datalog constraints including an example of how to use the CONSTRAINT_VIOLATION function.

Syntax:

CONSTRAINT_VIOLATION(exp1, ..., expk)

Description: The function creates a new resource based on the resources evaluated from exp1 through expk.

4.2.1.3. The DATETIME Function

Syntax:

DATETIME(year, month, day, hour, minute, second)
DATETIME(year, month, day, hour, minute, second, offset)

Description: The function takes as input integer values for year, month, day, hour, minute, second, and optionally timezone offset, and transforms these into a resource of type xsd:dateTime.

Example: Consider the data in our previous example on the use of SKOLEM. Let us issue the following INSERT statement, which adds start and end dates to the employment instance in our previous example, where :lois was the employer.

INSERT {
?employment :hasStartDate ?sd .
   ?employment :hasEndDate ?ed
}
WHERE {
?employment :hasEmployer :lois .
BIND(DATETIME(2020,4,5,9,0,0) as ?sd) .
BIND(DATETIME(2022,4,4,9,0,0) as ?ed))
}

This statement attaches a start date of 9am April 5th 2020 and an end date of 9am April 4th 2022 to the employment, where the dates will be represented using xsd:dateTime.

4.2.1.4. TIME_ON_TIMELINE Function

Syntax:

TIME_ON_TIMELINE(date)

Description: The function takes as input a value of one of the date/time data types, and it returns a decimal number representing its time on timeline.

Example: Consider the previous example, where we added a start date to an employment. The following INSERT statement converts all employment start dates to a decimal number using the time on timeline specification. This decimal number is attached to the employment using a new property :hasStartDateOnTimeline.

INSERT {
?employment :hasStartDateOnTimeline ?dot
}
WHERE {
?employment :hasStartDate ?date .
BIND(TIME_ON_TIMELINE(?date) as ?dot)
}

4.2.1.5. MINFN Function

Syntax:

MINFN(exp1, ..., expk)

Description: The function evaluates the expressions exp1 through expk and returns the smallest value.

Example: The following query returns, for each employment, the minimum between its start and end dates.

SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MINFN(?sd, ?ed) as ?md)
}

4.2.1.6. MAXFN Function

Syntax:

MAXFN(exp1, ..., expk)

Description: The function evaluates the expressions exp1 through expk and returns the largest value.

Example: The following query returns, for each employment, the maximum between its start and end dates.

SELECT ?employment ?md
WHERE {
?employment :hasStartDate ?sd .
?employment :hasEndDate ?ed .
BIND(MAXFN(?sd, ?ed) as ?md)
}

4.2.2. Querying Tuple Tables

RDFox organizes information in a data store using tuple tables, as described in more detail in How Information is Structured in RDFox. Briefly, tuple tables include named graphs, but can also represent data stored in external data sources. RDFox provides proprietary TT expressions to access data stored in tuple tables, as shown in the following example.

Example: Assume that that a binary tuple table :EmployeeName is mounted from an external data source (e.g., a database), and that it contains pairs that relate employee IDs to their names. The following query retrieves all pairs whose ID is contained in the :Manager class.

SELECT ?id ?name
WHERE {
?id rdf:type :Manager .
TT :EmployeeName { ?id ?name }
}

In the above example, TT :EmployeeName { ?id ?name } retrieves all pairs of IDs and names stored in the :EmployeeName tuple table. Since the tuple table is binary, only two terms (i.e., ?id and ?name) are allowed to occur inside the TT expression. This is analogous to named graphs, where GRAPH :G { ?X ?Y ?Z } accesses all triples in a named graph :G. The difference to GRAPH expressions can be summarised as follows.

  • The number of terms inside TT must match with the arity (i.e., the number of positions) of the tuple table.

  • Each TT expression represents exactly one reference to a tuple table. For example, to retrieve pairs of employee IDs with the same name, one can use TT :EmployeeName { ?id1 ?name } . TT :EmployeeName { ?id2 ?name } (whereas TT :EmployeeName { ?id1 ?name . ?id2 ?name } is syntactically invalid).

  • Variables cannot be used in place of tuple table names. For example, TT ?T { ?id ?name } is syntactically invalid.

4.3. Monitoring Query Execution

RDFox implements functionality for monitoring the execution of queries. In particular, users can gain access to query plans generated by the RDFox query optimizer as well as to useful statistics about the execution of such plans.

Suppose that we initialize a data store with the example data in our Getting Started guide. The following shell command provides access to the query plans produced by RDFox:

set query.explain true

Now, let’s issue the following SPARQL query against the store

SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :forename ?n. ?p :hasParent ?z }

which returns the following answers:

:meg "Meg" .
:stewie "Stewie" .
:chris "Chris" .
:meg "Meg" .

The shell now also displays the query plan that has been actually executed.

QUERY ?p ?n                                                            QueryIterator
    PROJECT ?n ?p                      {          -->    ?n ?p }
        CONJUNCTION                    {          -->    ?n ?p ?z }    NestedIndexLoopJoinIterator
            [?p, :hasParent, ?z]       {          -->    ?p ?z }       TripleTableIterator
            [?p, rdf:type, :person]    { ?p ?z    -->    ?p ?z }       TripleTableIterator
            [?p, :forename, ?n]        { ?p ?z    -->    ?n ?p ?z }    TripleTableIterator

The query plan is executed top-down in a depth-first-search manner and we can think of solution variable bindings as being generated one-at-a-time. It is useful to go in more detail through the execution of the plan for a given solution binding.

When we first “visit” the PROJECT block, we haven’t obtained any variable bindings yet (hence the empty space of the left of the “–>” symbol); in contrast, by the time we have finished executing the subplan underneath, we will have obtained a binding of variables ?n and ?p and hence an answer to the query (as reflected on the right-hand side of the “–>” symbol). Similarly, when we first visit the CONJUNCTION block, which performs the join of the query, we have an empty binding and, by the time we return from it, we will have a binding for ?n, ?p and ?z. The join is performed also top-down. First, we obtain a binding for ?p and ?z by matching the the triple pattern [?p, :hasParent, ?z]. We then consider the second triple pattern [?p, rdf:type, :person] and finally the third triple pattern [?p, :forename, ?n], which extends the binding by providing also a value for variable ?n.

Let us consider a slightly more complex query, which uses the OPTIONAL operator in SPARQL.

SELECT ?p ?n WHERE { ?p rdf:type :Person . ?p :hasParent ?z . OPTIONAL { ?p :forename ?n } }

RDFox will execute the following query plan:

QUERY ?p ?n                                                                  QueryIterator
    PROJECT ?n ?p                          {          -->    ?p | ?n }
        OPTIONAL                           {          -->    ?p ?z | ?n }    OptionalIterator
            CONJUNCTION                    {          -->    ?p ?z }         NestedIndexLoopJoinIterator
                [?p, :hasParent, ?z]       {          -->    ?p ?z }         TripleTableIterator
                [?p, rdf:type, :person]    { ?p ?z    -->    ?p ?z }         TripleTableIterator
            FILTER true
                [?p, :forename, ?n]        { ?p ?z    -->    ?n ?p ?z }      TripleTableIterator

The important difference to notice in this plan is the use of the ” | ” symbol. The variables on the left-hand-side of “|” are always bound by the corresponding block, whereas those indicated on the right-hand-side may or may not be returned.