Warning: This document is for an old version of RDFox Docs. The latest version is 4.0.

5. Reasoning in RDFox

Reasoning in RDF is the ability to calculate the set of triples that logically follow from an RDF graph and a set of rules. Such logical consequences are materialized in RDFox as new triples in the graph.

The use of rules can significantly simplify the management of RDF data as well as provide a more complete set of answers to user queries. Consider, for instance, a graph containing the following triples:

:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .

The relation :locatedIn is intuitively transitive: from the fact that Oxford is located in Oxfordshire and Oxfordshire is located in England, we can deduce that Oxford is located in England. The triple :oxford :locatedIn :england is, however, missing from the graph. As a consequence, SPARQL queries asking for all English cities will not return :oxford as an answer.

We could, of course, add the missing triple by hand to the graph, in which case :oxford would now be returned as an answer to our previous query. Doing so, however, has a number of important disadvantages. First, there can be millions such missing triples and each of them would need to be manually added, which is cumbersome and error-prone; for instance, if we add to the graph the triple :england :locatedIn :uk, then the following additional triples should also be added:

:oxford :locatedIn :uk .
:oxfordshire :locatedIn :uk .

More importantly, by manually adding missing triples we are not capturing the transitive nature of the relation, which establishes a causal link between different triples. Indeed, triple :oxford :locatedIn :england holds because triples :oxford :locatedIn :oxfordshire and :oxfordshire :locatedIn :england are part of the data. Assume that we later find out that :oxford is not located in :oxfordshire, but rather in the state of Mississippi in the US, and we delete from the graph the triple :oxford :locatedIn :oxfordshire as a result. Then, the triples :oxford locatedIn :england and :oxford :locatedIn :uk should also be retracted as they are no longer justified. Such situations are very hard to handle manually.

As we will see next, we can use a rule to faithfully represent the transitive nature of the relation and handle all of the aforementioned challenges in an efficient and elegant way.

5.1. Rule Languages

A rule language for RDF determines which syntactic expressions are valid rules, and also provides well-defined meaning to each rule. In particular, given an arbitrary set of syntactically valid rules and an arbitrary RDF graph, the set of new triples that follow from the application of the rules to the graph must be unambiguously defined.

5.1.1. Datalog

Rule languages have been in use since the 1980s in the fields of data management and artificial intelligence. The basic rule language is called Datalog. It is a very well understood language, which constitutes the core of a plethora of subsequent rule formalisms equipped with a wide range of extensions. In this section, we describe Datalog in the context of RDF.

A Datalog rule can be seen as an IF THEN statement. In particular, the following is a Datalog rule which faithfully represents the transitive nature of the relation :locatedIn.

[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .

The IF part of the rule is also called the body or antecedent; the THEN part of the rule is called the head or the consequent. The head is written first and is separated from the body by the symbol :-. Both body and head consist of a conjunction of conditions, where conjuncts are comma-separated and where each conjunct is a triple in which variables may occur. Each conjunct in the body or the head is called an atom. In our example, the body consists of atoms [?x, :locatedIn, ?y] and [?y, :locatedIn, ?z], whereas the head consists of the single atom [?x, :locatedIn, ?z].

Each rule conveys the idea that, from certain combinations of triples in the input RDF graph, we can logically deduce that some other triples must also be part of the graph. In particular, variables in the rule range over all possible nodes in the RDF graph (RDF literals, URIs, blank nodes); whenever these variables are assigned values that make the rule body become subset of the graph, then we see what the value of those variables is, propagate these values to the head of the rule, and deduce that the resulting triples must also be a part of the graph.

In our example, a particular rule application binds variable ?x to :oxford, variable ?y to :oxfordshire and variable ?z to :england, which then implies that that triple :oxford :locatedIn :england obtained by replacing ?x with :oxford and ?z with :england in the head of the rule holds as a logical consequence. A different rule application would bind ?x to :oxfordshire, ?y to :england, and ?z to :uk; as a result, the triple :oxfordshire :locatedIn :uk can also be derived as a logical consequence.

An alternative way to understand the meaning of a single Datalog rule application to an RDF graph is to look at it as the execution of an INSERT statement in SPARQL, which adds a set of triples to the graph. In particular, the statement

INSERT { ?x :locatedIn ?z } WHERE { ?x :locatedIn ?y. ?y :locatedIn ?z }

corresponding to our example rule leads to the insertion of triples

:oxford :locatedIn :england .
:oxfordshire :locatedIn :uk .

There is, however, a fundamental difference that makes rules more powerful than simple INSERT statements in SPARQL, namely that rules are applied recursively . Indeed, after we have derived that Oxford is located in England, we can apply the rule again by matching ?x to :oxford, ?y to :england, and ?z to :uk, to derive :oxford :locatedIn :uk—a triple that is not obtained as a result of the INSERT statement above.

In this way, the logical consequences of a set of Datalog rules on an input graph are captured by the recursive application of the rules until no new information can be added to the graph. It is important to notice that the set of new triples obtained is completely independent from the order in which rule applications are performed as well as of the order in which different elements of rule bodies are given. In particular, the following two rules are equivalent:

[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
[?x, :locatedIn, ?z] :- [?y, :locatedIn, ?z], [?x, :locatedIn, ?y] .

5.1.2. Extensions of Datalog

A wide range of extensions of Datalog have been proposed and studied in the literature. In this subsection we describe the extensions of Datalog implemented in RDFox as well as the restrictions on them that have been put in place in order to ensure that the resulting language is semantically well-defined. Later on in this section we will provide many more examples of rules equipped with these extended features.

5.1.2.1. Negation-as-failure

Negation-as-failure allows us to make deductions based on information that is not present in the graph. For instance, using negation-as-failure we can write a rule saying that someone who works for a company but is not an employee of the company is an external contractor.

[?x, :contractorFor, ?y] :- [?x, :worksFor, ?y], NOT [?x, :employeeOf, ?y] .

Here, NOT represents a negation of a body atom.

Let us consider the logical consequences of this rule when applied to the graph

:mary :worksFor :acme .
:mary :employeeOf :acme .
:bob :worksFor :acme .

On the one hand, we have that :mary works for :acme, and hence we can satisfy the first atom in the body by assigning :mary to ?x and :acme to ?y; however, :mary is also an employee of :acme, and hence the second condition is not satisfied, which means that we cannot derive that :mary is a contractor. On the other hand, we also have that :bob works for :acme and hence once again we can satisfy the first atom in the body, this time by assigning :bob to ?x and :acme to ?y; but now, we do not have a triple in the graph stating that :bob is an employee of :acme and hence we can satisfy the second condition in the body and derive the triple :bob  :contractorFor :acme.

Indeed, the query

SELECT ?x ?y WHERE { ?x :contractorFor ?y }

yields the expected result

:bob :acme .

Note that negation typically means “absence of information”; indeed, we do not know for sure whether :bob is not an employee of :acme; we only know that this information is not available in the graph (neither explicitly, nor as a consequence of other rule applications).

Negation-as-failure is intrinsically non-monotonic. In logic, this means that new information may invalidate previous deductions. For instance, suppose that :bob becomes an employee of :acme and, to reflect this, we add to our data graph the triple :bob :employeeOf :acme. Then, we can no longer infer that :bob is a contractor for :acme and the previous query will now return an empty answer. In contrast, rules in plain Datalog are monotonic: adding new triples to the graph cannot invalidate any consequences that we may have previously drawn; for instance, by adding a triple :england locatedIn :uk to the example in our previous section, cannot invalidate a previous inference such as :oxford locatedIn :england.

5.1.2.2. Aggregation

Aggregation is an important feature in query languages such as SQL or SPARQL. It allows one to compute numeric values (such as minimums, maximums, sums, counts or averages) on groups of solutions satisfying certain conditions (e.g., compute an average salary over the group of people working in the accounting department).

In RDFox, it is possible to define relations based on the result of aggregate calculations. For instance, consider the following data.

:bob :worksFor :accounting .
:bob :salary "50000"^^xsd:integer .
:mary :worksFor :hr .
:mary :salary "47000"^^xsd:integer .
:jen :worksFor :accounting .
:jen :salary "60000"^^xsd:integer .
:accounting rdf:type :Department .
:hr rdf:type :Department .

We can write an RDFox rule that computes the average salary of each department, and store the result in a newly introduced relation:

[?d, :deptAvgSalary, ?z] :-
    [?d, rdf:type, :Department],
    AGGREGATE(
        [?x, :worksFor, ?d],
        [?x, :salary, ?s]
        ON ?d
        BIND AVG(?s) AS ?z) .

Here, each group consists of a department with salaried employees, and for each group the rule computes an average of the salaries involved. In particular, suppose that we satisfy the first atom in the body by assigning value :accounting to variable ?d; then, we can satisfy the aggregate atom by grouping all employees working for :accounting (i.e., :bob and :jen), compute their average salary (55k) and assigning the resulting value to variable ?z; as a result, we can propagate the assignment of ?d to :accounting and of ?z to 55,000 to the head and derive the triple

:accounting :deptAvgSalary "55000"^^xsd:integer .

The query

SELECT ?d ?s WHERE { ?d rdf:type :Department . ?d :deptAvgSalary ?s }

then returns the expected answers

:accounting 55000.0 .
:hr 47000.0 .

Similarly to negation, aggregation is also a non-monotonic extension of Datalog. In particular, if we were to add a new employee to the accounting department with a salary of 52k, then we would need to withdraw our previous inference that the average accounting salary equals 55k and adjust the average accordingly.

5.1.2.3. Built-in Functions

Datalog can be extended with a wide range of built-in functions. These include the functions defined in the SPARQL specification (e.g., arithmetic operations, string concatenation, and so on), as well as function symbols in predicate first-order logic via the special function SKOLEM.

Let us start by introducing an example using the SKOLEM function, which can be used to capture function symbols in first-order predicate logic. Function symbols can be used to create objects that must exist in the world, but whose identity is unknown to us. As we will see later on, this is useful for representing relations of arity higher than two as well as for data integration and data restructuring.

Consider the following rules, where the second one uses the SKOLEM function:

[?y, rdf:type, :Person] :-
   [?x, :marriedTo, ?y],
   [?x, rdf:type, :Person] .

[?x, :hasMother, ?y] :-
   [?x, rdf:type, :Person],
   BIND(SKOLEM("motherOf", ?x) AS ?y) .

The first rule is a simple Datalog rule stating that everyone married to a person is also a person. The second rule generates, for every person, a new object in the graph representing the person’s mother. To understand the meaning of the second rule, consider its application to a triple :mary rdf:type :Person. Here, we can bind ?x to :mary because :mary is a person; now, the application of SKOLEM to :mary generates a new object which represents the mother of :mary, and this new object is assigned as the value of variable ?y and propagated to the head of the rule. As a result, we derive a triple relating :mary to her mother via the :hasMother relation.

Let us for now reconsider a variant of our “family” example data from the Getting Started guide, which contains the following triples in Turtle format:

:peter :forename "Peter" ;
    a :Person ;
    :marriedTo :lois ;
    :gender "male" .

:lois :forename "Lois" ;
    :gender "female" .

:brian :forename "Brian" . # Brian is a dog

And let us import our previous two rules. The following query asking for people having a mother

SELECT ?x WHERE { ?x rdf:type :Person. ?x :hasMother ?y }

returns :lois and :peter as answers. Indeed, :peter is a :Person according to the data, and hence by the second rule before he must have a mother. In turn, :lois is married to :peter, and hence by the first rule she must be a :Person, and by the second rule :lois must also have a mother.

Let us consider another example of a built-in function, namely string concatenation. The following rule computes the full name of a person as the concatenation of their first name and their family name.

[?x, :fullName, ?n] :-
    [?x, :firstName, ?y],
    [?x, :lastName, ?z],
    BIND(CONCAT(?y, ?z) AS ?n) .

Consider the application of this rule to the graph consisting of the following triples:

:peter :firstName "Peter" .
:peter :lastName "Griffin" .

Then, the query

SELECT ?x ?y WHERE { ?x :fullName ?y }

would return the expected answer

:peter "PeterGriffin" .

An important consequence of introducing built-in functions is that rules are now capable of deriving triples mentioning new objects which did not occur in the input data (such as the mothers of Peter and Lois in our first example and “PeterGriffin” in our second example). This is not possible using plain Datalog rules, where the application of a rule may generate new triples, but these triples can only mention objects that were present in the input data.

If users are not careful, they may write rules using built-in functions that generate infinitely many new constants and hence there may be infinitely many triples that logically follow from the rules and a (finite) input graph.

For instance, consider our previous example rules

[?y, rdf:type, :Person] :-
   [?x, :marriedTo, ?y],
   [?x, rdf:type, :Person] .

[?x, :hasMother, ?y] :-
   [?x, rdf:type, :Person],
   BIND(SKOLEM("motherOf", ?x) AS ?y) .

Suppose that we add another rule saying that the mother of a person must also be a person:

[?y, rdf:type, :Person] :- [?x, :hasMother, ?y] .

If we apply these rules to the input graph consisting of

:peter rdf:type :Person .

we will derive an infinite “chain” of triples, where the first one relates :peter with his mother, the second one relates peter’s mother to his grand-mother, and so on.

In such cases, RDFox will run out of resources trying to compute infinitely many new triples and will therefore not terminate. This is not due to a limitation of RDFox as a system, but rather to the well-known fact that Datalog becomes undecidable once extended with built-in functions that can introduce arbitrarily many fresh objects.

5.1.2.4. Equality

Equality is a special binary predicate that can be used to identify different resources as representing the same real-world object. The equality predicate is referred to as owl:sameAs in the standard W3C languages for the Semantic Web. In addition to equality, W3C standard languages also define an inequality predicate, which is referred to as owl:differentFrom.

By default, two resources with different names are not assumed to be actually different. For instance, resources called :marie_curie and :marie_sklodowsca may refer to the same object in the world (the renowned scientist Marie Curie). In logic terms we typically say that by default we are not making the unique name assumption (UNA). In some applications, however, it makes sense to make such assumption, and the effect of making the UNA is that we will have implicit owl:differentFrom statements between all pairs of resources mentioned in the data.

In RDFox we can enable the use of equality by initializing a store accordingly. For instance, using the shell, we can initialize a data store with equality reasoning turned on using the shell command

init seq equality noUNA

initializes a data store with equality reasoning and no UNA.

Extensions of Datalog with equality allow for the equality and inequality predicates to appear in rules and data. For instance, consider the following triples, where the second triple represents the fact that the URIs :marie_curie and :marie_sklodowsca refer to the same person.

:marie_curie rdf:type :Scientist .
:marie_curie owl:sameAs :marie_sklodowsca .

A query asking RDFox for all scientists

SELECT ?x WHERE { ?x rdf:type :Scientist }

will return both :marie_curie and :marie_sklodowsca as a result.

Equality and inequality can also be used in rules. For instance, the following rule establishes that a person can only have one biological mother

[?y, owl:sameAs, ?z] :- [?x, :hasMother, ?y], [?x, :hasMother, ?z] .

The application of this rule to the graph

:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :marie_sklodowsca .

identifies :marie_curie and :marie_sklodowsca :as the same person.

The joint use of equality and inequality can lead to logical contradictions. For instance, the application of the previous rule to a graph consisting of the following triples would lead to a contradiction:

:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :eve_curie .
:marie_curie owl:differentFrom :eve_curie .

Indeed, the application of the rule derives :marie_curie  owl:sameAs :eve_curie, which is in contradiction with the data triple :marie_curie  owl:differentFrom :eve_curie. Such contradictions can be identified in RDFox by querying for the instances of the special owl:Nothing predicate, which is also borrowed from the W3C standard OWL. The query

SELECT ?x WHERE { ?x rdf:type owl:Nothing }

returns :marie_curie and :eve_curie as answers. This can be interpreted by the user as: “resources :marie_curie and :irene_curie are involved in a logical contradiction”.

5.1.2.5. Named Graphs and N-ary Relations

In all our previous examples, all atoms in rules are evaluated against the default RDF graph. RDFox also supports named graphs, which can be created either implicitly, by importing an RDF dataset encoded as TriG or N-Quads, or explicitly, as shown in the following example that creates the named graph :Payroll.

tupletable add :Payroll type triples

Named graphs can also be used in the body and the head of rules, and hence it is possible to derive new triples as the result of rule application and add them to graphs other than the default graph. Rules can refer only to named graphs already created using one of the ways described above.

For instance, consider the following rule:

:Payroll(?id, :monthlyPayment, ?m) :-
    [?id, rdf:type, :Employee],
    :HR(?id, :yearlySalary, ?s),
    BIND(?s / 12 AS ?m) .

This rule joins information from the default graph and the named graph called HR, and it inserts consequences into the named graph called :Payroll. Specifically, The first body atom of the rule identifies IDs of employees in the default RDF graph. The second body atom is a general atom: it is evaluated in the named graph called :HR, and it matches triples that connect IDs with their yearly salaries. The head of the rule contains a general atom that refers to the named graph called :Payroll, and it derives triples that connect IDs of employees with their respective monthly payments. In particular, given as data

:HR(:a, :yearlySalary, "55000"^^xsd:integer) .
:a rdf:type :Employee .

the rule will compute the monthly payment for employee :a. Then, the query

SELECT ?s ?p ?o WHERE { GRAPH :Payroll{ ?s ?p ?o } }

will correctly return the monthly payment for employee :a

:a :monthlyPayment 4583.333333333333333 .

In addition to referring to graphs other than the default graph, RDFox can also directly represent external data as tuples of arbitrary arity (not just triples) using the same syntax as named graphs. Atoms representing such data, however, are only allowed to be used in the body of rules. Details on how to access external data from RDFox are given in Section 6.6.

5.2. Materialization-based Reasoning

The main computational problem solved by RDFox is that of answering a SPARQL 1.1 query with respect to an RDF graph and a set of rules.

To solve this problem, RDFox uses materialization-based reasoning to precompute and store all triples that logically follow from the input graph and rules in a query-independent way. Both the process of extending the input graph with such newly derived triples and its final output are commonly called materialization. After such preprocessing, queries can be answered directly over the materialization, which is usually very efficient since the rules do not need to be considered any further. Materializations can be large, but they can usually be stored and handled on modern hardware as the available memory is continually increasing.

The main challenge of this approach to query answering is that, whenever data triples and/or rules are added and/or deleted, the “old” materialization must be replaced with the “new” materialization that contains all triples that follow from the updated input. In this setting, deletion of triples is restricted to those that are explicit in the input graph and hence one does not consider deletion of derived triples—a complex problem known in the literature as belief revision or view update.

For instance, given as input the RDF graph

:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .
:england :locatedIn :uk .

and the familiar rule

[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .

RDFox will compute the corresponding materialization, which consists of triples

:oxford :locatedIn :oxfordshire .
:oxford :locatedIn :england .
:oxford :locatedIn :uk .
:oxfordshire :locatedIn :england .
:oxfordshire :locatedIn :uk .
:england :locatedIn :uk .

RDFox will now handle each SPARQL 1.1 query issued against the input graph and rule by simply evaluating the query directly over the materialization, thus avoiding the expensive reasoning at query time.

An update could delete a triple explicitly given in the input graph such as the triple :oxfordshire :locatedIn :england, in which case the new materialization consists only of triples

:oxford :locatedIn :oxfordshire .
:england :locatedIn :uk .

since the rule is no longer applicable after deletion. In contrast, deleting a derived triple such as :oxford :locatedIn :uk . is not allowed since this triple was not part of the original input.

RDFox implements sophisticated algorithms for both efficiently computing materializations and maintaining them under addition/deletion updates that may affect both the data and the rules. All these algorithms were developed after years of research at Oxford and have been extensively documented in the scientific literature.

5.3. Restrictions on Rule Sets

The rule language of RDFox imposes certain restrictions on the structure of rule sets. These restrictions ensure that the materialization of a set of rules and an RDF graph is well-defined and unique.

In particular, the semantics (i.e., the logical meaning) of rule sets involving negation-as-failure and/or aggregation is not straightforward, and numerous proposals exist in the scientific literature. There is, however, a general consensus for rule sets in which the use of negation-as-failure and aggregation are stratified. Informally, stratification conditions ensure that there are no cyclic dependencies in the rule set involving negation or aggregation.

Several variants of stratification have been proposed, where some of them capture a wider range of rule sets than others; they all, however, provide similar guarantees. We next describe the stratification conditions adopted in RDFox by means of examples. For this, let us consider the following rules mentioning negation-as-failure:

[?x, :contractorFor, ?y] :-
   [?x, :worksFor, ?y],
   NOT [?x, :employeeOf, ?y] .

[?x, :employeeOf, :acme] :- [?x, :worksFor, :acme] .

The first rule says that people working for a company who are not employees of that company act as contractors. The rule establishes two dependencies. The first dependency tells us that the presence of a triple having :worksFor in the middle position may contribute to triggering the derivation of a triple having contractorFor in the middle position. In turn, the second dependency tells us that the absence of a triple having :employeeOf in the middle position may also contribute to the derivation of a triple having contractorFor in the middle position.

The second rule tells us that everyone working for :acme is an employee of :acme. This rule establishes one dependency, namely the presence of a triple having :worksFor in the middle position and :acme in the rightmost position may trigger the derivation of a triple having :employeeOf in the middle position and :acme in the rightmost position.

We can keep track of such dependencies by means of a dependency graph. The nodes of the graph are obtained by replacing variables in individual triple patterns occurring in the rules with the special symbol ANY, which intuitively indicates that the position of the triple where it occurs can adopt any constant value, and leaving constants as they are. In particular, our example rules yield a graph having the following five vertices v1—v5:

v1:  ANY :contractorFor ANY
v2:  ANY :worksFor      ANY
v3:  ANY :employeeOf    ANY
v4:  ANY :worksFor      :acme
v5:  ANY :employeeOf    :acme

The (directed) edges of the graph lead from vertices corresponding to body atoms to vertices corresponding to head atoms and can be either “regular” or “special”. Special edges witness the presence of a dependency involving aggregation or negation-as-failure; in our case, we will have a single special edge (v3, v1). In turn, each dependency that is not via negation-as-failure/aggregation generates a regular edge; in our case, we will have regular edges (v2,v1) and (v4, v5). Finally, the graph will also contain bidirectional regular edges between nodes that unify in the sense of first-order logic: since [?x, :employeeOf, ?y] and [?x, :employeeOf, :acme] unify, we will have regular edges (v3,v5) and (v5, v3); similarly, we will also have regular edges (v2,v4) and (v4,v2).

Our two example rules are stratified and hence are accepted by RDFox; this is because there is no cycle in the dependency graph involving a special edge (indeed, all cycles involve regular edges only).

Now suppose that the add the following rule:

[?x, :employeeOf, ?y] :-
   [?x, :worksFor, ?y],
   NOT [?x, :contractorFor, ?y] .

which says that people working for a company who are not contractors for the company must be employees of the company. The addition of this rule does not change the set of nodes in the dependency graph; however, it adds two more edges: a regular edge (v2, v3) and a special edge (v1, v3). As a result, we now have a cycle involving a special edge and the rule set is no longer stratified, which means that the rule set will be rejected by RDFox as a result.

Due to stratification conditions, the use of the special equality relation owl:sameAs in rules precludes the use of aggregation or negation-as-failure. Consider the following rule set, where the second rule tells us that a person cannot be an employee of two different companies:

[?x, :contractorFor, ?y] :-
   [?x, :worksFor, ?y],
   NOT [?x, :employeeOf, ?y] .

[?y, owl:sameAs, ?z] :-
   [?x, :employeeOf, ?y],
   [?x, :employeeOf, ?z] .

This rule set will be rejected by RDFox as the rule set mentions both NOT and owl:sameAs. Informally, this is because equality can affect every single relation, which precludes stratification in most cases.

In addition to stratification conditions, RDFox also requires certain restrictions to the structure of rules which make sure that each rule can be evaluated by binding the variables in the body of the rule to a data graph. To see an example where things go wrong consider the rule:

[?x, :worksFor, ?y] :- [?y, rdf:type, :Department] .

The rule cannot be evaluated by first matching the body to the data graph and then propagating the variable bindings to the head; indeed, rule body to an RDF graph will always leave variable ?x of the rule unbound and hence the triple that must be added as a result of applying the rule to the data is undefined. As a result, this rule will be rejected by RDFox.

Binding restrictions in RDFox are rather involved given that the underpinning rule language is rich and there are many subtle corner cases. However, rules accepted by the parser can always be unambiguously evaluated.

5.4. The Rule Language of RDFox

This section formally specifies the syntax of rules in RDFox. As already mentioned, the rule language supported by RDFox extends Datalog with stratified negation, stratified aggregation, built-in functions, and more, so as to provide additional data analysis capabilities.

A rule has the form

H1 ,… , Hj :- L1 ,… , Lk .

where the formula to the left of the :- operator is the rule head and the formula to the right is the rule body. Informally, a rule says “if L1, …, and Lk all hold, then H1, …, and Hj hold as well”. Each Hi with 1 ≤ i ≤ j is an atom, and each Li with 1 ≤ i ≤ k is a literal. A literal is an atom, a negation, a bind literal, a filter literal, or an aggregate literal.

5.4.1. Atom

An atom is either a default graph RDF atom or a general atom. General atoms can be used to access data in named graphs and mounted data sources.

5.4.1.1. Default Graph RDF Atom

A default graph RDF atom has the form [t1, t2, t3] where ti is a term, which is either an RDF resource or a variable. To distinguish between these two kinds of terms, RDFox requires variables to start with the ? symbol. Also note that when t2 is an IRI, atom [t1,t2,t3] can be written alternatively as t2[t1,t3] moreover, when t2 is the special IRI “rdf:type” and t3 is also an IRI, atom [t1,t2,t3] can be written alternatively as t3[t1].

Example A simple rule with default graph RDF atoms only

a1:Person[?x] :- a1:teacherOf[?x, ?y] .

As we discussed earlier, this is equivalent to:

[?x, rdf:type, a1:Person] :- [?x, a1:teacherOf, ?y] .

The above rule has only one atom in the rule body and one atom in the rule head. Informally, the rule says that if x is a teacher of y, then x must be a person. Both the body and the head are matched in the default RDF graph.

5.4.1.2. General Atom

A general atom has the form A(t1, …, tn) with n ≥ 1 where A is an IRI denoting the name of a tuple table and t1, …, tn are terms. Each named RDF graph is represented in RDFox as a tuple table; thus, general atoms can be used to refer to data in named graphs.

Example A rule with both RDF and general atoms

[?id, fg:firstName, ?fn],
[?id, fg:lastName, ?ln] :-
   fg:Person(?id, ?fn, ?ln) .

The general atom in the rule body refers to a tuple table containing three columns. The same rule can be written alternatively as the following.

fg:firstName[?id, ?fn],
fg:lastName[?id, ?ln] :-
   fg:Person(?id, ?fn, ?ln) .

5.4.2. Negation

Negation is useful when the user wants to require that certain conditions are not satisfied. A negation has one of the following forms, where k ≥ 2, j ≥ 1, B1, …, Bk are atoms, and ?V1, …, ?Vj are variables.

NOT B1
NOT(B1, …, Bk)
NOT EXIST ?V1, …, ?Vj IN B1
NOT EXIST ?V1, …, ?Vj IN (B1, …, Bk)
NOT EXISTS ?V1, …, ?Vj IN B1
NOT EXISTS ?V1, …, ?Vj IN (B1, …, Bk)

Note RDFox will reject rules that use negation in all equality modes other than off (see Equality).

Example Using negation of the first form

a1:stranger[?x, ?y] :-
   a1:Person(?x),
   a1:Person(?y),
   NOT a1:friend[?x, ?y] .

Example Using negation of the last form

a1:basic[?x] :-
   a1:component[?x],
   NOT EXISTS ?y IN (
      a1:component[?y],
      a1:subcomponent[?y, ?x]
   ) .

Informally, the rule says that if X is a component and it does not have any subcomponents, then X is a basic component.

5.4.3. Bind Literal

A bind literal evaluates an expression and assigns the value of the expression to a variable, or compares the value of the expression with a term. A bind literal is of the following form, where exp is an expression and t is a term not appearing in exp. An expression can be constructed from terms, operators, and functions. The operators and functions supported here are the same as those supported in RDFox SPARQL queries; refer to Section 4 for a detailed comparison between SPARQL 1.1 functions and the ones implemented in RDFox.

BIND(exp AS t)

An important difference with SPARQL 1.1 is that, for each bind literal in a rule, every variable used in exp must be bound either by a body atom, or by another bind literal in the rule.

Example Using bind literals

cTemp[?x, ?z] :- fTemp[?x, ?y], BIND ((?y - 32) / 1.8 AS ?z) .

The bind literal in the above rule converts Fahrenheit degrees to Celsius degrees.

5.4.4. Filter Literal

Rule evaluation can be seen as the process of finding satisfying assignments for variables appearing in the rule. A filter literal is of the following form, and it restricts satisfying assignments of variables to those for which the expression exp evaluates to true. Thus, when the user writes a filter literal, the expression is expected to provide truth values.

FILTER(exp)

As with bind literals, every variable used in exp must be bound either by a body atom or by a bind literal.

Example Using filter literals

:PosNum[?x] :- :Num[?x], FILTER(?x > 0)

The rule says that a number is positive if it is larger than zero.

5.4.5. Aggregate Literal

An aggregate literal applies an aggregate function to groups of values to produce one value for each group. An aggregate literal has the form

AGGREGATE(B1, …, Bk ON ?X1, …, ?Xj BIND f1(exp1) AS t1 … BIND fn(expn) AS tn)

where k ≥ 1, j ≥ 0, n ≥ 1, and

  • B1, …, Bk are atoms,

  • ?X1, …, ?Xj are variables appearing in B1, …, Bk,

  • exp1, …, expn are expressions constructed using variables from B1, …, Bk,

  • f1, …, fn are aggregate functions, and

  • t1, …, tn are constants or variables that do not appear in B1, …, Bk.

Sometimes the user might be interested in computing an aggregate value from a set of distinct values. In this case, the keyword “distinct” can be used in front of an expression expi.

Note RDFox will reject rules that use aggregation in all equality modes other than off (see Equality).

Example Using aggregate literals

:minTemp[?x, ?z] :-
   :City[?x],
   AGGREGATE(
      :temp[?x, ?y]
      ON ?x
      BIND MIN(?y) AS ?z) .

Informally, the above rule computes a minimum temperature for each city.

Example Using the keyword distinct

:familyFriendCnt[?x, ?cnt] :-
    :Family[?x],
    AGGREGATE(
        :hasMember[?x, ?y],
        :hasFriend[?y, ?z]
        ON ?x
        BIND COUNT(DISTINCT ?z) AS ?cnt) .

This rule counts the number of different friends for each family; a person is considered a friend of a family if he is a friend of a member of the family.

5.5. Common Uses of Rules in Practice

This section describes common uses of rules and reasoning in practical applications. This section will be especially useful for practitioners who are seeking to understand how the reasoning capabilities provided by RDFox can enhance graph data management.

5.5.1. Computing the Transitive Closure of a Relation

In many other situations, we may have a relation that is not transitive, but we are interested in defining a different relation that “transitively closes” it. Consider a social network where users follow other users. The graph may be represented by the triples next.

:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .

A common task in social networks is to use existing connections to suggest new ones. For example, since Alice follows Bob and Bob follows Charlie, the system may suggest that Alice follow Charlie as well. Likewise, the system may suggest that Diana follow Bob; but then, if Diana follows Bob, she may also want to follow Charlie. We would like to construct an enhanced social network that contains the actual follows relations plus all the suggested additional links. The links in such enhanced social network represent the transitive closure of the original follows relation, which relates any pair of people who are connected by a path in the network. The transitive closure of the follows relation can be computed using RDFox by defining the following two rules:

[?x, :followsClosure, ?y] :- [?x, :follows, ?y] .

[?x, :followsClosure, ?z] :-
   [?x, :follows, ?y],
   [?y, :followsClosure, ?z] .

The first rule “copies” the contents of the direct follows relation to the new relation. The second rule implements the closure by saying that if a person p1 directly follows p2 and p2 (directly or indirectly) follows person p3, then p1 (indirectly) follows p3.

If we now issue the SPARQL query

SELECT ?x ?y WHERE { ?x :followsClosure ?y }

we obtain the expected results.

:diana :charlie .
:alice :charlie .
:diana :bob .
:alice :bob .
:bob :charlie .
:diana :alice .

Finally, we may also be interested in computing the suggested links that were not already part of the original follows relation. This can be achieved, for instance, by issuing the SPARQL query

SELECT ?x ?y
WHERE {
   ?x :followsClosure ?y .
   FILTER NOT EXISTS { ?x :follows ?y }
}

The results are the expected ones.

:diana :charlie .
:alice :charlie .
:diana :bob .

5.5.2. Composing Relations

An important practical use of knowledge graphs is to power Open Query Answering (Open QA) applications, where the user would pose a question in natural language, which is then automatically answered against the graph. Open QA systems often struggle to interpret questions that involve several “hops” in the graph. For instance, consider the graph consisting of the triples given next.

:douglas_adams :bornIn :uk .
:uk rdf:type :Country .

A user may ask the Open QA system for the country of birth of Douglas Adams. To obtain this information, the system would need to construct a query involving two hops in the graph. In particular, the SPARQL query

SELECT ?c
WHERE {
  :douglas_adams :bornIn  ?c .
  ?c rdf:type :Country .
}

would return :uk as answer.

The results of the open QA system would be greatly enhanced if the desired information had been available in just a single hop. RDFox rules can be used to provide a clean solution in this situation. In particular, we can use rules to define a new :countryOfBirth relation that provides a “shortcut” for directly accessing the desired information.

[?x, :countryOfBirth, ?y] :- [?x, :bornIn, ?y], [?y, rdf:type, :country] .

The rule says that, if a person p is born in a place c, and that place is a country, then c is the country of birth of p. As a result, RDFox would derive that the country of birth of Douglas Adams is the UK. The Open QA system would now only need to construct the following simpler query, which involves a single hop in the graph, to obtain the desired information.

SELECT ?x ?y WHERE { ?x :countryOfBirth ?y }

5.5.3. Representing SPARQL 1.1 Property Paths

As already mentioned, RDFox does not currently support SPARQL 1.1 property paths. It is, however, possible to encode property paths as rules.

Informally, a property path searches through the RDF graph for a sequence of IRIs that form a path conforming to an regular expression. For instance, the following query in our familiar social network example

SELECT ?x WHERE { ?x :follows+ :bob }

returns the set of people that follow :bob directly or indirectly in the network. In this case, the property path (?x :follows+ :bob) represents a path of arbitrary length from any node to :bob via the :follows relation, where the “+” symbol is the familiar one in regular expressions indicating “one or more occurrences”.

Property paths representing paths of arbitrary length are closely related to computing the transitive closure of a relation. In particular, the following rules would compute the set of “Bob followers” as those who follow :bob directly or indirectly.

[?x, rdf:type, :BobFollower] :- [?x, :follows, :bob] .

[?x, rdf:type, :BobFollower] :-
   [?x, :follows, ?y],
   [?y, rdf:type, :BobFollower] .

The simple query

SELECT ?x WHERE { ?x rdf:type :BobFollower }

gives us the same answers as the original query using property paths.

5.5.4. Defining a Query as a View

When querying a knowledge graph, we may be interested in materializing the result of a SPARQL query as a new relation in the graph. This can be the case, for instance, if the query is interesting on its own right, can be used to define new relations, or simplify the formulation of additional queries.

We can use an RDFox rule for this purpose, where the SPARQL query that we want to materialize in the graph is represented in the body of the rule and the answer as a new relation in the head.

For instance, consider again the previous example of a social network, where we were interested in suggesting new followers (recall the Transitive Closure usage pattern). Recall that we used a query

SELECT ?x ?y
WHERE {
    ?x :followsClosure ?y
    FILTER NOT EXISTS { ?x :follows ?y }
}

to obtain the suggested links that were not already part of the original follows relation. We may be interested in storing this query as a separate relation in the graph. For this, we could rewrite the query as a rule defining a new :suggestFollows relation:

[?x, :suggestFollows, ?y] :- [?x, :followsClosure, ?y], NOT [?x, :follows, ?y] .

The body of the rule represents the where clause in the query. The filter expression in the query is captured by the negated atom. Then, the simple query

SELECT ?x ?y WHERE { ?x :suggestFollows ?y }

will give us the expected answers

:diana :charlie .
:alice :charlie .
:diana :bob .

It is worth pointing out that only a subset of SPARQL 1.1 queries can be transformed into an RDFox rule in the way described. In particular, all queries involving basic graph patterns, filter expressions, negation (NOT EXISTS, MINUS) and aggregation can be represented. In contrast, SPARQL queries with more than two answer variables, or using OPTIONAL or UNION in the WHERE clause cannot be represented as rules.

5.5.5. Performing Calculations and Aggregating Data

RDFox rules can be used to perform computations over the data in a knowledge graph and store the results in a different relation. For instance, consider a graph with the following triples, specifying the height of different people in cm.

:alice :height "165"^^xsd:integer .
:bob :height "180"^^xsd:integer .
:diana :height "168"^^xsd:integer .
:emma :height "165"^^xsd:integer .

We would want to compute their height in feet, and record it in the graph by adding suitable triples over a new relation. For this, we can import the following RDFox rule.

[?x, :heightInFeet, ?y] :- [?x, :height, ?h], BIND(?h*0.0328  AS ?y) .

The BIND construct evaluates an expression and assigns the value of the expression to a variable.

We can now query the graph for the newly introduced relation to obtain the list of people and their height in both centimeters and feet.

SELECT ?x ?m ?f
WHERE {
    ?x :height ?m .
    ?x :heightInFeet ?f .
}

and obtain the expected answers

:emma 165 5.412 .
:diana 168 5.5104 .
:bob 180 5.904 .
:alice 165 5.412 .

Rules can also be used to compute aggregated values (e.g., sums, counts, averages, etc) over the graph and store the results in a new relation.

:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
:charlie :follows :alice.
:emma :follows :bob .
:alice rdf:type :Person .
:bob rdf:type :Person .
:charlie rdf:type :Person .
:diana rdf:type :Person .
:emma rdf:type :Person .

The graph contains also information about people’s hobbies, as represented by the following triples.

:alice :likes :tennis .
:bob :likes :music .
:diana :likes :swimming .
:charlie :likes :football .
:emma :likes :reading .
:tennis rdf:type :Sport .
:swimming rdf:type :Sport .
:football rdf:type :Sport .

We would like to count, for each person, the number of followers who enjoy practicing a sport. RDFox provides aggregation constructs which enable these kinds of computations.

[?y, :sportyFollowerCnt, ?cnt] :-
    [?y, rdf:type, :Person],
    AGGREGATE(
        [?x, :follows, ?y],
        [?x, :likes, ?w],
        [?w, rdf:type, :Sport]
        ON ?y
        BIND COUNT(DISTINCT ?x) AS ?cnt) .

In particular, the rule states that, if p1 Is a person, then count all distinct people who follow p1 and who like some sport, store the result in a count, and store the result in the new :sportyFollowerCnt relation.

By issuing the following SPARQL query

SELECT ?x ?cnt WHERE { ?x :sportyFollowerCnt ?cnt }

We obtain that Bob has one sporty follower (Alice), whereas Alice has 2 sporty followers (Diana and Charlie).

:bob 1 .
:alice 2 .

This type of computation is compatible with the computation of the transitive closure of a relation. For instance, we may be interested in counting the number of (direct or indirect) followers who are sporty. For this, we can use RDFox rules to compute the transitive closure of the follows relation:

[?x, :followsClosure, ?y] :- [?x, :follows, ?y] .
[?x, :followsClosure, ?z] :- [?x, :follows, ?y], [?y, :followsClosure, ?z] .

And use the following rule to compute the desired count.

[?y, :sportyFollowerClosureCnt, ?cnt] :-
    [?y, rdf:type, :Person],
    AGGREGATE(
        [?x, :followsClosure, ?y],
        [?x, :likes, ?w],
        [?w, rdf:type, :Sport]
        ON ?y
        BIND COUNT(DISTINCT ?x) AS ?cnt) .

The following SPARQL query

SELECT ?x ?cnt WHERE { ?x :sportyFollowerClosureCnt ?cnt }

Then provides the following results.

:charlie 3 .
:bob 3 .
:alice 3 .

We observe that the count for Charlie does not seem quite right. Charlie is followed directly only by Bob (who is not sporty); however, Bob is followed by Alice (a sporty person) and Alice is followed by Diana (another sporty person). Naturally, we would have obtained a count of 2; however, Charlie also follows Alice and hence he transitively follows himself, thus the count of 3!. If we wanted to prevent this situation, we can modify the second rule implementing transitive closure to eliminate self-loops as follows:

[?x, :followsClosure, ?z] :-
   [?x, :follows, ?y],
   [?y, :followsClosure, ?z],
   FILTER(?x != ?z) .

Now, our query before yields the expected results

:charlie 2 .
:bob 3 .
:alice 2 .

5.5.6. Arranging Concepts and Relations in a Hierarchical Structure

A common use of ontologies is to arrange concepts (called classes in OWL 2) and relations (called properties in OWL 2) in a subsumption hierarchy. For instance, we may want to say that dogs and cats are mammals and that mammals are animals. Such subsumption relationships can be easily represented using RDFox rules.

[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Dog] .

[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Cat] .

[?x, rdf:type, :Animal] :- [?x, rdf:type, :Mammal] .

Suppose that we have a graph with the following triples:

:max rdf:type :Dog .
:coco rdf:type :Cat .
:teddy rdf:type :Mammal .

Then, RDFox will deduce that Max and Coco are both mammals and therefore also animals, and also that Teddy is an animal. In particular, the query

SELECT ?x WHERE { ?x rdf:type :Animal }

yields the expected results

:max .
:teddy .
:coco .

It is also often the case that concepts are “assigned” certain properties. For instance, mammals have children which are also mammals. This is known as a range restriction in the ontology jargon, and can be represented using the following RDFox rule

[?y, rdf:type, :Mammal] :- [?x, rdf:type, :Mammal],[?x, :hasChild, ?y] .

If we now extend the graph with the following triples.

:max :hasChild :betsy .
:coco :hasChild :minnie .

RDFox will derive automatically that both Betsy and Minnie are also mammals (and therefore also animals). Indeed, the query

SELECT ?x WHERE { ?x rdf:type :Mammal }

Will yield the expected results.

:max .
:betsy .
:minnie .
:teddy .
:coco .

In many applications, it is also useful to represent subsumption relations between the edges in a knowledge graph, to specify that one relation is more specific than the other. For instance, we may want to say that the :hasDaughter relation is more specific than the :hasChild relation. This can be represented using the following RDFox rule.

[?x, :hasChild, ?y] :- [?x, :hasDaughter, ?y] .

If we now add the following triple to the graph

:betsy :hasDaughter :luna .

RDFox can infer that Luna is the child of Betsy and therefore she is also a mammal, and an animal. Indeed, the previous query listing all mammals will now also include :luna as an answer.

5.5.7. Detecting Cyclic Relations

A common task in knowledge graphs is to identify cyclic relationships. For instance, partonomy relations are typically acyclic (e.g., if an engine is part of a car we would not expect the car also to be part of the engine!). In these cases, cycle detection may be needed to detect errors in the graph and thus provide data validation.

A simple case of this pattern is when the relation we are checking for cyclicity is naturally transitive. Such is the case, for instance of the partOf relation. Consider the following graph:

:a :partOf :b .
:b :partOf :c .
:c :partOf :a .

The graph contains a cyclic path :a -> :b -> :c -> :a. via the :partOf relation. The relationship is naturally transitive and hence we can use the corresponding pattern to define it as such.

[?x, :partOf, ?z] :- [?x, :partOf, ?y], [?y, :partOf, ?z] .

The following SPARQL query now gives us which elements are part of others (directly or indirectly)

SELECT ?x ?y WHERE { ?x :partOf ?y }

Which gives us the following results

:a :a .
:c :c .
:b :b .
:a :c .
:b :a .
:c :b .
:c :a .
:b :c .
:a :b .

Cyclicity manifests itself by the presence of self-loops (e.g., :a is derived to be a part of itself ). Hence, it is possible to detect that the part of relation is cyclic by issuing the following SPARQL query.

ASK { ?x :partOf ?x }

Where the result comes true since the partonomy relation does have a self loop.

Alternatively, we could have defined the following additional rule.

[:partOf, rdf:type, :CyclicRelation] :- [?x, :partOf, ?x] .

Which tells us that if any object is determined to be a part of itself, then the partonomy relation is cyclic.

We can now issue the following SPARQL query, which retrieves the list of cyclic relations in the graph, which in this case consists of the relation :partOf.

SELECT ?x WHERE { ?x rdf:type :CyclicRelation }

5.5.8. Defining Attributes and Relationships as Mandatory

In knowledge graphs, data is typically incomplete.

For instance, suppose that the data in a knowledge graph has been obtained from a variety of sources. The graph has different types of information about people, such as their name, job title and so on. We notice that some people in the graph have a date of birth, whereas others do not. Because of the nature of our application, we would like to have the date of birth of each person represented in the graph, and would like to find out which people are missing this information; that is, we would like to make the presence of a date of birth value mandatory for every person in the graph. In relational databases this is typically solved by declaring an integrity constraint.

Consider the following graph.

:alice :dob "11/01/1987"^^xsd:string .
:alice rdf:type :Person .
:bob   :dob "23/07/1980"^^xsd:string .
:bob rdf:type :Person .
:diana :height "168"^^xsd:integer .
:diana rdf:type :Person .
:emma  :dob "10/02/1965"^^xsd:string .
:emma rdf:type :Person .
:max rdf:type :Dog .

We can use the following rule to record absence of a date of birth for people.

[?x, rdf:type, owl:Nothing] :-
   [?x, rdf:type, :Person],
   NOT EXISTS ?y IN ([?x, :dob, ?y]) .

The rule says that if a person p lacks a date of birth d, then p incurs in a constraint violation. The constraint violation is recorded by making person p an instance of the special owl:Nothing unary relation, which is also present in the OWL 2 standard.

The following SPARQL query then correctly reports that Diana violates the constraint (whereas Max does not because he is a dog).

SELECT ?x WHERE { ?x rdf:type owl:Nothing }

This type of computation combines well with type inheritance. For instance, suppose that we add the following triple:

:charlie rdf:type :Student .

And the following rule stating that every student is a person

[?x, rdf:type, :Person] :- [?x, rdf:type, :student] .

Then, the previous query will give as results

:charlie :dob .
:diana :dob .

Indeed, since Charlie is a student, he is also a person; furthermore, Charlie lacks date of birth information.

The meaning of the special class owl:Nothing is different in RDFox and the OWL 2 standard. If one can derive from an OWL 2 ontology that that an object is an instance of owl:Nothing, then the ontology is inconsistent and querying the ontology becomes logically meaningless. Thus, the OWL 2 standard would require users to modify the data and/or ontology to fix the inconsistency prior to attempting to issue queries. Furthermore, it is worth noting that in OWL 2 it is not possible to write statements that check for “absence of information”; this is due to the monotonicity properties of OWL 2 as a fragment of first-order logic.

In contrast, in RDFox, deriving an instance of owl:Nothing does not lead to a logical inconsistency and the answers to queries remain perfectly meaningful. In the pattern we have described, querying for owl:Nothing simply provides users with the list of all nodes in the graph for which mandatory information is missing. As a result, the user is warned rather than prevented from carrying out a task such as issuing a query. For instance, if we were to ask a query to RDFox such as the following

SELECT ?x WHERE { ?x rdf:type :Person }

We would still obtain the expected results (see below) despite the fact that there are constraint violations in the data.

:alice .
:charlie .
:emma .
:diana .
:bob .

This behavior is also different from relational databases, where the system would typically reject updates that lead to a constraint violation. As already mentioned, RDFox continues to operate normally and would accept any updates although constraints are being violated. Of course, users are encouraged to query the system in order to detect and rectify such violations.

5.5.9. Expressing Defaults and Exceptions

Rules can be used to write default statements (that is, statements that normally hold in the absence of additional information). This is especially useful to represent exceptions to rules, which is important, for instance, in legal domains.

Consider the following graph saying that Tweety is a bird.

:tweety rdf:type :Bird .

Birds typically fly; that is, in the absence of additional information, the fact that Tweety is a bird constitutes sufficient evidence to believe that Tweety flies. There may, however, be exceptions. For instance, penguins are non-flying birds, and hence if we were to find out that Tweety is a penguin, then we would need to withdraw our default assumption that Tweety flies.

RDFox rules can be used to model this type of default reasoning. In particular, consider a rule saying that birds fly unless they are penguins.

[?x, rdf:type, :FlyingAnimal] :-
   [?x, rdf:type, :Bird],
   NOT [?x, rdf:type, :Penguin] .

We can now issue a SPARQL query asking for the list of flying animals

SELECT ?x WHERE { ?x rdf:type :FlyingAnimal }

and obtain :tweety as an answer.

Suppose now that we were to extend the graph with the following triple

:tweety rdf:type :Penguin .

Then, the same query would now give us an empty set of answers since, in the light of the new evidence, we can no longer conclude that Tweety flies.

5.5.10. Restructuring Data

Rules can be used to transform the structure of the data in a knowledge graph (e.g., by adding properties to a relationship).

Consider the following knowledge graph representing employees and their employer.

:alice :worksFor :oxford_university .
:bob :worksFor :acme .
:charlie :worksFor :oxford_university .
:charlie :worksFor :acme .

Suppose that we now want to expand the graph by adding further information about the employment, such as the salary and the start date. This information is relative to each specific employment of an employee; for instance, Charlie will have a different salary and start date for his employment with Oxford University and his employment with Acme.

We can use RDFox rules to automatically restructure the data in the graph to account for the new information.

[?z, rdf:type, :Employment],
[?z, :Employee, ?x],
[?z, :employer, ?y] :-
    [?x, :worksFor, ?y],
    BIND(SKOLEM("Employment", ?x, ?y) AS ?z) .
For each edge connecting a person x with their employer y, the rule creates a new employment instance z as an RDF blank node, and relates it to employee x and employer y. The name of generated instance starts with the underscore character indicating that it is a blank node, followed by the string “Employment” and unique identifiers for the corresponding employee x and employer y.
The query
SELECT ?x ?y ?z WHERE { ?x ?y ?z . ?x rdf:type :Employment }

gives us the new triples generated by the application of the previous rule

_:Employment_116_200 :employer :acme .
_:Employment_116_200 :Employee :bob .
_:Employment_116_200 rdf:type :Employment .
_:Employment_113_199 :employer :oxford_university .
_:Employment_113_199 :Employee :alice .
_:Employment_113_199 rdf:type :Employment .
_:Employment_156_199 :employer :oxford_university .
_:Employment_156_199 :Employee :charlie .
_:Employment_156_199 rdf:type :Employment .
_:Employment_156_200 :employer :acme .
_:Employment_156_200 :Employee :charlie .
_:Employment_156_200 rdf:type :Employment .

It is important to notice that the generated SKOLEM IDs such as cannot be considered stable across runs (or RDFox versions) since they are generated based in the dictionary IDs of the arguments. Further data relative to an employment, such as associated salary and start date, should therefore not be inserted directly as triples, but rather rules such as the following ones:

[?z, :salary, "60000"^^xsd:integer] :- BIND(SKOLEM("Employment", :alice, :oxford_university) AS ?z) .
[?z, :salary, "55000"^^xsd:integer] :- BIND(SKOLEM("Employment", :charlie, :oxford_university) AS ?z) .
[?z, :salary, "40000"^^xsd:integer] :- BIND(SKOLEM("Employment", :charlie, :acme) AS ?z) .
[?z, :salary, "45000"^^xsd:integer] :- BIND(SKOLEM("Employment", :bob, :acme) AS ?z) .

Note that each of these rules uses the SKOLEM construct in the antecedent to make sure that they match correctly to the generated triples listed above.

To check that the salary data has been inserted correctly, we can issue the query

SELECT ?x (SUM(?y) AS ?income)
WHERE {
   ?e :Employee ?x .
   ?e :salary ?y
}
GROUP BY ?x

which gives us the total yearly income for each person by summing up the salary of each of their employments, giving the expected results.

:alice 60000 .
:charlie 95000 .
:bob 45000 .

Data restructuring via reification has multiple applications. In particular, RDF can only represent directly binary relations and hence the representation of higher arity relations is only possible through reification. Reification is also needed if we want to qualify or annotate edges in a graph (e.g., by adding weights, or dates, or other relevant properties).

5.5.11. Representing Ordered Relations

Many relations naturally imply some sort of order, and in such cases we are often interested in finding the first and last elements of such orders. For instance, consider the managerial structure of a company.

:alice :manages :bob .
:bob :manages :jeremy .
:bob :manages :emma .
:emma :manages :david .
:jeremy :manages :monica .

We would like to recognize which individuals in the company are “top level managers”. We can use a rule to define a top level manager as a person who manages someone and is not managed by anyone else.

[?x, rdf:type, :TopLevelManager] :-
   [?x, :manages, ?y],
   NOT EXISTS ?z IN ([?z, :manages, ?x]) .

The query

SELECT ?x WHERE { ?x rdf:type :TopLevelManager }

asking for the list of top level managers gives as :alice as the answer. We can now use a rule to define “junior employees” as those who have a manager but who themselves do not manage anyone else.

[?x, rdf:type, :JuniorEmployee] :-
   [?y, :manages, ?x],
   NOT EXISTS ?z IN ([?x, :manages, ?z]) .

The query

SELECT ?x WHERE { ?x rdf:type :JuniorEmployee }

Gives us :monica and :david as answers.

Prominent examples of ordered relations where we may be interested in finding the top and bottom elements are partonomies (part-whole relations) and is-a hierarchies.

5.5.12. Representing Equality Cliques

When integrating data from multiple sources using a knowledge graph, it is usually the case that objects from different sources are identified to be the same. In this setting, we want to be able to answer complex queries that span across the different sources, and to easily identify the source where the information came from. Additionally, we may not want to use the equality predicate owl:sameAs to identify the objects since our rule set may contain rules involving aggregation and/or negation-as-failure which cannot be used in conjunction with equality.

For instance, assume that we are integrating sources s1, s2, and s3 containing information about music artists and records. Assume that we have determined (e.g., using entity resolution techniques or exploiting explicit links between the sources) that “John Doe” in s1 is the same as “J. H. Doe” in s2 and “The Blues King” in s3. We can represent these correspondences using a binary relation ost:same which we define as reflexive, symmetric, and transitive using RDFox rules as given next.

s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .

[?x, ost:same, ?x] :- [?x, ost:same, ?y] .

[?y, ost:same, ?x] :- [?x, ost:same, ?y] .

[?x, ost:same, ?z] :- [?x, ost:same, ?y], [?y, ost:name, ?z] .

In these way, the aforementioned objects form a clique in the integrated graph. Indeed, the query

SELECT ?x ?y WHERE { ?x ost:same ?y }

returns the answer

s3:blues_king s2:john_H_doe .
s2:john_H_doe s3:blues_king .
s2:john_H_doe s2:john_H_doe .
s3:blues_king s3:blues_king .
s2:john_H_doe s1:john_doe .
s3:blues_king s1:john_doe .
s1:john_doe s1:john_doe .
s1:john_doe s3:blues_king .
s1:john_doe s2:john_H_doe .

In order to be able to query across artists from different sources, we want to define a unique representative for the elements in the clique. A plausible strategy is to first select the smallest individual according to some pre-defined total order (the order itself is irrelevant, and we can choose for example the order on IRIs provided by RDFox). To select the smallest object we introduce the following rules.

[?x, ost:comesBefore, ?y] :- [?x, ost:same, ?y], FILTER (?x < ?y)  .

[?y, rdf:type, ost:NotSmallestInClique] :- [?x, ost:comesBefore, ?y] .

[?x, rdf:type, ost:SmallestInClique] :-
    [?x, ost:comesBefore, ?y],
    NOT [?x, rdf:type, ost:NotSmallestInClique] .

The first rule generates an order amongst the elements of the clique. The second rule says that if ?x comes before ?y then ?y is not the smallest element. The third rule finally identifies the smallest element in the clique. The following query

SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }

reveals the generated order

s2:john_H_doe s3:blues_king .
s1:john_doe s2:john_H_doe .
s1:john_doe s3:blues_king .

where s1:john_doe is correctly identified as the smallest element by the query.

SELECT ?x WHERE { ?x rdf:type ost:SmallestInClique }

Now that we have identified an element of the clique we can create a representative of the clique using a Skolem constant, as given next.

[?z, rdf:type, ost:Artist],
[?z, ost:represents, ?x] :-
    [?x, rdf:type, ost:SmallestInClique],
    BIND(SKOLEM(“OSTArtist”, ?x) AS ?z) .

[?x, ost:represents, ?z] :-
   [?x, ost:represents, ?y],
   [?y, ost:comesBefore, ?z] .

The first rule creates the SKOLEM constant and states that it represents the smallest element. The second rule states that the Skolem constant also represents every other element in the clique.

The query

SELECT ?z ?x WHERE { ?z ost:represents ?x }

Yields the expected result.

_:OSTArtist_2136 s2:john_H_doe .
_:OSTArtist_2136 s3:blues_king .
_:OSTArtist_2136 s1:john_doe .

It is possible to achieve the same results by using an optimized set of rules that generates fewer triples. In particular, this optimized representation avoids axiomatizing the ost:same property as reflexive and symmetric. Let’s reconsider the data.

s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .

We now redefine directly the ost:comesBefore relation using the following rules

[?x, ost:comesBefore, ?y] :-
   [?x, ost:same, ?y],
   FILTER(?x > ?y) .

[?x, ost:comesBefore, ?y] :-
   [?y, ost:same, ?x],
   FILTER(?x > ?y) .

[?x, ost:comesBefore, ?z] :-
   [?x, ost:comesBefore, ?y],
   [?y, ost:comesBefore, ?z],
   FILTER(?x > ?y) .

[?x, ost:comesBefore, ?y] :-
   [?z, ost:comesBefore, ?x],
   [?z, ost:comesBefore, ?y],
   FILTER(?x > ?y) .

The query

SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }

reveals a generated order. Once we have the order, we proceed as before.

5.5.13. Populating a Knowledge Graph from a Data Source

Rules can be used to bring information from an external data source into a knowledge graph.

Data feeding a knowledge graph often stems from different types of external data sources, such as relational databases. We can use RDFox rules to specify how each record in the external data source corresponds to a set of nodes and edges in the graph. RDFox allows us to load the information in an external data source by means of a two-stage process. The first step is to attach a data source and assign it to a relation. For instance, consider the following data about the employees of ACME corporation in a CSV file named “employee.csv”.

emp_id

emp_name

job_name

hire_date

salary

68319

KAYLING

PRESIDENT

200,000

66928

BLAZE

MANAGER

2017-05-01

90,000

67453

JONES

ASSISTANT

2018-05-03

35,000

We attach the table to an RDFox relation Employee with 5 arguments, one per column in the table. This can be achieved using the following commands.

dsource add delimitedFile "EmployeeDS" \
  file "$(dir.root)csv/employee.csv" \
  header true

The net result is that the employee.csv is added as an RDFox data source. We called the data source EmployeeDS. Here, file specifies the path to the file, and header indicates whether the file contains a header row. At this point, we can check whether the data has been attached correctly and whether the RDFox data source is in place by running the command

dsource show EmployeeDS

to obtain the expected information

Data source type name: delimitedFile
Data source name:      EmployeeDS
Parameters:            file   = employee.csv
                       header = true
------------------------------------------------------------
Table name:            employee.csv
Column 1:              emp_id     xsd:integer
Column 2:              emp_name   xsd:string
Column 3:              job_name   xsd:string
Column 4:              hire_date  xsd:string
Column 5:              salary     xsd:integer
------------------------------------------------------------

The next step attaches the RDFox data source to an employee relation in RDFox.

dsource attach :employee "EmployeeDS"  \
   "columns"     5                       \
    "1"          "https://oxfordsemantic.tech/RDFox/tutorial/{1}_{2}" \
    "1.datatype" "iri"                   \
    "2"          "{emp_name}"            \
    "2.datatype" "string"                \
    "3"          "{job_name}"            \
    "3.datatype" "string"                \
    "4"          "{hire_date}"           \
    "4.datatype" "string"                \
    "4.if-empty" "absent"                \
    "5"          "{salary}"              \
    "5.datatype" "integer"               \
    "5.if-empty" "absent"

The IRI of the new relation will be :employee, where “:” is the default prefix defined beforehand as “https://oxfordsemantic.tech/RDFox/tutorial/”. The :employee data relation will contain 5 arguments. The first argument provides an identifier for each employee as a composition of the prefix’s IRI, the employee ID (first column in the data source) and the employee name (second column). The remaining arguments are obtained from the column of the corresponding name in the data source. Since not every employee may have a hiring date or a known salary, the conditions “if-empty” indicate that the corresponding argument in the RDFox relation will be left empty.

Once the relation has been created in RDFox, it can be queried and used in the antecedent of rules. As a first step, we can query the RDFox relation using SPARQL to check whether the data has been imported correctly in the relation. The SPARQL query

SELECT ?x ?y ?z ?u ?w WHERE { :employee(?x, ?y, ?z, ?u, ?w) }

Will return the following answers:

:68319_KAYLING "KAYLING" "PRESIDENT" UNDEF 200000 .
:66928_BLAZE "BLAZE" "MANAGER" "01/05/2017" 90000 .
:67453_JONES "JONES" "ASSISTANT" "03/05/2018" 35000 .

As we can see, the UNDEF entry represents that the value of the hiring date for the first employee is missing. Now that we have the RDFox relation correctly in place, the next step would be to turn the data in the relation in the form of a graph. For this we can use the following rule, where the RDFox relation forms the antecedent and the generated edges in the graph based on it are described in the consequent of the rule:

[?x, rdf:type, :Employee],
[?x, :worksFor, :acme],
[?x, :hasName, ?y],
[?x, :hasJob, ?z],
[?x, :hiredOnDate, ?u],
[?x, :salary, ?w] :-
    :employee(?x, ?y, ?z, ?u, ?w) .

The materialization of the rule generates a graph from the data in the relation. The new relations in the graph can be used in other rules to define additional concepts and relations. For instance, we can add the rules stating that every employee is a person and every person with a salary higher than £50,000 pays tax at a higher-rate.

[?x, rdf:type, :Person ] :- [?x, rdf:type, :Employee ] .

[?x, :taxRate, :higher-rate] :- [?x, rdf:type, :Person],  [?x, :salary, ?y], FILTER(?y > 50000) .

Now we can query the graph to obtain, for instance, the list of high income tax payers.

SELECT ?x WHERE { ?x :taxRate :higher-rate }

And obtain the expected results.

:68319_KAYLING .
:66928_BLAZE .

Data can be imported from different data sources and merged together in the graph. For instance, if we had a different employee table (e.g., for a different department) in another CSV, we could attach to it a new RDFox data source and exploit a rule akin to the one before to further populate the binary relations in the graph, as well as to create new ones.

5.6. OWL 2 Support in RDFox

This section describes the support in RDFox for OWL 2—the W3C standard language for representing ontologies.

5.6.1. OWL 2 Ontologies

An OWL 2 ontology is a formal description of a domain of interest. OWL 2 defines three different syntactic categories.

The first syntactic category are Entities, such as classes, properties and individuals, which are identified by an IRI. Classes represent sets of objects in the world; for instance, a class :Person can be used to represent the set of all people. Properties represent binary relations, and OWL 2 distinguishes between two different types of properties: data properties describe relationships between objects and literal values (e.g., the data property :age can be used to represent a person’s age), whereas object properties describe relationships between two objects (e.g., an object property :locatedIn can be used to relate places to their locations). Finally, individuals in OWL 2 are used to refer to concrete objects in the world; for instance, the individual :oxford can be used to refer to the city of Oxford.

The second syntactic category are expressions, which can be used to describe complex classes and relations constructed in terms of simpler ones. For instance the expression ObjectUnionOf( :Cat :Dog) represents the set of animals that are either cats or dogs.

The third syntactic category are axioms, which are statements about entities and expressions that are asserted to be true in the domain described. For instance, the OWL 2 axiom SubClassOf(:scientist :Person) states that every scientist is a person by defining the class :scientist to be a subclass of the class :Person.

The main component of an OWL 2 ontology is a set of axioms. Ontologies can also import other ontologies and contain annotations.

OWL 2 ontologies can be written using different syntaxes. RDFox can currently load ontologies written in the functional syntax as well as ontologies written in the turtle syntax.

5.6.2. OWL 2 Ontologies vs. RDFox Rules

OWL 2 and the rule language of RDFox are languages for knowledge representation with well-understood formal semantics.

Both languages share a common core. That is, certain types of rules can be equivalently rewritten as OWL 2 axioms and vice-versa. For instance, the following axiom and rule both express that every scientist is also a person.

SubClassOf(:Scientist :Person)

[?x, rdf:type, :Person] :- [?x, rdf:type, :Scientist] .

In particular, the OWL 2 specification describes the OWL 2 RL profile—a subset of the OWL 2 language that is amenable to implementation via rule-based technologies.

There are, however, many other aspects where OWL 2 and the rule language of RDFox differ, and there are many constructs in OWL 2 that cannot be translated as RDFox rules and vice-versa. For instance, OWL 2 can represent disjunctive knowledge, i.e., we can write an OWL 2 axiom saying that every student is either an undergraduate student, a graduate student, or a doctoral student:

SubClassOf(:Student ObjectUnionOf(:UndergraduateSt :MscSt :DoctoralSt) )

RDFox rules, however, do not support disjunction. There are also many kinds of rules in RDFox that cannot be expressed using OWL 2 axioms; these include, for instance, rules involving features such as aggregation, negation-as-failure or certain built-in functions; furthermore, there are also plain Datalog rules that do not have a correspondence in OWL 2.

5.6.3. Loading OWL 2 Ontologies in RDFox

RDFox is able to load, store and manipulate three kinds of syntactic elements: triples, rules, and OWL 2 axioms. These are kept in separate “bags” in the system and can be added or deleted individually. For instance, consider the following text file “ontology.txt” containing an ontology written in the functional syntax of OWL 2:

Prefix(:=<http://www.example.com/ontology1#>)
Ontology( <http://www.example.com/ontology1>
      SubClassOf( :Child :Person )
      SubClassOf( :Person ObjectUnionOf(:Child :adult) )
)

The ontology contains two axioms. The first axiom tells us that every child is also a person, whereas the second axiom states that every person is either a child or an adult. The first axiom can be faithfully translated into RDFox rules, whereas the second one cannot. RDFox provides a full API for OWL 2 and can parse, store and manage all kinds of OWL 2 axioms in functional syntax. As a result, it will correctly load both axioms, but will issue a warning indicating that the second axiom has no correspondence into rules.

To load the ontology in RDFox, we can initialize a data store (see the Getting Started guide) and import the the file in the usual way.

import ontology.txt

The ontology axioms are now loaded in the data store and kept internally in the “axioms bag”.

We can now import a turtle file containing the following triples:

:jen rdf:type :Child .
:jen :hasParent :mary .

These triples will be kept internally in the “triples bag”.

Finally, we can import the following RDFox rule saying that the parent of a child is a person.

[?y, rdf:type, :Person] :- [?x, :hasParent, ?y], [?x, rdf:type, :Child] .

This rule is kept internally in RDFox in the separate “rules bag”.

Now, we are in a position to perform reasoning. For this we can issue a SPARQL query asking for the list of all people:

SELECT ?x WHERE { ?x rdf:type :Person }

To answer the query, RDFox will translate OWL 2 axioms into rules and will consider together all data triples, all RDFox rules added by the user, plus all rules stemming from the translation of OWL 2 axioms. In particular, the following rules and facts contribute to answering the query, where the first rule comes from the translation of the first ontology axiom as a rule (the second axiom in the ontology is ignored):

:jen rdf:type :Child .
:jen :hasParent :mary .

[?x, rdf:type, :Person] :- [?x, rdf:type, :Child] .

[?y, rdf:type, :Person] :- [?x, :hasParent, ?y], [?x, rdf:type, :Child] .

As a result, RDFox will return as answers both :jen and :mary. Indeed, :jen is a child and hence also a person by the first rule; in turn, :mary is the parent of :jen and hence also a person by the second rule.

The translation of OWL 2 axioms into rules for the purpose of reasoning is performed on a best-effort basis. In particular, sometimes RDFox may not be able to translate the whole of given axiom, but may still be able to translate a part of it. For instance suppose that we add to our data store the following axiom saying that every person is a human and also either an adult or a child:

SubClassOf(:Person ObjectIntersectionOf(:Human ObjectUnionOf(:Child :Adult)))

RDFox will load the axiom correctly, but will again issue a warning due to the use of disjunction in the axiom. Suppose that we now issue the query

SELECT ?x WHERE { ?x rdf:type :Human }

RDFox will correctly return both :jen and :mary as answers. Indeed, as already explained, RDFox can deduce that both :jen and :mary are persons. Now, although the last axiom we imported cannot be fully translated into rules, RDFox will still be able to partly translate it into the following rule:

[?x, rdf:type, :Human] :- [?x, rdf:type, :Person] .

from which we can deduce that :jen and :mary are also humans.

OWL 2 ontologies can also be loaded from a turtle file, following the standard representation of OWL 2 ontologies as triples. In order to load an ontology from a turtle file, we need to initialize a store with special parameters. Using the command line, we can initialize such a store as follows:

init par-complex-nn owl-in-rdf-support relaxed

This command creates a store in which parsing of OWL as triples is enabled. As a result, RDFox will identify OWL 2 axioms that were encoded as RDF triples and will tarnslate those axioms into rules as described earlier. Suppose that we import into the store a turtle file containing the following triples:

:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .
:jen rdf:type :Child .

The first two triples correspond to the serialization into triples of the following axioms in functional syntax:

SubClassOf( :Child :Person )
SubClassOf( :Person :Human )

As a result of parsing, all triples will be stored in the “triples bag” of RDFox, whereas the first two triples will also be added as axioms.

Now, assume that we issue a query asking for the list of all humans:

SELECT ?x WHERE { ?x rdf:type :Human }

Then, RDFox will correctly return :jen as the answer. Internally, RDFox will transform the OWL axioms into rules

[?x, rdf:type, :Person] :- [?x, rdf:type, :Child] .

[?x, rdf:type, :Human] :- [?x, rdf:type, :Person] .

and compute the corresponding materialization.

5.6.4. Subsumption Reasoning

OWL 2 reasoners implement a wide range of reasoning services, which are not limited to query answering. In particular, OWL reasoners can solve the subsumption problem: given a class, they would compute all its inferred superclasses.

For example, given

SubClassOf( :Child :Person )
SubClassOf( :Person :Human )

an OWL 2 reasoner would be able to infer

SubClassOf( :Child :Human )

as a consequence, since from the fact that every child is a person, and every person is a human, that every child is also a human.

RDFox is a materialization-based query answering system, and it has not been designed for solving problems such as class subsumption. RDFox, however, is still able to detect some such subsumption relations should this be required in an application.

One way to achieve this is to reduce subsumption to query answering. In particular, to check whether it is true that every child is a human, we can introduce a fresh object in the data store, which we make an instance of :Child. That is, we can import the following triple, where :a_child is a fresh URI.

:a_child rdf:type :Child .

Then, we would test whether :a_child is inferred to be also a human by issuing the query

ASK { :a_child rdf:type :Human }

which would return true.

Another way of testing subsumption is to import the ontology as a set of triples:

:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .

When the triples are parsed and eventually translated into rules for reasoning, RDFox will also add a number of internal rules that partially encode the semantics of the RDFS and OWL vocabularies; in particular, it will add rules representing the relation rdfs:subClassOf as transitive and reflexive, and also saying that every class is a subclass of owl:Thing.

[?x, rdfs:subClassOf, ?x],
[?x, rdfs:subClassOf, owl:Thing] :-
   [?x, rdf:type, owl:Class] .

[?y, rdfs:subClassOf, owl:Thing] :- [?x, rdfs:subClassOf, ?y] .

[?x, rdfs:subClassOf, ?z] :- [?x, rdfs:subClassOf, ?y], [?y, rdfs:subClassOf, ?z] .

As a result, the following SPARQL query

SELECT ?x WHERE { :Child rdfs:subClassOf ?x }

will correctly return all superclasses of :Child as

:Person .
:Human .
owl:Thing .
:Child .

The complete set of internal rules added by RDFox when loading an OWL ontology as a set of triples is given below:

[owl:Nothing, rdfs:subClassOf, ?x],
[?x, rdfs:subClassOf, ?x],
[?x, rdfs:subClassOf, owl:Thing] :-
    [?x, rdf:type, owl:Class] .

[owl:Nothing, rdfs:subClassOf, ?x],
[?x, rdfs:subClassOf, ?x],
[?y, rdfs:subClassOf, ?y],
[?y, rdfs:subClassOf, owl:Thing] :-
    [?x, rdfs:subClassOf, ?y] .

[?x, rdfs:subClassOf, ?z] :-
   [?x, rdfs:subClassOf, ?y],
   [?y, rdfs:subClassOf, ?z] .

[?x, rdfs:subClassOf, ?y],
[?y, rdfs:subClassOf, ?x] :-
   [?x, owl:equivalentClass, ?y] .

[?x, owl:equivalentClass, ?y] :-
   [?x, rdfs:subClassOf, ?y],
   [?y, rdfs:subClassOf, ?x] .

[owl:bottomDataProperty, rdfs:subPropertyOf, ?x],
[?x, rdfs:subPropertyOf, ?x],
[?x, rdfs:subPropertyOf, owl:topDataProperty] :-
    [?x, rdf:type, owl:DatatypeProperty] .

[owl:bottomObjectProperty, rdfs:subPropertyOf, ?x],
[?x, rdfs:subPropertyOf, ?x],
[?x, rdfs:subPropertyOf, owl:topObjectProperty] :-
    [?x, rdf:type, owl:ObjectProperty] .

[?x, rdfs:subPropertyOf, ?x],
[?y, rdfs:subPropertyOf, ?y] :-
   [?x, rdfs:subPropertyOf, ?y] .

[?x, rdfs:subPropertyOf, ?z] :-
   [?x, rdfs:subPropertyOf, ?y],
   [?y, rdfs:subPropertyOf, ?z] .

[?x, rdfs:subPropertyOf, ?y],
[?y, rdfs:subPropertyOf, ?x] :-
   [?x, owl:equivalentProperty, ?y] .

[?x, owl:equivalentProperty, ?y] :-
   [?x, rdfs:subPropertyOf, ?y],
   [?y, rdfs:subPropertyOf, ?x] .

[?p, rdfs:domain, ?b] :-
   [?p, rdfs:domain, ?a],
   [?a, rdfs:subClassOf, ?b] .

[?p, rdfs:domain, ?a] :-
   [?q, rdfs:domain, ?a],
   [?p, rdfs:subPropertyOf, ?q] .

[?p, rdfs:range, ?b] :-
   [?p, rdfs:range, ?a],
   [?a, rdfs:subClassOf, ?b] .

[?p, rdfs:range, ?a] :-
   [?q, rdfs:range, ?a],
   [?p, rdfs:subPropertyOf, ?q] .

5.6.5. Current Limitations

The following details should be taken into account by users of RDFox who rely on OWL 2 ontologies in their applications:

  • RDFox currently does not support ontology importation. That is, if we load ontology O, which in turns imports O1 and O2, only the contents of O will be loaded (and not those of O1 and O2).

  • RDFox also does not support associating axioms to a given ontology. In particular, if we load two different ontology files, all the axioms in both ontologies will be added to the same bag of axioms in the system.

5.7. Explaining Reasoning Results

RDFox can display a proof of how a given triple has been derived. Such proofs can be very useful for explaining reasoning results to users as well as for understanding the reasoning process.

Consider a data store containing the triple

:kiki rdf:type :Cat .

and the following rules:

[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Cat] .

[?x, rdf:type, :Animal] :- [?x, rdf:type, :Mammal] .

As a result of reasoning, RDFox will derive the following new triples:

:kiki rdf:type :Mammal .
:kiki rdf:type :Animal .

Suppose that we want to understand how triple :kiki rdf:type :Animal has been derived. A way to do this in RDFox is to use the explain command in the shell as follows:

explain :Animal[:kiki]

RDFox will explicate the reasoning process by displaying the following proof of the requested fact:

:Animal[:kiki]
    :Animal[?x] :- :Mammal[?x] .  | { ?x -> :kiki }
        :Mammal[:kiki]
            :Mammal[?x] :- :Cat[?x] .  | { ?x -> :kiki }
                :Cat[:kiki]  EDB

We can read the proof bottom-up. Starting from fact :Cat[:kiki] in the data, we apply rule :Mammal[?x] :- :Cat[?x] by matching variable ?x to :kiki and derive the fact :Mammal[:kiki]. The application of rule :Animal[?x] :- :Mammal[?x] to fact :Mammal[:kiki] where ?x is matched to :kiki yields the desired result.

Typically, there will be several different proofs for a given fact. To see this, suppose that we add to our data store the triples

:kiki :eats :luxury_pet_treat .
:luxury_pet_treat rdf:type :PetFood .

and the rule

[?x, rdf:type, :Animal] :- [?x, :eats, ?y], [?y, rdf:type, :PetFood] .

Then, in addition to the previous one, the following is also a proof that :kiki is an animal:

:Animal[:kiki]
    :Animal[?x] :- :eats[?x,?y], :PetFood[?y] .  | { ?x -> :kiki, ?y -> :luxury_pet_treat }
        :eats[:kiki,:luxury_pet_treat]  EDB
        :PetFood[:luxury_pet_treat]  EDB

Indeed, we can match rule :Animal[?x] :- :eats[?x, ?y], :PetFood[?y] to the data facts :eats[:kiki, :luxury_pet_treat] and :PetFood[:luxury_pet_treat] by matching variable ?x to :kiki and variable ?y to :luxury_pet_treat to derive :Animal[:kiki].

If we run again the explanation command

explain :Animal[:kiki]

RDFox will display both proofs.

:Animal[:kiki]
    :Animal[?x] :- :Mammal[?x] .  | { ?x -> :kiki }
        :Mammal[:kiki]
            :Mammal[?x] :- :Cat[?x] .  | { ?x -> :kiki }
                :Cat[:kiki]  EDB
    :Animal[?x] :- :eats[?x,?y], :PetFood[?y] .  | { ?x -> :kiki, ?y -> :luxury_pet_treat }
        :eats[:kiki,:luxury_pet_treat]  EDB
        :PetFood[:luxury_pet_treat]  EDB

Since the number of possible different proofs for a given fact may be very large, we may be content with just obtaining a single one. We can use the explain command to obtain a shortest proof as follows:

explain shortest :Animal[:kiki]

which will return the following proof

:Animal[:kiki]
    :Animal[?x] :- :eats[?x,?y], :PetFood[?y] .  | { ?x -> :kiki, ?y -> :luxury_pet_treat }
        :eats[:kiki,:luxury_pet_treat]  EDB
        :PetFood[:luxury_pet_treat]  EDB

Indeed, this is the shortest proof as it involves a single rule application, whereas the alternative proof involves two rule applications.

When using the explanation command, it is important to understand that rules in RDFox can come from different sources

  • User rules such as the ones in our previous example are rules introduced directly by the user.

  • User axioms are OWL 2 axioms imported by the user, which are internally translated into rules.

  • Special rules are rules that have no direct connection with the information provided by the user and are internally added by RDFox. An example of special rules are the rules for subsumption reasoning provided at the end of the previous section, and another example are the rules obtained by axiomatizing equality as a transitive, reflexive and symmetric relation.

Consider for example a data store where we import the following triple: :kiki rdf:type :Cat . and also the following OWL 2 axioms in functional syntax

SubClassOf( :Cat :Mammal )
SubClassOf( :Mammal :Animal )

If we now run the explain command

explain :Animal[:kiki]

we obtain the same proof as before:

:Animal[:kiki]
    :Animal[?X] :- :Mammal[?X] .  | { ?X -> :kiki }
        :Mammal[:kiki]
            :Mammal[?X] :- :Cat[?X] .  | { ?X -> :kiki }
                :Cat[:kiki]  EDB

It is important to note, however, that the explicitly given OWL 2 axioms are not displayed in the proof, but rather the rules that are obtained from them internally.

5.8. Monitoring Reasoning in RDFox

This section gives an overview of the functionality implemented in RDFox for monitoring the progress of reasoning.

Let us start by creating a new data store in which reasoning will be performed in a single-threaded fashion:

dstore create default par-complex-nn
threads 1

To enable monitoring of reasoning we use the following shell commands, where the second one establishes the frequency at which information is provided in the console.

set reason.monitor progress
set log-frequency 1

We can now import rules and data which, in our case, will come from the well-known LUBM benchmark.

We first import the rules:

import LUBM_L.dlog

RDFox will then import the rules and display relevant information about the rule importation process:

Adding data in file './LUBM_L.dlog'.
[1]: START './LUBM_L.dlog'
[1]: FINISHED './LUBM_L.dlog'
    Time since import start:         1 ms
    Time since start of this import: 1 ms
    Facts processed  in this import: 0
    Number of finished imports:      1
    Total facts processed so far:    0
Import operation took 0.4 s.
Processed 98 rules, of which 98 were updated.``

In particular, we can see that 98 rules were imported in total and that rule importation took 0.4s.

We can now ask RDFox to print detailed information about the imported rules. For instance, the following command will provide statistics about the rule set and then will print each rule in a given order:

info rulestats print-rules by-body-size

RDFox will first provide some statistics about the rule set

================================ RULES STATISTICS ================================
Component    Body size    Nonrecursive rules    Recursive rules    Total rules
        0            2                     0                  1              1
        1            1                    19                  2             21
        1            2                     1                  0              1
        2            1                    19                  2             21
        3            1                    13                  0             13
        4            1                    28                  8             36
        4            3                     0                  5              5
----------------------------------------------------------------------------------
Total:                                    80                 18             98
----------------------------------------------------------------------------------

RDFox organizes rules by components, which gives us an idea of how information flows during reasoning. To give some intuition as to what a component is, consider the following simple set of rules:

[?x, rdf:type, :B] :- [?x, rdf:type, :A] .

[?x, rdf:type, :C] :- [?x, rdf:type, :B] .

[?x, rdf:type, :D] :- [?x, rdf:type, :B] .

[?x, rdf:type, :A] :- [?x, rdf:type, :D] .

We can see that :B depends on :A since to derive facts about :B we need to first obtain facts about :A. Similarly, :C and :D both depend on :B. Finally, :A depends on :D, and hence the first, third and fourth rules are involved in a cycle of dependencies. As a result, the flow of information during rule application can be seen in two stages: first, we need to derive all facts about :A, :B and :D using the first, third and fourth rules. Then, we can derive all facts about :C using the second rule. To reflect this, RDFox will organize these rules into two components: the first component will contain the first, third and fourth rules which together are considered recursive (they are involved in a cycle of dependencies), whereas the second rule will go in its own component and will be identified as non-recursive.

In the table above, we can see the same kind of information concerning the more complex LUBM rules. We can see that rules are arranged in 5 components (0..4), we can see the number of rules involved in dependency cycles (recursive rules) in each component, as well as the total number of rules and their maximal body size.

RDFox then will print the rules component by component on the console and within each component it will arrange the rules sorted by number of atoms in their bodies. In our simple example about :A, :B, :C and :D, the information printed will look as follows:

-- COMPONENT:          0
-- NONRECURSIVE RULES: 0
-- RECURSIVE RULES:    3
**********************************************************************************
** BODY SIZE:          1
** RECURSIVE RULES:    3

:B[?x] :- :A[?x] .
:D[?x] :- :B[?x] .
:A[?x] :- :D[?x] .
----------------------------------------------------------------------------------
-- COMPONENT:          1
-- NONRECURSIVE RULES: 1
-- RECURSIVE RULES:    0
**********************************************************************************
** BODY SIZE:          1
** NONRECURSIVE RULES: 1

:C[?x] :- :B[?x] .
==================================================================================

Now that we have imported the rules, we can import also the data:

import LUBM-large.ttl

At this point, RDFox will load the data (without performing any reasoning yet) and will provide information about the progress of loading. We can see an excerpt of such information below:

> import LUBM-large.ttl
Adding data in file './LUBM-large.ttl'.
[1]: START './LUBM-large.ttl'
[1]: PROGRESS './LUBM-large.ttl'
    Time since start of import:      1001 ms
    Time since start of this import: 1001 ms
    Facts processed in this import:  418000
[1]: PROGRESS './LUBM-large.ttl'
    Time since start of import:      2001 ms
    Time since start of this import: 2001 ms
    Facts processed in this import:  795000
[1]: PROGRESS './LUBM-large.ttl'
    Time since start of import:      3002 ms
    Time since start of this import: 3002 ms
    Facts processed in this import:  1164000
...

[1]: FINISHED './LUBM-large.ttl'
    Time since import start:         13143 ms
    Time since start of this import: 13143 ms
    Facts processed  in this import: 5000000
    Number of finished imports:      1
    Total facts processed so far:    5000000
Import operation took 17.8 s.
Processed 5000000 facts, of which 5000000 were updated.

In particular, we can see how many data facts have been imported each second. We can also see that, in the end, 5,000,000 data triples were imported and that the import took 17.8s in total.

We can now compute the materialization of the LUBM rules and facts in the store using the mat command:

mat

RDFox will display information about the number of facts generated:

Materializing rules incrementally.
Rules will be processed by strata.
Maximum depth of backward chaining is unbounded.
Materialization time:      0 s.
------------------------------------------------------------------------------------------------
Table            |  Facts                   |  EDB                     |  IDB
------------------------------------------------------------------------------------------------
internal:triple  |  6,826,914 -> 6,826,914  |  5,000,000 -> 5,000,000  |  6,826,913 -> 6,826,913
------------------------------------------------------------------------------------------------

The column labeled EDB tells us the number of facts that were explicitly given in the data file. In turn the column labeled IDB indicates the total number of facts in the store after materialization; in our case, this means that the system has derived a total of over 1.8 million new facts through rule application. The Table column indicates the name of each tuple table in the store. In this case, we just have the default triple table, but in other cases we may also have other tuple tables such as those obtained from named graphs. Each different tuple table will have different numbers of explicit and derived facts. Finally, the column labeled facts indicates the total number of memory slots that were reserved by different threads during reasoning; this number can actually be larger that the total number of facts in the system as some of these slots may not have been used to store a fact.

5.9. Querying the Explicitly given Data

After reasoning, RDFox will by default answer all SPARQL queries with respect to the obtained materialization. For instance, suppose that we have a data store with fact :a rdf:type :A and the following rules:

[?x, rdf:type, :B] :- [?x, rdf:type, :A] .

[?x, rdf:type, :C] :- [?x, rdf:type, :B] .

[?x, rdf:type, :D] :- [?x, rdf:type, :B] .

[?x, rdf:type, :A] :- [?x, rdf:type, :D] .

The materialization will contain the following facts, where three of them have been derived and only fact :a rdf:type :A was originally in the data:

:a rdf:type :A .
:a rdf:type :B .
:a rdf:type :C .
:a rdf:type :D .

If we issue a query

SELECT ?x WHERE { ?x rdf:type :D }

we will obtain :a as a result.

In RDFox it is possible to query only the explicit data even after materialization has been performed. For this, we can use the shell command

set query.domain EDB

If we then issue the previous query again we will obtain the empty answer as a result.