6. Reasoning in RDFox¶
Reasoning in RDF is the ability to calculate the set of triples that logically follow from an RDF graph and a set of rules. Such logical consequences are materialized in RDFox as new triples in the graph.
The use of rules can significantly simplify the management of RDF data as well as provide a more complete set of answers to user queries. Consider, for instance, a graph containing the following triples:
:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .
The relation :locatedIn
is intuitively transitive: from the fact that
Oxford is located in Oxfordshire and Oxfordshire is located in England, we can
deduce that Oxford is located in England. The triple :oxford :locatedIn
:england
is, however, missing from the graph. As a consequence, SPARQL
queries asking for all English cities will not return :oxford
as an answer.
We could, of course, add the missing triple by hand to the graph, in which case
:oxford
would now be returned as an answer to our previous query. Doing so,
however, has a number of important disadvantages. First, there can be millions
such missing triples and each of them would need to be manually added, which is
cumbersome and errorprone; for instance, if we add to the graph the triple
:england :locatedIn :uk
, then the following additional triples should also
be added:
:oxford :locatedIn :uk .
:oxfordshire :locatedIn :uk .
More importantly, by manually adding missing triples we are not capturing the
transitive nature of the relation, which establishes a causal link between
different triples. Indeed, triple :oxford :locatedIn :england
holds because
triples :oxford :locatedIn :oxfordshire
and :oxfordshire :locatedIn
:england
are part of the data. Assume that we later find out that :oxford
is not located in :oxfordshire
, but rather in the state of Mississippi in
the US, and we delete from the graph the triple :oxford :locatedIn
:oxfordshire
as a result. Then, the triples :oxford locatedIn :england
and :oxford :locatedIn :uk
should also be retracted as they are no longer
justified. Such situations are very hard to handle manually.
As we will see next, we can use a rule to faithfully represent the transitive nature of the relation and handle all of the aforementioned challenges in an efficient and elegant way.
6.1. Rule Languages¶
A rule language for RDF determines which syntactic expressions are valid rules, and also provides welldefined meaning to each rule. In particular, given an arbitrary set of syntactically valid rules and an arbitrary RDF graph, the set of new triples that follow from the application of the rules to the graph must be unambiguously defined.
6.1.1. Datalog¶
Rule languages have been in use since the 1980s in the fields of data management and artificial intelligence. The basic rule language is called Datalog. It is a very well understood language, which constitutes the core of a plethora of subsequent rule formalisms equipped with a wide range of extensions. In this section, we describe Datalog in the context of RDF.
A Datalog rule can be seen as an IF … THEN
statement. In particular, the
following is a Datalog rule which faithfully represents the transitive nature
of the relation :locatedIn
.
[?x, :locatedIn, ?z] : [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
The IF
part of the rule is also called the body or antecedent; the
THEN
part of the rule is called the head or the consequent. The head is
written first and is separated from the body by the symbol :
. Both body
and head consist of a conjunction of conditions, where conjuncts are
commaseparated and where each conjunct is a triple in which variables may
occur. Each conjunct in the body or the head is called an atom. In our
example, the body consists of atoms [?x, :locatedIn, ?y]
and [?y,
:locatedIn, ?z]
, whereas the head consists of the single atom [?x,
:locatedIn, ?z]
.
Each rule conveys the idea that, from certain combinations of triples in the input RDF graph, we can logically deduce that some other triples must also be part of the graph. In particular, variables in the rule range over all possible nodes in the RDF graph (RDF literals, URIs, blank nodes); whenever these variables are assigned values that make the rule body become subset of the graph, then we see what the value of those variables is, propagate these values to the head of the rule, and deduce that the resulting triples must also be a part of the graph.
In our example, a particular rule application binds variable ?x
to
:oxford
, variable ?y
to :oxfordshire
and variable ?z
to
:england
, which then implies that that triple :oxford :locatedIn
:england
obtained by replacing ?x
with :oxford
and ?z
with
:england
in the head of the rule holds as a logical consequence. A
different rule application would bind ?x
to :oxfordshire
, ?y
to
:england
, and ?z
to :uk
; as a result, the triple :oxfordshire
:locatedIn :uk
can also be derived as a logical consequence.
An alternative way to understand the meaning of a single Datalog rule
application to an RDF graph is to look at it as the execution of an INSERT
statement in SPARQL, which adds a set of triples to the graph. In particular,
the statement
INSERT { ?x :locatedIn ?z } WHERE { ?x :locatedIn ?y. ?y :locatedIn ?z }
corresponding to our example rule leads to the insertion of triples
:oxford :locatedIn :england .
:oxfordshire :locatedIn :uk .
There is, however, a fundamental difference that makes rules more powerful than
simple INSERT
statements in SPARQL, namely that rules are applied
recursively . Indeed, after we have derived that Oxford is located in
England, we can apply the rule again by matching ?x
to :oxford
, ?y
to :england
, and ?z
to :uk
, to derive :oxford :locatedIn :uk
—a
triple that is not obtained as a result of the INSERT
statement above.
In this way, the logical consequences of a set of Datalog rules on an input graph are captured by the recursive application of the rules until no new information can be added to the graph. It is important to notice that the set of new triples obtained is completely independent from the order in which rule applications are performed as well as of the order in which different elements of rule bodies are given. In particular, the following two rules are equivalent:
[?x, :locatedIn, ?z] : [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
[?x, :locatedIn, ?z] : [?y, :locatedIn, ?z], [?x, :locatedIn, ?y] .
6.1.2. Extensions of Datalog¶
A wide range of extensions of Datalog have been proposed and studied in the literature. In this subsection we describe the extensions of Datalog implemented in RDFox as well as the restrictions on them that have been put in place in order to ensure that the resulting language is semantically welldefined. Later on in this section we will provide many more examples of rules equipped with these extended features.
6.1.2.1. Negationasfailure¶
Negationasfailure allows us to make deductions based on information that is not present in the graph. For instance, using negationasfailure we can write a rule saying that someone who works for a company but is not an employee of the company is an external contractor.
[?x, :contractorFor, ?y] : [?x, :worksFor, ?y], NOT [?x, :employeeOf, ?y] .
Here, NOT
represents a negation of a body atom.
Let us consider the logical consequences of this rule when applied to the graph
:mary :worksFor :acme .
:mary :employeeOf :acme .
:bob :worksFor :acme .
On the one hand, we have that :mary
works for :acme
, and hence we can
satisfy the first atom in the body by assigning :mary
to ?x
and
:acme
to ?y
; however, :mary
is also an employee of :acme
, and
hence the second condition is not satisfied, which means that we cannot derive
that :mary
is a contractor. On the other hand, we also have that :bob
works for :acme
and hence once again we can satisfy the first atom in the
body, this time by assigning :bob
to ?x
and :acme
to ?y
; but
now, we do not have a triple in the graph stating that :bob
is an employee
of :acme
and hence we can satisfy the second condition in the body and
derive the triple :bob :contractorFor :acme
.
Indeed, the query
SELECT ?x ?y WHERE { ?x :contractorFor ?y }
yields the expected result
:bob :acme .
Note that negation typically means “absence of information”; indeed, we do not
know for sure whether :bob
is not an employee of :acme
; we only know
that this information is not available in the graph (neither explicitly, nor as
a consequence of other rule applications).
Negationasfailure is intrinsically nonmonotonic. In logic, this means
that new information may invalidate previous deductions. For instance, suppose
that :bob
becomes an employee of :acme
and, to reflect this, we add to
our data graph the triple :bob :employeeOf :acme
. Then, we can no longer
infer that :bob
is a contractor for :acme
and the previous query will
now return an empty answer. In contrast, rules in plain Datalog are monotonic:
adding new triples to the graph cannot invalidate any consequences that we may
have previously drawn; for instance, by adding a triple :england :locatedIn
:uk
to the example in our previous section, cannot invalidate a previous
inference such as :oxford :locatedIn :england
.
6.1.2.2. Aggregation¶
Aggregation is an important feature in query languages such as SQL or SPARQL. It allows one to compute numeric values (such as minimums, maximums, sums, counts or averages) on groups of solutions satisfying certain conditions (e.g., compute an average salary over the group of people working in the accounting department).
In RDFox, it is possible to define relations based on the result of aggregate calculations. For instance, consider the following data.
:bob :worksFor :accounting .
:bob :salary "50000"^^xsd:integer .
:mary :worksFor :hr .
:mary :salary "47000"^^xsd:integer .
:jen :worksFor :accounting .
:jen :salary "60000"^^xsd:integer .
:accounting rdf:type :Department .
:hr rdf:type :Department .
We can write an RDFox rule that computes the average salary of each department, and store the result in a newly introduced relation:
[?d, :deptAvgSalary, ?z] :
[?d, rdf:type, :Department],
AGGREGATE(
[?x, :worksFor, ?d],
[?x, :salary, ?s]
ON ?d
BIND AVG(?s) AS ?z) .
Here, each group consists of a department with salaried employees, and for each
group the rule computes an average of the salaries involved. In particular,
suppose that we satisfy the first atom in the body by assigning value
:accounting
to variable ?d
; then, we can satisfy the aggregate atom
by grouping all employees working for :accounting
(i.e., :bob
and
:jen
), compute their average salary (55k) and assigning the resulting value
to variable ?z
; as a result, we can propagate the assignment of ?d
to
:accounting and of ?z
to 55,000 to the head and derive the triple
:accounting :deptAvgSalary "55000"^^xsd:integer .
The query
SELECT ?d ?s WHERE { ?d rdf:type :Department . ?d :deptAvgSalary ?s }
then returns the expected answers
:accounting 55000.0 .
:hr 47000.0 .
Similarly to negation, aggregation is also a nonmonotonic extension of Datalog. In particular, if we were to add a new employee to the accounting department with a salary of 52k, then we would need to withdraw our previous inference that the average accounting salary equals 55k and adjust the average accordingly.
6.1.2.3. Builtin Functions¶
Datalog can be extended with the use of functions in rule bodies. In
particular, one can use all functions described in Section 5.2,
with the exception of NOW
, RAND
, UUID
and STRUUID
, whose
behavior is nondeterministic.
We demonstrate the use of functions in rule bodies using the string
concatenation function CONCAT
. The following rule computes the full name of
a person as the concatenation of their first name and their family name.
[?x, :fullName, ?n] :
[?x, :firstName, ?y],
[?x, :lastName, ?z],
BIND(CONCAT(?y, " ", ?z) AS ?n) .
Consider the application of this rule to the graph consisting of the following triples:
:peter :firstName "Peter" .
:peter :lastName "Griffin" .
Then, the query
SELECT ?x ?y WHERE { ?x :fullName ?y }
would return the expected answer
:peter "Peter Griffin" .
An important consequence of introducing builtin functions is that rules are now capable of deriving triples mentioning new objects which did not occur in the input data (such as “Peter Griffin” in the above example). This is not possible using plain Datalog rules, where the application of a rule may generate new triples, but these triples can only mention objects that were present in the input data.
If users are not careful, they may write rules using builtin functions that generate infinitely many new constants and hence there may be infinitely many triples that logically follow from the rules and a (finite) input graph.
For instance, consider the following rule, which creates longer names from shorter names.
[?person, :hasName, ?longerName] :
[?person, :hasName, ?name],
BIND(CONCAT("Longer name: ", ?name) AS ?longerName) .
If we apply this rule to the input graph consisting of
:peter :hasName "Peter"
we will derive an infinite “chain” of triples
:peter :hasName "Peter"
:peter :hasName "Longer name: Peter"
:peter :hasName "Longer name: Longer name: Peter"
:peter :hasName "Longer name: Longer name: Longer name: Peter"
:peter :hasName "Longer name: Longer name: Longer name: Longer name: Peter"
...
In such cases, RDFox will run out of resources trying to compute infinitely many new triples and will therefore not terminate. This is not due to a limitation of RDFox as a system, but rather to the wellknown fact that Datalog becomes undecidable once extended with builtin functions that can introduce arbitrarily many fresh objects.
6.1.2.4. Equality¶
Equality is a special binary predicate that can be used to identify different
resources as representing the same realworld object. The equality predicate is
referred to as owl:sameAs
in the standard W3C languages for the Semantic
Web. In addition to equality, W3C standard languages also define an
inequality predicate, which is referred to as owl:differentFrom
.
By default, two resources with different names are not assumed to be actually
different. For instance, resources called :marie_curie
and
:marie_sklodowsca
may refer to the same object in the world (the renowned
scientist Marie Curie). In logic terms we typically say that by default we are
not making the unique name assumption (UNA). In some applications, however,
it makes sense to make such assumption, and the effect of making the UNA is
that we will have implicit owl:differentFrom
statements between all pairs
of resources mentioned in the data.
In RDFox we can enable the use of equality by initializing a store accordingly. For instance, using the shell, we can initialize a data store with equality reasoning turned on using the shell command
init seq equality noUNA
initializes a data store with equality reasoning and no UNA.
Extensions of Datalog with equality allow for the equality and inequality
predicates to appear in rules and data. For instance, consider the following
triples, where the second triple represents the fact that the URIs
:marie_curie
and :marie_sklodowsca
refer to the same person.
:marie_curie rdf:type :Scientist .
:marie_curie owl:sameAs :marie_sklodowsca .
A query asking RDFox for all scientists
SELECT ?x WHERE { ?x rdf:type :Scientist }
will return both :marie_curie
and :marie_sklodowsca
as a result.
Equality and inequality can also be used in rules. For instance, the following rule establishes that a person can only have one biological mother
[?y, owl:sameAs, ?z] : [?x, :hasMother, ?y], [?x, :hasMother, ?z] .
The application of this rule to the graph
:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :marie_sklodowsca .
identifies :marie_curie
and :marie_sklodowsca
:as the same person.
The joint use of equality and inequality can lead to logical contradictions. For instance, the application of the previous rule to a graph consisting of the following triples would lead to a contradiction:
:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :eve_curie .
:marie_curie owl:differentFrom :eve_curie .
Indeed, the application of the rule derives :marie_curie owl:sameAs
:eve_curie
, which is in contradiction with the data triple :marie_curie
owl:differentFrom :eve_curie
. Such contradictions can be identified in RDFox
by querying for the instances of the special owl:Nothing
predicate, which
is also borrowed from the W3C standard OWL. The query
SELECT ?x WHERE { ?x rdf:type owl:Nothing }
returns :marie_curie
and :eve_curie
as answers. This can be interpreted
by the user as: “resources :marie_curie and :irene_curie are involved in a
logical contradiction”.
6.1.2.5. Named Graphs and Nary Relations¶
In all our previous examples, all atoms in rules are evaluated against the
default RDF graph. RDFox also supports named graphs, which can be created
either implicitly, by importing an RDF dataset encoded as TriG or NQuads, or
explicitly, as shown in the following example that creates the named graph
:Payroll
.
tupletable create :Payroll type triples
Named graphs can also be used in the body and the head of rules, and hence it is possible to derive new triples as the result of rule application and add them to graphs other than the default graph. Rules can refer only to named graphs already created using one of the ways described above.
For instance, consider the following rule:
:Payroll(?id, :monthlyPayment, ?m) :
[?id, rdf:type, :Employee],
:HR(?id, :yearlySalary, ?s),
BIND(?s / 12 AS ?m) .
This rule joins information from the default graph and the named graph called
HR
, and it inserts consequences into the named graph called :Payroll
.
Specifically, The first body atom of the rule identifies IDs of employees in
the default RDF graph. The second body atom is a general atom: it is evaluated
in the named graph called :HR
, and it matches triples that connect IDs with
their yearly salaries. The head of the rule contains a general atom that refers
to the named graph called :Payroll
, and it derives triples that connect IDs
of employees with their respective monthly payments. In particular, given as
data
:HR(:a, :yearlySalary, "55000"^^xsd:integer) .
:a rdf:type :Employee .
the rule will compute the monthly payment for employee :a
. Then, the query
SELECT ?s ?p ?o WHERE { GRAPH :Payroll{ ?s ?p ?o } }
will correctly return the monthly payment for employee :a
:a :monthlyPayment 4583.333333333333333 .
In addition to referring to graphs other than the default graph, RDFox can also directly represent external data as tuples of arbitrary arity (not just triples) using the same syntax as named graphs. Atoms representing such data, however, are only allowed to be used in the body of rules. Details on how to access external data from RDFox are given in Section 10.
6.2. MaterializationBased Reasoning¶
The main computational problem solved by RDFox is that of answering a SPARQL 1.1 query with respect to an RDF graph and a set of rules.
To solve this problem, RDFox uses materializationbased reasoning to precompute and store all triples that logically follow from the input graph and rules in a queryindependent way. Both the process of extending the input graph with such newly derived triples and its final output are commonly called materialization. After such preprocessing, queries can be answered directly over the materialization, which is usually very efficient since the rules do not need to be considered any further. Materializations can be large, but they can usually be stored and handled on modern hardware as the available memory is continually increasing.
The main challenge of this approach to query answering is that, whenever data triples and/or rules are added and/or deleted, the “old” materialization must be replaced with the “new” materialization that contains all triples that follow from the updated input. In this setting, deletion of triples is restricted to those that are explicit in the input graph and hence one does not consider deletion of derived triples—a complex problem known in the literature as belief revision or view update.
For instance, given as input the RDF graph
:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .
:england :locatedIn :uk .
and the familiar rule
[?x, :locatedIn, ?z] : [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
RDFox will compute the corresponding materialization, which consists of triples
:oxford :locatedIn :oxfordshire .
:oxford :locatedIn :england .
:oxford :locatedIn :uk .
:oxfordshire :locatedIn :england .
:oxfordshire :locatedIn :uk .
:england :locatedIn :uk .
RDFox will now handle each SPARQL 1.1 query issued against the input graph and rule by simply evaluating the query directly over the materialization, thus avoiding the expensive reasoning at query time.
An update could delete a triple explicitly given in the input graph such as the triple :oxfordshire :locatedIn :england, in which case the new materialization consists only of triples
:oxford :locatedIn :oxfordshire .
:england :locatedIn :uk .
since the rule is no longer applicable after deletion. In contrast, deleting a
derived triple such as :oxford :locatedIn :uk .
is not allowed since this
triple was not part of the original input.
RDFox implements sophisticated algorithms for both efficiently computing materializations and maintaining them under addition/deletion updates that may affect both the data and the rules. All these algorithms were developed after years of research at Oxford and have been extensively documented in the scientific literature.
6.3. Restrictions on Rule Sets¶
The rule language of RDFox imposes certain restrictions on the structure of rule sets. These restrictions ensure that the materialization of a set of rules and an RDF graph is welldefined and unique.
In particular, the semantics (i.e., the logical meaning) of rule sets involving negationasfailure and/or aggregation is not straightforward, and numerous proposals exist in the scientific literature. There is, however, a general consensus for rule sets in which the use of negationasfailure and aggregation are stratified. Informally, stratification conditions ensure that there are no cyclic dependencies in the rule set involving negation or aggregation.
Several variants of stratification have been proposed, where some of them capture a wider range of rule sets than others; they all, however, provide similar guarantees. We next describe the stratification conditions adopted in RDFox by means of examples. For this, let us consider the following rules mentioning negationasfailure:
[?x, :contractorFor, ?y] :
[?x, :worksFor, ?y],
NOT [?x, :employeeOf, ?y] .
[?x, :employeeOf, :acme] : [?x, :worksFor, :acme] .
The first rule says that people working for a company who are not employees of
that company act as contractors. The rule establishes two dependencies. The
first dependency tells us that the presence of a triple having :worksFor
in the middle position may contribute to triggering the derivation of a triple
having :contractorFor
in the middle position. In turn, the second
dependency tells us that the absence of a triple having :employeeOf
in
the middle position may also contribute to the derivation of a triple having
:contractorFor
in the middle position.
The second rule tells us that everyone working for :acme
is an employee of
:acme
. This rule establishes one dependency, namely the presence of a
triple having :worksFor
in the middle position and :acme
in the
rightmost position may trigger the derivation of a triple having
:employeeOf
in the middle position and :acme
in the rightmost position.
We can keep track of such dependencies by means of a dependency graph. The
nodes of the graph are obtained by replacing variables in individual triple
patterns occurring in the rules with the special symbol ANY
, which
intuitively indicates that the position of the triple where it occurs can adopt
any constant value, and leaving constants as they are. In particular, our
example rules yield a graph having the following five vertices v1—v5:
v1: ANY :contractorFor ANY
v2: ANY :worksFor ANY
v3: ANY :employeeOf ANY
v4: ANY :worksFor :acme
v5: ANY :employeeOf :acme
The (directed) edges of the graph lead from vertices corresponding to body
atoms to vertices corresponding to head atoms and can be either “regular” or
“special”. Special edges witness the presence of a dependency involving
aggregation or negationasfailure; in our case, we will have a single special
edge (v3, v1). In turn, each dependency that is not via
negationasfailure/aggregation generates a regular edge; in our case, we will
have regular edges (v2,v1) and (v4, v5). Finally, the graph will also contain
bidirectional regular edges between nodes that unify in the sense of
firstorder logic: since [?x, :employeeOf, ?y]
and [?x, :employeeOf,
:acme]
unify, we will have regular edges (v3,v5) and (v5, v3); similarly, we
will also have regular edges (v2,v4) and (v4,v2).
Our two example rules are stratified and hence are accepted by RDFox; this is because there is no cycle in the dependency graph involving a special edge (indeed, all cycles involve regular edges only).
Now suppose that the add the following rule:
[?x, :employeeOf, ?y] :
[?x, :worksFor, ?y],
NOT [?x, :contractorFor, ?y] .
which says that people working for a company who are not contractors for the company must be employees of the company. The addition of this rule does not change the set of nodes in the dependency graph; however, it adds two more edges: a regular edge (v2, v3) and a special edge (v1, v3). As a result, we now have a cycle involving a special edge and the rule set is no longer stratified, which means that the rule set will be rejected by RDFox as a result.
Due to stratification conditions, the use of the special equality relation
owl:sameAs
in rules precludes the use of aggregation or
negationasfailure. Consider the following rule set, where the second rule
tells us that a person cannot be an employee of two different companies:
[?x, :contractorFor, ?y] :
[?x, :worksFor, ?y],
NOT [?x, :employeeOf, ?y] .
[?y, owl:sameAs, ?z] :
[?x, :employeeOf, ?y],
[?x, :employeeOf, ?z] .
This rule set will be rejected by RDFox as the rule set mentions both NOT
and owl:sameAs
. Informally, this is because equality can affect every
single relation, which precludes stratification in most cases.
In addition to stratification conditions, RDFox also requires certain restrictions to the structure of rules which make sure that each rule can be evaluated by binding the variables in the body of the rule to a data graph. To see an example where things go wrong consider the rule:
[?x, :worksFor, ?y] : [?y, rdf:type, :Department] .
The rule cannot be evaluated by first matching the body to the data graph and
then propagating the variable bindings to the head; indeed, rule body to an RDF
graph will always leave variable ?x
of the rule unbound and hence the
triple that must be added as a result of applying the rule to the data is
undefined. As a result, this rule will be rejected by RDFox.
Binding restrictions in RDFox are rather involved given that the underpinning rule language is rich and there are many subtle corner cases. However, rules accepted by the parser can always be unambiguously evaluated.
6.4. The Rule Language of RDFox¶
This section formally describes the rule language of RDFox. As already mentioned, the rule language supported by RDFox extends Datalog with stratified negation, stratified aggregation, builtin functions, and more, so as to provide additional data analysis capabilities.
A rule has the following form, where the formula to the left of the :
operator is the rule head and the formula to the right is the rule body. Each
Hi
, with 1 ≤ i ≤ j
, is a tuple table atom, and each Li
, with
1 ≤ i ≤ k
, is a body formula. A body formula can be either an atom,
a negation or an aggregate, where an atom is a tuple table atom, a
filter atom, or a bind atom. Currently, the only tuple table atoms
allowed in rule’s heads are default graph atoms and named graph atoms.
A complete grammar of the rule language is given in Section 6.4.6.
H1 , …, Hj : L1 , …, Lk .
Informally, a rule says that “if L1
, …, and Lk
all hold for some
bindings of the rule’s variables to RDF resources, then H1
, …, and Hj
also hold for the same bindings.” Rule evaluation is the process of finding all
variable bindings for which the rule body holds. For every such variable
binding, the triples obtained from the rule head by replacing the variables
according to their binding are added to the current store. A rule can be
evaluated, only if each variable in its head can be bound by its body. The
ordering of the head atoms and/or the body formulas does not affect the meaning
of a rule.
Successful variable bindings are computed by consecutively evaluating the body formulas in the rule. The evaluation of a body formula may succeed or fail for the current variable bindings. If successfully evaluated, some body formulas can bind previously unbound variables. These include tuple table atoms, bind atoms, and aggregates. Some body formulas may require that certain variables are bound before they can be evaluated. These include bind atoms, filter atoms, and general atoms backed by certain builtin tables and external sources. Finally, body formulas may have local variables that are not visible by the rest of the rule. These include the negation and the aggregate body formulas. Any mentioning of a local variable in a different context is treated as a different variable.
In the following sections we are going to describe the different constituents of a rule in more detail. The word term denotes either an RDF resource or a SPARQL variable.
6.4.1. Tuple Table Atom¶
A tuple table atom can be either a default graph atom, a named graph atom or a general atom. Default and named graph atoms are used to refer to RDF data in the current store, while general atoms can refer to builtin tuple tables or data source tuple tables. All three types of tuple table atoms provide bindings for the variables that occur in them.
6.4.1.1. Default Graph Atom¶
A default graph atom has the form [t1, t2, t3]
, where ti
are terms. If
t2
is an IRI, then the atom [t1, t2, t3]
can be written alternatively
as t2[t1, t3]
. Furthermore, when t2
is the special IRI rdf:type
and
t3
is also an IRI, atom [t1, t2, t3]
can be written alternatively as
t3[t1]
. Default graph atoms are evaluated against the triples in the
default graph of an RDF dataset, with t1
, t2
and t3
matching the
subject, predicate and object of each triple.
Example A simple rule with default graph atoms only
[?x, rdf:type, :Person] : [?x, :teacherOf, ?y] .
As we discussed earlier, this is equivalent to:
:Person[?x] : :teacherOf[?x, ?y] .
The above rule asserts that for every triple (t, :teacherOf, s)
in
the default graph, a triple (t, rdf:type, :Person)
has to be added to
the default graph as well.
6.4.1.2. Named Graph Atom¶
A named graph atom has the form A(t1, t2, t3)
, where A
is an IRI, and
t1
, t2
and t3
are terms. Named graph atoms are evaluated against
the triples in the named graph A
of an RDF dataset, with t1
, t2
and
t3
matching the subject, predicate and object of each triple.
Example A rule referring to a named graph :Personnel
in an RDF
dataset.
:Personnel(?person, rdf:type, :Person),
:Personnel(?address, rdf:type, :Address) :
:Personnel(?person, :hasAddress, ?address] .
The above rule asserts that for every RDF triple (person, :hasAddress,
address)
in the named graph :Personnel
, the two RDF triples
(person, rdf:type, :Person)
and (address, rdf:type, Address)
have
to be added to the named graph :Personnel
.
6.4.1.3. General Atom¶
A general atom has the form A(t1, …, tn)
, where n ≥ 1
, A
is an IRI
denoting the name of a tuple table in the current store, and t1, …, tn
are
terms. Named graph atoms are a special case of general atoms, in which A
is
the name of a graph, n = 3
and the columns represent subject, predicate and
object. General tuple table atoms can also refer to other types of tuple
tables, such as builtin tuple tables and data source tuple tables.
Example A rule with a general atom
:hasFirstName[?person, ?firstName], :hasLastName[?person, ?lastName], :hasAddress[?person, ?address] : :Person(?person, ?firstName, ?lastName, ?address) .The rule asserts that, for every tuple
(person, firstName, lastName, address)
in the table:Person
, the following RDF triples should be added to the default graph of the current RDF dataset.(person, :hasFirstName, firstName) (person, :hasLastName, lastName) (person, :hasAddress, address)
6.4.2. Bind Atom¶
A bind atom has the form BIND(exp AS v)
, where exp
is an expression
and v
is a variable not occurring in exp
. Expressions are constructed
from variables, RDF resources, and the operators and functions supported by
RDFox (see Section 5) with the exception of functions whose values are
not determined by values of their arguments (e.g. NOW()
and RAND()
). A
bind atom can be evaluated after all variables in its expression have been
bound by other body formulas. The value of the expression is then assigned to
v
, if v
has not been previously bound. Otherwise, the value of the
expression is compared with that of the variable, and the evaluation of the
bind atom succeeds, if the two values agree. A bind atom provides a binding for
its target variable v
.
Note
Unlike SPARQL 1.1, a bind atom in a rule can be evaluated only if all variables in its expression are bound by other body formulas in the rule.
Example Using bind atom
:cTemperature[?x, ?z] : :fTemperature[?x, ?y], BIND ((?y  32) / 1.8 AS ?z) .
The bind atom in the above rule converts Fahrenheit degrees to Celsius
degrees. Note that the tuple table atom :fTemperature[?x, ?y]
will be
evaluated before the bind atom, since that is the only way ?y
can be
bound. This is true even if the above rule is rewritten as follows.
:cTemperature[?x, ?z] : BIND ((?y  32) / 1.8 AS ?z), :fTemperature[?x, ?y] .
6.4.3. Filter Atom¶
A filter atom has the form FILTER(exp)
, where exp
is an expression. A
filter atom can be evaluated only after all variables in its expression have
been previously bound. The evaluation of a filter atom succeeds if the exp
evaluates to true
for the current variable bindings. Filter atoms provide
no variable bindings.
Note
Unlike SPARQL 1.1, a filter atom in a rule can be evaluated only if all variables in its expression are bound by other body formulas in the rule.
Example Using filter atoms
:PositiveNumber[?x] : :Number[?x], FILTER(?x > 0) .The rule says that a number is positive if it is larger than zero. Since the order of the body formulas does not matter, the rule can be equivalently written as follows.
:PositiveNumber[?x] : FILTER(?x > 0), :Number[?x] .
6.4.4. Negation¶
A negation has one of the following forms, where k,j ≥ 1
, B1, …, Bk
are
atoms, and ?V1, …, ?Vj
are variables local to the body formula. A negation
formula can be evaluated only after all its nonlocal variables have been bound.
The evaluation of a negation succeeds for the given variable bindings, if there
are no bindings for ?V1, …, ?Vj
to resources that make B1, …, Bk
true.
Each variable in B1, …, Bk
has to be ether bound by another body formula in
the rule or has to appear in the variable list after EXIST
/EXISTS
.
Negation body formulas provide no variable bindings.
NOT B1
NOT(B1, …, Bk)
NOT EXIST ?V1, …, ?Vj IN B1
NOT EXIST ?V1, …, ?Vj IN (B1, …, Bk)
NOT EXISTS ?V1, …, ?Vj IN B1
NOT EXISTS ?V1, …, ?Vj IN (B1, …, Bk)
Note
Negation introduces a new variable scope; that is, all variables ?V1, …,
?Vj
listed after EXIST/EXISTS
are local to the scope of the negation
body formula, and any occurrence of such variables elsewhere in the rule
will be treated by RDFox as a different variable (an example is provided
below).
Note
RDFox will reject rules that use negation in all equality
modes other
than off
(see Equality).
Example Using negation with NOT
:hasOptionalComponent[?x, ?y] :
:hasComponent[?x, ?y],
NOT :hasMandatoryComponent[?x, ?y] .
Example Using negation with NOT EXISTS
:BasicComponent[?x] :
:Component[?x],
NOT EXISTS ?y IN (
:Component[?y],
:hasComponent[?x, ?y]
) .
The rule defines as basic all components that have no subcomponents.
Example Variable used in different scopes
:TopComponent[?x] :
:hasComponent[?x, ?y],
NOT EXISTS ?y IN (
:hasComponent[?y, ?x]
) .
We can see that variable ?y
is used both inside and outside the scope
of EXISTS
. As already mentioned, such occurrences will be treated by
RDFox as referring to different variables; in particular, we can obtain an
equivalent rule by replacing all occurrences of ?y
outside the scope of
EXISTS
with a new variable ?z
.
:TopComponent[?x] :
:hasComponent[?x, ?z],
NOT EXISTS ?y IN (
:hasComponent[?y, ?x]
) .
6.4.5. Aggregate¶
Aggregates are used to compute expressions over sets of values using
aggregate functions like COUNT
, MIN
, and SUM
. An aggregate has
the following form
AGGREGATE(B1, …, Bk ON ?X1, …, ?Xj BIND f1(exp1) AS ?V1 … BIND fn(expn) AS ?Vn)
where k ≥ 1, j ≥ 0, n ≥ 0, and
B1, …, Bk
are atoms,?X1, …, ?Xj
are group variables that appear inB1, …, Bk
,exp1, …, expn
are expressions over the variables inB1, …, Bk
, optionally prefixed by the keywordDISTINCT
,f1, …, fn
are aggregate functions, and?V1, …, ?Vn
are variables that do not appear inB1, …, Bk
.
The evaluation of an aggregate will find all variable bindings that make B1,
…, Bk
true. These bindings are grouped by the values of the group variables,
and the aggregate bind expressions are evaluated for each group. Nongroup
variables that appear in B1, …, Bk
are local to the aggregate. Aggregates
provide bindings for the group variables (i.e. ?X1, …, ?Xj
) and the target
variables of their aggregate binds (i.e. ?V1, …, ?Vn
).
Note
An aggregate atom introduces a new variable scope. In particular, all
variables in an aggregate occurring in atoms B1, …, Bk
but which are
not mentioned group variables are local to the aggregate; any occurrence of
such local variables outside the atom will be treated by RDFox as a
different variable.
Note
RDFox will reject rules that use aggregation in all equality
modes
other than off
(see Equality).
Example Compute the minimum and maximum ages for the members in a family
:minAge[?family, ?minAge],
:maxAge[?family, ?maxAge] :
:Family[?family],
AGGREGATE (
:hasMember[?family, ?member],
:hasAge[?member, ?age]
ON ?family
BIND MIN(?age) AS ?minAge
BIND MAX(?age) AS ?maxAge
) .
The above rule computes the minimum and maximum age of the members of each
family. Variables ?family
, ?minAge
and ?maxAge
are global to
the rule, whereas the variables ?member
and ?age
are local to the
aggregate, as they are variables that occur in the aggregate atoms and are
not grouped.
Example Compute the adulttochild ratio in each family
:hasAdultToChildRatio[?family, ?ratio] :
:Family[?family],
AGGREGATE (
:hasMember[?family, ?member],
:hasAge[?member, ?age],
FILTER(?age >= 18)
ON ?family
BIND COUNT(?member) AS ?numberOfAdults
),
AGGREGATE (
:hasMember[?family, ?member],
:hasAge[?member, ?age],
FILTER(?age < 18)
ON ?family
BIND COUNT(?member) AS ?numberOfChildren
),
BIND(?numberOfAdults / ?numberOfChildren AS ?ratio) .
The above rule computes the adulttochild ratio in each family. Note that
the variable ?member
occurs locally in each of the aggregate atoms,
since it is not a group variable. As a result, the two occurences can be
thought of as referring to different variables, and the above rule would be
equivalent to a rule where the two occurences of ?member
are replaced
by ?member1
and ?member2
, respectively.
Example Define close families
:hasCloseFamily[?family1, ?family2] :
AGGREGATE (
:Family[?family1],
:hasMember[?family1, ?member1],
:Family[?family2],
:hasMember[?family2, ?member2],
:hasFriend[?member1, ?member2]
ON ?family1 ?family2
BIND COUNT(*) AS ?numberOfFriendships
),
FILTER (?numberOfFriendships > 3) .
The above rule defines as close those pairs of families that share more than three friendships between their members.
Example Using the keyword DISTINCT
:hasFamilyFriendsCount[?x, ?cnt] :
:Family[?x],
AGGREGATE(
:hasMember[?x, ?y],
:hasFriend[?y, ?z]
ON ?x
BIND COUNT(DISTINCT ?z) AS ?cnt) .
This rule counts the number of distinct family friends; a person is considered a family friend, if they are a friend of a family member.
6.4.6. Grammar¶
This section presents the RDFox rule language grammar.
Ruleset 
:= 
( PrefixDecl  Rule  Fact ) ^{*} 
Rule 
:= 
RuleHead ' : ' RuleBody ' . ' 
Fact 
:= 
TupleTableAtom ' . ' 
RuleHead 
:= 
TupleTableAtom ( ' , ' TupleTableAtom ) ^{*} 
RuleBody 
:= 
( BodyFormula ( ' , ' BodyFormula ) ^{*} ) ? 
BodyFormula 
:= 
Atom  Negation  Aggregate 
Atom 
:= 
TupleTableAtom  FilterAtom  BindAtom 
TupleTableAtom 
:= 
DefaultGraphAtom  NamedGraphAtom  GeneralAtom 
DefaultGraphAtom 
:= 
DefaultGraphTripleAtom  DefaultGraphPropertyAtom  DefaultGraphClassAtom 
DefaultGraphTripleAtom 
:= 
' [ ' Term ' , ' Term ' , ' Term ' ] ' 
DefaultGraphPropertyAtom 
:= 
iri ' [ ' Term ' , ' Term ' ] ' 
DefaultGraphClassAtom 
:= 
iri ' [ ' Term ' ] ' 
NamedGraphAtom 
:= 
GraphName ' ( ' Term ' , ' Term ' , ' Term ) ' ) ' 
GeneralAtom 
:= 
TupleTableName ' ( ' Term ( ' , ' Term ) ^{*} ' ) ' 
GraphName 
:= 

TupleTableName 
:= 

Term 
:= 
var  iri  bnode  RDFLiteral  NumericLiteral  BooleanLiteral 
FilterAtom 
:= 
' FILTER ' ' ( ' Expression ' ) ' 
BindAtom 
:= 
' BIND ' ' ( ' Expression ' AS ' var ' ) ' 
Expression 
:= 
any valid SPARQL expression with no EXISTS and NOT EXISTS subexpressions, and with builtin functions restricted to RDFox builtin functions, but excluding NOW , RAND , UUID , STRUUID and aggregate functions 
Negation 
:= 
' NOT ' ExistsClause ? ( Atom  ' ( ' AtomList ' ) ' ) 
AtomList 
:= 
Atom ( ' , ' Atom ) ^{*} 
ExistsClause 
:= 

Aggregate 
:= 
' AGGREGATE ' ' ( ' AtomList OnClause ? AggregateBind ^{*} ' ) ' 
OnClause 
:= 
' ON ' var ^{+} 
AggregateBind 
:= 
' BIND ' ( AggregateExpression  CountExpression ) ' AS ' var 
AggregateExpression 
:= 
AggregateFunction ' ( ' ( ' DISTINCT ' ) ? Expression ' ) ' 
AggregateFunction 
:= 
( ' COUNT '  ' SUM '  ' AVG '  ' MIN '  ' MAX ' ) 
CountExpression 
:= 
' COUNT ' ' ( ' ( ' DISTINCT ' ) ? ' * ' ' ) ' 
6.5. Common Uses of Rules in Practice¶
This section describes common uses of rules and reasoning in practical applications. This section will be especially useful for practitioners who are seeking to understand how the reasoning capabilities provided by RDFox can enhance graph data management.
6.5.1. Computing the Transitive Closure of a Relation¶
In many other situations, we may have a relation that is not transitive, but we are interested in defining a different relation that “transitively closes” it. Consider a social network where users follow other users. The graph may be represented by the triples next.
:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
A common task in social networks is to use existing connections to suggest new ones. For example, since Alice follows Bob and Bob follows Charlie, the system may suggest that Alice follow Charlie as well. Likewise, the system may suggest that Diana follow Bob; but then, if Diana follows Bob, she may also want to follow Charlie. We would like to construct an enhanced social network that contains the actual follows relations plus all the suggested additional links. The links in such enhanced social network represent the transitive closure of the original follows relation, which relates any pair of people who are connected by a path in the network. The transitive closure of the follows relation can be computed using RDFox by defining the following two rules:
[?x, :followsClosure, ?y] : [?x, :follows, ?y] .
[?x, :followsClosure, ?z] :
[?x, :follows, ?y],
[?y, :followsClosure, ?z] .
The first rule “copies” the contents of the direct follows relation to the new relation. The second rule implements the closure by saying that if a person p1 directly follows p2 and p2 (directly or indirectly) follows person p3, then p1 (indirectly) follows p3.
If we now issue the SPARQL query
SELECT ?x ?y WHERE { ?x :followsClosure ?y }
we obtain the expected results.
:diana :charlie .
:alice :charlie .
:diana :bob .
:alice :bob .
:bob :charlie .
:diana :alice .
Finally, we may also be interested in computing the suggested links that were not already part of the original follows relation. This can be achieved, for instance, by issuing the SPARQL query
SELECT ?x ?y
WHERE {
?x :followsClosure ?y .
FILTER NOT EXISTS { ?x :follows ?y }
}
The results are the expected ones.
:diana :charlie .
:alice :charlie .
:diana :bob .
6.5.2. Composing Relations¶
An important practical use of knowledge graphs is to power Open Query Answering (Open QA) applications, where the user would pose a question in natural language, which is then automatically answered against the graph. Open QA systems often struggle to interpret questions that involve several “hops” in the graph. For instance, consider the graph consisting of the triples given next.
:douglas_adams :bornIn :uk .
:uk rdf:type :Country .
A user may ask the Open QA system for the country of birth of Douglas Adams. To obtain this information, the system would need to construct a query involving two hops in the graph. In particular, the SPARQL query
SELECT ?c
WHERE {
:douglas_adams :bornIn ?c .
?c rdf:type :Country .
}
would return :uk as answer.
The results of the open QA system would be greatly enhanced if the desired information had been available in just a single hop. RDFox rules can be used to provide a clean solution in this situation. In particular, we can use rules to define a new :countryOfBirth relation that provides a “shortcut” for directly accessing the desired information.
[?x, :countryOfBirth, ?y] : [?x, :bornIn, ?y], [?y, rdf:type, :Country] .
The rule says that, if a person p is born in a place c, and that place is a country, then c is the country of birth of p. As a result, RDFox would derive that the country of birth of Douglas Adams is the UK. The Open QA system would now only need to construct the following simpler query, which involves a single hop in the graph, to obtain the desired information.
SELECT ?x ?y WHERE { ?x :countryOfBirth ?y }
6.5.3. Representing SPARQL 1.1 Property Paths¶
As already mentioned, RDFox does not currently support SPARQL 1.1 property paths. It is, however, possible to encode property paths as rules.
Informally, a property path searches through the RDF graph for a sequence of IRIs that form a path conforming to an regular expression. For instance, the following query in our familiar social network example
SELECT ?x WHERE { ?x :follows+ :bob }
returns the set of people that follow :bob directly or indirectly in the
network. In this case, the property path (?x :follows+ :bob)
represents a
path of arbitrary length from any node to :bob via the :follows relation, where
the +
symbol is the familiar one in regular expressions indicating “one or
more occurrences”.
Property paths representing paths of arbitrary length are closely related to computing the transitive closure of a relation. In particular, the following rules would compute the set of “Bob followers” as those who follow :bob directly or indirectly.
[?x, rdf:type, :BobFollower] : [?x, :follows, :bob] .
[?x, rdf:type, :BobFollower] :
[?x, :follows, ?y],
[?y, rdf:type, :BobFollower] .
The simple query
SELECT ?x WHERE { ?x rdf:type :BobFollower }
gives us the same answers as the original query using property paths.
6.5.4. Defining a Query as a View¶
When querying a knowledge graph, we may be interested in materializing the result of a SPARQL query as a new relation in the graph. This can be the case, for instance, if the query is interesting on its own right, can be used to define new relations, or simplify the formulation of additional queries.
We can use an RDFox rule for this purpose, where the SPARQL query that we want to materialize in the graph is represented in the body of the rule and the answer as a new relation in the head.
For instance, consider again the previous example of a social network, where we were interested in suggesting new followers (recall the Transitive Closure usage pattern). Recall that we used a query
SELECT ?x ?y
WHERE {
?x :followsClosure ?y
FILTER NOT EXISTS { ?x :follows ?y }
}
to obtain the suggested links that were not already part of the original follows relation. We may be interested in storing this query as a separate relation in the graph. For this, we could rewrite the query as a rule defining a new :suggestFollows relation:
[?x, :suggestFollows, ?y] : [?x, :followsClosure, ?y], NOT [?x, :follows, ?y] .
The body of the rule represents the where
clause in the query. The filter
expression in the query is captured by the negated atom. Then, the simple query
SELECT ?x ?y WHERE { ?x :suggestFollows ?y }
will give us the expected answers
:diana :charlie .
:alice :charlie .
:diana :bob .
It is worth pointing out that only a subset of SPARQL 1.1 queries can be
transformed into an RDFox rule in the way described. In particular, all queries
involving basic graph patterns, filter expressions, negation (NOT EXISTS
,
MINUS
) and aggregation can be represented. In contrast, SPARQL queries with
more than two answer variables, or using OPTIONAL
or UNION
in the
WHERE
clause cannot be represented as rules.
6.5.5. Performing Calculations and Aggregating Data¶
RDFox rules can be used to perform computations over the data in a knowledge graph and store the results in a different relation. For instance, consider a graph with the following triples, specifying the height of different people in cm.
:alice :height "165"^^xsd:integer .
:bob :height "180"^^xsd:integer .
:diana :height "168"^^xsd:integer .
:emma :height "165"^^xsd:integer .
We would want to compute their height in feet, and record it in the graph by adding suitable triples over a new relation. For this, we can import the following RDFox rule.
[?x, :heightInFeet, ?y] : [?x, :height, ?h], BIND(?h*0.0328 AS ?y) .
The BIND
construct evaluates an expression and assigns the value of the
expression to a variable.
We can now query the graph for the newly introduced relation to obtain the list of people and their height in both centimeters and feet.
SELECT ?x ?m ?f
WHERE {
?x :height ?m .
?x :heightInFeet ?f .
}
and obtain the expected answers
:emma 165 5.412 .
:diana 168 5.5104 .
:bob 180 5.904 .
:alice 165 5.412 .
Rules can also be used to compute aggregated values (e.g., sums, counts, averages, etc) over the graph and store the results in a new relation.
:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
:charlie :follows :alice.
:emma :follows :bob .
:alice rdf:type :Person .
:bob rdf:type :Person .
:charlie rdf:type :Person .
:diana rdf:type :Person .
:emma rdf:type :Person .
The graph contains also information about people’s hobbies, as represented by the following triples.
:alice :likes :tennis .
:bob :likes :music .
:diana :likes :swimming .
:charlie :likes :football .
:emma :likes :reading .
:tennis rdf:type :Sport .
:swimming rdf:type :Sport .
:football rdf:type :Sport .
We would like to count, for each person, the number of followers who enjoy practicing a sport. RDFox provides aggregation constructs which enable these kinds of computations.
[?y, :sportyFollowerCnt, ?cnt] :
[?y, rdf:type, :Person],
AGGREGATE(
[?x, :follows, ?y],
[?x, :likes, ?w],
[?w, rdf:type, :Sport]
ON ?y
BIND COUNT(DISTINCT ?x) AS ?cnt) .
In particular, the rule states that, if p1 Is a person, then count all distinct
people who follow p1 and who like some sport, store the result in a count, and
store the result in the new :sportyFollowerCnt
relation.
By issuing the following SPARQL query
SELECT ?x ?cnt WHERE { ?x :sportyFollowerCnt ?cnt }
We obtain that Bob has one sporty follower (Alice), whereas Alice has 2 sporty followers (Diana and Charlie).
:bob 1 .
:alice 2 .
This type of computation is compatible with the computation of the transitive closure of a relation. For instance, we may be interested in counting the number of (direct or indirect) followers who are sporty. For this, we can use RDFox rules to compute the transitive closure of the follows relation:
[?x, :followsClosure, ?y] : [?x, :follows, ?y] .
[?x, :followsClosure, ?z] : [?x, :follows, ?y], [?y, :followsClosure, ?z] .
And use the following rule to compute the desired count.
[?y, :sportyFollowerClosureCnt, ?cnt] :
[?y, rdf:type, :Person],
AGGREGATE(
[?x, :followsClosure, ?y],
[?x, :likes, ?w],
[?w, rdf:type, :Sport]
ON ?y
BIND COUNT(DISTINCT ?x) AS ?cnt) .
The following SPARQL query
SELECT ?x ?cnt WHERE { ?x :sportyFollowerClosureCnt ?cnt }
Then provides the following results.
:charlie 3 .
:bob 3 .
:alice 3 .
We observe that the count for Charlie does not seem quite right. Charlie is followed directly only by Bob (who is not sporty); however, Bob is followed by Alice (a sporty person) and Alice is followed by Diana (another sporty person). Naturally, we would have obtained a count of 2; however, Charlie also follows Alice and hence he transitively follows himself, thus the count of 3!. If we wanted to prevent this situation, we can modify the second rule implementing transitive closure to eliminate selfloops as follows:
[?x, :followsClosure, ?z] :
[?x, :follows, ?y],
[?y, :followsClosure, ?z],
FILTER(?x != ?z) .
Now, our query before yields the expected results
:charlie 2 .
:bob 3 .
:alice 2 .
6.5.6. Arranging Concepts and Relations in a Hierarchical Structure¶
A common use of ontologies is to arrange concepts (called classes in OWL 2) and relations (called properties in OWL 2) in a subsumption hierarchy. For instance, we may want to say that dogs and cats are mammals and that mammals are animals. Such subsumption relationships can be easily represented using RDFox rules.
[?x, rdf:type, :Mammal] : [?x, rdf:type, :Dog] .
[?x, rdf:type, :Mammal] : [?x, rdf:type, :Cat] .
[?x, rdf:type, :Animal] : [?x, rdf:type, :Mammal] .
Suppose that we have a graph with the following triples:
:max rdf:type :Dog .
:coco rdf:type :Cat .
:teddy rdf:type :Mammal .
Then, RDFox will deduce that Max and Coco are both mammals and therefore also animals, and also that Teddy is an animal. In particular, the query
SELECT ?x WHERE { ?x rdf:type :Animal }
yields the expected results
:max .
:teddy .
:coco .
It is also often the case that concepts are “assigned” certain properties. For instance, mammals have children which are also mammals. This is known as a range restriction in the ontology jargon, and can be represented using the following RDFox rule
[?y, rdf:type, :Mammal] : [?x, rdf:type, :Mammal],[?x, :hasChild, ?y] .
If we now extend the graph with the following triples.
:max :hasChild :betsy .
:coco :hasChild :minnie .
RDFox will derive automatically that both Betsy and Minnie are also mammals (and therefore also animals). Indeed, the query
SELECT ?x WHERE { ?x rdf:type :Mammal }
Will yield the expected results.
:max .
:betsy .
:minnie .
:teddy .
:coco .
In many applications, it is also useful to represent subsumption relations
between the edges in a knowledge graph, to specify that one relation is more
specific than the other. For instance, we may want to say that the
:hasDaughter
relation is more specific than the :hasChild
relation.
This can be represented using the following RDFox rule.
[?x, :hasChild, ?y] : [?x, :hasDaughter, ?y] .
If we now add the following triple to the graph
:betsy :hasDaughter :luna .
RDFox can infer that Luna is the child of Betsy and therefore she is also a
mammal, and an animal. Indeed, the previous query listing all mammals will now
also include :luna
as an answer.
6.5.7. Detecting Cyclic Relations¶
A common task in knowledge graphs is to identify cyclic relationships. For instance, partonomy relations are typically acyclic (e.g., if an engine is part of a car we would not expect the car also to be part of the engine!). In these cases, cycle detection may be needed to detect errors in the graph and thus provide data validation.
A simple case of this pattern is when the relation we are checking for cyclicity is naturally transitive. Such is the case, for instance of the partOf relation. Consider the following graph:
:a :partOf :b .
:b :partOf :c .
:c :partOf :a .
The graph contains a cyclic path :a > :b > :c > :a. via the :partOf
relation. The relationship is naturally transitive and hence we can use the
corresponding pattern to define it as such.
[?x, :partOf, ?z] : [?x, :partOf, ?y], [?y, :partOf, ?z] .
The following SPARQL query now gives us which elements are part of others (directly or indirectly)
SELECT ?x ?y WHERE { ?x :partOf ?y }
Which gives us the following results
:a :a .
:c :c .
:b :b .
:a :c .
:b :a .
:c :b .
:c :a .
:b :c .
:a :b .
Cyclicity manifests itself by the presence of selfloops (e.g., :a
is
derived to be a part of itself ). Hence, it is possible to detect that the part
of relation is cyclic by issuing the following SPARQL query.
ASK { ?x :partOf ?x }
Where the result comes true since the partonomy relation does have a self loop.
Alternatively, we could have defined the following additional rule.
[:partOf, rdf:type, :CyclicRelation] : [?x, :partOf, ?x] .
Which tells us that if any object is determined to be a part of itself, then the partonomy relation is cyclic.
We can now issue the following SPARQL query, which retrieves the list of cyclic relations in the graph, which in this case consists of the relation :partOf.
SELECT ?x WHERE { ?x rdf:type :CyclicRelation }
6.5.8. Defining Attributes and Relationships as Mandatory¶
In knowledge graphs, data is typically incomplete.
For instance, suppose that the data in a knowledge graph has been obtained from a variety of sources. The graph has different types of information about people, such as their name, job title and so on. We notice that some people in the graph have a date of birth, whereas others do not. Because of the nature of our application, we would like to have the date of birth of each person represented in the graph, and would like to find out which people are missing this information; that is, we would like to make the presence of a date of birth value mandatory for every person in the graph. In relational databases this is typically solved by declaring an integrity constraint.
Consider the following graph.
:alice :dob "11/01/1987"^^xsd:string .
:alice rdf:type :Person .
:bob :dob "23/07/1980"^^xsd:string .
:bob rdf:type :Person .
:diana :height "168"^^xsd:integer .
:diana rdf:type :Person .
:emma :dob "10/02/1965"^^xsd:string .
:emma rdf:type :Person .
:max rdf:type :Dog .
We can use the following rule to record absence of a date of birth for people.
[?x, rdf:type, owl:Nothing] :
[?x, rdf:type, :Person],
NOT EXISTS ?y IN ([?x, :dob, ?y]) .
The rule says that if a person p lacks a date of birth d, then p incurs in a constraint violation. The constraint violation is recorded by making person p an instance of the special owl:Nothing unary relation, which is also present in the OWL 2 standard.
The following SPARQL query then correctly reports that Diana violates the constraint (whereas Max does not because he is a dog).
SELECT ?x WHERE { ?x rdf:type owl:Nothing }
This type of computation combines well with type inheritance. For instance, suppose that we add the following triple:
:charlie rdf:type :Student .
And the following rule stating that every student is a person
[?x, rdf:type, :Person] : [?x, rdf:type, :Student] .
Then, the previous query will give as results
:charlie .
:diana .
Indeed, since Charlie is a student, he is also a person; furthermore, Charlie lacks date of birth information.
The meaning of the special class owl:Nothing
is different in RDFox and the
OWL 2 standard. If one can derive from an OWL 2 ontology that that an object is
an instance of owl:Nothing, then the ontology is inconsistent and querying the
ontology becomes logically meaningless. Thus, the OWL 2 standard would require
users to modify the data and/or ontology to fix the inconsistency prior to
attempting to issue queries. Furthermore, it is worth noting that in OWL 2 it
is not possible to write statements that check for “absence of information”;
this is due to the monotonicity properties of OWL 2 as a fragment of
firstorder logic.
In contrast, in RDFox, deriving an instance of owl:Nothing does not lead to a logical inconsistency and the answers to queries remain perfectly meaningful. In the pattern we have described, querying for owl:Nothing simply provides users with the list of all nodes in the graph for which mandatory information is missing. As a result, the user is warned rather than prevented from carrying out a task such as issuing a query. For instance, if we were to ask a query to RDFox such as the following
SELECT ?x WHERE { ?x rdf:type :Person }
We would still obtain the expected results (see below) despite the fact that there are constraint violations in the data.
:alice .
:charlie .
:emma .
:diana .
:bob .
This behavior is also different from relational databases, where the system would typically reject updates that lead to a constraint violation. As already mentioned, RDFox continues to operate normally and would accept any updates although constraints are being violated. Of course, users are encouraged to query the system in order to detect and rectify such violations.
6.5.9. Expressing Defaults and Exceptions¶
Rules can be used to write default statements (that is, statements that normally hold in the absence of additional information). This is especially useful to represent exceptions to rules, which is important, for instance, in legal domains.
Consider the following graph saying that Tweety is a bird.
:tweety rdf:type :Bird .
Birds typically fly; that is, in the absence of additional information, the fact that Tweety is a bird constitutes sufficient evidence to believe that Tweety flies. There may, however, be exceptions. For instance, penguins are birds that cannot fly, and hence if we were to find out that Tweety is a penguin, then we would need to withdraw our default assumption that Tweety flies.
RDFox rules can be used to model this type of default reasoning. In particular, consider a rule saying that birds fly unless they are penguins.
[?x, rdf:type, :FlyingAnimal] :
[?x, rdf:type, :Bird],
NOT [?x, rdf:type, :Penguin] .
We can now issue a SPARQL query asking for the list of flying animals
SELECT ?x WHERE { ?x rdf:type :FlyingAnimal }
and obtain :tweety as an answer.
Suppose now that we were to extend the graph with the following triple
:tweety rdf:type :Penguin .
Then, the same query would now give us an empty set of answers since, in the light of the new evidence, we can no longer conclude that Tweety flies.
6.5.10. Restructuring Data¶
Rules can be used to transform the structure of the data in a knowledge graph (e.g., by adding properties to a relationship).
Consider the following knowledge graph representing employees and their employer.
:alice :worksFor :oxford_university .
:bob :worksFor :acme .
:charlie :worksFor :oxford_university .
:charlie :worksFor :acme .
Suppose that we now want to expand the graph by adding further information about the employment, such as the salary and the start date. This information is relative to each specific employment of an employee; for instance, Charlie will have a different salary and start date for his employment with Oxford University and his employment with Acme.
We can use RDFox rules to automatically restructure the data in the graph to
account for the new information. To this end, we use the builtin tuple table
rdfox:SKOLEM
, which allows us to associate a tuple of elements with a
unique blank node.
[?employment, rdf:type, :Employment],
[?employment, :hasEmployee, ?employee],
[?employment, :hasEmployer, ?employer] :
[?employee, :worksFor, ?employer],
rdfox:SKOLEM("Employment", ?employee, ?employer, ?employment) .
For each :worksFor
edge connecting a person ?employee
with their
employer ?employer
, the rule creates a blank node ?employment
, which is
unique for the tuple ("Employment", ?employee, ?employer)
. Furthermore, the
rule relates the blank node ?employment
to the entities ?employee
and
?employer
using the :hasEmployee
and :hasEmployer
properties,
respectively.
The query
SELECT ?employment ?property ?object WHERE {
?employment ?property ?object .
?employment rdf:type :Employment
}
gives us the new triples generated by the application of the previous rule
_:Employment_116_200 :hasEmployer :acme .
_:Employment_116_200 :hasEmployee :bob .
_:Employment_116_200 rdf:type :Employment .
_:Employment_113_199 :hasEmployer :oxford_university .
_:Employment_113_199 :hasEmployee :alice .
_:Employment_113_199 rdf:type :Employment .
_:Employment_156_199 :hasEmployer :oxford_university .
_:Employment_156_199 :hasEmployee :charlie .
_:Employment_156_199 rdf:type :Employment .
_:Employment_156_200 :hasEmployer :acme .
_:Employment_156_200 :hasEmployee :charlie .
_:Employment_156_200 rdf:type :Employment .
New data relative to an employment, such as associated salary and start date, can be inserted using rules like the following ones.
[?z, :hasSalary, "60000"^^xsd:integer] : rdfox:SKOLEM("Employment", :alice, :oxford_university, ?z) .
[?z, :hasSalary, "55000"^^xsd:integer] : rdfox:SKOLEM("Employment", :charlie, :oxford_university, ?z) .
[?z, :hasSalary, "40000"^^xsd:integer] : rdfox:SKOLEM("Employment", :charlie, :acme, ?z) .
[?z, :hasSalary, "45000"^^xsd:integer] : rdfox:SKOLEM("Employment", :bob, :acme, ?z) .
Note that each of these rules uses the rdfox:SKOLEM
builtin table in the
rule body to make sure that they match correctly to the generated triples
listed above.
To check that the salary data has been inserted correctly, we can issue the query
SELECT ?x (SUM(?y) AS ?income)
WHERE {
?e :hasEmployee ?x .
?e :hasSalary ?y
}
GROUP BY ?x
which gives us the total yearly income for each person by summing up the salary of each of their employments, giving the expected results.
:alice 60000 .
:charlie 95000 .
:bob 45000 .
Data restructuring via reification has multiple applications. In particular, RDF can only represent directly binary relations and hence the representation of higher arity relations is only possible through reification. Reification is also needed if we want to qualify or annotate edges in a graph (e.g., by adding weights, or dates, or other relevant properties).
6.5.11. Representing Ordered Relations¶
Many relations naturally imply some sort of order, and in such cases we are often interested in finding the first and last elements of such orders. For instance, consider the managerial structure of a company.
:alice :manages :bob .
:bob :manages :jeremy .
:bob :manages :emma .
:emma :manages :david .
:jeremy :manages :monica .
We would like to recognize which individuals in the company are “top level managers”. We can use a rule to define a top level manager as a person who manages someone and is not managed by anyone else.
[?x, rdf:type, :TopLevelManager] :
[?x, :manages, ?y],
NOT EXISTS ?z IN ([?z, :manages, ?x]) .
The query
SELECT ?x WHERE { ?x rdf:type :TopLevelManager }
asking for the list of top level managers gives as :alice
as the answer. We
can now use a rule to define “junior employees” as those who have a manager but
who themselves do not manage anyone else.
[?x, rdf:type, :JuniorEmployee] :
[?y, :manages, ?x],
NOT EXISTS ?z IN ([?x, :manages, ?z]) .
The query
SELECT ?x WHERE { ?x rdf:type :JuniorEmployee }
Gives us :monica
and :david
as answers.
Prominent examples of ordered relations where we may be interested in finding the top and bottom elements are partonomies (partwhole relations) and isa hierarchies.
6.5.12. Representing Equality Cliques¶
When integrating data from multiple sources using a knowledge graph, it is
usually the case that objects from different sources are identified to be the
same. In this setting, we want to be able to answer complex queries that span
across the different sources, and to easily identify the source where the
information came from. Additionally, we may not want to use the equality
predicate owl:sameAs
to identify the objects since our rule set may contain
rules involving aggregation and/or negationasfailure which cannot be used in
conjunction with equality.
For instance, assume that we are integrating sources s1, s2, and s3 containing information about music artists and records. Assume that we have determined (e.g., using entity resolution techniques or exploiting explicit links between the sources) that “John Doe” in s1 is the same as “J. H. Doe” in s2 and “The Blues King” in s3. We can represent these correspondences using a binary relation ost:same which we define as reflexive, symmetric, and transitive using RDFox rules as given next.
s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .
[?x, ost:same, ?x] : [?x, ost:same, ?y] .
[?y, ost:same, ?x] : [?x, ost:same, ?y] .
[?x, ost:same, ?z] : [?x, ost:same, ?y], [?y, ost:name, ?z] .
In these way, the aforementioned objects form a clique in the integrated graph. Indeed, the query
SELECT ?x ?y WHERE { ?x ost:same ?y }
returns the answer
s3:blues_king s2:john_H_doe .
s2:john_H_doe s3:blues_king .
s2:john_H_doe s2:john_H_doe .
s3:blues_king s3:blues_king .
s2:john_H_doe s1:john_doe .
s3:blues_king s1:john_doe .
s1:john_doe s1:john_doe .
s1:john_doe s3:blues_king .
s1:john_doe s2:john_H_doe .
In order to be able to query across artists from different sources, we want to define a unique representative for the elements in the clique. A plausible strategy is to first select the smallest individual according to some predefined total order (the order itself is irrelevant, and we can choose for example the order on IRIs provided by RDFox). To select the smallest object we introduce the following rules.
[?x, ost:comesBefore, ?y] : [?x, ost:same, ?y], FILTER (?x < ?y) .
[?y, rdf:type, ost:NotSmallestInClique] : [?x, ost:comesBefore, ?y] .
[?x, rdf:type, ost:SmallestInClique] :
[?x, ost:comesBefore, ?y],
NOT [?x, rdf:type, ost:NotSmallestInClique] .
The first rule generates an order amongst the elements of the clique. The second rule says that if ?x comes before ?y then ?y is not the smallest element. The third rule finally identifies the smallest element in the clique. The following query
SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }
reveals the generated order
s2:john_H_doe s3:blues_king .
s1:john_doe s2:john_H_doe .
s1:john_doe s3:blues_king .
where s1:john_doe is correctly identified as the smallest element by the query.
SELECT ?x WHERE { ?x rdf:type ost:SmallestInClique }
Now that we have identified an element of the clique we can create a
representative of the clique using the builtin table rdfox:SKOLEM
, as
given next.
[?z, rdf:type, ost:Artist],
[?z, ost:represents, ?x] :
[?x, rdf:type, ost:SmallestInClique],
rdfox:SKOLEM("OSTArtist", ?x, ?z) .
[?x, ost:represents, ?z] :
[?x, ost:represents, ?y],
[?y, ost:comesBefore, ?z] .
The first rule creates a blank node that is an ost:Artist
and that
represents the smallest element in the clique. The second rule ensures that the
new blank node also represents every other element in the clique.
The query
SELECT ?z ?x WHERE { ?z ost:represents ?x }
Yields the expected result.
_:OSTArtist_2136 s2:john_H_doe .
_:OSTArtist_2136 s3:blues_king .
_:OSTArtist_2136 s1:john_doe .
It is possible to achieve the same results by using an optimized set of rules that generates fewer triples. In particular, this optimized representation avoids axiomatizing the ost:same property as reflexive and symmetric. Let’s reconsider the data.
s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .
We now redefine directly the ost:comesBefore relation using the following rules
[?x, ost:comesBefore, ?y] :
[?x, ost:same, ?y],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?y] :
[?y, ost:same, ?x],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?z] :
[?x, ost:comesBefore, ?y],
[?y, ost:comesBefore, ?z],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?y] :
[?z, ost:comesBefore, ?x],
[?z, ost:comesBefore, ?y],
FILTER(?x > ?y) .
The query
SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }
reveals a generated order. Once we have the order, we proceed as before.
6.5.13. Populating a Knowledge Graph from a Data Source¶
Rules can be used to bring information from an external data source into a knowledge graph.
Data feeding a knowledge graph often stems from different types of external
data sources, such as relational databases. We can use RDFox rules to specify
how each record in the external data source corresponds to a set of nodes and
edges in the graph. RDFox allows us to load the information in an external data
source by means of a twostage process. The first step is to register the data
source. For instance, consider the following data about the employees of ACME
corporation in a CSV file named employee.csv
.
emp_id,emp_name,job_name,hire_date,salary
68319,KAYLING,PRESIDENT,,200000
66928,BLAZE,MANAGER,20170501,90000
67453,JONES,ASSISTANT,20180503,35000
We want to create a data source tuple table to allow access to the file with 5
arguments, one per column in the table, and with IRI :employee
. This can be
achieved using the following commands.
dsource register delimitedFile "EmployeeDS" \
file "$(dir.root)csv/employee.csv" \
header true
The net result is that the employee.csv file is registered as an RDFox data
source. We called the data source EmployeeDS. Here, file
specifies the path
to the file, and header
indicates whether the file contains a header row.
At this point, we can check whether the data source has been registered
successfully by running the command
dsource show EmployeeDS
to obtain the expected information
Data source type name: delimitedFile
Data source name: EmployeeDS
Parameters: file = employee.csv
header = true

Table name: employee.csv
Column 1: emp_id xsd:integer
Column 2: emp_name xsd:string
Column 3: job_name xsd:string
Column 4: hire_date xsd:string
Column 5: salary xsd:integer

The next step creates the data source tuple table.
tupletable create :employee \
"dataSourceName" "EmployeeDS" \
"columns" 5 \
"1" "https://oxfordsemantic.tech/RDFox/tutorial/{1}_{2}" \
"1.datatype" "iri" \
"2" "{emp_name}" \
"2.datatype" "string" \
"3" "{job_name}" \
"3.datatype" "string" \
"4" "{hire_date}" \
"4.datatype" "string" \
"4.ifempty" "absent" \
"5" "{salary}" \
"5.datatype" "integer" \
"5.ifempty" "absent"
The IRI of the new relation will be :employee
, where :
is the default
prefix defined beforehand as https://oxfordsemantic.tech/RDFox/tutorial/
.
The :employee
data relation will contain 5 arguments. The first argument
provides an identifier for each employee as a composition of the prefix’s IRI,
the employee ID (first column in the data source) and the employee name (second
column). The remaining arguments are obtained from the column of the
corresponding name in the data source. Since not every employee may have a
hiring date or a known salary, the conditions “ifempty” indicate that the
corresponding argument in the RDFox relation will be left empty.
Once the relation has been created in RDFox, it can be queried in SPARQL and
used in rule bodies. To query it in SPARQL, we use an RDFox extension to SPARQL
which uses the TT
syntax, where TT
stands for tuple table. The SPARQL
query:
SELECT ?x ?y ?z ?u ?w WHERE { TT :employee{ ?x ?y ?z ?u ?w } }
Will return the following answers:
:68319_KAYLING "KAYLING" "PRESIDENT" UNDEF 200000 .
:66928_BLAZE "BLAZE" "MANAGER" "01/05/2017" 90000 .
:67453_JONES "JONES" "ASSISTANT" "03/05/2018" 35000 .
As we can see, the UNDEF
entry represents that the value of the hiring date
for the first employee is missing. Now that we have the RDFox relation
correctly in place, the next step would be to turn the data in the relation in
the form of a graph. For this we can use the following rule, where the RDFox
relation forms the antecedent and the generated edges in the graph based on it
are described in the consequent of the rule:
[?x, rdf:type, :Employee],
[?x, :worksFor, :acme],
[?x, :hasName, ?y],
[?x, :hasJob, ?z],
[?x, :hiredOnDate, ?u],
[?x, :salary, ?w] :
:employee(?x, ?y, ?z, ?u, ?w) .
The materialization of the rule generates a graph from the data in the relation. The new relations in the graph can be used in other rules to define additional concepts and relations. For instance, we can add the rules stating that every employee is a person and every person with a salary higher than £50,000 pays tax at a higherrate.
[?x, rdf:type, :Person ] : [?x, rdf:type, :Employee ] .
[?x, :taxRate, :higherrate] : [?x, rdf:type, :Person], [?x, :salary, ?y], FILTER(?y > 50000) .
Now we can query the graph to obtain, for instance, the list of high income tax payers.
SELECT ?x WHERE { ?x :taxRate :higherrate }
And obtain the expected results.
:68319_KAYLING .
:66928_BLAZE .
Data can be imported from different data sources and merged together in the graph. For instance, if we had a different employee table (e.g., for a different department) in another CSV, we could register it as a new RDFox data source and exploit a rule akin to the one before to further populate the binary relations in the graph, as well as to create new ones.
6.6. OWL 2 Support in RDFox¶
This section describes the support in RDFox for OWL 2—the W3C standard language for representing ontologies.
6.6.1. OWL 2 Ontologies¶
An OWL 2 ontology is a formal description of a domain of interest. OWL 2 defines three different syntactic categories.
The first syntactic category are Entities, such as classes, properties
and individuals, which are identified by an IRI. Classes represent sets of
objects in the world; for instance, a class :Person
can be used to
represent the set of all people. Properties represent binary relations, and
OWL 2 distinguishes between two different types of properties: data
properties describe relationships between objects and literal values (e.g.,
the data property :age
can be used to represent a person’s age), whereas
object properties describe relationships between two objects (e.g., an object
property :locatedIn
can be used to relate places to their locations).
Finally, individuals in OWL 2 are used to refer to concrete objects in the
world; for instance, the individual :oxford
can be used to refer to the
city of Oxford.
The second syntactic category are expressions, which can be used to describe
complex classes and relations constructed in terms of simpler ones. For
instance the expression ObjectUnionOf( :Cat :Dog)
represents the set of
animals that are either cats or dogs.
The third syntactic category are axioms, which are statements about entities
and expressions that are asserted to be true in the domain described. For
instance, the OWL 2 axiom SubClassOf(:scientist :Person)
states that every
scientist is a person by defining the class :scientist
to be a subclass of
the class :Person
.
The main component of an OWL 2 ontology is a set of axioms. Ontologies can also import other ontologies and contain annotations.
OWL 2 ontologies can be written using different syntaxes. RDFox can currently load ontologies written in the functional syntax as well as ontologies written in the turtle syntax.
6.6.2. OWL 2 Ontologies vs. RDFox Rules¶
OWL 2 and the rule language of RDFox are languages for knowledge representation with wellunderstood formal semantics.
Both languages share a common core. That is, certain types of rules can be equivalently rewritten as OWL 2 axioms and viceversa. For instance, the following axiom and rule both express that every scientist is also a person.
SubClassOf(:Scientist :Person)
[?x, rdf:type, :Person] : [?x, rdf:type, :Scientist] .
In particular, the OWL 2 specification describes the OWL 2 RL profile—a subset of the OWL 2 language that is amenable to implementation via rulebased technologies.
There are, however, many other aspects where OWL 2 and the rule language of RDFox differ, and there are many constructs in OWL 2 that cannot be translated as RDFox rules and viceversa. For instance, OWL 2 can represent disjunctive knowledge, i.e., we can write an OWL 2 axiom saying that every student is either an undergraduate student, a graduate student, or a doctoral student:
SubClassOf(:Student ObjectUnionOf(:UndergraduateSt :MscSt :DoctoralSt) )
RDFox rules, however, do not support disjunction. There are also many kinds of rules in RDFox that cannot be expressed using OWL 2 axioms; these include, for instance, rules involving features such as aggregation, negationasfailure or certain builtin functions; furthermore, there are also plain Datalog rules that do not have a correspondence in OWL 2.
6.6.3. Loading OWL 2 Ontologies in RDFox¶
RDFox is able to load, store and manipulate three kinds of syntactic elements: triples, rules, and OWL 2 axioms. These are kept in separate “bags” in the system and can be added or deleted individually. For instance, consider the following text file “ontology.txt” containing an ontology written in the functional syntax of OWL 2:
Prefix(:=<http://www.example.com/ontology1#>)
Ontology( <http://www.example.com/ontology1>
SubClassOf( :Child :Person )
SubClassOf( :Person ObjectUnionOf(:Child :Adult) )
)
The ontology contains two axioms. The first axiom tells us that every child is also a person, whereas the second axiom states that every person is either a child or an adult. The first axiom can be faithfully translated into RDFox rules, whereas the second one cannot. RDFox provides a full API for OWL 2 and can parse, store and manage all kinds of OWL 2 axioms in functional syntax. As a result, it will correctly load both axioms, but will issue a warning indicating that the second axiom has no correspondence into rules.
To load the ontology in RDFox, we can initialize a data store (see the Getting Started guide) and import the the file in the usual way.
import ontology.txt
The ontology axioms are now loaded in the data store and kept internally in the “axioms bag”.
We can now import a turtle file containing the following triples:
:jen rdf:type :Child .
:jen :hasParent :mary .
These triples will be kept internally in the “triples bag”.
Finally, we can import the following RDFox rule saying that the parent of a child is a person.
[?y, rdf:type, :Person] : [?x, :hasParent, ?y], [?x, rdf:type, :Child] .
This rule is kept internally in RDFox in the separate “rules bag”.
Now, we are in a position to perform reasoning. For this we can issue a SPARQL query asking for the list of all people:
SELECT ?x WHERE { ?x rdf:type :Person }
To answer the query, RDFox will translate OWL 2 axioms into rules and will consider together all data triples, all RDFox rules added by the user, plus all rules stemming from the translation of OWL 2 axioms. In particular, the following rules and facts contribute to answering the query, where the first rule comes from the translation of the first ontology axiom as a rule (the second axiom in the ontology is ignored):
:jen rdf:type :Child .
:jen :hasParent :mary .
[?x, rdf:type, :Person] : [?x, rdf:type, :Child] .
[?y, rdf:type, :Person] : [?x, :hasParent, ?y], [?x, rdf:type, :Child] .
As a result, RDFox will return as answers both :jen
and :mary
. Indeed,
:jen
is a child and hence also a person by the first rule; in turn,
:mary
is the parent of :jen
and hence also a person by the second rule.
The translation of OWL 2 axioms into rules for the purpose of reasoning is performed on a besteffort basis. In particular, sometimes RDFox may not be able to translate the whole of given axiom, but may still be able to translate a part of it. For instance suppose that we add to our data store the following axiom saying that every person is a human and also either an adult or a child:
SubClassOf(:Person ObjectIntersectionOf(:Human ObjectUnionOf(:Child :Adult)))
RDFox will load the axiom correctly, but will again issue a warning due to the use of disjunction in the axiom. Suppose that we now issue the query
SELECT ?x WHERE { ?x rdf:type :Human }
RDFox will correctly return both :jen
and :mary
as answers. Indeed, as
already explained, RDFox can deduce that both :jen
and :mary
are
persons. Now, although the last axiom we imported cannot be fully translated
into rules, RDFox will still be able to partly translate it into the following
rule:
[?x, rdf:type, :Human] : [?x, rdf:type, :Person] .
from which we can deduce that :jen
and :mary
are also humans.
OWL 2 ontologies can also be loaded from a turtle file, following the standard representation of OWL 2 ontologies as triples. In order to load an ontology from a turtle file, we need to initialize a store with special parameters. Using the command line, we can initialize such a store as follows:
init parcomplexnn owlinrdfsupport relaxed
This command creates a store in which parsing of OWL as triples is enabled. As a result, RDFox will identify OWL 2 axioms that were encoded as RDF triples and will translate those axioms into rules as described earlier. Suppose that we import into the store a turtle file containing the following triples:
:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .
:jen rdf:type :Child .
The first two triples correspond to the serialization into triples of the following axioms in functional syntax:
SubClassOf( :Child :Person )
SubClassOf( :Person :Human )
As a result of parsing, all triples will be stored in the “triples bag” of RDFox, whereas the first two triples will also be added as axioms.
Now, assume that we issue a query asking for the list of all humans:
SELECT ?x WHERE { ?x rdf:type :Human }
Then, RDFox will correctly return :jen
as the answer. Internally, RDFox
will transform the OWL axioms into rules
[?x, rdf:type, :Person] : [?x, rdf:type, :Child] .
[?x, rdf:type, :Human] : [?x, rdf:type, :Person] .
and compute the corresponding materialization.
6.6.4. Subsumption Reasoning¶
OWL 2 reasoners implement a wide range of reasoning services, which are not limited to query answering. In particular, OWL reasoners can solve the subsumption problem: given a class, they would compute all its inferred superclasses.
For example, given
SubClassOf( :Child :Person )
SubClassOf( :Person :Human )
an OWL 2 reasoner would be able to infer
SubClassOf( :Child :Human )
as a consequence, since from the fact that every child is a person, and every person is a human, that every child is also a human.
RDFox is a materializationbased query answering system, and it has not been designed for solving problems such as class subsumption. RDFox, however, is still able to detect some such subsumption relations should this be required in an application.
One way to achieve this is to reduce subsumption to query answering. In
particular, to check whether it is true that every child is a human, we can
introduce a fresh object in the data store, which we make an instance of
:Child
. That is, we can import the following triple, where :a_child
is
a fresh URI.
:a_child rdf:type :Child .
Then, we would test whether :a_child
is inferred to be also a human by
issuing the query
ASK { :a_child rdf:type :Human }
which would return true.
Another way of testing subsumption is to import the ontology as a set of triples:
:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .
When the triples are parsed and eventually translated into rules for reasoning,
RDFox will also add a number of internal rules that partially encode the
semantics of the RDFS and OWL vocabularies; in particular, it will add rules
representing the relation rdfs:subClassOf
as transitive and reflexive, and
also saying that every class is a subclass of owl:Thing
. As a result, the
following SPARQL query
SELECT ?x WHERE { :Child rdfs:subClassOf ?x }
will correctly return all superclasses of :Child
as
:Person .
:Human .
owl:Thing .
:Child .
6.6.5. Current Limitations¶
The following details should be taken into account by users of RDFox who rely on OWL 2 ontologies in their applications:
RDFox currently does not support ontology importation. That is, if we load ontology O, which in turns imports O1 and O2, only the contents of O will be loaded (and not those of O1 and O2).
RDFox also does not support associating axioms to a given ontology. In particular, if we load two different ontology files, all the axioms in both ontologies will be added to the same bag of axioms in the system.
6.7. SWRL Support in RDFox¶
This section describes the support in RDFox for SWRL—a format for representing rules on the Semantic Web.
6.7.1. SWRL Rules¶
The SWRL specification extends the set of OWL axioms to include also Datalog rules. It thus enables rules to be combined with an OWL ontology. SWRL rules can be written using different syntaxes. RDFox can currently load SWRL rules written in the functional syntax as well as rules written in the turtle syntax. SWRL rules can be easily expressed as RDFox rules, with the only exception of rules containing certain builtins which do not have a direct correspondence to SPARQL 1.1 builtin functions.
6.7.2. Loading SWRL Rules in RDFox¶
RDFox treats SWRL as an extension of OWL 2 and hence SWRL rules are loaded and
managed in exactly the same way as OWL 2 axioms. For instance, consider the
following text file swrlrules.txt
containing an ontology written in the
functional syntax of SWRL:
Prefix(:=<http://www.example.com/ontology1#>)
Ontology( <http://www.example.com/ontology1>
Implies(Antecedent(:Student(Ivariable(:x1))) Consequent(:Person(Ivariable(:x1))))
)
The ontology consists of a rule stating that every student is a person.
To load the ontology in RDFox, we can initialize a data store (see the Getting Started guide) and import the the file in the usual way.
import swrlrules.txt
The SWRL rule is now loaded in the data store and kept internally in the “axioms bag”.
We can next import a turtle file containing the triple:
:jen rdf:type :Student .
Now, we are in a position to perform reasoning. For this we can issue a SPARQL query asking for the list of all people:
SELECT ?x WHERE { ?x rdf:type :Person }
To answer the query, RDFox will translate SWRL into RDFox rules, and will
return :jen
as a result.
SWRL can also be loaded from a turtle file, following the relevant syntax in the SWRL specification. For this, RDFox follows exactly the same approach as with OWL 2 axioms expressed as triples.
6.7.3. NegationAsFailure in SWRL¶
By default SWRL rules that feature ObjectComplementOf
are rejected by
RDFox, since negation in RDFox is interpreted under the closedworld
assumption, while negation in SWRL is interpreted under the openworld
assumption.
This behavior can be overridden by initializing a store with the option
swrlnegationasfailure
set to on
, as described in
Section 8.3.11.
Example: Consider, for example, the following SWRL rule with a suitably defined default prefix
Implies ( Antecedent ( :A(Ivariable(:x)) ObjectComplementOf(:B)(Ivariable(:x)) ) Consequent(:C(Ivariable(:x))) )If a store is initialized with the option
swrlnegationasfailure on
, RDFox will convert the above SWRL rule to the following RDFox rule:C[?x] : :A[?x], NOT :B[?x] .
The feature is limited to class expressions of the form
ObjectComplementOf(C)
, where C
is a class name. The usage of
ObjectComplementOf
in complex class expressions or in the consequent of a
SWRL rule is not supported.
This feature can be combined with the owlinrdfsupport
option when
importing SWRL rules encoded as RDF.
6.7.4. Current Limitations¶
SWRL comes with a large number of builtin functions, but unfortunately only a subset of them maps directly to SPARQL 1.1 builtin functions, which are the those natively supported in RDFox.
The list of builtin functions not supported is as follows:
swrlb:roundHalfToEven
, swrlb:normalizeSpace
, swrlb:translate
,
swrlb:anyURI
, swrlb:tokenize
, swrlb:yearMonthDuration
,
swrlb:dayTimeDuration
, swrlb:dateTime
, swrlb:date
, swrlb:time
,
swrlb:addYearMonthDurations
, swrlb:subtractYearMonthDurations
,
swrlb:multiplyYearMonthDuration
, swrlb:divideYearMonthDurations
,
swrlb:addDayTimeDurations
, swrlb:subtractDayTimeDurations
,
swrlb:multiplyDayTimeDurations
, swrlb:divideDayTimeDuration
,
swrlb:subtractDates
, swrlb:subtractTimes
,
swrlb:addYearMonthDurationToDateTime
,
swrlb:addDayTimeDurationToDateTime
,
swrlb:subtractYearMonthDurationFromDateTime
,
swrlb:subtractDayTimeDurationFromDateTime
,
swrlb:addYearMonthDurationToDate
, swrlb:addDayTimeDurationToDate
,
swrlb:subtractYearMonthDurationFromDate
,
swrlb:subtractDayTimeDurationFromDate
, swrlb:addDayTimeDurationToTime
,
swrlb:subtractDayTimeDurationFromTime
,
swrlb:subtractDateTimesYieldingYearMonthDuration
,
swrlb:subtractDateTimesYieldingDayTimeDuration
, swrlb:listConcat
,
swrlb:listIntersection
, swrlb:listSubtraction
, swrlb:member
,
swrlb:length
, swrlb:first
, swrlb:rest
, swrlb:sublist
, and
swrlb:empty
.
6.8. Explaining Reasoning Results¶
RDFox can display a proof of how a given triple has been derived. Such proofs can be very useful for explaining reasoning results to users as well as for understanding the reasoning process.
Consider a data store containing the triple
:kiki rdf:type :Cat .
and the following rules:
[?x, rdf:type, :Mammal] : [?x, rdf:type, :Cat] .
[?x, rdf:type, :Animal] : [?x, rdf:type, :Mammal] .
As a result of reasoning, RDFox will derive the following new triples:
:kiki rdf:type :Mammal .
:kiki rdf:type :Animal .
Suppose that we want to understand how triple :kiki rdf:type :Animal
has
been derived. A way to do this in RDFox is to use the explain
command in
the shell as follows:
explain :Animal[:kiki]
RDFox will explicate the reasoning process by displaying the following proof of the requested fact:
:Animal[:kiki]
:Animal[?x] : :Mammal[?x] .  { ?x > :kiki }
:Mammal[:kiki]
:Mammal[?x] : :Cat[?x] .  { ?x > :kiki }
:Cat[:kiki] EDB
We can read the proof bottomup. Starting from fact :Cat[:kiki]
in the
data, we apply rule :Mammal[?x] : :Cat[?x]
by matching variable ?x
to
:kiki
and derive the fact :Mammal[:kiki]
. The application of rule
:Animal[?x] : :Mammal[?x]
to fact :Mammal[:kiki]
where ?x
is
matched to :kiki
yields the desired result.
Typically, there will be several different proofs for a given fact. To see this, suppose that we add to our data store the triples
:kiki :eats :luxury_pet_treat .
:luxury_pet_treat rdf:type :PetFood .
and the rule
[?x, rdf:type, :Animal] : [?x, :eats, ?y], [?y, rdf:type, :PetFood] .
Then, in addition to the previous one, the following is also a proof that
:kiki
is an animal:
:Animal[:kiki]
:Animal[?x] : :eats[?x,?y], :PetFood[?y] .  { ?x > :kiki, ?y > :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Indeed, we can match rule :Animal[?x] : :eats[?x, ?y], :PetFood[?y]
to the
data facts :eats[:kiki, :luxury_pet_treat]
and
:PetFood[:luxury_pet_treat]
by matching variable ?x
to :kiki
and
variable ?y
to :luxury_pet_treat
to derive :Animal[:kiki]
.
If we run again the explanation command
explain :Animal[:kiki]
RDFox will display both proofs.
:Animal[:kiki]
:Animal[?x] : :Mammal[?x] .  { ?x > :kiki }
:Mammal[:kiki]
:Mammal[?x] : :Cat[?x] .  { ?x > :kiki }
:Cat[:kiki] EDB
:Animal[?x] : :eats[?x,?y], :PetFood[?y] .  { ?x > :kiki, ?y > :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Since the number of possible different proofs for a given fact may be very large, we may be content with just obtaining a single one. We can use the explain command to obtain a shortest proof as follows:
explain shortest :Animal[:kiki]
which will return the following proof
:Animal[:kiki]
:Animal[?x] : :eats[?x,?y], :PetFood[?y] .  { ?x > :kiki, ?y > :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Indeed, this is the shortest proof as it involves a single rule application, whereas the alternative proof involves two rule applications.
When using the explanation command, it is important to understand that rules in RDFox can come from different sources
User rules such as the ones in our previous example are rules introduced directly by the user.
User axioms are OWL 2 axioms imported by the user, which are internally translated into rules.
Special rules are rules that have no direct connection with the information provided by the user and are internally added by RDFox. An example of special rules are the rules for subsumption reasoning provided at the end of the previous section, and another example are the rules obtained by axiomatizing equality as a transitive, reflexive and symmetric relation.
Consider for example a data store where we import the following triple: :kiki
rdf:type :Cat .
and also the following OWL 2 axioms in functional syntax
SubClassOf( :Cat :Mammal )
SubClassOf( :Mammal :Animal )
If we now run the explain command
explain :Animal[:kiki]
we obtain the same proof as before:
:Animal[:kiki]
:Animal[?X] : :Mammal[?X] .  { ?X > :kiki }
:Mammal[:kiki]
:Mammal[?X] : :Cat[?X] .  { ?X > :kiki }
:Cat[:kiki] EDB
It is important to note, however, that the explicitly given OWL 2 axioms are not displayed in the proof, but rather the rules that are obtained from them internally.
6.9. Monitoring Reasoning in RDFox¶
This section gives an overview of the functionality implemented in RDFox for monitoring the progress of reasoning.
Let us start by creating a new data store in which reasoning will be performed in a singlethreaded fashion:
dstore create default parcomplexnn
threads 1
To enable monitoring of reasoning we use the following shell commands, where the second one establishes the frequency at which information is provided in the console.
set reason.monitor progress
set logfrequency 1
We can now import rules and data which, in our case, will come from the wellknown LUBM benchmark.
We first import the rules:
import LUBM_L.dlog
RDFox will then import the rules and display relevant information about the rule importation process:
Adding data in file './LUBM_L.dlog'.
[1]: START './LUBM_L.dlog'
[1]: FINISHED './LUBM_L.dlog'
Time since import start: 1 ms
Time since start of this import: 1 ms
Facts processed in this import: 0
Number of finished imports: 1
Total facts processed so far: 0
Import operation took 0.4 s.
Processed 98 rules, of which 98 were updated.``
In particular, we can see that 98 rules were imported in total and that rule importation took 0.4s.
We can now ask RDFox to print detailed information about the imported rules. For instance, the following command will provide statistics about the rule set and then will print each rule in a given order:
info rulestats printrules bybodysize
RDFox will first provide some statistics about the rule set
================================ RULES STATISTICS ================================
Component Body size Nonrecursive rules Recursive rules Total rules
0 2 0 1 1
1 1 19 2 21
1 2 1 0 1
2 1 19 2 21
3 1 13 0 13
4 1 28 8 36
4 3 0 5 5

Total: 80 18 98

RDFox organizes rules by components, which gives us an idea of how information flows during reasoning. To give some intuition as to what a component is, consider the following simple set of rules:
[?x, rdf:type, :B] : [?x, rdf:type, :A] .
[?x, rdf:type, :C] : [?x, rdf:type, :B] .
[?x, rdf:type, :D] : [?x, rdf:type, :B] .
[?x, rdf:type, :A] : [?x, rdf:type, :D] .
We can see that :B
depends on :A
since to derive facts about :B
we
need to first obtain facts about :A
. Similarly, :C
and :D
both
depend on :B
. Finally, :A
depends on :D
, and hence the first, third
and fourth rules are involved in a cycle of dependencies. As a result, the flow
of information during rule application can be seen in two stages: first, we
need to derive all facts about :A
, :B
and :D
using the first, third
and fourth rules. Then, we can derive all facts about :C
using the second
rule. To reflect this, RDFox will organize these rules into two components: the
first component will contain the first, third and fourth rules which together
are considered recursive (they are involved in a cycle of dependencies),
whereas the second rule will go in its own component and will be identified as
nonrecursive.
In the table above, we can see the same kind of information concerning the more complex LUBM rules. We can see that rules are arranged in 5 components (0..4), we can see the number of rules involved in dependency cycles (recursive rules) in each component, as well as the total number of rules and their maximal body size.
RDFox then will print the rules component by component on the console and
within each component it will arrange the rules sorted by number of atoms in
their bodies. In our simple example about :A
, :B
, :C
and :D
,
the information printed will look as follows:
 COMPONENT: 0
 NONRECURSIVE RULES: 0
 RECURSIVE RULES: 3
**********************************************************************************
** BODY SIZE: 1
** RECURSIVE RULES: 3
:B[?x] : :A[?x] .
:D[?x] : :B[?x] .
:A[?x] : :D[?x] .

 COMPONENT: 1
 NONRECURSIVE RULES: 1
 RECURSIVE RULES: 0
**********************************************************************************
** BODY SIZE: 1
** NONRECURSIVE RULES: 1
:C[?x] : :B[?x] .
==================================================================================
Now that we have imported the rules, we can import also the data:
import LUBMlarge.ttl
At this point, RDFox will load the data (without performing any reasoning yet) and will provide information about the progress of loading. We can see an excerpt of such information below:
> import LUBMlarge.ttl
Adding data in file './LUBMlarge.ttl'.
[1]: START './LUBMlarge.ttl'
[1]: PROGRESS './LUBMlarge.ttl'
Time since start of import: 1001 ms
Time since start of this import: 1001 ms
Facts processed in this import: 418000
[1]: PROGRESS './LUBMlarge.ttl'
Time since start of import: 2001 ms
Time since start of this import: 2001 ms
Facts processed in this import: 795000
[1]: PROGRESS './LUBMlarge.ttl'
Time since start of import: 3002 ms
Time since start of this import: 3002 ms
Facts processed in this import: 1164000
...
[1]: FINISHED './LUBMlarge.ttl'
Time since import start: 13143 ms
Time since start of this import: 13143 ms
Facts processed in this import: 5000000
Number of finished imports: 1
Total facts processed so far: 5000000
Import operation took 17.8 s.
Processed 5000000 facts, of which 5000000 were updated.
In particular, we can see how many data facts have been imported each second. We can also see that, in the end, 5,000,000 data triples were imported and that the import took 17.8s in total.
We can now compute the materialization of the LUBM rules and facts in the store
using the mat
command:
mat
RDFox will display information about the number of facts generated:
Materializing rules incrementally.
Rules will be processed by strata.
Maximum depth of backward chaining is unbounded.
Materialization time: 0 s.

Table  Facts  EDB  IDB

internal:triple  6,826,914 > 6,826,914  5,000,000 > 5,000,000  6,826,913 > 6,826,913

The column labeled EDB tells us the number of facts that were explicitly given in the data file. In turn the column labeled IDB indicates the total number of facts in the store after materialization; in our case, this means that the system has derived a total of over 1.8 million new facts through rule application. The Table column indicates the name of each tuple table in the store. In this case, we just have the default triple table, but in other cases we may also have other tuple tables such as those obtained from named graphs. Each different tuple table will have different numbers of explicit and derived facts. Finally, the column labeled facts indicates the total number of memory slots that were reserved by different threads during reasoning; this number can actually be larger that the total number of facts in the system as some of these slots may not have been used to store a fact.
6.10. Querying the Explicitly given Data¶
After reasoning, RDFox will by default answer all SPARQL queries with respect
to the obtained materialization. For instance, suppose that we have a data
store with fact :a rdf:type :A
and the following rules:
[?x, rdf:type, :B] : [?x, rdf:type, :A] .
[?x, rdf:type, :C] : [?x, rdf:type, :B] .
[?x, rdf:type, :D] : [?x, rdf:type, :B] .
[?x, rdf:type, :A] : [?x, rdf:type, :D] .
The materialization will contain the following facts, where three of them have
been derived and only fact :a rdf:type :A
was originally in the data:
:a rdf:type :A .
:a rdf:type :B .
:a rdf:type :C .
:a rdf:type :D .
If we issue a query
SELECT ?x WHERE { ?x rdf:type :D }
we will obtain :a
as a result.
In RDFox it is possible to query only the explicit data even after materialization has been performed. For this, we can use the shell command
set query.factdomain EDB
If we then issue the previous query again we will obtain the empty answer as a result.