5. Reasoning in RDFox¶
Reasoning in RDF is the ability to calculate the set of triples that logically follow from an RDF graph and a set of rules. Such logical consequences are materialized in RDFox as new triples in the graph.
The use of rules can significantly simplify the management of RDF data as well as provide a more complete set of answers to user queries. Consider, for instance, a graph containing the following triples:
:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .
The relation :locatedIn
is intuitively transitive: from the fact
that Oxford is located in Oxfordshire and Oxfordshire is located in
England, we can deduce that Oxford is located in England. The triple
:oxford :locatedIn :england
is, however, missing from the graph. As
a consequence, SPARQL queries asking for all English cities will not
return :oxford
as an answer.
We could, of course, add the missing triple by hand to the graph, in
which case :oxford
would now be returned as an answer to our
previous query. Doing so, however, has a number of important
disadvantages. First, there can be millions such missing triples and
each of them would need to be manually added, which is cumbersome and
error-prone; for instance, if we add to the graph the triple
:england :locatedIn :uk
, then the following additional triples
should also be added:
:oxford :locatedIn :uk .
:oxfordshire :locatedIn :uk .
More importantly, by manually adding missing triples we are not
capturing the transitive nature of the relation, which establishes a
causal link between different triples. Indeed, triple
:oxford :locatedIn :england
holds because triples
:oxford :locatedIn :oxfordshire
and
:oxfordshire :locatedIn :england
are part of the data. Assume that
we later find out that :oxford
is not located in :oxfordshire
,
but rather in the state of Mississippi in the US, and we delete from the
graph the triple :oxford :locatedIn :oxfordshire
as a result. Then,
the triples :oxford locatedIn :england
and
:oxford :locatedIn :uk
should also be retracted as they are no
longer justified. Such situations are very hard to handle manually.
As we will see next, we can use a rule to faithfully represent the transitive nature of the relation and handle all of the aforementioned challenges in an efficient and elegant way.
5.1. Rule Languages¶
A rule language for RDF determines which syntactic expressions are valid rules, and also provides well-defined meaning to each rule. In particular, given an arbitrary set of syntactically valid rules and an arbitrary RDF graph, the set of new triples that follow from the application of the rules to the graph must be unambiguously defined.
5.1.1. Datalog¶
Rule languages have been in use since the 1980s in the fields of data management and artificial intelligence. The basic rule language is called Datalog. It is a very well understood language, which constitutes the core of a plethora of subsequent rule formalisms equipped with a wide range of extensions. In this section, we describe Datalog in the context of RDF.
A Datalog rule can be seen as an IF … THEN
statement. In
particular, the following is a Datalog rule which faithfully represents
the transitive nature of the relation :locatedIn
.
[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
The IF
part of the rule is also called the body or antecedent;
the THEN
part of the rule is called the head or the consequent.
The head is written first and is separated from the body by the symbol
:-
. Both body and head consist of a conjunction of conditions, where
conjuncts are comma-separated and where each conjunct is a triple in
which variables may occur. Each conjunct in the body or the head is
called an atom. In our example, the body consists of atoms
[?x, :locatedIn, ?y]
and [?y, :locatedIn, ?z]
, whereas the
head consists of the single atom [?x, :locatedIn, ?z]
.
Each rule conveys the idea that, from certain combinations of triples in the input RDF graph, we can logically deduce that some other triples must also be part of the graph. In particular, variables in the rule range over all possible nodes in the RDF graph (RDF literals, URIs, blank nodes); whenever these variables are assigned values that make the rule body become subset of the graph, then we see what the value of those variables is, propagate these values to the head of the rule, and deduce that the resulting triples must also be a part of the graph.
In our example, a particular rule application binds variable ?x
to
:oxford
, variable ?y
to :oxfordshire
and variable ?z
to
:england
, which then implies that that triple
:oxford :locatedIn :england
obtained by replacing ?x
with
:oxford
and ?z
with :england
in the head of the rule holds
as a logical consequence. A different rule application would bind ?x
to :oxfordshire
, ?y
to :england
, and ?z
to :uk
; as a
result, the triple :oxfordshire :locatedIn :uk
can also be derived
as a logical consequence.
An alternative way to understand the meaning of a single Datalog rule
application to an RDF graph is to look at it as the execution of an
INSERT
statement in SPARQL, which adds a set of triples to the
graph. In particular, the statement
INSERT { ?x :locatedIn ?z } WHERE { ?x :locatedIn ?y. ?y :locatedIn ?z }
corresponding to our example rule leads to the insertion of triples
:oxford :locatedIn :england .
:oxfordshire :locatedIn :uk .
There is, however, a fundamental difference that makes rules more
powerful than simple INSERT
statements in SPARQL, namely that rules
are applied recursively . Indeed, after we have derived that Oxford
is located in England, we can apply the rule again by matching ?x
to
:oxford
, ?y
to :england
, and ?z
to :uk
, to derive
:oxford :locatedIn :uk
—a triple that is not obtained as a result of
the INSERT
statement above.
In this way, the logical consequences of a set of Datalog rules on an input graph are captured by the recursive application of the rules until no new information can be added to the graph. It is important to notice that the set of new triples obtained is completely independent from the order in which rule applications are performed as well as of the order in which different elements of rule bodies are given. In particular, the following two rules are equivalent:
[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
[?x, :locatedIn, ?z] :- [?y, :locatedIn, ?z], [?x, :locatedIn, ?y] .
5.1.2. Extensions of Datalog¶
A wide range of extensions of Datalog have been proposed and studied in the literature. In this subsection we describe the extensions of Datalog implemented in RDFox as well as the restrictions on them that have been put in place in order to ensure that the resulting language is semantically well-defined. Later on in this section we will provide many more examples of rules equipped with these extended features.
5.1.2.1. Negation-as-failure¶
Negation-as-failure allows us to make deductions based on information that is not present in the graph. For instance, using negation-as-failure we can write a rule saying that someone who works for a company but is not an employee of the company is an external contractor.
[?x, :contractorFor, ?y] :- [?x, :worksFor, ?y], NOT [?x, :employeeOf, ?y] .
Here, NOT
represents a negation of a body atom.
Let us consider the logical consequences of this rule when applied to the graph
:mary :worksFor :acme .
:mary :employeeOf :acme .
:bob :worksFor :acme .
On the one hand, we have that :mary
works for :acme
, and hence
we can satisfy the first atom in the body by assigning :mary
to
?x
and :acme
to ?y
; however, :mary
is also an employee
of :acme
, and hence the second condition is not satisfied, which
means that we cannot derive that :mary
is a contractor. On the other
hand, we also have that :bob
works for :acme
and hence once
again we can satisfy the first atom in the body, this time by assigning
:bob
to ?x
and :acme
to ?y
; but now, we do not have a
triple in the graph stating that :bob
is an employee of :acme
and hence we can satisfy the second condition in the body and derive the
triple :bob :contractorFor :acme
.
Indeed, the query
SELECT ?x ?y WHERE { ?x :contractorFor ?y }
yields the expected result
:bob :acme .
Note that negation typically means “absence of information”; indeed, we
do not know for sure whether :bob
is not an employee of :acme
;
we only know that this information is not available in the graph
(neither explicitly, nor as a consequence of other rule applications).
Negation-as-failure is intrinsically non-monotonic. In logic, this
means that new information may invalidate previous deductions. For
instance, suppose that :bob
becomes an employee of :acme
and, to
reflect this, we add to our data graph the triple
:bob :employeeOf :acme
. Then, we can no longer infer that :bob
is a contractor for :acme
and the previous query will now return an
empty answer. In contrast, rules in plain Datalog are monotonic: adding
new triples to the graph cannot invalidate any consequences that we may
have previously drawn; for instance, by adding a triple
:england locatedIn :uk
to the example in our previous section,
cannot invalidate a previous inference such as
:oxford locatedIn :england
.
5.1.2.2. Aggregation¶
Aggregation is an important feature in query languages such as SQL or SPARQL. It allows one to compute numeric values (such as minimums, maximums, sums, counts or averages) on groups of solutions satisfying certain conditions (e.g., compute an average salary over the group of people working in the accounting department).
In RDFox, it is possible to define relations based on the result of aggregate calculations. For instance, consider the following data.
:bob :worksFor :accounting .
:bob :salary "50000"^^xsd:integer .
:mary :worksFor :hr .
:mary :salary "47000"^^xsd:integer .
:jen :worksFor :accounting .
:jen :salary "60000"^^xsd:integer .
:accounting rdf:type :Department .
:hr rdf:type :Department .
We can write an RDFox rule that computes the average salary of each department, and store the result in a newly introduced relation:
[?d, :deptAvgSalary, ?z] :-
[?d, rdf:type, :Department],
AGGREGATE(
[?x, :worksFor, ?d],
[?x, :salary, ?s]
ON ?d
BIND AVG(?s) AS ?z) .
Here, each group consists of a department with salaried employees, and
for each group the rule computes an average of the salaries involved. In
particular, suppose that we satisfy the first atom in the body by
assigning value :accounting
to variable ?d
; then, we can satisfy
the aggregate atom by grouping all employees working for
:accounting
(i.e., :bob
and :jen
), compute their average
salary (55k) and assigning the resulting value to variable ?z
; as a
result, we can propagate the assignment of ?d
to :accounting and of
?z
to 55,000 to the head and derive the triple
:accounting :deptAvgSalary "55000"^^xsd:integer .
The query
SELECT ?d ?s WHERE { ?d rdf:type :Department . ?d :deptAvgSalary ?s }
then returns the expected answers
:accounting 55000.0 .
:hr 47000.0 .
Similarly to negation, aggregation is also a non-monotonic extension of Datalog. In particular, if we were to add a new employee to the accounting department with a salary of 52k, then we would need to withdraw our previous inference that the average accounting salary equals 55k and adjust the average accordingly.
5.1.2.3. Built-in Functions¶
Datalog can be extended with a wide range of built-in functions. These
include the functions defined in the SPARQL specification (e.g.,
arithmetic operations, string concatenation, and so on), as well as
function symbols in predicate first-order logic via the special function
SKOLEM
.
Let us start by introducing an example using the SKOLEM
function,
which can be used to capture function symbols in first-order predicate
logic. Function symbols can be used to create objects that must exist in
the world, but whose identity is unknown to us. As we will see later on,
this is useful for representing relations of arity higher than two as
well as for data integration and data restructuring.
Consider the following rules, where the second one uses the SKOLEM
function:
[?y, rdf:type, :Person] :-
[?x, :marriedTo, ?y],
[?x, rdf:type, :Person] .
[?x, :hasMother, ?y] :-
[?x, rdf:type, :Person],
BIND(SKOLEM("motherOf", ?x) AS ?y) .
The first rule is a simple Datalog rule stating that everyone married to
a person is also a person. The second rule generates, for every person,
a new object in the graph representing the person’s mother. To
understand the meaning of the second rule, consider its application to a
triple :mary rdf:type :Person
. Here, we can bind ?x
to :mary
because :mary
is a person; now, the application of SKOLEM
to
:mary
generates a new object which represents the mother of
:mary
, and this new object is assigned as the value of variable
?y
and propagated to the head of the rule. As a result, we derive a
triple relating :mary
to her mother via the :hasMother
relation.
Let us for now reconsider a variant of our “family” example data from the Getting Started guide, which contains the following triples in Turtle format:
:peter :forename "Peter" ;
a :Person ;
:marriedTo :lois ;
:gender "male" .
:lois :forename "Lois" ;
:gender "female" .
:brian :forename "Brian" . # Brian is a dog
And let us import our previous two rules. The following query asking for people having a mother
SELECT ?x WHERE { ?x rdf:type :Person. ?x :hasMother ?y }
returns :lois
and :peter
as answers. Indeed, :peter
is a
:Person
according to the data, and hence by the second rule before
he must have a mother. In turn, :lois
is married to :peter
, and
hence by the first rule she must be a :Person
, and by the second
rule :lois
must also have a mother.
Let us consider another example of a built-in function, namely string concatenation. The following rule computes the full name of a person as the concatenation of their first name and their family name.
[?x, :fullName, ?n] :-
[?x, :firstName, ?y],
[?x, :lastName, ?z],
BIND(CONCAT(?y, ?z) AS ?n) .
Consider the application of this rule to the graph consisting of the following triples:
:peter :firstName "Peter" .
:peter :lastName "Griffin" .
Then, the query
SELECT ?x ?y WHERE { ?x :fullName ?y }
would return the expected answer
:peter "PeterGriffin" .
An important consequence of introducing built-in functions is that rules are now capable of deriving triples mentioning new objects which did not occur in the input data (such as the mothers of Peter and Lois in our first example and “PeterGriffin” in our second example). This is not possible using plain Datalog rules, where the application of a rule may generate new triples, but these triples can only mention objects that were present in the input data.
If users are not careful, they may write rules using built-in functions that generate infinitely many new constants and hence there may be infinitely many triples that logically follow from the rules and a (finite) input graph.
For instance, consider our previous example rules
[?y, rdf:type, :Person] :-
[?x, :marriedTo, ?y],
[?x, rdf:type, :Person] .
[?x, :hasMother, ?y] :-
[?x, rdf:type, :Person],
BIND(SKOLEM("motherOf", ?x) AS ?y) .
Suppose that we add another rule saying that the mother of a person must also be a person:
[?y, rdf:type, :Person] :- [?x, :hasMother, ?y] .
If we apply these rules to the input graph consisting of
:peter rdf:type :Person .
we will derive an infinite “chain” of triples, where the first one
relates :peter
with his mother, the second one relates peter’s
mother to his grand-mother, and so on.
In such cases, RDFox will run out of resources trying to compute infinitely many new triples and will therefore not terminate. This is not due to a limitation of RDFox as a system, but rather to the well-known fact that Datalog becomes undecidable once extended with built-in functions that can introduce arbitrarily many fresh objects.
5.1.2.4. Equality¶
Equality is a special binary predicate that can be used to identify
different resources as representing the same real-world object. The
equality predicate is referred to as owl:sameAs
in the standard W3C
languages for the Semantic Web. In addition to equality, W3C standard
languages also define an inequality predicate, which is referred to as
owl:differentFrom
.
By default, two resources with different names are not assumed to be
actually different. For instance, resources called :marie_curie
and
:marie_sklodowsca
may refer to the same object in the world (the
renowned scientist Marie Curie). In logic terms we typically say that by
default we are not making the unique name assumption (UNA). In some
applications, however, it makes sense to make such assumption, and the
effect of making the UNA is that we will have implicit
owl:differentFrom
statements between all pairs of resources
mentioned in the data.
In RDFox we can enable the use of equality by initializing a store accordingly. For instance, using the shell, we can initialize a data store with equality reasoning turned on using the shell command
init seq equality noUNA
initializes a data store with equality reasoning and no UNA.
Extensions of Datalog with equality allow for the equality and
inequality predicates to appear in rules and data. For instance,
consider the following triples, where the second triple represents the
fact that the URIs :marie_curie
and :marie_sklodowsca
refer to
the same person.
:marie_curie rdf:type :Scientist .
:marie_curie owl:sameAs :marie_sklodowsca .
A query asking RDFox for all scientists
SELECT ?x WHERE { ?x rdf:type :Scientist }
will return both :marie_curie
and :marie_sklodowsca
as a result.
Equality and inequality can also be used in rules. For instance, the following rule establishes that a person can only have one biological mother
[?y, owl:sameAs, ?z] :- [?x, :hasMother, ?y], [?x, :hasMother, ?z] .
The application of this rule to the graph
:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :marie_sklodowsca .
identifies :marie_curie
and :marie_sklodowsca
:as the same
person.
The joint use of equality and inequality can lead to logical contradictions. For instance, the application of the previous rule to a graph consisting of the following triples would lead to a contradiction:
:irene_curie :hasMother :marie_curie .
:irene_curie :hasMother :eve_curie .
:marie_curie owl:differentFrom :eve_curie .
Indeed, the application of the rule derives
:marie_curie owl:sameAs :eve_curie
, which is in contradiction with
the data triple :marie_curie owl:differentFrom :eve_curie
. Such
contradictions can be identified in RDFox by querying for the instances
of the special owl:Nothing
predicate, which is also borrowed from
the W3C standard OWL. The query
SELECT ?x WHERE { ?x rdf:type owl:Nothing }
returns :marie_curie
and :eve_curie
as answers. This can be
interpreted by the user as: “resources :marie_curie and :irene_curie
are involved in a logical contradiction”.
5.1.2.5. Named Graphs and N-ary Relations¶
In all our previous examples, all atoms in rules are evaluated against
the default RDF graph. RDFox also supports named graphs, which can
be created either implicitly, by importing an RDF dataset encoded as TriG
or N-Quads, or explicitly, as shown in the following example that creates
the named graph :Payroll
.
tupletable add :Payroll type triples
Named graphs can also be used in the body and the head of rules, and hence it is possible to derive new triples as the result of rule application and add them to graphs other than the default graph. Rules can refer only to named graphs already created using one of the ways described above.
For instance, consider the following rule:
:Payroll(?id, :monthlyPayment, ?m) :-
[?id, rdf:type, :Employee],
:HR(?id, :yearlySalary, ?s),
BIND(?s / 12 AS ?m) .
This rule joins information from the default graph and the named graph
called HR
, and it inserts consequences into the named graph called
:Payroll
. Specifically, The first body atom of the rule identifies
IDs of employees in the default RDF graph. The second body atom is a
general atom: it is evaluated in the named graph called :HR
, and it
matches triples that connect IDs with their yearly salaries. The head of
the rule contains a general atom that refers to the named graph called
:Payroll
, and it derives triples that connect IDs of employees with
their respective monthly payments. In particular, given as data
:HR(:a, :yearlySalary, "55000"^^xsd:integer) .
:a rdf:type :Employee .
the rule will compute the monthly payment for employee :a
. Then, the
query
SELECT ?s ?p ?o WHERE { GRAPH :Payroll{ ?s ?p ?o } }
will correctly return the monthly payment for employee :a
:a :monthlyPayment 4583.333333333333333 .
In addition to referring to graphs other than the default graph, RDFox can also directly represent external data as tuples of arbitrary arity (not just triples) using the same syntax as named graphs. Atoms representing such data, however, are only allowed to be used in the body of rules. Details on how to access external data from RDFox are given in Section 6.6.
5.2. Materialization-based Reasoning¶
The main computational problem solved by RDFox is that of answering a SPARQL 1.1 query with respect to an RDF graph and a set of rules.
To solve this problem, RDFox uses materialization-based reasoning to precompute and store all triples that logically follow from the input graph and rules in a query-independent way. Both the process of extending the input graph with such newly derived triples and its final output are commonly called materialization. After such preprocessing, queries can be answered directly over the materialization, which is usually very efficient since the rules do not need to be considered any further. Materializations can be large, but they can usually be stored and handled on modern hardware as the available memory is continually increasing.
The main challenge of this approach to query answering is that, whenever data triples and/or rules are added and/or deleted, the “old” materialization must be replaced with the “new” materialization that contains all triples that follow from the updated input. In this setting, deletion of triples is restricted to those that are explicit in the input graph and hence one does not consider deletion of derived triples—a complex problem known in the literature as belief revision or view update.
For instance, given as input the RDF graph
:oxford :locatedIn :oxfordshire .
:oxfordshire :locatedIn :england .
:england :locatedIn :uk .
and the familiar rule
[?x, :locatedIn, ?z] :- [?x, :locatedIn, ?y], [?y, :locatedIn, ?z] .
RDFox will compute the corresponding materialization, which consists of triples
:oxford :locatedIn :oxfordshire .
:oxford :locatedIn :england .
:oxford :locatedIn :uk .
:oxfordshire :locatedIn :england .
:oxfordshire :locatedIn :uk .
:england :locatedIn :uk .
RDFox will now handle each SPARQL 1.1 query issued against the input graph and rule by simply evaluating the query directly over the materialization, thus avoiding the expensive reasoning at query time.
An update could delete a triple explicitly given in the input graph such as the triple :oxfordshire :locatedIn :england, in which case the new materialization consists only of triples
:oxford :locatedIn :oxfordshire .
:england :locatedIn :uk .
since the rule is no longer applicable after deletion. In contrast,
deleting a derived triple such as :oxford :locatedIn :uk .
is not
allowed since this triple was not part of the original input.
RDFox implements sophisticated algorithms for both efficiently computing materializations and maintaining them under addition/deletion updates that may affect both the data and the rules. All these algorithms were developed after years of research at Oxford and have been extensively documented in the scientific literature.
5.3. Restrictions on Rule Sets¶
The rule language of RDFox imposes certain restrictions on the structure of rule sets. These restrictions ensure that the materialization of a set of rules and an RDF graph is well-defined and unique.
In particular, the semantics (i.e., the logical meaning) of rule sets involving negation-as-failure and/or aggregation is not straightforward, and numerous proposals exist in the scientific literature. There is, however, a general consensus for rule sets in which the use of negation-as-failure and aggregation are stratified. Informally, stratification conditions ensure that there are no cyclic dependencies in the rule set involving negation or aggregation.
Several variants of stratification have been proposed, where some of them capture a wider range of rule sets than others; they all, however, provide similar guarantees. We next describe the stratification conditions adopted in RDFox by means of examples. For this, let us consider the following rules mentioning negation-as-failure:
[?x, :contractorFor, ?y] :-
[?x, :worksFor, ?y],
NOT [?x, :employeeOf, ?y] .
[?x, :employeeOf, :acme] :- [?x, :worksFor, :acme] .
The first rule says that people working for a company who are not
employees of that company act as contractors. The rule establishes two
dependencies. The first dependency tells us that the presence of a
triple having :worksFor
in the middle position may contribute to
triggering the derivation of a triple having contractorFor
in the
middle position. In turn, the second dependency tells us that the
absence of a triple having :employeeOf
in the middle position may
also contribute to the derivation of a triple having contractorFor
in the middle position.
The second rule tells us that everyone working for :acme
is an
employee of :acme
. This rule establishes one dependency, namely the
presence of a triple having :worksFor
in the middle position and
:acme
in the rightmost position may trigger the derivation of a
triple having :employeeOf
in the middle position and :acme
in
the rightmost position.
We can keep track of such dependencies by means of a dependency graph.
The nodes of the graph are obtained by replacing variables in individual
triple patterns occurring in the rules with the special symbol ANY
,
which intuitively indicates that the position of the triple where it
occurs can adopt any constant value, and leaving constants as they are.
In particular, our example rules yield a graph having the following five
vertices v1—v5:
v1: ANY :contractorFor ANY
v2: ANY :worksFor ANY
v3: ANY :employeeOf ANY
v4: ANY :worksFor :acme
v5: ANY :employeeOf :acme
The (directed) edges of the graph lead from vertices corresponding to
body atoms to vertices corresponding to head atoms and can be either “regular” or “special”.
Special edges witness the presence of a dependency involving aggregation
or negation-as-failure; in our case, we will have a single special edge
(v3, v1). In turn, each dependency that is not via
negation-as-failure/aggregation generates a regular edge; in our case,
we will have regular edges (v2,v1) and (v4, v5). Finally, the graph will
also contain bidirectional regular edges between nodes that unify in the
sense of first-order logic: since [?x, :employeeOf, ?y]
and
[?x, :employeeOf, :acme]
unify, we will have regular edges (v3,v5)
and (v5, v3); similarly, we will also have regular edges (v2,v4) and
(v4,v2).
Our two example rules are stratified and hence are accepted by RDFox; this is because there is no cycle in the dependency graph involving a special edge (indeed, all cycles involve regular edges only).
Now suppose that the add the following rule:
[?x, :employeeOf, ?y] :-
[?x, :worksFor, ?y],
NOT [?x, :contractorFor, ?y] .
which says that people working for a company who are not contractors for the company must be employees of the company. The addition of this rule does not change the set of nodes in the dependency graph; however, it adds two more edges: a regular edge (v2, v3) and a special edge (v1, v3). As a result, we now have a cycle involving a special edge and the rule set is no longer stratified, which means that the rule set will be rejected by RDFox as a result.
Due to stratification conditions, the use of the special equality
relation owl:sameAs
in rules precludes the use of aggregation or
negation-as-failure. Consider the following rule set, where the second
rule tells us that a person cannot be an employee of two different
companies:
[?x, :contractorFor, ?y] :-
[?x, :worksFor, ?y],
NOT [?x, :employeeOf, ?y] .
[?y, owl:sameAs, ?z] :-
[?x, :employeeOf, ?y],
[?x, :employeeOf, ?z] .
This rule set will be rejected by RDFox as the rule set mentions both
NOT
and owl:sameAs
. Informally, this is because equality can
affect every single relation, which precludes stratification in most
cases.
In addition to stratification conditions, RDFox also requires certain restrictions to the structure of rules which make sure that each rule can be evaluated by binding the variables in the body of the rule to a data graph. To see an example where things go wrong consider the rule:
[?x, :worksFor, ?y] :- [?y, rdf:type, :Department] .
The rule cannot be evaluated by first matching the body to the data
graph and then propagating the variable bindings to the head; indeed,
rule body to an RDF graph will always leave variable ?x
of the rule
unbound and hence the triple that must be added as a result of applying
the rule to the data is undefined. As a result, this rule will be
rejected by RDFox.
Binding restrictions in RDFox are rather involved given that the underpinning rule language is rich and there are many subtle corner cases. However, rules accepted by the parser can always be unambiguously evaluated.
5.4. The Rule Language of RDFox¶
This section formally specifies the syntax of rules in RDFox. As already mentioned, the rule language supported by RDFox extends Datalog with stratified negation, stratified aggregation, built-in functions, and more, so as to provide additional data analysis capabilities.
A rule has the form
H1 ,… , Hj :- L1 ,… , Lk .
where the formula to the left of the :-
operator is the rule head
and the formula to the right is the rule body. Informally, a rule
says “if L1, …, and Lk all hold, then H1,
…, and Hj hold as well”. Each Hi with 1 ≤ i ≤ j is
an atom, and each Li with 1 ≤ i ≤ k is a literal. A
literal is an atom, a negation, a bind literal, a filter
literal, or an aggregate literal.
5.4.1. Atom¶
An atom is either a default graph RDF atom or a general atom. General atoms can be used to access data in named graphs and mounted data sources.
5.4.1.1. Default Graph RDF Atom¶
A default graph RDF atom has the form [t1, t2, t3]
where ti
is a term, which is either an RDF resource or a
variable. To distinguish between these two kinds of terms, RDFox
requires variables to start with the ?
symbol. Also note that when
t2
is an IRI, atom [t1,t2,t3]
can be written alternatively as t2[t1,t3]
moreover, when t2
is the special IRI “rdf:type” and t3 is also an
IRI, atom [t1,t2,t3]
can be written alternatively
as t3[t1].
Example A simple rule with default graph RDF atoms only
a1:Person[?x] :- a1:teacherOf[?x, ?y] .
As we discussed earlier, this is equivalent to:
[?x, rdf:type, a1:Person] :- [?x, a1:teacherOf, ?y] .
The above rule has only one atom in the rule body and one atom in the rule head. Informally, the rule says that if x is a teacher of y, then x must be a person. Both the body and the head are matched in the default RDF graph.
5.4.1.2. General Atom¶
A general atom has the form A(t1, …, tn)
with n ≥ 1
where A
is an IRI denoting the name of a tuple table and
t1, …, tn
are terms. Each named RDF graph is
represented in RDFox as a tuple table; thus, general atoms can be used
to refer to data in named graphs.
Example A rule with both RDF and general atoms
[?id, fg:firstName, ?fn],
[?id, fg:lastName, ?ln] :-
fg:Person(?id, ?fn, ?ln) .
The general atom in the rule body refers to a tuple table containing three columns. The same rule can be written alternatively as the following.
fg:firstName[?id, ?fn],
fg:lastName[?id, ?ln] :-
fg:Person(?id, ?fn, ?ln) .
5.4.2. Negation¶
Negation is useful when the user wants to require that certain
conditions are not satisfied. A negation has one of the following forms,
where k ≥ 2, j ≥ 1, B1, …, Bk
are atoms, and
?V1, …, ?Vj
are variables.
NOT B1
NOT(B1, …, Bk)
NOT EXIST ?V1, …, ?Vj IN B1
NOT EXIST ?V1, …, ?Vj IN (B1, …, Bk)
NOT EXISTS ?V1, …, ?Vj IN B1
NOT EXISTS ?V1, …, ?Vj IN (B1, …, Bk)
Note RDFox will reject rules that use negation in all equality
modes other than off
(see Equality).
Example Using negation of the first form
a1:stranger[?x, ?y] :-
a1:Person[?x],
a1:Person[?y],
NOT a1:friend[?x, ?y] .
Example Using negation of the last form
a1:basic[?x] :-
a1:component[?x],
NOT EXISTS ?y IN (
a1:component[?y],
a1:subcomponent[?y, ?x]
) .
Informally, the rule says that if X is a component and it does not have any subcomponents, then X is a basic component.
5.4.3. Bind Literal¶
A bind literal evaluates an expression and assigns the value of the
expression to a variable, or compares the value of the expression with a
term. A bind literal is of the following form, where exp
is an
expression and t
is a term not appearing in exp
. An expression
can be constructed from terms, operators, and functions. The operators
and functions supported here are the same as those supported in RDFox
SPARQL queries; refer to Section 4 for a detailed comparison
between SPARQL 1.1 functions and the ones implemented in RDFox.
BIND(exp AS t)
An important difference with SPARQL 1.1 is that, for each bind literal in
a rule, every variable used in exp
must be bound either by a body atom,
or by another bind literal in the rule.
Example Using bind literals
cTemp[?x, ?z] :- fTemp[?x, ?y], BIND ((?y - 32) / 1.8 AS ?z) .
The bind literal in the above rule converts Fahrenheit degrees to Celsius degrees.
5.4.4. Filter Literal¶
Rule evaluation can be seen as the process of finding satisfying
assignments for variables appearing in the rule. A filter literal is of
the following form, and it restricts satisfying assignments of variables
to those for which the expression exp
evaluates to true. Thus, when
the user writes a filter literal, the expression is expected to provide
truth values.
FILTER(exp)
As with bind literals, every variable used in exp
must be bound
either by a body atom or by a bind literal.
Example Using filter literals
:PosNum[?x] :- :Num[?x], FILTER(?x > 0)
The rule says that a number is positive if it is larger than zero.
5.4.5. Aggregate Literal¶
An aggregate literal applies an aggregate function to groups of values to produce one value for each group. An aggregate literal has the form
AGGREGATE(B1, …, Bk ON ?X1, …, ?Xj BIND f1(exp1) AS t1 … BIND fn(expn) AS tn)
where k ≥ 1, j ≥ 0, n ≥ 1, and
B1, …, Bk
are atoms,?X1
, …,?Xj
are variables appearing inB1
, …,Bk
,exp1
, …,expn
are expressions constructed using variables fromB1
, …,Bk
,f1
, …,fn
are aggregate functions, andt1
, …,tn
are constants or variables that do not appear inB1
, …,Bk
.
Sometimes the user might be interested in computing an aggregate value
from a set of distinct values. In this case, the keyword “distinct”
can be used in front of an expression expi
.
Note RDFox will reject rules that use aggregation in all
equality
modes other than off
(see Equality).
Example Using aggregate literals
:minTemp[?x, ?z] :-
:City[?x],
AGGREGATE(
:temp[?x, ?y]
ON ?x
BIND MIN(?y) AS ?z) .
Informally, the above rule computes a minimum temperature for each city.
Example Using the keyword distinct
:familyFriendCnt[?x, ?cnt] :-
:Family[?x],
AGGREGATE(
:hasMember[?x, ?y],
:hasFriend[?y, ?z]
ON ?x
BIND COUNT(DISTINCT ?z) AS ?cnt) .
This rule counts the number of different friends for each family; a person is considered a friend of a family if he is a friend of a member of the family.
5.5. Common Uses of Rules in Practice¶
This section describes common uses of rules and reasoning in practical applications. This section will be especially useful for practitioners who are seeking to understand how the reasoning capabilities provided by RDFox can enhance graph data management.
5.5.1. Computing the Transitive Closure of a Relation¶
In many other situations, we may have a relation that is not transitive, but we are interested in defining a different relation that “transitively closes” it. Consider a social network where users follow other users. The graph may be represented by the triples next.
:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
A common task in social networks is to use existing connections to suggest new ones. For example, since Alice follows Bob and Bob follows Charlie, the system may suggest that Alice follow Charlie as well. Likewise, the system may suggest that Diana follow Bob; but then, if Diana follows Bob, she may also want to follow Charlie. We would like to construct an enhanced social network that contains the actual follows relations plus all the suggested additional links. The links in such enhanced social network represent the transitive closure of the original follows relation, which relates any pair of people who are connected by a path in the network. The transitive closure of the follows relation can be computed using RDFox by defining the following two rules:
[?x, :followsClosure, ?y] :- [?x, :follows, ?y] .
[?x, :followsClosure, ?z] :-
[?x, :follows, ?y],
[?y, :followsClosure, ?z] .
The first rule “copies” the contents of the direct follows relation to the new relation. The second rule implements the closure by saying that if a person p1 directly follows p2 and p2 (directly or indirectly) follows person p3, then p1 (indirectly) follows p3.
If we now issue the SPARQL query
SELECT ?x ?y WHERE { ?x :followsClosure ?y }
we obtain the expected results.
:diana :charlie .
:alice :charlie .
:diana :bob .
:alice :bob .
:bob :charlie .
:diana :alice .
Finally, we may also be interested in computing the suggested links that were not already part of the original follows relation. This can be achieved, for instance, by issuing the SPARQL query
SELECT ?x ?y
WHERE {
?x :followsClosure ?y .
FILTER NOT EXISTS { ?x :follows ?y }
}
The results are the expected ones.
:diana :charlie .
:alice :charlie .
:diana :bob .
5.5.2. Composing Relations¶
An important practical use of knowledge graphs is to power Open Query Answering (Open QA) applications, where the user would pose a question in natural language, which is then automatically answered against the graph. Open QA systems often struggle to interpret questions that involve several “hops” in the graph. For instance, consider the graph consisting of the triples given next.
:douglas_adams :bornIn :uk .
:uk rdf:type :Country .
A user may ask the Open QA system for the country of birth of Douglas Adams. To obtain this information, the system would need to construct a query involving two hops in the graph. In particular, the SPARQL query
SELECT ?c
WHERE {
:douglas_adams :bornIn ?c .
?c rdf:type :Country .
}
would return :uk as answer.
The results of the open QA system would be greatly enhanced if the desired information had been available in just a single hop. RDFox rules can be used to provide a clean solution in this situation. In particular, we can use rules to define a new :countryOfBirth relation that provides a “shortcut” for directly accessing the desired information.
[?x, :countryOfBirth, ?y] :- [?x, :bornIn, ?y], [?y, rdf:type, :country] .
The rule says that, if a person p is born in a place c, and that place is a country, then c is the country of birth of p. As a result, RDFox would derive that the country of birth of Douglas Adams is the UK. The Open QA system would now only need to construct the following simpler query, which involves a single hop in the graph, to obtain the desired information.
SELECT ?x ?y WHERE { ?x :countryOfBirth ?y }
5.5.3. Representing SPARQL 1.1 Property Paths¶
As already mentioned, RDFox does not currently support SPARQL 1.1 property paths. It is, however, possible to encode property paths as rules.
Informally, a property path searches through the RDF graph for a sequence of IRIs that form a path conforming to an regular expression. For instance, the following query in our familiar social network example
SELECT ?x WHERE { ?x :follows+ :bob }
returns the set of people that follow :bob directly or indirectly in the
network. In this case, the property path (?x :follows+ :bob)
represents a path of arbitrary length from any node to :bob via the
:follows relation, where the “+” symbol is the familiar one in regular
expressions indicating “one or more occurrences”.
Property paths representing paths of arbitrary length are closely related to computing the transitive closure of a relation. In particular, the following rules would compute the set of “Bob followers” as those who follow :bob directly or indirectly.
[?x, rdf:type, :BobFollower] :- [?x, :follows, :bob] .
[?x, rdf:type, :BobFollower] :-
[?x, :follows, ?y],
[?y, rdf:type, :BobFollower] .
The simple query
SELECT ?x WHERE { ?x rdf:type :BobFollower }
gives us the same answers as the original query using property paths.
5.5.4. Defining a Query as a View¶
When querying a knowledge graph, we may be interested in materializing the result of a SPARQL query as a new relation in the graph. This can be the case, for instance, if the query is interesting on its own right, can be used to define new relations, or simplify the formulation of additional queries.
We can use an RDFox rule for this purpose, where the SPARQL query that we want to materialize in the graph is represented in the body of the rule and the answer as a new relation in the head.
For instance, consider again the previous example of a social network, where we were interested in suggesting new followers (recall the Transitive Closure usage pattern). Recall that we used a query
SELECT ?x ?y
WHERE {
?x :followsClosure ?y
FILTER NOT EXISTS { ?x :follows ?y }
}
to obtain the suggested links that were not already part of the original follows relation. We may be interested in storing this query as a separate relation in the graph. For this, we could rewrite the query as a rule defining a new :suggestFollows relation:
[?x, :suggestFollows, ?y] :- [?x, :followsClosure, ?y], NOT [?x, :follows, ?y] .
The body of the rule represents the where
clause in the query. The
filter expression in the query is captured by the negated atom. Then,
the simple query
SELECT ?x ?y WHERE { ?x :suggestFollows ?y }
will give us the expected answers
:diana :charlie .
:alice :charlie .
:diana :bob .
It is worth pointing out that only a subset of SPARQL 1.1 queries can be
transformed into an RDFox rule in the way described. In particular, all
queries involving basic graph patterns, filter expressions, negation
(NOT EXISTS
, MINUS
) and aggregation can be represented. In
contrast, SPARQL queries with more than two answer variables, or using
OPTIONAL
or UNION
in the WHERE
clause cannot be represented
as rules.
5.5.5. Performing Calculations and Aggregating Data¶
RDFox rules can be used to perform computations over the data in a knowledge graph and store the results in a different relation. For instance, consider a graph with the following triples, specifying the height of different people in cm.
:alice :height "165"^^xsd:integer .
:bob :height "180"^^xsd:integer .
:diana :height "168"^^xsd:integer .
:emma :height "165"^^xsd:integer .
We would want to compute their height in feet, and record it in the graph by adding suitable triples over a new relation. For this, we can import the following RDFox rule.
[?x, :heightInFeet, ?y] :- [?x, :height, ?h], BIND(?h*0.0328 AS ?y) .
The BIND
construct evaluates an expression and assigns the value of
the expression to a variable.
We can now query the graph for the newly introduced relation to obtain the list of people and their height in both centimeters and feet.
SELECT ?x ?m ?f
WHERE {
?x :height ?m .
?x :heightInFeet ?f .
}
and obtain the expected answers
:emma 165 5.412 .
:diana 168 5.5104 .
:bob 180 5.904 .
:alice 165 5.412 .
Rules can also be used to compute aggregated values (e.g., sums, counts, averages, etc) over the graph and store the results in a new relation.
:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
:charlie :follows :alice.
:emma :follows :bob .
:alice rdf:type :Person .
:bob rdf:type :Person .
:charlie rdf:type :Person .
:diana rdf:type :Person .
:emma rdf:type :Person .
The graph contains also information about people’s hobbies, as represented by the following triples.
:alice :likes :tennis .
:bob :likes :music .
:diana :likes :swimming .
:charlie :likes :football .
:emma :likes :reading .
:tennis rdf:type :Sport .
:swimming rdf:type :Sport .
:football rdf:type :Sport .
We would like to count, for each person, the number of followers who enjoy practicing a sport. RDFox provides aggregation constructs which enable these kinds of computations.
[?y, :sportyFollowerCnt, ?cnt] :-
[?y, rdf:type, :Person],
AGGREGATE(
[?x, :follows, ?y],
[?x, :likes, ?w],
[?w, rdf:type, :Sport]
ON ?y
BIND COUNT(DISTINCT ?x) AS ?cnt) .
In particular, the rule states that, if p1 Is a person, then count all
distinct people who follow p1 and who like some sport, store the result
in a count, and store the result in the new :sportyFollowerCnt
relation.
By issuing the following SPARQL query
SELECT ?x ?cnt WHERE { ?x :sportyFollowerCnt ?cnt }
We obtain that Bob has one sporty follower (Alice), whereas Alice has 2 sporty followers (Diana and Charlie).
:bob 1 .
:alice 2 .
This type of computation is compatible with the computation of the transitive closure of a relation. For instance, we may be interested in counting the number of (direct or indirect) followers who are sporty. For this, we can use RDFox rules to compute the transitive closure of the follows relation:
[?x, :followsClosure, ?y] :- [?x, :follows, ?y] .
[?x, :followsClosure, ?z] :- [?x, :follows, ?y], [?y, :followsClosure, ?z] .
And use the following rule to compute the desired count.
[?y, :sportyFollowerClosureCnt, ?cnt] :-
[?y, rdf:type, :Person],
AGGREGATE(
[?x, :followsClosure, ?y],
[?x, :likes, ?w],
[?w, rdf:type, :Sport]
ON ?y
BIND COUNT(DISTINCT ?x) AS ?cnt) .
The following SPARQL query
SELECT ?x ?cnt WHERE { ?x :sportyFollowerClosureCnt ?cnt }
Then provides the following results.
:charlie 3 .
:bob 3 .
:alice 3 .
We observe that the count for Charlie does not seem quite right. Charlie is followed directly only by Bob (who is not sporty); however, Bob is followed by Alice (a sporty person) and Alice is followed by Diana (another sporty person). Naturally, we would have obtained a count of 2; however, Charlie also follows Alice and hence he transitively follows himself, thus the count of 3!. If we wanted to prevent this situation, we can modify the second rule implementing transitive closure to eliminate self-loops as follows:
[?x, :followsClosure, ?z] :-
[?x, :follows, ?y],
[?y, :followsClosure, ?z],
FILTER(?x != ?z) .
Now, our query before yields the expected results
:charlie 2 .
:bob 3 .
:alice 2 .
5.5.6. Arranging Concepts and Relations in a Hierarchical Structure¶
A common use of ontologies is to arrange concepts (called classes in OWL 2) and relations (called properties in OWL 2) in a subsumption hierarchy. For instance, we may want to say that dogs and cats are mammals and that mammals are animals. Such subsumption relationships can be easily represented using RDFox rules.
[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Dog] .
[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Cat] .
[?x, rdf:type, :Animal] :- [?x, rdf:type, :Mammal] .
Suppose that we have a graph with the following triples:
:max rdf:type :Dog .
:coco rdf:type :Cat .
:teddy rdf:type :Mammal .
Then, RDFox will deduce that Max and Coco are both mammals and therefore also animals, and also that Teddy is an animal. In particular, the query
SELECT ?x WHERE { ?x rdf:type :Animal }
yields the expected results
:max .
:teddy .
:coco .
It is also often the case that concepts are “assigned” certain properties. For instance, mammals have children which are also mammals. This is known as a range restriction in the ontology jargon, and can be represented using the following RDFox rule
[?y, rdf:type, :Mammal] :- [?x, rdf:type, :Mammal],[?x, :hasChild, ?y] .
If we now extend the graph with the following triples.
:max :hasChild :betsy .
:coco :hasChild :minnie .
RDFox will derive automatically that both Betsy and Minnie are also mammals (and therefore also animals). Indeed, the query
SELECT ?x WHERE { ?x rdf:type :Mammal }
Will yield the expected results.
:max .
:betsy .
:minnie .
:teddy .
:coco .
In many applications, it is also useful to represent subsumption
relations between the edges in a knowledge graph, to specify that one
relation is more specific than the other. For instance, we may want to
say that the :hasDaughter
relation is more specific than the
:hasChild
relation. This can be represented using the following
RDFox rule.
[?x, :hasChild, ?y] :- [?x, :hasDaughter, ?y] .
If we now add the following triple to the graph
:betsy :hasDaughter :luna .
RDFox can infer that Luna is the child of Betsy and therefore she is
also a mammal, and an animal. Indeed, the previous query listing all
mammals will now also include :luna
as an answer.
5.5.7. Detecting Cyclic Relations¶
A common task in knowledge graphs is to identify cyclic relationships. For instance, partonomy relations are typically acyclic (e.g., if an engine is part of a car we would not expect the car also to be part of the engine!). In these cases, cycle detection may be needed to detect errors in the graph and thus provide data validation.
A simple case of this pattern is when the relation we are checking for cyclicity is naturally transitive. Such is the case, for instance of the partOf relation. Consider the following graph:
:a :partOf :b .
:b :partOf :c .
:c :partOf :a .
The graph contains a cyclic path :a -> :b -> :c -> :a. via the
:partOf
relation. The relationship is naturally transitive and
hence we can use the corresponding pattern to define it as such.
[?x, :partOf, ?z] :- [?x, :partOf, ?y], [?y, :partOf, ?z] .
The following SPARQL query now gives us which elements are part of others (directly or indirectly)
SELECT ?x ?y WHERE { ?x :partOf ?y }
Which gives us the following results
:a :a .
:c :c .
:b :b .
:a :c .
:b :a .
:c :b .
:c :a .
:b :c .
:a :b .
Cyclicity manifests itself by the presence of self-loops (e.g., :a
is derived to be a part of itself ). Hence, it is possible to detect
that the part of relation is cyclic by issuing the following SPARQL
query.
ASK { ?x :partOf ?x }
Where the result comes true since the partonomy relation does have a self loop.
Alternatively, we could have defined the following additional rule.
[:partOf, rdf:type, :CyclicRelation] :- [?x, :partOf, ?x] .
Which tells us that if any object is determined to be a part of itself, then the partonomy relation is cyclic.
We can now issue the following SPARQL query, which retrieves the list of cyclic relations in the graph, which in this case consists of the relation :partOf.
SELECT ?x WHERE { ?x rdf:type :CyclicRelation }
5.5.8. Defining Attributes and Relationships as Mandatory¶
In knowledge graphs, data is typically incomplete.
For instance, suppose that the data in a knowledge graph has been obtained from a variety of sources. The graph has different types of information about people, such as their name, job title and so on. We notice that some people in the graph have a date of birth, whereas others do not. Because of the nature of our application, we would like to have the date of birth of each person represented in the graph, and would like to find out which people are missing this information; that is, we would like to make the presence of a date of birth value mandatory for every person in the graph. In relational databases this is typically solved by declaring an integrity constraint.
Consider the following graph.
:alice :dob "11/01/1987"^^xsd:string .
:alice rdf:type :Person .
:bob :dob "23/07/1980"^^xsd:string .
:bob rdf:type :Person .
:diana :height "168"^^xsd:integer .
:diana rdf:type :Person .
:emma :dob "10/02/1965"^^xsd:string .
:emma rdf:type :Person .
:max rdf:type :Dog .
We can use the following rule to record absence of a date of birth for people.
[?x, rdf:type, owl:Nothing] :-
[?x, rdf:type, :Person],
NOT EXISTS ?y IN ([?x, :dob, ?y]) .
The rule says that if a person p lacks a date of birth d, then p incurs in a constraint violation. The constraint violation is recorded by making person p an instance of the special owl:Nothing unary relation, which is also present in the OWL 2 standard.
The following SPARQL query then correctly reports that Diana violates the constraint (whereas Max does not because he is a dog).
SELECT ?x WHERE { ?x rdf:type owl:Nothing }
This type of computation combines well with type inheritance. For instance, suppose that we add the following triple:
:charlie rdf:type :Student .
And the following rule stating that every student is a person
[?x, rdf:type, :Person] :- [?x, rdf:type, :student] .
Then, the previous query will give as results
:charlie :dob .
:diana :dob .
Indeed, since Charlie is a student, he is also a person; furthermore, Charlie lacks date of birth information.
The meaning of the special class owl:Nothing
is different in RDFox
and the OWL 2 standard. If one can derive from an OWL 2 ontology that
that an object is an instance of owl:Nothing, then the ontology is
inconsistent and querying the ontology becomes logically meaningless.
Thus, the OWL 2 standard would require users to modify the data and/or
ontology to fix the inconsistency prior to attempting to issue queries.
Furthermore, it is worth noting that in OWL 2 it is not possible to
write statements that check for “absence of information”; this is due to
the monotonicity properties of OWL 2 as a fragment of first-order logic.
In contrast, in RDFox, deriving an instance of owl:Nothing does not lead to a logical inconsistency and the answers to queries remain perfectly meaningful. In the pattern we have described, querying for owl:Nothing simply provides users with the list of all nodes in the graph for which mandatory information is missing. As a result, the user is warned rather than prevented from carrying out a task such as issuing a query. For instance, if we were to ask a query to RDFox such as the following
SELECT ?x WHERE { ?x rdf:type :Person }
We would still obtain the expected results (see below) despite the fact that there are constraint violations in the data.
:alice .
:charlie .
:emma .
:diana .
:bob .
This behavior is also different from relational databases, where the system would typically reject updates that lead to a constraint violation. As already mentioned, RDFox continues to operate normally and would accept any updates although constraints are being violated. Of course, users are encouraged to query the system in order to detect and rectify such violations.
5.5.9. Expressing Defaults and Exceptions¶
Rules can be used to write default statements (that is, statements that normally hold in the absence of additional information). This is especially useful to represent exceptions to rules, which is important, for instance, in legal domains.
Consider the following graph saying that Tweety is a bird.
:tweety rdf:type :Bird .
Birds typically fly; that is, in the absence of additional information, the fact that Tweety is a bird constitutes sufficient evidence to believe that Tweety flies. There may, however, be exceptions. For instance, penguins are non-flying birds, and hence if we were to find out that Tweety is a penguin, then we would need to withdraw our default assumption that Tweety flies.
RDFox rules can be used to model this type of default reasoning. In particular, consider a rule saying that birds fly unless they are penguins.
[?x, rdf:type, :FlyingAnimal] :-
[?x, rdf:type, :Bird],
NOT [?x, rdf:type, :Penguin] .
We can now issue a SPARQL query asking for the list of flying animals
SELECT ?x WHERE { ?x rdf:type :FlyingAnimal }
and obtain :tweety as an answer.
Suppose now that we were to extend the graph with the following triple
:tweety rdf:type :Penguin .
Then, the same query would now give us an empty set of answers since, in the light of the new evidence, we can no longer conclude that Tweety flies.
5.5.10. Restructuring Data¶
Rules can be used to transform the structure of the data in a knowledge graph (e.g., by adding properties to a relationship).
Consider the following knowledge graph representing employees and their employer.
:alice :worksFor :oxford_university .
:bob :worksFor :acme .
:charlie :worksFor :oxford_university .
:charlie :worksFor :acme .
Suppose that we now want to expand the graph by adding further information about the employment, such as the salary and the start date. This information is relative to each specific employment of an employee; for instance, Charlie will have a different salary and start date for his employment with Oxford University and his employment with Acme.
We can use RDFox rules to automatically restructure the data in the graph to account for the new information.
[?z, rdf:type, :Employment],
[?z, :Employee, ?x],
[?z, :employer, ?y] :-
[?x, :worksFor, ?y],
BIND(SKOLEM("Employment", ?x, ?y) AS ?z) .
SELECT ?x ?y ?z WHERE { ?x ?y ?z . ?x rdf:type :Employment }
gives us the new triples generated by the application of the previous rule
_:Employment_116_200 :employer :acme .
_:Employment_116_200 :Employee :bob .
_:Employment_116_200 rdf:type :Employment .
_:Employment_113_199 :employer :oxford_university .
_:Employment_113_199 :Employee :alice .
_:Employment_113_199 rdf:type :Employment .
_:Employment_156_199 :employer :oxford_university .
_:Employment_156_199 :Employee :charlie .
_:Employment_156_199 rdf:type :Employment .
_:Employment_156_200 :employer :acme .
_:Employment_156_200 :Employee :charlie .
_:Employment_156_200 rdf:type :Employment .
It is important to notice that the generated SKOLEM
IDs such as
cannot be considered stable across runs (or RDFox versions) since they
are generated based in the dictionary IDs of the arguments. Further data
relative to an employment, such as associated salary and start date,
should therefore not be inserted directly as triples, but rather rules
such as the following ones:
[?z, :salary, "60000"^^xsd:integer] :- BIND(SKOLEM("Employment", :alice, :oxford_university) AS ?z) .
[?z, :salary, "55000"^^xsd:integer] :- BIND(SKOLEM("Employment", :charlie, :oxford_university) AS ?z) .
[?z, :salary, "40000"^^xsd:integer] :- BIND(SKOLEM("Employment", :charlie, :acme) AS ?z) .
[?z, :salary, "45000"^^xsd:integer] :- BIND(SKOLEM("Employment", :bob, :acme) AS ?z) .
Note that each of these rules uses the SKOLEM
construct in the
antecedent to make sure that they match correctly to the generated
triples listed above.
To check that the salary data has been inserted correctly, we can issue the query
SELECT ?x (SUM(?y) AS ?income)
WHERE {
?e :Employee ?x .
?e :salary ?y
}
GROUP BY ?x
which gives us the total yearly income for each person by summing up the salary of each of their employments, giving the expected results.
:alice 60000 .
:charlie 95000 .
:bob 45000 .
Data restructuring via reification has multiple applications. In particular, RDF can only represent directly binary relations and hence the representation of higher arity relations is only possible through reification. Reification is also needed if we want to qualify or annotate edges in a graph (e.g., by adding weights, or dates, or other relevant properties).
5.5.11. Representing Ordered Relations¶
Many relations naturally imply some sort of order, and in such cases we are often interested in finding the first and last elements of such orders. For instance, consider the managerial structure of a company.
:alice :manages :bob .
:bob :manages :jeremy .
:bob :manages :emma .
:emma :manages :david .
:jeremy :manages :monica .
We would like to recognize which individuals in the company are “top level managers”. We can use a rule to define a top level manager as a person who manages someone and is not managed by anyone else.
[?x, rdf:type, :TopLevelManager] :-
[?x, :manages, ?y],
NOT EXISTS ?z IN ([?z, :manages, ?x]) .
The query
SELECT ?x WHERE { ?x rdf:type :TopLevelManager }
asking for the list of top level managers gives as :alice
as the
answer. We can now use a rule to define “junior employees” as those who
have a manager but who themselves do not manage anyone else.
[?x, rdf:type, :JuniorEmployee] :-
[?y, :manages, ?x],
NOT EXISTS ?z IN ([?x, :manages, ?z]) .
The query
SELECT ?x WHERE { ?x rdf:type :JuniorEmployee }
Gives us :monica
and :david
as answers.
Prominent examples of ordered relations where we may be interested in finding the top and bottom elements are partonomies (part-whole relations) and is-a hierarchies.
5.5.12. Representing Equality Cliques¶
When integrating data from multiple sources using a knowledge graph, it
is usually the case that objects from different sources are identified
to be the same. In this setting, we want to be able to answer complex
queries that span across the different sources, and to easily identify
the source where the information came from. Additionally, we may not
want to use the equality predicate owl:sameAs
to identify the
objects since our rule set may contain rules involving aggregation
and/or negation-as-failure which cannot be used in conjunction with
equality.
For instance, assume that we are integrating sources s1, s2, and s3 containing information about music artists and records. Assume that we have determined (e.g., using entity resolution techniques or exploiting explicit links between the sources) that “John Doe” in s1 is the same as “J. H. Doe” in s2 and “The Blues King” in s3. We can represent these correspondences using a binary relation ost:same which we define as reflexive, symmetric, and transitive using RDFox rules as given next.
s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .
[?x, ost:same, ?x] :- [?x, ost:same, ?y] .
[?y, ost:same, ?x] :- [?x, ost:same, ?y] .
[?x, ost:same, ?z] :- [?x, ost:same, ?y], [?y, ost:name, ?z] .
In these way, the aforementioned objects form a clique in the integrated graph. Indeed, the query
SELECT ?x ?y WHERE { ?x ost:same ?y }
returns the answer
s3:blues_king s2:john_H_doe .
s2:john_H_doe s3:blues_king .
s2:john_H_doe s2:john_H_doe .
s3:blues_king s3:blues_king .
s2:john_H_doe s1:john_doe .
s3:blues_king s1:john_doe .
s1:john_doe s1:john_doe .
s1:john_doe s3:blues_king .
s1:john_doe s2:john_H_doe .
In order to be able to query across artists from different sources, we want to define a unique representative for the elements in the clique. A plausible strategy is to first select the smallest individual according to some pre-defined total order (the order itself is irrelevant, and we can choose for example the order on IRIs provided by RDFox). To select the smallest object we introduce the following rules.
[?x, ost:comesBefore, ?y] :- [?x, ost:same, ?y], FILTER (?x < ?y) .
[?y, rdf:type, ost:NotSmallestInClique] :- [?x, ost:comesBefore, ?y] .
[?x, rdf:type, ost:SmallestInClique] :-
[?x, ost:comesBefore, ?y],
NOT [?x, rdf:type, ost:NotSmallestInClique] .
The first rule generates an order amongst the elements of the clique. The second rule says that if ?x comes before ?y then ?y is not the smallest element. The third rule finally identifies the smallest element in the clique. The following query
SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }
reveals the generated order
s2:john_H_doe s3:blues_king .
s1:john_doe s2:john_H_doe .
s1:john_doe s3:blues_king .
where s1:john_doe is correctly identified as the smallest element by the query.
SELECT ?x WHERE { ?x rdf:type ost:SmallestInClique }
Now that we have identified an element of the clique we can create a representative of the clique using a Skolem constant, as given next.
[?z, rdf:type, ost:Artist],
[?z, ost:represents, ?x] :-
[?x, rdf:type, ost:SmallestInClique],
BIND(SKOLEM(“OSTArtist”, ?x) AS ?z) .
[?x, ost:represents, ?z] :-
[?x, ost:represents, ?y],
[?y, ost:comesBefore, ?z] .
The first rule creates the SKOLEM
constant and states that it
represents the smallest element. The second rule states that the Skolem
constant also represents every other element in the clique.
The query
SELECT ?z ?x WHERE { ?z ost:represents ?x }
Yields the expected result.
_:OSTArtist_2136 s2:john_H_doe .
_:OSTArtist_2136 s3:blues_king .
_:OSTArtist_2136 s1:john_doe .
It is possible to achieve the same results by using an optimized set of rules that generates fewer triples. In particular, this optimized representation avoids axiomatizing the ost:same property as reflexive and symmetric. Let’s reconsider the data.
s1:john_doe rdf:type s1:Artist .
s2:john_H_doe rdf:type s2:Performer .
s3:blues_king rdf:type s3:Musician .
s1:john_doe ost:same s2:john_H_doe .
s1:john_doe ost:same s3:blues_king .
s2:john_H_doe ost:same s3:blues_king .
We now redefine directly the ost:comesBefore relation using the following rules
[?x, ost:comesBefore, ?y] :-
[?x, ost:same, ?y],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?y] :-
[?y, ost:same, ?x],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?z] :-
[?x, ost:comesBefore, ?y],
[?y, ost:comesBefore, ?z],
FILTER(?x > ?y) .
[?x, ost:comesBefore, ?y] :-
[?z, ost:comesBefore, ?x],
[?z, ost:comesBefore, ?y],
FILTER(?x > ?y) .
The query
SELECT ?x ?y WHERE { ?x ost:comesBefore ?y }
reveals a generated order. Once we have the order, we proceed as before.
5.5.13. Populating a Knowledge Graph from a Data Source¶
Rules can be used to bring information from an external data source into a knowledge graph.
Data feeding a knowledge graph often stems from different types of external data sources, such as relational databases. We can use RDFox rules to specify how each record in the external data source corresponds to a set of nodes and edges in the graph. RDFox allows us to load the information in an external data source by means of a two-stage process. The first step is to attach a data source and assign it to a relation. For instance, consider the following data about the employees of ACME corporation in a CSV file named “employee.csv”.
emp_id,emp_name,job_name,hire_date,salary
68319,KAYLING,PRESIDENT,,200000
66928,BLAZE,MANAGER,2017-05-01,90000
67453,JONES,ASSISTANT,2018-05-03,35000
We attach the table to an RDFox relation Employee with 5 arguments, one per column in the table. This can be achieved using the following commands.
dsource add delimitedFile "EmployeeDS" \
file "$(dir.root)csv/employee.csv" \
header true
The net result is that the employee.csv is added as an RDFox data
source. We called the data source EmployeeDS. Here, file
specifies the
path to the file, and header
indicates whether the file contains a
header row. At this point, we can check whether the data has been
attached correctly and whether the RDFox data source is in place by
running the command
dsource show EmployeeDS
to obtain the expected information
Data source type name: delimitedFile
Data source name: EmployeeDS
Parameters: file = employee.csv
header = true
------------------------------------------------------------
Table name: employee.csv
Column 1: emp_id xsd:integer
Column 2: emp_name xsd:string
Column 3: job_name xsd:string
Column 4: hire_date xsd:string
Column 5: salary xsd:integer
------------------------------------------------------------
The next step attaches the RDFox data source to an employee relation in RDFox.
dsource attach :employee "EmployeeDS" \
"columns" 5 \
"1" "https://oxfordsemantic.tech/RDFox/tutorial/{1}_{2}" \
"1.datatype" "iri" \
"2" "{emp_name}" \
"2.datatype" "string" \
"3" "{job_name}" \
"3.datatype" "string" \
"4" "{hire_date}" \
"4.datatype" "string" \
"4.if-empty" "absent" \
"5" "{salary}" \
"5.datatype" "integer" \
"5.if-empty" "absent"
The IRI of the new relation will be :employee, where “:” is the default
prefix defined beforehand as
“https://oxfordsemantic.tech/RDFox/tutorial/”. The :employee
data
relation will contain 5 arguments. The first argument provides an
identifier for each employee as a composition of the prefix’s IRI, the
employee ID (first column in the data source) and the employee name
(second column). The remaining arguments are obtained from the column of
the corresponding name in the data source. Since not every employee may
have a hiring date or a known salary, the conditions “if-empty” indicate
that the corresponding argument in the RDFox relation will be left
empty.
Once the relation has been created in RDFox, it can be queried in SPARQL and used
in rule bodies. To query it in SPARQL, we use an RDFox extension to SPARQL which uses the TT
syntax, where TT
stands for tuple table. The SPARQL query:
SELECT ?x ?y ?z ?u ?w WHERE { TT :employee{ ?x ?y ?z ?u ?w } }
Will return the following answers:
:68319_KAYLING "KAYLING" "PRESIDENT" UNDEF 200000 .
:66928_BLAZE "BLAZE" "MANAGER" "01/05/2017" 90000 .
:67453_JONES "JONES" "ASSISTANT" "03/05/2018" 35000 .
As we can see, the UNDEF
entry represents that the value of the
hiring date for the first employee is missing. Now that we have the
RDFox relation correctly in place, the next step would be to turn the
data in the relation in the form of a graph. For this we can use the
following rule, where the RDFox relation forms the antecedent
and the generated edges in the graph based on it are described in the
consequent of the rule:
[?x, rdf:type, :Employee],
[?x, :worksFor, :acme],
[?x, :hasName, ?y],
[?x, :hasJob, ?z],
[?x, :hiredOnDate, ?u],
[?x, :salary, ?w] :-
:employee(?x, ?y, ?z, ?u, ?w) .
The materialization of the rule generates a graph from the data in the relation. The new relations in the graph can be used in other rules to define additional concepts and relations. For instance, we can add the rules stating that every employee is a person and every person with a salary higher than £50,000 pays tax at a higher-rate.
[?x, rdf:type, :Person ] :- [?x, rdf:type, :Employee ] .
[?x, :taxRate, :higher-rate] :- [?x, rdf:type, :Person], [?x, :salary, ?y], FILTER(?y > 50000) .
Now we can query the graph to obtain, for instance, the list of high income tax payers.
SELECT ?x WHERE { ?x :taxRate :higher-rate }
And obtain the expected results.
:68319_KAYLING .
:66928_BLAZE .
Data can be imported from different data sources and merged together in the graph. For instance, if we had a different employee table (e.g., for a different department) in another CSV, we could attach to it a new RDFox data source and exploit a rule akin to the one before to further populate the binary relations in the graph, as well as to create new ones.
5.6. OWL 2 Support in RDFox¶
This section describes the support in RDFox for OWL 2—the W3C standard language for representing ontologies.
5.6.1. OWL 2 Ontologies¶
An OWL 2 ontology is a formal description of a domain of interest. OWL 2 defines three different syntactic categories.
The first syntactic category are Entities, such as classes,
properties and individuals, which are identified by an IRI.
Classes represent sets of objects in the world; for instance, a class
:Person
can be used to represent the set of all people. Properties
represent binary relations, and OWL 2 distinguishes between two
different types of properties: data properties describe relationships
between objects and literal values (e.g., the data property :age
can
be used to represent a person’s age), whereas object properties
describe relationships between two objects (e.g., an object property
:locatedIn
can be used to relate places to their locations).
Finally, individuals in OWL 2 are used to refer to concrete objects in
the world; for instance, the individual :oxford
can be used to refer
to the city of Oxford.
The second syntactic category are expressions, which can be used to
describe complex classes and relations constructed in terms of simpler
ones. For instance the expression ObjectUnionOf( :Cat :Dog)
represents the set of animals that are either cats or dogs.
The third syntactic category are axioms, which are statements about
entities and expressions that are asserted to be true in the domain
described. For instance, the OWL 2 axiom
SubClassOf(:scientist :Person)
states that every scientist is a
person by defining the class :scientist
to be a subclass of the
class :Person
.
The main component of an OWL 2 ontology is a set of axioms. Ontologies can also import other ontologies and contain annotations.
OWL 2 ontologies can be written using different syntaxes. RDFox can currently load ontologies written in the functional syntax as well as ontologies written in the turtle syntax.
5.6.2. OWL 2 Ontologies vs. RDFox Rules¶
OWL 2 and the rule language of RDFox are languages for knowledge representation with well-understood formal semantics.
Both languages share a common core. That is, certain types of rules can be equivalently rewritten as OWL 2 axioms and vice-versa. For instance, the following axiom and rule both express that every scientist is also a person.
SubClassOf(:Scientist :Person)
[?x, rdf:type, :Person] :- [?x, rdf:type, :Scientist] .
In particular, the OWL 2 specification describes the OWL 2 RL profile—a subset of the OWL 2 language that is amenable to implementation via rule-based technologies.
There are, however, many other aspects where OWL 2 and the rule language of RDFox differ, and there are many constructs in OWL 2 that cannot be translated as RDFox rules and vice-versa. For instance, OWL 2 can represent disjunctive knowledge, i.e., we can write an OWL 2 axiom saying that every student is either an undergraduate student, a graduate student, or a doctoral student:
SubClassOf(:Student ObjectUnionOf(:UndergraduateSt :MscSt :DoctoralSt) )
RDFox rules, however, do not support disjunction. There are also many kinds of rules in RDFox that cannot be expressed using OWL 2 axioms; these include, for instance, rules involving features such as aggregation, negation-as-failure or certain built-in functions; furthermore, there are also plain Datalog rules that do not have a correspondence in OWL 2.
5.6.3. Loading OWL 2 Ontologies in RDFox¶
RDFox is able to load, store and manipulate three kinds of syntactic elements: triples, rules, and OWL 2 axioms. These are kept in separate “bags” in the system and can be added or deleted individually. For instance, consider the following text file “ontology.txt” containing an ontology written in the functional syntax of OWL 2:
Prefix(:=<http://www.example.com/ontology1#>)
Ontology( <http://www.example.com/ontology1>
SubClassOf( :Child :Person )
SubClassOf( :Person ObjectUnionOf(:Child :adult) )
)
The ontology contains two axioms. The first axiom tells us that every child is also a person, whereas the second axiom states that every person is either a child or an adult. The first axiom can be faithfully translated into RDFox rules, whereas the second one cannot. RDFox provides a full API for OWL 2 and can parse, store and manage all kinds of OWL 2 axioms in functional syntax. As a result, it will correctly load both axioms, but will issue a warning indicating that the second axiom has no correspondence into rules.
To load the ontology in RDFox, we can initialize a data store (see the Getting Started guide) and import the the file in the usual way.
import ontology.txt
The ontology axioms are now loaded in the data store and kept internally in the “axioms bag”.
We can now import a turtle file containing the following triples:
:jen rdf:type :Child .
:jen :hasParent :mary .
These triples will be kept internally in the “triples bag”.
Finally, we can import the following RDFox rule saying that the parent of a child is a person.
[?y, rdf:type, :Person] :- [?x, :hasParent, ?y], [?x, rdf:type, :Child] .
This rule is kept internally in RDFox in the separate “rules bag”.
Now, we are in a position to perform reasoning. For this we can issue a SPARQL query asking for the list of all people:
SELECT ?x WHERE { ?x rdf:type :Person }
To answer the query, RDFox will translate OWL 2 axioms into rules and will consider together all data triples, all RDFox rules added by the user, plus all rules stemming from the translation of OWL 2 axioms. In particular, the following rules and facts contribute to answering the query, where the first rule comes from the translation of the first ontology axiom as a rule (the second axiom in the ontology is ignored):
:jen rdf:type :Child .
:jen :hasParent :mary .
[?x, rdf:type, :Person] :- [?x, rdf:type, :Child] .
[?y, rdf:type, :Person] :- [?x, :hasParent, ?y], [?x, rdf:type, :Child] .
As a result, RDFox will return as answers both :jen
and :mary
.
Indeed, :jen
is a child and hence also a person by the first rule;
in turn, :mary
is the parent of :jen
and hence also a person by
the second rule.
The translation of OWL 2 axioms into rules for the purpose of reasoning is performed on a best-effort basis. In particular, sometimes RDFox may not be able to translate the whole of given axiom, but may still be able to translate a part of it. For instance suppose that we add to our data store the following axiom saying that every person is a human and also either an adult or a child:
SubClassOf(:Person ObjectIntersectionOf(:Human ObjectUnionOf(:Child :Adult)))
RDFox will load the axiom correctly, but will again issue a warning due to the use of disjunction in the axiom. Suppose that we now issue the query
SELECT ?x WHERE { ?x rdf:type :Human }
RDFox will correctly return both :jen
and :mary
as answers.
Indeed, as already explained, RDFox can deduce that both :jen
and
:mary
are persons. Now, although the last axiom we imported cannot
be fully translated into rules, RDFox will still be able to partly
translate it into the following rule:
[?x, rdf:type, :Human] :- [?x, rdf:type, :Person] .
from which we can deduce that :jen
and :mary
are also humans.
OWL 2 ontologies can also be loaded from a turtle file, following the standard representation of OWL 2 ontologies as triples. In order to load an ontology from a turtle file, we need to initialize a store with special parameters. Using the command line, we can initialize such a store as follows:
init par-complex-nn owl-in-rdf-support relaxed
This command creates a store in which parsing of OWL as triples is enabled. As a result, RDFox will identify OWL 2 axioms that were encoded as RDF triples and will translate those axioms into rules as described earlier. Suppose that we import into the store a turtle file containing the following triples:
:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .
:jen rdf:type :Child .
The first two triples correspond to the serialization into triples of the following axioms in functional syntax:
SubClassOf( :Child :Person )
SubClassOf( :Person :Human )
As a result of parsing, all triples will be stored in the “triples bag” of RDFox, whereas the first two triples will also be added as axioms.
Now, assume that we issue a query asking for the list of all humans:
SELECT ?x WHERE { ?x rdf:type :Human }
Then, RDFox will correctly return :jen
as the answer. Internally,
RDFox will transform the OWL axioms into rules
[?x, rdf:type, :Person] :- [?x, rdf:type, :Child] .
[?x, rdf:type, :Human] :- [?x, rdf:type, :Person] .
and compute the corresponding materialization.
5.6.4. Subsumption Reasoning¶
OWL 2 reasoners implement a wide range of reasoning services, which are not limited to query answering. In particular, OWL reasoners can solve the subsumption problem: given a class, they would compute all its inferred superclasses.
For example, given
SubClassOf( :Child :Person )
SubClassOf( :Person :Human )
an OWL 2 reasoner would be able to infer
SubClassOf( :Child :Human )
as a consequence, since from the fact that every child is a person, and every person is a human, that every child is also a human.
RDFox is a materialization-based query answering system, and it has not been designed for solving problems such as class subsumption. RDFox, however, is still able to detect some such subsumption relations should this be required in an application.
One way to achieve this is to reduce subsumption to query answering. In
particular, to check whether it is true that every child is a human, we
can introduce a fresh object in the data store, which we make an
instance of :Child
. That is, we can import the following triple,
where :a_child
is a fresh URI.
:a_child rdf:type :Child .
Then, we would test whether :a_child
is inferred to be also a human
by issuing the query
ASK { :a_child rdf:type :Human }
which would return true.
Another way of testing subsumption is to import the ontology as a set of triples:
:Child rdfs:subClassOf :Person .
:Person rdfs:subClassOf :Human .
When the triples are parsed and eventually translated into rules for
reasoning, RDFox will also add a number of internal rules that partially
encode the semantics of the RDFS and OWL vocabularies; in particular, it
will add rules representing the relation rdfs:subClassOf
as
transitive and reflexive, and also saying that every class is a subclass
of owl:Thing
. As a result, the following SPARQL query
SELECT ?x WHERE { :Child rdfs:subClassOf ?x }
will correctly return all superclasses of :Child
as
:Person .
:Human .
owl:Thing .
:Child .
5.6.5. Current Limitations¶
The following details should be taken into account by users of RDFox who rely on OWL 2 ontologies in their applications:
RDFox currently does not support ontology importation. That is, if we load ontology O, which in turns imports O1 and O2, only the contents of O will be loaded (and not those of O1 and O2).
RDFox also does not support associating axioms to a given ontology. In particular, if we load two different ontology files, all the axioms in both ontologies will be added to the same bag of axioms in the system.
5.7. SWRL Support in RDFox¶
This section describes the support in RDFox for SWRL—a format for representing rules on the Semantic Web.
5.7.1. SWRL Rules¶
The SWRL specification extends the set of OWL axioms to include also Datalog rules. It thus enables rules to be combined with an OWL ontology. SWRL rules can be written using different syntaxes. RDFox can currently load SWRL rules written in the functional syntax as well as rules written in the turtle syntax. SWRL rules can be easily expressed as RDFox rules, with the only exception of rules containing certain built-ins which do not have a direct correspondence to SPARQL 1.1 built-in functions.
5.7.2. Loading SWRL Rules in RDFox¶
RDFox treats SWRL as an extension of OWL 2 and hence SWRL rules are loaded and managed in exactly the same way as OWL 2 axioms. For instance, consider the following text file “swrl-rules.txt” containing an ontology written in the functional syntax of SWRL:
Prefix(:=<http://www.example.com/ontology1#>)
Ontology( <http://www.example.com/ontology1>
Implies(Antecedent(:Student(I-variable(:x1))) Consequent(:Person(I-variable(:x1))))
)
The ontology consists of a rule stating that every student is a person.
To load the ontology in RDFox, we can initialize a data store (see the Getting Started guide) and import the the file in the usual way.
import swrl-rules.txt
The SWRL rule is now loaded in the data store and kept internally in the “axioms bag”.
We can next import a turtle file containing the triple:
:jen rdf:type :Student .
Now, we are in a position to perform reasoning. For this we can issue a SPARQL query asking for the list of all people:
SELECT ?x WHERE { ?x rdf:type :Person }
To answer the query, RDFox will translate SWRL into RDFox rules, and will return :jen
as a result.
SWRL can also be loaded from a turtle file, following the relevant syntax in the SWRL specification. For this, RDFox follows exactly the same approach as with OWL 2 axioms expressed as triples.
5.7.3. Negation-As-Failure in SWRL¶
By default SWRL rules that feature ObjectComplementOf
are rejected by RDFox,
since negation in RDFox is interpreted under the closed-world assumption, while
negation in SWRL is interpreted under the open-world assumption.
This behavior can be overridden by initializing a store with the option
swrl-negation-as-failure
set to on
, as described in Section 6.2.2.11.
Example: Consider, for example, the following SWRL rule with a suitably defined default prefix
Implies ( Antecedent ( :A(I-variable(:x)) ObjectComplementOf(:B)(I-variable(:x)) ) Consequent(:C(I-variable(:x))) )If a store is initialized with the option
swrl-negation-as-failure on
, RDFox will convert the above SWRL rule to the following RDFox rule:C[?x] :- :A[?x], NOT :B[?x] .
The feature is limited to class expressions of the form ObjectComplementOf(C)
, where
C
is a class name. The usage of ObjectComplementOf
in complex class expressions or
in the consequent of a SWRL rule is not supported.
This feature can be combined with the owl-in-rdf-support
option when importing
SWRL rules encoded as RDF.
5.7.4. Current Limitations¶
SWRL comes with a large number of built-in functions, but unfortunately only a subset of them maps directly to SPARQL 1.1 built-in functions, which are the those natively supported in RDFox.
The list of built-in functions not supported is as follows: swrlb:roundHalfToEven
, swrlb:normalizeSpace
, swrlb:translate
, swrlb:anyURI
, swrlb:tokenize
, swrlb:yearMonthDuration
, swrlb:dayTimeDuration
, swrlb:dateTime
, swrlb:date
, swrlb:time
, swrlb:addYearMonthDurations
, swrlb:subtractYearMonthDurations
, swrlb:multiplyYearMonthDuration
, swrlb:divideYearMonthDurations
, swrlb:addDayTimeDurations
, swrlb:subtractDayTimeDurations
, swrlb:multiplyDayTimeDurations
, swrlb:divideDayTimeDuration
, swrlb:subtractDates
, swrlb:subtractTimes
, swrlb:addYearMonthDurationToDateTime
, swrlb:addDayTimeDurationToDateTime
, swrlb:subtractYearMonthDurationFromDateTime
, swrlb:subtractDayTimeDurationFromDateTime
, swrlb:addYearMonthDurationToDate
, swrlb:addDayTimeDurationToDate
, swrlb:subtractYearMonthDurationFromDate
, swrlb:subtractDayTimeDurationFromDate
, swrlb:addDayTimeDurationToTime
, swrlb:subtractDayTimeDurationFromTime
, swrlb:subtractDateTimesYieldingYearMonthDuration
, swrlb:subtractDateTimesYieldingDayTimeDuration
, swrlb:listConcat
, swrlb:listIntersection
, swrlb:listSubtraction
, swrlb:member
, swrlb:length
, swrlb:first
, swrlb:rest
, swrlb:sublist
, and swrlb:empty
.
5.8. Explaining Reasoning Results¶
RDFox can display a proof of how a given triple has been derived. Such proofs can be very useful for explaining reasoning results to users as well as for understanding the reasoning process.
Consider a data store containing the triple
:kiki rdf:type :Cat .
and the following rules:
[?x, rdf:type, :Mammal] :- [?x, rdf:type, :Cat] .
[?x, rdf:type, :Animal] :- [?x, rdf:type, :Mammal] .
As a result of reasoning, RDFox will derive the following new triples:
:kiki rdf:type :Mammal .
:kiki rdf:type :Animal .
Suppose that we want to understand how triple :kiki rdf:type :Animal
has been derived. A way to do this in RDFox is to use the explain
command in the shell as follows:
explain :Animal[:kiki]
RDFox will explicate the reasoning process by displaying the following proof of the requested fact:
:Animal[:kiki]
:Animal[?x] :- :Mammal[?x] . | { ?x -> :kiki }
:Mammal[:kiki]
:Mammal[?x] :- :Cat[?x] . | { ?x -> :kiki }
:Cat[:kiki] EDB
We can read the proof bottom-up. Starting from fact :Cat[:kiki]
in
the data, we apply rule :Mammal[?x] :- :Cat[?x]
by matching variable
?x
to :kiki
and derive the fact :Mammal[:kiki]
. The
application of rule :Animal[?x] :- :Mammal[?x]
to fact
:Mammal[:kiki]
where ?x
is matched to :kiki
yields the
desired result.
Typically, there will be several different proofs for a given fact. To see this, suppose that we add to our data store the triples
:kiki :eats :luxury_pet_treat .
:luxury_pet_treat rdf:type :PetFood .
and the rule
[?x, rdf:type, :Animal] :- [?x, :eats, ?y], [?y, rdf:type, :PetFood] .
Then, in addition to the previous one, the following is also a proof
that :kiki
is an animal:
:Animal[:kiki]
:Animal[?x] :- :eats[?x,?y], :PetFood[?y] . | { ?x -> :kiki, ?y -> :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Indeed, we can match rule :Animal[?x] :- :eats[?x, ?y], :PetFood[?y]
to the data facts :eats[:kiki, :luxury_pet_treat]
and
:PetFood[:luxury_pet_treat]
by matching variable ?x
to
:kiki
and variable ?y
to :luxury_pet_treat
to derive
:Animal[:kiki]
.
If we run again the explanation command
explain :Animal[:kiki]
RDFox will display both proofs.
:Animal[:kiki]
:Animal[?x] :- :Mammal[?x] . | { ?x -> :kiki }
:Mammal[:kiki]
:Mammal[?x] :- :Cat[?x] . | { ?x -> :kiki }
:Cat[:kiki] EDB
:Animal[?x] :- :eats[?x,?y], :PetFood[?y] . | { ?x -> :kiki, ?y -> :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Since the number of possible different proofs for a given fact may be very large, we may be content with just obtaining a single one. We can use the explain command to obtain a shortest proof as follows:
explain shortest :Animal[:kiki]
which will return the following proof
:Animal[:kiki]
:Animal[?x] :- :eats[?x,?y], :PetFood[?y] . | { ?x -> :kiki, ?y -> :luxury_pet_treat }
:eats[:kiki,:luxury_pet_treat] EDB
:PetFood[:luxury_pet_treat] EDB
Indeed, this is the shortest proof as it involves a single rule application, whereas the alternative proof involves two rule applications.
When using the explanation command, it is important to understand that rules in RDFox can come from different sources
User rules such as the ones in our previous example are rules introduced directly by the user.
User axioms are OWL 2 axioms imported by the user, which are internally translated into rules.
Special rules are rules that have no direct connection with the information provided by the user and are internally added by RDFox. An example of special rules are the rules for subsumption reasoning provided at the end of the previous section, and another example are the rules obtained by axiomatizing equality as a transitive, reflexive and symmetric relation.
Consider for example a data store where we import the following triple:
:kiki rdf:type :Cat .
and also the following OWL 2 axioms in
functional syntax
SubClassOf( :Cat :Mammal )
SubClassOf( :Mammal :Animal )
If we now run the explain command
explain :Animal[:kiki]
we obtain the same proof as before:
:Animal[:kiki]
:Animal[?X] :- :Mammal[?X] . | { ?X -> :kiki }
:Mammal[:kiki]
:Mammal[?X] :- :Cat[?X] . | { ?X -> :kiki }
:Cat[:kiki] EDB
It is important to note, however, that the explicitly given OWL 2 axioms are not displayed in the proof, but rather the rules that are obtained from them internally.
5.9. Monitoring Reasoning in RDFox¶
This section gives an overview of the functionality implemented in RDFox for monitoring the progress of reasoning.
Let us start by creating a new data store in which reasoning will be performed in a single-threaded fashion:
dstore create default par-complex-nn
threads 1
To enable monitoring of reasoning we use the following shell commands, where the second one establishes the frequency at which information is provided in the console.
set reason.monitor progress
set log-frequency 1
We can now import rules and data which, in our case, will come from the well-known LUBM benchmark.
We first import the rules:
import LUBM_L.dlog
RDFox will then import the rules and display relevant information about the rule importation process:
Adding data in file './LUBM_L.dlog'.
[1]: START './LUBM_L.dlog'
[1]: FINISHED './LUBM_L.dlog'
Time since import start: 1 ms
Time since start of this import: 1 ms
Facts processed in this import: 0
Number of finished imports: 1
Total facts processed so far: 0
Import operation took 0.4 s.
Processed 98 rules, of which 98 were updated.``
In particular, we can see that 98 rules were imported in total and that rule importation took 0.4s.
We can now ask RDFox to print detailed information about the imported rules. For instance, the following command will provide statistics about the rule set and then will print each rule in a given order:
info rulestats print-rules by-body-size
RDFox will first provide some statistics about the rule set
================================ RULES STATISTICS ================================
Component Body size Nonrecursive rules Recursive rules Total rules
0 2 0 1 1
1 1 19 2 21
1 2 1 0 1
2 1 19 2 21
3 1 13 0 13
4 1 28 8 36
4 3 0 5 5
----------------------------------------------------------------------------------
Total: 80 18 98
----------------------------------------------------------------------------------
RDFox organizes rules by components, which gives us an idea of how information flows during reasoning. To give some intuition as to what a component is, consider the following simple set of rules:
[?x, rdf:type, :B] :- [?x, rdf:type, :A] .
[?x, rdf:type, :C] :- [?x, rdf:type, :B] .
[?x, rdf:type, :D] :- [?x, rdf:type, :B] .
[?x, rdf:type, :A] :- [?x, rdf:type, :D] .
We can see that :B
depends on :A
since to derive facts about
:B
we need to first obtain facts about :A
. Similarly, :C
and
:D
both depend on :B
. Finally, :A
depends on :D
, and
hence the first, third and fourth rules are involved in a cycle of
dependencies. As a result, the flow of information during rule
application can be seen in two stages: first, we need to derive all
facts about :A
, :B
and :D
using the first, third and fourth
rules. Then, we can derive all facts about :C
using the second rule.
To reflect this, RDFox will organize these rules into two components:
the first component will contain the first, third and fourth rules which
together are considered recursive (they are involved in a cycle of
dependencies), whereas the second rule will go in its own component and
will be identified as non-recursive.
In the table above, we can see the same kind of information concerning the more complex LUBM rules. We can see that rules are arranged in 5 components (0..4), we can see the number of rules involved in dependency cycles (recursive rules) in each component, as well as the total number of rules and their maximal body size.
RDFox then will print the rules component by component on the console
and within each component it will arrange the rules sorted by number of
atoms in their bodies. In our simple example about :A
, :B
,
:C
and :D
, the information printed will look as follows:
-- COMPONENT: 0
-- NONRECURSIVE RULES: 0
-- RECURSIVE RULES: 3
**********************************************************************************
** BODY SIZE: 1
** RECURSIVE RULES: 3
:B[?x] :- :A[?x] .
:D[?x] :- :B[?x] .
:A[?x] :- :D[?x] .
----------------------------------------------------------------------------------
-- COMPONENT: 1
-- NONRECURSIVE RULES: 1
-- RECURSIVE RULES: 0
**********************************************************************************
** BODY SIZE: 1
** NONRECURSIVE RULES: 1
:C[?x] :- :B[?x] .
==================================================================================
Now that we have imported the rules, we can import also the data:
import LUBM-large.ttl
At this point, RDFox will load the data (without performing any reasoning yet) and will provide information about the progress of loading. We can see an excerpt of such information below:
> import LUBM-large.ttl
Adding data in file './LUBM-large.ttl'.
[1]: START './LUBM-large.ttl'
[1]: PROGRESS './LUBM-large.ttl'
Time since start of import: 1001 ms
Time since start of this import: 1001 ms
Facts processed in this import: 418000
[1]: PROGRESS './LUBM-large.ttl'
Time since start of import: 2001 ms
Time since start of this import: 2001 ms
Facts processed in this import: 795000
[1]: PROGRESS './LUBM-large.ttl'
Time since start of import: 3002 ms
Time since start of this import: 3002 ms
Facts processed in this import: 1164000
...
[1]: FINISHED './LUBM-large.ttl'
Time since import start: 13143 ms
Time since start of this import: 13143 ms
Facts processed in this import: 5000000
Number of finished imports: 1
Total facts processed so far: 5000000
Import operation took 17.8 s.
Processed 5000000 facts, of which 5000000 were updated.
In particular, we can see how many data facts have been imported each second. We can also see that, in the end, 5,000,000 data triples were imported and that the import took 17.8s in total.
We can now compute the materialization of the LUBM rules and facts in
the store using the mat
command:
mat
RDFox will display information about the number of facts generated:
Materializing rules incrementally.
Rules will be processed by strata.
Maximum depth of backward chaining is unbounded.
Materialization time: 0 s.
------------------------------------------------------------------------------------------------
Table | Facts | EDB | IDB
------------------------------------------------------------------------------------------------
internal:triple | 6,826,914 -> 6,826,914 | 5,000,000 -> 5,000,000 | 6,826,913 -> 6,826,913
------------------------------------------------------------------------------------------------
The column labeled EDB tells us the number of facts that were explicitly given in the data file. In turn the column labeled IDB indicates the total number of facts in the store after materialization; in our case, this means that the system has derived a total of over 1.8 million new facts through rule application. The Table column indicates the name of each tuple table in the store. In this case, we just have the default triple table, but in other cases we may also have other tuple tables such as those obtained from named graphs. Each different tuple table will have different numbers of explicit and derived facts. Finally, the column labeled facts indicates the total number of memory slots that were reserved by different threads during reasoning; this number can actually be larger that the total number of facts in the system as some of these slots may not have been used to store a fact.
5.10. Querying the Explicitly given Data¶
After reasoning, RDFox will by default answer all SPARQL queries with
respect to the obtained materialization. For instance, suppose that we
have a data store with fact :a rdf:type :A
and the following rules:
[?x, rdf:type, :B] :- [?x, rdf:type, :A] .
[?x, rdf:type, :C] :- [?x, rdf:type, :B] .
[?x, rdf:type, :D] :- [?x, rdf:type, :B] .
[?x, rdf:type, :A] :- [?x, rdf:type, :D] .
The materialization will contain the following facts, where three of
them have been derived and only fact :a rdf:type :A
was originally
in the data:
:a rdf:type :A .
:a rdf:type :B .
:a rdf:type :C .
:a rdf:type :D .
If we issue a query
SELECT ?x WHERE { ?x rdf:type :D }
we will obtain :a
as a result.
In RDFox it is possible to query only the explicit data even after materialization has been performed. For this, we can use the shell command
set query.domain EDB
If we then issue the previous query again we will obtain the empty answer as a result.