9. Managing Tuple Tables¶
As explained in Section 4, a data store uses tuple tables as containers for facts – that is, triples and other kind of data that RDFox should process. Each tuple table is identified by a name that is unique for a data store. Moreover, each tuple table has a minimal and maximal arity, which are numbers determining the smallest and the largest numbers of RDF resources in a fact stored in the tuple table. In most cases, the minimal and maximal arity are the same, in which case they are called just arity.
9.1. Types of Tuple Tables¶
RDFox supports three kinds of tuple tables.
In-memory tuple tables are most commonly used kind of tuple table, which, as the name suggests, store facts in RAM. RDFox uses in-memory tuple tables of arity three to store triples of the default graph and the named graphs of RDF. In particular, an in-memory tuple table called
http://oxfordsemantic.tech/RDFox#DefaultTriples
is created automatically when a fresh data store is created, and RDFox will create additional in-memory tuple tables for each named graph it encounters. RDFox provides ways to add and delete facts in in-memory tuple tables.Data source tuple tables provide a ‘virtual view’ over data in non-RDF data sources, such as CSV files, relational databases, or a full-text Solr index. Such tuple tables must be created explicitly by the user, and doing so requires specifying how the external data is to be transformed into a format compatible with RDF. The facts in data source tuple tables are ‘virtual’ in the sense that they are constructed automatically by RDFox based on the data in the data source — that is, there is no way to add/delete such facts directly. Finally, data source tuple tables can be of arbitrary arity — that is, such tuple tables are not limited to containing just triples. Data source tuple tables and the process of importing external data are described in detail in Section 10.
Built-in tuple tables contain some well-known facts that can be useful in various applications of RDFox. The facts in such tuple tables cannot be modified by users; rather, they are produced on the fly by RDFox as needed. They are described in more detail in Section 9.5.
9.2. Fact Domains¶
Each fact in a tuple is associated with one or more fact domains.
The
EDB
fact domain contains facts that were imported explicitly by the user. The name EDB is an abbreviation of Extensional Database.The
IDB
fact domain contains facts that were derived using rules. The name IDB is an abbreviation of Intensional Database. This fact domain is used as the default in all operations that take a fact domain as argument.The
IDBrep
fact domain contains the representative facts of the IDB domain. This fact domain differs only in data stores for which equality reasoning (i.e., reasoning withowl:sameAs
) is turned on.The
IDBrepNoEDB
fact domain contains facts of the IDB domain that are not in the EDB domain, which are essentially facts that were derived during reasoning and were not present in the input.
A fact can belong to more than one domain. For example, facts added to
the store are stored into the EDB
domain, and during reasoning they
are transferred into the IDB
domain.
Only the EDB
fact domain can be directly affected by users. That is, all
explicitly added facts are added to the EDB
domain, and only those facts
can be deleted. It is not possible to manually delete derived facts since the
meaning of such deletions is unclear.
Many RDFox operations accept a fact domain as an argument. For example, SPARQL
query evaluation takes a fact domain as an argument, which determines what
subset of the facts the query should be evaluated over. Thus, if a query is
evaluated with respect to the EDB
domain, it will ‘see’ only the facts that
were explicitly added to a data store, and it will ignore the the facts that
were derived by reasoning.
9.3. Managing and Using Tuple Tables¶
RDFox provide ways for creating and deleting tuple tables: this can be
accomplished in the shell using the tupletable
command (see
Section 15.2.2.40), and the relevant APIs are described in
Section 13.6. When creating a tuple table, one must specify a
list of key-value parameters that determine what kind of tuple table is to be
created. The parameters for data source tuple tables depend on the type of data
source and are described in detail in Section 10. Moreover, the
parameters for in-memory and built-in tuple tables are described in
Section 9.4 and Section 9.5,
respectively.
RDFox provides ways to add and delete facts to in-memory tuple tables: this can
be accomplished in the shell using the import
command (see
Section 15.2.2.20), and the relevant APIs are described in
Section 13.4.5.
Facts in a tuple table can be accessed during querying and reasoning. In
queries, tuple tables corresponding to the default graph and the named graphs
can be accessed using standard SPARQL syntax for triple patterns and the
GRAPH
operator — that is, a triple pattern outside a GRAPH
operator
will access the http://oxfordsemantic.tech/RDFox#DefaultTriples
tuple
table, and a triple pattern inside a GRAPH :G
operator will access the
in-memory tuple table with name :G
. To access tuple tables of other types,
RDFox extends the SPARQL syntax with the TT
operator, which is described in
Section 5.3. Note that the default graph and the named
graphs can also be accessed using the TT
operator. Moreover, tuple tables
can be accessed in rules using the general atom syntax described in
Section 6.4.1.3. Since only in-memory tuple tables can be modified by
users, any atom occurring in the head of a rule is allowed to mention only an
in-memory tuple table.
9.4. In-Memory Tuple Tables¶
RDFox uses in-memory tuple tables to store facts imported by the users. At
present, RDFox supports only tuple tables of arity three, thus allowing the
system to store only triples. An in-memory tuple table called
http://oxfordsemantic.tech/RDFox#DefaultTriples
is created automatically
when a fresh data store is created. Moreover, in-memory tuple tables can be
created in the following three ways.
When instructed to import data containing triples in graphs other than the default one, RDFox will automatically create a tuple table for each named graph it encounters.
The SPARQL 1.1 Update command
CREATE GRAPH
creates an in-memory tuple table for each named graph.In-memory tuple tables can be created using tuple table management APIs. The main benefit of this over the above two methods is the ability to specify additional parameters, as described in the following table.
Parameter |
Default value |
Description |
---|---|---|
|
|
Specifies that the tuple table will be used to store triples — that is, the tuple table backs the default or a named graph. This parameter must be specified when creating an in-memory tuple table. |
|
(as in the data store) |
Specifies the maximum number of triples that the new tuple table will be able to hold. The main purpose of this parameter is to reduce the amount of address space that the tuple table will use. The default value for is the value of the data store parameter with the same name. |
|
(as in the data store) |
Provides a hint as to how many facts the system should expect to store initially in the tuple table. When importing large data sets, setting this parameter to be roughly equal to the number of facts to be imported can significantly improve the speed of importation. |
9.5. Built-In Tuple Tables¶
Built-in tuple tables are similar to built-in functions; however, whereas a
built-in function returns just one value for a given number of arguments, a
built-in tuple table can relate sets of values. Thus, facts in built-in tuple
tables are not stored explicitly; rather, they are produced on the fly as query
and/or rule evaluation progresses. Other than this internal detail, built-in
tuple tables are used in queries and rules just like any other tuple table:
they are referenced in queries using the proprietary TT
operator (see
Section 5.3), and they are referenced in rules using
general atoms (see Section 6.4.1.3). Built-in tuple tables are the only
ones for which the minimal and the maximal arity are not necessarily the same.
Each built-in tuple table is identified by a well-known name, which cannot be
changed. The names of all of built-in tuple tables starts with
http://oxfordsemantic.tech/RDFox#
, which is abbreviated in the rest of this
section as rdfox:
. For example, the rdfox:SKOLEM
built-in tuple table
is always available under that name. When a data store is created, all built-in
tuple tables supported by RDFox will be created automatically. It is very
unlikely that users will ever need to delete built-in tuple tables;
nevertheless, for the sake of consistency, RDFox allows such tuple tables to be
deleted just like any other tuple table. In case a built-in tuple table is
deleted, it can be recreated using standard methods, by simply specifying the
tuple table name without any parameters. (Please note that, as a consequence of
this, it is not possible to create an in-memory or a data source tuple table
with a name that is reserved for a built-in tuple table.)
9.5.1. rdfox:SKOLEM
¶
The rdfox:SKOLEM
tuple table can have arity from one onwards. Moreover, in
each fact in this tuple table, the last resource of the fact is a blank node
that is uniquely determined by all remaining arguments. This can be useful in
queries and/or rules that need to create new objects. This is explained using
the following example.
Example: Let us assume we are dealing with a dataset where each person
is associated with zero or more companies using the :worksFor
relationship. For example, our dataset could contain the following triples.
:Peter :worksFor :Company1 .
:Peter :worksFor :Company2 .
:Paul :worksFor :Company1 .
Now assume that we wish to attach additional information to each individual
employment. For example, we might want to say that the employment of
:Peter
in :Company1
started on a specific date. To be able to
capture such data, we will ‘convert’ each :worksFor
link to a separate
instance of the :Employment
class; then, we can attach arbitrary
information to such instances. This presents us with a key challenge: for
each combination of a person and company, we need to ‘invent’ a fresh
object that is uniquely determined by the person and company.
This problem is solved using the rdfox:SKOLEM
built-in tuple table. In
particular, we can restructure the data using the following rule.
:Employment[?E], :employee[?E,?P], :inCompany[?E,?C] :- :worksFor[?P,?C], rdfox:SKOLEM("Employment",?P,?C,?E) .
The above rule can be understood as follows. Body atom :worksFor[?P,?C]
selects all combinations of a person and a company that the person works
for. Moreover, atom rdfox:SKOLEM("Employment",?P,?C,?E)
contains all
facts where the value of ?E
is uniquely determined by the fixed string
"Employment"
, the value of ?P
, and the value of ?C
. Thus, for
each combination of ?P
and ?C
, the built-in tuple table will
produce a unique value of ?E
, which is then used in the rule head to
derive new triples.
How a value of ?E
is computed from the other arguments is not under
application control: each value is a blank node whose name is guaranteed to
be unique. However, what matters is that the value of ?E
is always the
same whenever the values of all other arguments are the same. Thus, we can
use the following rule to specify the start time of Peter’s employment in
Company 1.
:startDate[?E,"2020-02-03"^^xsd:date] :- rdfox:SKOLEM("Employment",:Peter,:Company1,?E) .
After evaluating these rules, the following triples will be added to the
data store. We use blank node names such as _:new_1
for clarity: the
actual names of new blank nodes will me much longer in practice.
_:new_1 rdf:type :Employment .
_:new_1 :employee :Peter .
_:new_1 :inCompany :Company1 .
_:new_1 :startDate "2020-02-03"^^xsd:date .
_:new_2 rdf:type :Employment .
_:new_2 :employee :Peter .
_:new_2 :inCompany :Company2 .
_:new_3 rdf:type :Employment .
_:new_3 :employee :Paul .
_:new_3 :inCompany :Company1 .
When creating fresh objects using the rdfox:SKOLEM
built-in tuple table, it
is good practice to incorporate object type into the argument. The above
example achieved this by passing a fixed string "Employment"
as the first
argument of rdfox:SKOLEM
. This allows us to create another, distinct blank
node for each combination of a person and a company by simply varying the first
argument of rdfox:SKOLEM
.
Atoms involving the rdfox:SKOLEM
built-in tuple table must satisfy certain
binding restrictions in rules and queries. Essentially, it must be possible
to evaluate a query/rule so that, once an rdfox:SKOLEM
atom is reached,
either the value of the last argument, or the values of all all but the last
argument must be known. This is explained using the following example.
Example: The following query cannot be evaluated by RDFox — that is, the system will respond with a query planning error.
SELECT ?P ?C ?E WHERE { TT rdfox:SKOLEM { "Employment" ?P ?C ?E } }
This query essentially says “return all ?P
, ?C
, and ?E
where
the value of ?E
is uniquely defined by "Employment"
, ?P
, and
?C
”. The problem with this is that the values of ?P
and ?C
have
not been restricted in any way, so the query should, in principle, return
infinitely many answers.
To evaluate the query, one must provide the values of ?P
and ?C
, or
for ?E
, either explicitly as arguments or implicitly by binding the
arguments in other parts of the query. Thus, both of the following queries
can be successfully evaluated.
SELECT ?E WHERE { TT rdfox:SKOLEM { "Employment" :Paul :Company2 ?E } }
SELECT ?T ?C ?P WHERE { TT rdfox:SKOLEM { ?T ?C ?P _:new_1 } }
The latter query aims to unpack _:new_1
into the values of ?T
,
?C
, and ?P
for which _:new_1
is the uniquely generated fresh
blank node. Note that such ?T
, ?C
, and ?P
may or may not exist,
depending on the algorithm RDFox uses to generate blank nodes. The
following is a more realistic example of blank node ‘unpacking’.
SELECT ?T ?C ?P WHERE { ?E rdf:type :Employment . TT rdfox:SKOLEM { ?T ?C ?P ?E } }