11. Transactions¶

Operations in a database are normally grouped into a transaction, which is a unit of work that must be executed atomically and in apparent isolation from other transactions.

Each transaction in RDFox operates on a single data store — that is, transactions cannot span several data stores within a single server or data stores in different servers.

A transaction can be rolled back (i.e., aborted without changing the data store) or committed.

Transactions in RDFox satisfy the well-known ACID properties:

Atomicity: If an operation inside a transaction starts changing the store but then fails in the middle, the transaction will be rolled back and hence an operation in a transaction cannot be partially executed.
Consistency: A transaction can only bring the store from a consistent state to another consistent state. In RDFox, consistency means that 1) every implicit fact that logically follows from the given rules and explicit facts has been materialized and 2) each constraint defined on the data store content is satisfied (see Section 11.5).
Isolation: Transactions appear to be executed as if no other transaction was being executing at the same time.
Durability: The effect of a committed transaction is never lost; in particular, once a transaction has been committed RDFox ensures that the state of the relevant data store will be persisted on disk. Durability is configurable in RDFox: see Section 13 for details.

11.1. Types of Transactions¶

Transactions in RDFox can be read/write or read-only. A data store can be updated only by a read/write transaction. Changes made by a read/write transaction are immediately visible to the transaction, including any new facts derived from reasoning (if the user chooses to).

Example Read/write transaction

Consider again our usual family example. Let us first initialize a store in the shell and load the data as we did in the Getting Started guide:

dstore create family
active family
import data.ttl
set output out

We can now group in a read/write transaction a write operation consisting of a rule importation and a read operation consisting of a query as follows:

begin
import ! [?p, :hasChild, ?c] :- [?c, :hasParent, ?p] .
select ?p ?c where { ?p :hasChild ?c }
commit

Reasoning will happen after the rule is imported and hence the query results will reflect the changes in the materialization. In particular, we will obtain the following query results:

:lois :meg .
:peter :meg .
:peter :chris .
:lois :stewie .

If an operation in a transaction starts changing the store but fails in the middle, the transaction will be rolled back. For instance, in the previous transaction an error could happen in the middle of the import operation when the instruction has started changing the store but has not finished (thus leaving the database in an inconsistent state). In this case, the transaction will be rolled back.

In contrast, if an operation throws an exception without having changed the store, then there is no rollback. The idea is that if an operation throws an exception but it does not change the store, then you can just continue because you know that the failing operation failed in its entirety. For instance, consider the following read/write transaction where the second rule being imported contains syntax errors.

begin
import ! [?p, :hasDescendant, ?c] :- [?c, :hasParent, ?p] .
import ! [?x, :marriedTo ?y] - [?y, :marriedTo, ?x] .
commit

The first rule will be imported into the store and the importation of the second rule will fail. The transaction, however, will commit since the second importation operation has failed before it has actually made any changes to the store. Indeed, if we now run the query

select ?x ?y where {?x :hasDescendant ?y}

we will obtain four answers, showing that the first rule has taken effect.

Read-only transactions are only allowed to query a data store and cannot update its contents in any way. Their use is demonstrated in the following example.

Example Read-only transactions

Building on the previous example, we can write in the shell a transaction consisting of two queries over the store.

begin read
select ?p ?n where { ?p rdf:type :Person . ?p :forename ?n }
select ?x ?y where { ?x :marriedTo ?y }
commit

We will obtain the result of both queries as a result. Attempting to update the store in a read-only transaction as given below will immediately lead to an error in RDFox indicating that read-only transactions do not support updates.

begin read
import ! [?p, :hasChild, ?c] :- [?p, :hasDescendant, ?c] .
commit

11.2. Concurrent Execution of Transactions¶

At each point in time the following transactions can be active in a data store:

a single read/write transaction; or
multiple read-only transactions.

As a result of this model, the common issues associated with concurrent execution of transactions in databases (e.g., “dirty reads”) cannot occur in RDFox. In particular, RDFox achieves the serializability isolation level without the need to implement any mechanism (such as locking) to prevent concurrency anomalies.

11.3. Explicit and Implicit Transactions¶

Every data store operation takes place in a transaction. In the examples above, transactions are explicity opened using the begin shell command and committed with the commit shell command. Having to run two additional commands for each operation would be inefficient so RDFox supports implicit transactions for use in cases where the user wishes to perform only a single operation.

Implicit transactions work as follows. If, when an operation begins, no transaction has been explicity opened on the active connection, RDFox will open an implicit transaction to support the operation. If the operation in question only needs to read data, the implicit transaction will be a read-only transaction, otherwise it will be a read/write transaction.

Once the operation has completed successfully, the implicit transaction is then closed, either by rolling it back (in the case of read operations) or by committing it (in the case of write operations). If, instead, the operation results in an error, implicit transactions are always rolled back.

11.4. Recoverable and Non-recoverable Errors Within Read/Write Transactions¶

Errors occuring within a read/write transaction are classified as either recoverable or unrecoverable. As the name suggests, unrecoverable errors put the transaction into an error state which mean that it cannot be committed and must instead be rolled back.

11.5. Constraining Data Store Content¶

Transactions in which the default RDF graph contains at least one instance of the class <https://rdfox.com/vocabulary#ConstraintViolation> cannot be committed. Since RDFox runs incremental materialization prior to committing each Read/Write transaction, rules which derive an instance of the above class act as constraints on a data store’s content.

When an attempt to commit a transaction fails due to constraint violations, the resulting error message will include up to ten properties of up to ten of the violations to aid diagnosis of the problem.

The following examples use the prefix rdfox: to represent <https://rdfox.com/vocabulary#> and : to represent <http://example.com/>.

Example Mandatory property constraint

The following rule prevents instances of class foaf:Person from being added to the default graph unless they have at least one foaf:mbox property.

[?person, a, rdfox:ConstraintViolation] :-
    [?person, a, foaf:Person],
    NOT EXIST ?mbox IN [?person, foaf:mbox, ?mbox] .

With this rule loaded, attempting to import the following triples will fail with the message shown underneath.

:alice a foaf:Person; foaf:name "Alice" .

The transaction could not be committed because it would have introduced
the following constraint violation:

<http://example.com/alice> <http://xmlns.com/foaf/0.1/name> "Alice";
    rdf:type <http://xmlns.com/foaf/0.1/Person> .

Although it is possible to make existing resources members of the constraint violation class, as in the example above, more informative failure messages can be obtained by ensuring that each separate violation has a unique resource to represent it. The built-in tuple-table SKOLEM can be used to generate blank nodes for this purpose.

Once each violation has its own resource, it is safe to add further atoms to the rule head to associate with the violation any additional information that will help the reader of the error message understand what is wrong.

Example Improved mandatory property constraint

As in the previous example, the following rule prevents insertion of foaf:Person instances with no foaf:mbox property but this time using SKOLEM.

PREFIX rdfox: <https://rdfox.com/vocabulary#>
[?v, a, rdfox:ConstraintViolation],
[?v, :mboxMissingFrom, ?person],
[?v, :constraintDescription, "Every foaf:Person must have at least one foaf:mbox property."] :-
    [?person, a, foaf:Person],
    NOT EXIST ?mbox IN [?person, foaf:mbox, ?mbox],
    SKOLEM("MissingMbox", ?person, ?v) .

The rule head classifies the blank node bound to ?v via the SKOLEM tuple table as a constraint violation and gives it additional properties identifying the deficient foaf:Person node and describing the constraint it violates in natural language. With this rule loaded, the failure message for importing the same data as in the previous example is:

The transaction could not be committed because it would have introduced
the following constraint violation:

_:__05TWlzc2luZ01ib3gA_02aHR0cDovL2V4YW1wbGUuY29tL2FsaWNlAA-- <http://example.com/constraintDescription> "Every foaf:Person must have at least one foaf:mbox property.";
  <http://example.com/mboxMissingFrom> <http://example.com/alice> .

The presence of constraint violations at the moment a transaction commit is attempted results in a _recoverable_ error. That is, if the transaction has been explicitly opened, it is possible to attempt to fix the data so that it no longer violates the constraint, and to then retry the commit.

11.6. Commit Procedure (EXPERIMENTAL)¶

It is often useful for the data added explicitly to a data store to be automatically expanded in some way. In the vast majority of cases, the best way to achieve this in RDFox is through reasoning. There are, however, are a few situations where reasoning is not suitable and, for these cases, RDFox provides a second mechanism for automatically deriving additional information from each transaction’s data: the commit procedure.

A commit procedure is a user-specified SPARQL update that is evaluated as part of committing each and every read/write transaction on a data store. As with rules, a commit procedure can match data in the store and introduce additional facts derived from the matched data. Unlike rules, facts added by a commit procedure are added to the explicit rather than the derived fact domain and won’t be retracted if the supporting facts are later deleted. This is useful in the audit logging example given below.

Some other differences between rules and commit procedures are:

commit procedures may use built-in functions (such as NOW(), ROLE() and RAND()) and tuple tables (such as SHACL) that are not supported in Datalog rules,
commit procedures can delete as well as add facts facts whereas rules can only add facts,
commit procedures are only evaluated at commit time whereas rules may be evaluated during a read/write transaction if a query is received,
after the initial materalization, rules are evaluated using incremental reasoning whereas commit procedures are always evaluated over the whole stored data set.

A commit procedure can be set on a data store using the commitproc shell command (see Section 15.2.8) or using an API call (see Section 16.6). As with any SPARQL update, a data store’s commit procedure can be formed from multiple steps using the semi-colon separator.

11.6.1. Outline of Transaction Commit Process¶

For each read/write transaction that is committed, whether explicit or implicit, RDFox performs the following steps:

For each step in the commit procedure:
1. ensure materialization is up-to-date,
2. evaluate the step, deleting and inserting facts as necessary,
ensure materialization is up-to-date,
for each step in the commit procedure:
1. re-evaluate the step, counting how many fresh deletions or insertions take place,
  1. if there were any fresh deletions or insertions, throw an error,
query the default graph for triples matching ?v a rdfox:ConstraintViolation,
1. if the query returned one or more matches, throw an error,
persist the transaction according to the persistence setting for the data store.

Step 3 guards against instabilities that can arise from the combination of a set of Datalog rules and a commit procedure. In this context, instability refers to the possibility for reasoning to derive new facts that match the commit procedure’s WHERE clause leading to insertion of additional new facts which create yet more rule applications and resulting facts, and so on. Without the convergence check implemented in step 3, this could lead to a situation where the evaluation of a commit procedure from one transaction would create latent materialization work for the next transaction. In that case, simply opening and immediately committing a transaction would lead to new facts appearing in the data store even though no new facts were added by the user. Note that the error thrown in this situation (step 3.1.1) is unrecoverable.

Warning

RDFox’s mechanism for detecting instability between rules and commit procedures relies on the presence of data that triggers the divergent behaviour. This means that is possible to add an intrinsically unstable combination of rules and commit procedure while there is no data present. The problem would then go undetected until such a time as the right data pattern is inserted to trigger the convergence check. To avoid encountering these problems in production settings, it is vital that the developers of the rules and commit procedure carefully analyze them to ensure that they do not interact in the way described above.

11.6.2. Performance Considerations¶

As already mentioned, with the exception of the initial materialization step, a data store’s rules are re-evaluated in each commit using RDFox’s efficient incremental reasoning algorithms. This stops the cost of maintaining the correct set of derived facts from growing as the total number of rule body matches grows. Instead the cost remains proportional to the number of new matches due to the changes in the current transaction.

By contrast, SPARQL updates and, by extension, data store commit procedures, are always evaluated over the entire data set. Used naively, a commit procedure could therefore take longer and longer to evaluate as the data set grows. This bad scaling behaviour can, however, be avoided by using Datalog rules to find the parts of the data where the commit procedure should apply. This is demonstrated in the following example.

Example: Audit logging with commit procedures.

In this example we will use a commit procedure to establish audit logging for some class of entities within a data store. To begin with, we will use a naive commit procedure and then adapt it to show how to avoid performance problems.

We begin by defining a class, :Action, on which we want to perform audit logging — that is to track who inserted each instance into the data store and when. We assume that access control is configured such that the users we are tracking have read and write access to the default graph in the data store but no access to any named graphs or the data store’s rules or commit procedure. Given this we can achieve what we want by setting the following SPARQL update as our commit procedure:

INSERT {
   GRAPH :actionLog {
      ?action :actionTakenBy ?role ;
         :actionTakenAt ?now .
   }
}
WHERE {
   ?action a :Action .
   FILTER(NOT EXISTS {
      GRAPH :actionLog {
         ?action :actionTakenBy ?anyRole .
      }
   })
   BIND(ROLE() AS ?role)
   BIND(NOW() AS ?now)
}

This finds each instance of the :Action class that doesn’t have an :actionTakenBy property in the :actionLog graph, and inserts both :actionTakenBy and :actionTakenAt properties using the values returned by the ROLE() and NOW() built-in functions respectively. Assuming that this is saved to file commit-procedure.rq, we can set it as the active data store’s commit procedure with:

commitproc set commit-procedure.rq

We can test that this has worked by inserting a triple matching ?a a :Action into the data store and then querying the :actionLog named graph. This is demonstrated in the following excerpt from and RDFox shell session:

> import ! :anAction a :Action .
Adding data.
Import operation took 0.016 s.
Processed 1 fact, of which 1 was updated.
> set output out
output = "out"
> SELECT * WHERE { GRAPH :actionLog { ?S ?P ?O } }
@prefix : <http://rdfox.com/examples/commitprocedure#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:anAction :actionTakenAt "2023-06-05T17:58:27.991+01:00"^^xsd:dateTime .
:anAction :actionTakenBy "guest" .
Number of query answers:        2
Total number of query answers:  2
Total statement evaluation time: 0 s

This achieves our functional aims but we must also consider performance. Evaluation of the WHERE clause of the commit procedure above will iterate over every historical instance of the :Action class on each commit, even if no new instances have been inserted. Initially, when there are few instances, the cost of this will be negligible but over time this may become problematic.

We can solve this scaling challenge using rules and incremental reasoning. Instead of using FILTER(NOT EXISTS { ... }) in SPARQL to find newly-added actions, we install the following Datalog rule to add each of them to the class, :NewAction:

[?a, a, :NewAction] :-
   [?a, a, :Action],
   NOT EXISTS ?anyRole IN (
      [?a, :actionTakenBy, ?anyRole] :actionLog
   ) .

Because rules are evaluated incrementally, the cost of identifying the new actions in each transaction will now be proportional to the number of new instances rather than the total number of instances in the data store. The commit procedure can then be simplified to iterate over just the instances of the incrementally evaluated :NewAction class:

INSERT {
   GRAPH :actionLog {
      ?action :actionTakenBy ?role ;
         :actionTakenAt ?now .
   }
}
WHERE {
   ?action a :NewAction .
   BIND(ROLE() AS ?role)
   BIND(NOW() AS ?now)
}

Note that this example is only for demonstration purposes and does not constitute a production-ready audit logging feature. For advice on achieving a more robust setup, please contact Oxford Semantic Technologies for support.