13. Persistence

RDFox is a main-memory data store. This means that a full copy of the data loaded into the system is always held in main memory to support query answering and other operations. It is, however, possible to configure RDFox to incrementally save various types of information to persistent storage so that it may be restored in a subsequent or concurrent session.

There are three reasons to enable persistence. The first is when RDFox is being used as the System of Record for the data it contains. In this case, persisting to disk is essential in order to achieve acceptable levels of data durability. The second reason is to improve restart performance. Using persistence achieves this because it is generally faster for RDFox to reload data stored in its native format than it is to import it from other formats, or to copy it from another data store. The third reason is so that changes to the data store can be continuously replicated amongst several RDFox instances. Only the file-sequence persistence type described in Section 13.2.2 supports this last goal.

When persistence is enabled, care must be taken to ensure that sufficient disk space is available for RDFox to persist new data as well as ensuring that there is sufficient memory.

RDFox’s persistence is compliant with the ACID properties described in Section 11.

Note

Persistence does not increase the capacity of an RDFox data store beyond what can be stored in memory.

13.1. Key Concepts

This section introduces some of the key concepts of RDFox’s persistence model.

13.1.1. Version Numbers

Each component that supports persistence (server, role manager, and data store) has a version number that is saved and restored when persistence is enabled.

The server’s version numbers is incremented for every change to the server’s data store catalog. Similarly, the role manager’s version number is incremented for every change to one of the server’s roles. These version numbers are visible in the output of the serverinfo shell command and equivalent API calls (to see the role manager version number, the extended info must be requested).

Data store version numbers are incremented as part of each succesfully committed transaction that changes the content of the data store. They are also incremented by two special operations that may be invoked by the user: rematerialization and compaction. Rematerialization causes RDFox to drop all state held for the purposes of incremental reasoning and perform a full, from-scratch materialization. Compaction is described in Section 13.1.3. Data store version numbers are visible in the output of the info shell command and equivalent API calls.

13.1.2. Snapshots and Deltas

For data store persistence, versions may be persisted as either snapshots or deltas. Snapshots contain a full copy of the content whereas delta contains only the incremental changes since the previous version. The time and disk space needed to create a snapshot are proportional to the size of the full content whereas the time and disk space needed to create a delta are proportional to the size of the change since the previous version. For role and server persistence each version is saved as a complete snapshot.

When reloading a data store, RDFox must begin at a snapshot but can then apply any arbitrary sequence of deltas and snapshots that follows. To ensure that there is always a snapshot at which restoration can begin, the first versions of a data store, up to and including the first that contains at least one fact, are always persisted as snapshots. Thereafter, new versions are persisted as deltas, with the exception of rematerialization and compaction operations which create new snapshot versions.

It is possible to observe the version number of the most recent snapshot for a data store in the output of the info shell command or equivalent API calls.

13.1.3. Compaction

Compaction, which can be triggered by the compact shell command or equivalent API calls, is a data store operation that attempts to free memory and disk space.

In regular operation, deleting a fact from a tuple table only marks the entry as deleted and does not reclaim the associated memory. This makes deletion fast; however, in use cases with a high turnover of facts, the amount of memory associated with deleted facts may become significant over time. Compaction reorganizes the non-deleted facts in the store into the minimal amount of memory so that surplus memory can be freed.

When persistence is enabled, high fact turnover can also cause storage efficiency to degrade over time. This is because, in contrast to snapshots which only use disk space to store the non-deleted content of the store, deltas contain copies of facts, rules and axioms that have been deleted. After many transactions, the total disk space occupied by content that has been deleted may become significant. The need to replay long sequences of deltas when restoring also causes restart performance to deteriorate as the number of deltas since the last snapshot grows. Compaction creates a new snapshot on disk to improve restart performance, and it can also remove earlier snapshots and deltas to improve storage efficiency. For file persistence, this happens automatically; in contrast, for file-sequence persistence, an additional argument must be supplied to specify that redundant files should be deleted. See Section 13.2.2 for an explanation of why this is the case.

Compacting a data store may take a long time to complete: the time will be roughly proportional to the aggregate size of the data store’s tuple tables. During this time, the data store cannot be queried or modified. When persistence is enabled, compaction increases disk usage at least temporarily due to the new snapshot, so it is important to ensure that the disk has sufficient capacity to accommodate this. Furthermore, in cases where a server directory is shared by more than one RDFox instance, it is also important to ensure that all instances are in sync before performing compaction with redundant file removal. For these reasons, compaction is not run automatically but must be triggered explicitly by the user. It is recommended that operators of systems with high fact turnover perform compaction as a periodic maintenance task.

13.2. Configuration

In order to use any form of persistence, an RDFox server must be configured with a server directory from which data and settings will be loaded and to which data will be saved. Users can choose to persist roles only, or both roles and data stores. Persistence of roles is controlled by the persist-roles server parameter while persistence of data stores is controlled by the persist-ds server parameter. Each of these options can be set to off (no persistence), file, or file-sequence. The file and file-sequence options are described in the following sections. When both roles and data stores are persisted, the same persistence type must be used for both. The allowed combinations of these two parameters are given in the following table.

persist-roles server parameter value

Allowed persist-ds server parameter values

off

off

file

off, file

file-sequencce

off, file-sequencce

Once a server directory has been populated using a given type of persistence, it can only be used by servers configured to use that type of persistence. The easiest way to ensure this, is to specify the desired persistence options for the directory in a server parameters file.

The server parameter persist-ds actually controls whether data store persistence is enabled at the RDFox server or not. When this parameter is set to off, it is not possible to persist data stores. When it is set to one of the two other accepted values, newly created data stores will be persisted using the selected persistence type unless the persist-ds data store parameter is explicitly set to off. In this case the data store will be created in memory only.

13.2.1. file persistence

The file persistence option stores the persisted content (whether it is the data store catalog, roles database, or the content of a data store) in a single file. For all types of content, the file begins with the most recent snapshot. In the case of data store content, the file may then contain a sequence of deltas made by the transactions since the last snapshot version.

A server directory containing this persistence type can only be used by one RDFox server at a time. RDFox ensures that this is the case by seeking an exclusive lock on the directory when the server instance is created and exiting if the lock cannot be obtained.

When using this persistence type, backups of the server directory must be coordinated with data store write operations to avoid backing up incomplete deltas. If RDFox encounters an incomplete delta while restoring from a file, it will be unable to make further progress (accept new write operations) until the user runs a compact operation. At that point, any data in the partial delta will be deleted from the disk. To avoid this situation, backups of the server directory should only be taken during a maintenance window in which no read-write transactions are performed.

When data stores persisted with this file type are compacted, the single persistence file containing the previous snapshot and subsequent deltas, is wholly replaced. This means that the disk space occupied by these old persistence records is automatically freed and no additional arguments are needed.

13.2.1.1. Filesystem requirements

In order to exclusively lock the server directory , file persistence uses the flock system call with the LOCK_EX flag on Linux, and the CreateFileW system call with the dwShareMode set to 0 on Windows. In both cases, the underying file system must faithfully and correctly support the locking semantics of those calls.

13.2.2. file-sequence persistence

The file-sequence persistence option stores the persisted content (whether it is the data store catalog, roles database, or the content of a data store) in a sequence of files with one file per version. The path of each file is determined by the relevant version number (server, role manager or data store).

Unlike with file persistence, a server directory using file-sequence persistence may be shared by several RDFox servers at once. So long as the underlying file system meets the criteria described in Section 13.2.2.1, any modification succesfully made via any of these instances will be replicated to the other instances eventually. This provides the basis for deploying RDFox in a leaderless, high availability (HA) configuration. Please refer to Section 13.2.2.2 for information about replication lag and Section 21 for details of how to setup HA deployments.

Also unlike the file persistence type, file-sequence server directories can be backed up during write operations, removing the need for maintenance windows in order to collect backups. This is because the files that form the file sequence never contain partial deltas.

When data stores persisted with this file type are compacted, old snapshots and deltas are not automatically deleted from the disk. This is because deleting the files that contain these records could cause other RDFox instances restoring the file sequence to diverge from the committed version history. For example, if an instance has not yet restored content from the file corresponding to version v, where v is less than the current data store version, and the file is deleted, that instance will restore versions up to v-1 and then become available to accept writes. Since the path reserved for version v is empty, a write via this instance will succeed, creating a divergence in the version history that could lead to data loss.

So long as all of the instances sharing a server directory have restored the globally highest version, the above problem cannot happen. This could be guaranteed by shutting down all but one of the instances or by blocking all inbound write requests and waiting until instances report the same data store version. To allow disk space to be freed when such conditions have been established, compact supports an optional argument indicating that redundant files should be deleted.

Note

In order to use the file-sequence persistence option, the license key provided at startup must explicitly support the feature.

Note

The file-sequence persistence option is EXPERIMENTAL on Windows and macOS.

13.2.2.1. Filesystem requirements

In order to make it safe for any of the RDFox instances sharing a server directory to persist new transactions, each instance must add files to file sequences in a way that will fail if the version they are trying to create has already been created by another instance. This requires that the file system containing the server directory supports an atomic move or link operation that returns a distinct error code when the target path is already occupied.

The Network File System (NFS) protocol meets the above requirement through the LINK operation (documented, in Sectiion 18.9 of RFC5661 <https://datatracker.ietf.org/doc/html/rfc5661#section-18.9>). Oxford Semantic Technologies has successfully tested the file-sequence persistence option under sustained write contention on the following managed NFS services:

In each case, testing was performed by provisioning an instance of the file system and mounting it to three Linux hosts in separate availability zones using the mounting procedure recommended by the service provider. An instance of RDFox was then started in shell mode on each host with the server-directory parameter pointing to the same directory on the NFS file system. Using one of the instances, a data store was created with the following command:

dstore create default

Next, the following commands were run on each host:

set output out
rwtest 500 5000

The rwtest shell command has been designed specifically to detect replication errors and is described fully in Section 15.2.40. When invoked as shown, the test attempts to commit a transaction every 2.75 s on average. Running the command on three instances simultaneously results in frequent write contention events.

After 72 hours, the rwtest command was interrupted by issuing Ctrl-C in the terminal on each host. This produces a final report that shows the total number of transactions successfully committed by each instance. The sum of these numbers was found to match the data store version minus 1 (the initial data store version) as expected. If more than one of the instances had concluded that it had successfully created any given version, the sum of these numbers would be higher. If in any iteration of the test loop on any of the three instances the content of the data store differed from the expected content, which is known for each data store version, the test would have stopped with an error.

The above procedure constitutes a minimum test for qualifying file systems (and the associated configuration options) for production use in scenarios where write contention may occur. Users planning deployments of RDFox that use the file-sequence persistence option are advised to conduct their own testing using this procedure. The degree of write contention can be varied in the test by changing the numeric parameters to the command which represent the minimum and maximum duration in milliseconds between iterations of the test loop.

It is worth noting that the atomic operation described above is only required in situations where there is a risk of write contention and that a broader range of file systems may be safe to use under the constraint that write contention will not occur. This can be achieved by ensuring (externally) that all writes will be committed via the same nominated instance. Some approaches to this are reviewed in Section 21.3. To qualify file systems for use in such setups, the rwtest command can be invoked with the read-only argument on all but one of the hosts.

Note

On UNIX operating systems, RDFox uses the link system call as the atomic commit operation, while on Windows MoveFileW is used. The EEXISTS (UNIX) and ERROR_ALREADY_EXISTS (Windows) error codes are interpreted to mean that the commit has failed because another instance successfully committed a change first.

13.2.2.2. Replication Performance

In order for a change to be replicated between instances, the instance writing the change must succesfully commit it to the path reserved for the new version number, and the other instances must then discover it and apply the changes to their own in-memory copies of the data. The time taken for this process is called replication lag.

In all, there are three mechanisms by which a receiving instance can discover new version files. The first is through polling. The poll interval is controlled by the file-system-poll-interval server parameter which has a default of 60 s. A separate poll interval is established for each persisted component. This means that, in the case of a server with three persisted data stores, the file system will be polled five times in each interval: once for new server versions, once for new role manager versions, and once for each of the three data stores. It is generally desirable to keep polling intervals long to minimise load on the file system so, while this mechanism helps bound worst-case replication lag, it is unsuitable for achieving low average-case replication lag.

The second mechanism by which new version files are discovered is when a commit fails because the local version has fallen behind the highest persisted version. In this case, the instance will apply the new version and any others that have been created since as soon as the failed transaction has been rolled back. This mechanism is provided to reduce the time taken for an instance to catch up with the latest changes after a commit fails due to write contention. It will not, in most cases, be useful for achieving low average-case replication lag.

The third mechanism for new version files to be discovered is by notification over UDP. This mechanism gives the lowest possible average-case replication lag. To activate this mechanism, the notifications-address server parameter must be set to a host name and UDP port number separated by a plus (+) symbol. An instance configured this way will register itself for notifications by writing the given notifications address to the server directory itself. For the mechanism to work, the host name must be resolvable by the other instances that share the server directory and UDP packets must be able to flow freely from the other instances to the specified port. With this mechanism in place, it should be possible to achieve sub-second replication lag for instances within the same data center assuming that changes are small.