13. Persistence¶
RDFox is a main-memory data store. This means that a full copy of the data loaded into the system is always held in main memory to support query answering and other operations. It is, however, possible to configure RDFox to incrementally save various types of information to persistent storage so that it may be restored in a subsequent or concurrent session.
There are three reasons to enable persistence. The first is when RDFox is being
used as the System of Record for the data it contains. In
this case, persisting to disk is essential in order to achieve acceptable
levels of data durability. The second reason is to improve restart performance.
Using persistence achieves this because it is generally faster for RDFox to
reload data stored in its native format than it is to import it from other
formats, or to copy it from another data store. The third reason is so that
changes to the data store can be continuously replicated amongst several RDFox
instances. Only the file-sequence
persistence type described in
Section 13.2.2 supports this last goal.
When persistence is enabled, care must be taken to ensure that sufficient disk space is available for RDFox to persist new data as well as ensuring that there is sufficient memory.
RDFox’s persistence is compliant with the ACID properties described in Section 11.
Note
Persistence does not increase the capacity of an RDFox data store beyond what can be stored in memory.
13.1. Key Concepts¶
This section introduces some of the key concepts of RDFox’s persistence model.
13.1.1. Version Numbers¶
Each component that supports persistence (server, role manager, and data store) has a version number that is saved and restored when persistence is enabled.
The server’s version numbers is incremented for every change to the server’s
data store catalog. Similarly, the role manager’s version number is incremented
for every change to one of the server’s roles. These version numbers are
visible in the output of the serverinfo
shell command and equivalent API
calls (to see the role manager version number, the extended info must be
requested).
Data store version numbers are incremented as part of each succesfully
committed transaction that changes the content of the data store. They are also
incremented by two special operations that may be invoked by the user:
rematerialization and compaction. Rematerialization causes RDFox to drop
all state held for the purposes of incremental reasoning and perform a full,
from-scratch materialization. Compaction is described in Section 13.1.3.
Data store version numbers are visible in the output of the info
shell
command and equivalent API calls.
13.1.2. Snapshots and Deltas¶
For data store persistence, versions may be persisted as either snapshots or deltas. Snapshots contain a full copy of the content whereas delta contains only the incremental changes since the previous version. The time and disk space needed to create a snapshot are proportional to the size of the full content whereas the time and disk space needed to create a delta are proportional to the size of the change since the previous version. For role and server persistence each version is saved as a complete snapshot.
When reloading a data store, RDFox must begin at a snapshot but can then apply any arbitrary sequence of deltas and snapshots that follows. To ensure that there is always a snapshot at which restoration can begin, the first versions of a data store, up to and including the first that contains at least one fact, are always persisted as snapshots. Thereafter, new versions are persisted as deltas, with the exception of rematerialization and compaction operations which create new snapshot versions.
It is possible to observe the version number of the most recent snapshot for a
data store in the output of the info
shell command or equivalent API calls.
13.1.3. Compaction¶
Compaction, which can be triggered by the compact
shell command or
equivalent API calls, is a data store operation that attempts to free memory
and disk space.
In regular operation, deleting a fact from a tuple table only marks the entry as deleted and does not reclaim the associated memory. This makes deletion fast; however, in use cases with a high turnover of facts, the amount of memory associated with deleted facts may become significant over time. Compaction reorganizes the non-deleted facts in the store into the minimal amount of memory so that surplus memory can be freed.
When persistence is enabled, high fact turnover can also cause storage
efficiency to degrade over time. This is because, in contrast to snapshots
which only use disk space to store the non-deleted content of the store, deltas
contain copies of facts, rules and axioms that have been deleted. After many
transactions, the total disk space occupied by content that has been deleted
may become significant. The need to replay long sequences of deltas when
restoring also causes restart performance to deteriorate as the number of
deltas since the last snapshot grows. Compaction creates a new snapshot on disk
to improve restart performance, and it can also remove earlier snapshots and
deltas to improve storage efficiency. For file
persistence, this happens
automatically; in contrast, for file-sequence
persistence, an additional
argument must be supplied to specify that redundant files should be deleted.
See Section 13.2.2 for an explanation of why this is the
case.
Compacting a data store may take a long time to complete: the time will be roughly proportional to the aggregate size of the data store’s tuple tables. During this time, the data store cannot be queried or modified. When persistence is enabled, compaction increases disk usage at least temporarily due to the new snapshot, so it is important to ensure that the disk has sufficient capacity to accommodate this. Furthermore, in cases where a server directory is shared by more than one RDFox instance, it is also important to ensure that all instances are in sync before performing compaction with redundant file removal. For these reasons, compaction is not run automatically but must be triggered explicitly by the user. It is recommended that operators of systems with high fact turnover perform compaction as a periodic maintenance task.
13.2. Configuration¶
In order to use any form of persistence, an RDFox server must be configured
with a server directory from which data
and settings will be loaded and to which data will be saved. Users can choose
to persist roles only, or both roles and data stores. Persistence of roles is
controlled by the persist-roles
server parameter
while persistence of data stores is controlled by the persist-ds
server
parameter. Each of these options can be set to off
(no persistence),
file
, or file-sequence
. The file
and file-sequence
options are
described in the following sections. When both roles and data stores are
persisted, the same persistence type must be used for both. The allowed
combinations of these two parameters are given in the following table.
|
Allowed |
---|---|
|
|
|
|
|
|
Once a server directory has been populated using a given type of persistence, it can only be used by servers configured to use that type of persistence. The easiest way to ensure this, is to specify the desired persistence options for the directory in a server parameters file.
The server parameter persist-ds
actually controls whether data store
persistence is enabled at the RDFox server or not. When this parameter is set
to off
, it is not possible to persist data stores. When it is set to one of
the two other accepted values, newly created data stores will be persisted
using the selected persistence type unless the persist-ds
data store
parameter is explicitly set to off
. In this case the data store will be
created in memory only.
13.2.1. file
persistence¶
The file
persistence option stores the persisted content (whether it is the
data store catalog, roles database, or the content of a data store) in a single file.
For all types of content, the file begins with the most recent snapshot. In the
case of data store content, the file may then contain a sequence of deltas made
by the transactions since the last snapshot version.
A server directory containing this persistence type can only be used by one RDFox server at a time. RDFox ensures that this is the case by seeking an exclusive lock on the directory when the server instance is created and exiting if the lock cannot be obtained.
When using this persistence type, backups of the server directory must be coordinated with data store write operations to avoid backing up incomplete deltas. If RDFox encounters an incomplete delta while restoring from a file, it will be unable to make further progress (accept new write operations) until the user runs a compact operation. At that point, any data in the partial delta will be deleted from the disk. To avoid this situation, backups of the server directory should only be taken during a maintenance window in which no read-write transactions are performed.
When data stores persisted with this file type are compacted, the single persistence file containing the previous snapshot and subsequent deltas, is wholly replaced. This means that the disk space occupied by these old persistence records is automatically freed and no additional arguments are needed.
13.2.1.1. Filesystem requirements¶
In order to exclusively lock the server directory , file
persistence uses
the flock
system call with the LOCK_EX
flag on Linux, and the
CreateFileW
system call with the dwShareMode
set to 0 on Windows. In
both cases, the underying file system must faithfully and correctly support the
locking semantics of those calls.
13.2.2. file-sequence
persistence¶
The file-sequence
persistence option stores the persisted content (whether
it is the data store catalog, roles database, or the content of a data store)
in a sequence of files with one file per version. The path of each file is
determined by the relevant version number (server, role manager or data store).
Unlike with file
persistence, a server directory using file-sequence
persistence may be shared by several RDFox servers at once. So long as the
underlying file system meets the criteria described in
Section 13.2.2.1, any modification succesfully
made via any of these instances will be replicated to the other instances
eventually. This provides the basis for deploying RDFox in a leaderless, high
availability (HA) configuration. Please refer to
Section 13.2.2.2 for information about replication lag and
Section 21 for details of how to setup HA deployments.
Also unlike the file
persistence type, file-sequence
server directories
can be backed up during write operations, removing the need for maintenance
windows in order to collect backups. This is because the files that form the
file sequence never contain partial deltas.
When data stores persisted with this file type are compacted, old snapshots and deltas are not automatically deleted from the disk. This is because deleting the files that contain these records could cause other RDFox instances restoring the file sequence to diverge from the committed version history. For example, if an instance has not yet restored content from the file corresponding to version v, where v is less than the current data store version, and the file is deleted, that instance will restore versions up to v-1 and then become available to accept writes. Since the path reserved for version v is empty, a write via this instance will succeed, creating a divergence in the version history that could lead to data loss.
So long as all of the instances sharing a server directory have restored the globally highest version, the above problem cannot happen. This could be guaranteed by shutting down all but one of the instances or by blocking all inbound write requests and waiting until instances report the same data store version. To allow disk space to be freed when such conditions have been established, compact supports an optional argument indicating that redundant files should be deleted.
Note
In order to use the file-sequence
persistence option, the license key
provided at startup must explicitly support the feature.
Note
The file-sequence
persistence option is EXPERIMENTAL
on Windows and
macOS.
13.2.2.1. Filesystem requirements¶
In order to make it safe for any of the RDFox instances sharing a server directory to persist new transactions, each instance must add files to file sequences in a way that will fail if the version they are trying to create has already been created by another instance. This requires that the file system containing the server directory supports an atomic move or link operation that returns a distinct error code when the target path is already occupied.
The Network File System (NFS) protocol meets the above requirement through the
LINK
operation (documented, in Sectiion 18.9 of RFC5661
<https://datatracker.ietf.org/doc/html/rfc5661#section-18.9>). Oxford Semantic
Technologies has successfully tested the file-sequence
persistence option
under sustained write contention on the following managed NFS services:
In each case, testing was performed by provisioning an instance of the file
system and mounting it to three Linux hosts in separate availability zones
using the mounting procedure recommended by the service provider. An instance
of RDFox was then started in shell
mode on each host with the
server-directory
parameter pointing to the same directory on the NFS file
system. Using one of the instances, a data store was created with the following
command:
dstore create default
Next, the following commands were run on each host:
set output out
rwtest 500 5000
The rwtest
shell command has been designed specifically to detect
replication errors and is described fully in Section 15.2.40. When invoked as
shown, the test attempts to commit a transaction every 2.75 s on average.
Running the command on three instances simultaneously results in frequent write
contention events.
After 72 hours, the rwtest
command was interrupted by issuing Ctrl-C
in
the terminal on each host. This produces a final report that shows the total
number of transactions successfully committed by each instance. The sum of
these numbers was found to match the data store version minus 1 (the initial
data store version) as expected. If more than one of the instances had
concluded that it had successfully created any given version, the sum of these
numbers would be higher. If in any iteration of the test loop on any of the
three instances the content of the data store differed from the expected
content, which is known for each data store version, the test would have
stopped with an error.
The above procedure constitutes a minimum test for qualifying file systems (and
the associated configuration options) for production use in scenarios where
write contention may occur. Users planning deployments of RDFox that use the
file-sequence
persistence option are advised to conduct their own testing
using this procedure. The degree of write contention can be varied in the test
by changing the numeric parameters to the command which represent the minimum
and maximum duration in milliseconds between iterations of the test loop.
It is worth noting that the atomic operation described above is only required
in situations where there is a risk of write contention and that a broader
range of file systems may be safe to use under the constraint that write
contention will not occur. This can be achieved by ensuring (externally) that
all writes will be committed via the same nominated instance. Some approaches
to this are reviewed in Section 21.3. To
qualify file systems for use in such setups, the rwtest
command can be
invoked with the read-only
argument on all but one of the hosts.
Note
On UNIX operating systems, RDFox uses the link
system call as the
atomic commit operation, while on Windows MoveFileW
is used. The
EEXISTS
(UNIX) and ERROR_ALREADY_EXISTS
(Windows) error codes are
interpreted to mean that the commit has failed because another instance
successfully committed a change first.
13.2.2.2. Replication Performance¶
In order for a change to be replicated between instances, the instance writing the change must succesfully commit it to the path reserved for the new version number, and the other instances must then discover it and apply the changes to their own in-memory copies of the data. The time taken for this process is called replication lag.
In all, there are three mechanisms by which a receiving instance can discover
new version files. The first is through polling. The poll interval is
controlled by the file-system-poll-interval
server parameter which has a
default of 60 s. A separate poll interval is established for each persisted
component. This means that, in the case of a server with three persisted data
stores, the file system will be polled five times in each interval: once for
new server versions, once for new role manager versions, and once for each of
the three data stores. It is generally desirable to keep polling intervals long
to minimise load on the file system so, while this mechanism helps bound
worst-case replication lag, it is unsuitable for achieving low average-case
replication lag.
The second mechanism by which new version files are discovered is when a commit fails because the local version has fallen behind the highest persisted version. In this case, the instance will apply the new version and any others that have been created since as soon as the failed transaction has been rolled back. This mechanism is provided to reduce the time taken for an instance to catch up with the latest changes after a commit fails due to write contention. It will not, in most cases, be useful for achieving low average-case replication lag.
The third mechanism for new version files to be discovered is by notification
over UDP. This mechanism gives the lowest possible average-case replication
lag. To activate this mechanism, the notifications-address
server parameter
must be set to a host name and UDP port number separated by a plus (+
)
symbol. An instance configured this way will register itself for notifications
by writing the given notifications address to the server directory itself. For
the mechanism to work, the host name must be resolvable by the other instances
that share the server directory and UDP packets must be able to flow freely
from the other instances to the specified port. With this mechanism in place,
it should be possible to achieve sub-second replication lag for instances
within the same data center assuming that changes are small.