13. Persistence¶
RDFox® is a main-memory system, which means that a full copy of the data loaded into the system is always held in main memory to support query answering and other operations. It is, however, possible to configure RDFox to incrementally save various types of information to persistent storage so that it may be restored in a subsequent or concurrent session.
There are three reasons to enable persistence. The first is when RDFox is being
used as the System of Record for the data it contains. In
this case, persisting to disk is essential in order to achieve acceptable
levels of data durability. The second reason is to improve restart performance.
Using persistence achieves this because it is generally faster for RDFox to
reload data stored in its native format than it is to import it from other
formats, or to copy it from another data store. The third reason is so that
changes to the data store can be continuously replicated amongst several RDFox
instances. Only the file-sequence
persistence type described in
Section 13.2.1.2 supports this last goal.
Note
Persistence does not increase the capacity of an RDFox data store beyond what can be stored in memory.
RDFox’s persistence is compliant with the ACID properties described in Section 11.
The persisted state of an RDFox instance consists of the following types of information:
information about all data stores in the system (also known as the server catalog),
information about the roles used for access control, and
the data loaded into each data store.
Each version of the server catalog is uniquely identified by a server version number, and each distinct state of a data store us uniquely identified by a data store version number (see Section 4). In addition, each distinct list of roles used for access control is uniquely identified by a unique role manager version number.
13.1. The Server Directory¶
In order to use persistence, an RDFox server must be configured with a server
directory which will contain all of the server’s persisted content and
settings. See the documentation of the server-directory
parameter in
Section 4.3.
Care must be taken to ensure that the disk containing the server directory has sufficient space to accommodate the persisted data. The amount of space required will depend on the size of the data store(s) and the frequency of changes to the data. The disk should be continuously monitored to ensure that it has sufficient free space to accommodate any new data that will be added. Additional disk space is required temporarily when persistent data stores are compacted and when a server directory is upgraded.
To ensure smooth operation, a persistent server should have uninterrupted read and write access to its server directory. Specifically, the server must have complete freedom to create, delete, or move files in the server directory. If other processes access files within a server directory concurrently with an RDFox server, the server’s operations may fail or behave unexpectedly. It is therefore advised that the server directory should be added to the exclusion list of potential programs that would interfere with its file operations. These programs can include but are not limited to anti-virus, anti-malware programs, security applications, and file indexing services.
The remaining sub-sections of the current section describe key points in the lifecycle of a server directory.
13.1.1. Initialization¶
Before starting a persistent server, its server directory must be initialized for the desired persistence option. The initialization step creates the server directory, if it does not already exist, and creates the necessary files to store the server catalog and roles database. Initializing a server directory also captures any server parameters specified at the time of initialization into the server parameters file to serve as defaults for future server sessions. When initializing a server directory using the RDFox executable, parameters for the endpoint are also captured and written to the endpoint parameters file (see Section 19.2.1).
13.1.2. Upgrade¶
RDFox can upgrade the contents of a server directory that has been populated by an earlier version so that it can be used with the current version. Upgrade is possible for at least the immediately preceding persistence version and possibly from versions before that.
The upgrade procedure uses two auxiliary directories, the new and saved directories, both of which are within the same folder as the existing server directory. First, the upgraded server directory is created as the new directory. This step works by creating, within the new directory, modified copies of any files in the existing server directory that need to be changed to conform to the new persistence format, and then hard-linking or copying any files that do not need to be modified to give a complete, upgraded directory. Next, the existing server directory is moved to the saved directory path and then the new server directory is moved to the original server directory path. Finally, the saved directory is deleted. This process ensures that the server directory is always in a recoverable state, even if the upgrade process is interrupted.
Warning
While the above procedure has been designed for safety, operators should always take a back of the server directory immediately before upgrade as a precaution. In the event that you are unable to start the RDFox server after the upgrade using either the old or new version, restore the backup, return to using the old version, and contact Oxford Semantic Technologies for assistance.
The procedure described above results in the following requirements which the operator must satisfy in order to successfully perform an upgrade:
No RDFox server, or other process, can be using the directory to be upgraded during any part of the upgrade process.
The server directory must be on the same volume as its parent directory so that the auxiliary directory paths will be on the same volume as the existing directory. This is necessary to ensure that the two move operations succeed.
The process performing the upgrade must be able to write to the parent directory of the server directory in order to create the new directory and move the existing and new directories.
There should be enough disk space on the volume to hold a complete copy of the server directory. This is a worst-case requirement as hard-linking of files that do not need to change may substantially reduce the additional disk space that is required. Operators should consider compacting all data stores using the older RDFox version to reduce the disk space used before upgrade begins.
Upgrade can be invoked using the upgrade
mode of the RDFox executable (see
Section 18.4) or by using the upgrade(...)
method of the
RDFoxServer
class in the Java API.
13.1.3. Backup and Restore¶
RDFox does not expose any specific functionality for backing up and restoring persistent servers but backups can be taken by recursively copying the complete server directory to a new location. The server directory can be restored by recursively copying a backed-up server version to a new location and specifying the new location as the server directory for a new RDFox server instance.
With the file-sequence
persistence option, backups can be taken at any
time, even if replicas are running however with the file
persistence
option, backups should only be taken after ensuring that no writes to the
server directory will take place (for example by shutting down the server).
13.1.4. Decommissioning¶
To decommission a persistent server, simply stop all running replicas and delete the server directory.
13.2. Configuration¶
This section describes RDFox’s configuration options related to persistence in RDFox.
13.2.1. Persistence Options¶
The persistence option to be used for a particular server is controlled by the
persistence
server parameter, which can be set
to off
(no persistence), file
, or file-sequence
. The file
and
file-sequence
options are described in the following sections. Even when
persistence is enabled at the server level, it may be disabled at the data
store level by setting the persistence
data store parameter to off
.
13.2.1.1. file
persistence¶
The file
persistence option stores the persisted content (the data store
catalog, the list of available roles, and the content of a data store) in a
single file. The data store catalog and the list of available roles are always
saved as a snapshot. The content of a data store is saved as a snapshot
followed by zero or more deltas. When a data store is compacted, the saved data
is replaced by a fresh snapshot, which is extended by deltas as further
transactions are committed on the data store. The process of compacting a data
store first saves the current snapshot into a new file, and then it atomically
replaces the old file with the new file. Consequently, compacting a data store
eventually frees the storage occupied by any deltas written after the snapshot,
but it may temporarily use additional disk space in order to hold both the old
and the new files.
A server directory containing this persistence type can only be used by one RDFox server at a time. RDFox ensures that this is the case by seeking an exclusive lock on the directory when the server instance is created and exiting if the lock cannot be obtained.
RDFox will update the persisted data in a way that is in most cases resilient to RDFox crashing or the system losing power during the saving process. Specifically, if saving of a delta is interrupted for any reason, RDFox will undo any changes made to the data the next time the RDFox process is restarted; in this way, RDFox provides ACID guarantees for transaction updates.
While RDFox guarantees consistency of persisted data due to RDFox crashes and power failure, RDFox is not immune against external damage to persisted files. RDFox will attempt to detect such corruption as follows.
When data is encrypted, the encryption algorithm itself offers protection against corruption: decrypting a damaged file will produce data that has a very high chance to be detected by RDFox as invalid.
When data is not encrypted, RDFox will use the CRC64 checksum algorithm to detect data corruption.
RDFox will refuse to start if corruption is detected in any part of the persisted data. In such cases, the only possible course of action is to restore a recent state of the RDFox database from backup. Consequently, it is highly recommended to create periodic backups of the entire server directory. It is safe to create copies of the server directory even if an RDFox instance is running, provided that the RDFox instance does not write any data during the backup period. If RDFox tries to save a transaction or change the server catalog while a backup is in progress, the backed up data may be invalid (i.e., it cannot be used in future to restore the state of an RDFox server). Consequently, backups of the server directory should only be taken during a maintenance window in which no read/write transactions are performed.
13.2.1.1.1. System Requirements¶
In order to exclusively lock the server directory , file
persistence uses
the flock
system call with the LOCK_EX
flag on Linux, and the
CreateFileW
system call with the dwShareMode
set to 0 on Windows. In
both cases, the underlying file system must faithfully and correctly support
the locking semantics of those calls.
Correctness of file
persistence relies on the following important
system-level considerations.
To guard against sudden power failure, RDFox writes data in multiples of disk sector size. However, determining the sector size programmatically typically requires administrative privileges in modern OSes. Consequently, RDFox relies on users to configure the sector size correctly. RDFox will function correctly as long as its sector size is a multiple of actual sector size; however, using a sector size that size that is larger than what is strictly necessary for the disk may waste a very small amount of storage per transaction. Most disks available nowadays on the market use sectors of 512 or 4096 bytes, so RDFox uses a sector size of 4096 by default as this ensures correctness on commonly used hardware. If RDFox is used on a disk with a different sector size, the correct sector size must be set explicitly using the
persistence.disk-sector-size
server parameter.RDFox relies on system calls that ensure that the data is persisted on disk (
FlushFileBuffers
on Windows,fcntl
with theF_BARRIERFSYNC
option on macOS, andfsyncdata
on Linux). It is well documented that certain disks and disk drivers will “lie” to the operating system; for example, some disks will report that the data has been fully persisted even if the data has not yet been flushed from the disk controller’s cache. Modern operating systems do not provide a way of detecting such situations, and so RDFox has no choice but to “believe” the operating system. If RDFox is used with a disk that “lies” about persistence, data can be lost in case of unexpected power failure or kernel crash. Please check with your disk’s manufacturer whether their product is safe to be used in a transactional application.On macOS, RDFox uses the
fcntl
with theF_BARRIERFSYNC
option to synchronize data with external storage. This system call is well known to not offer hard persistence guarantees, and in fact it was observed in practice that the data can be kept in disk buffers for a few seconds after the system call is issued. TheF_FULLFSYNC
option offers stronger persistence guarantees, but is known to cause considerable slowdown and can introduce considerable wear and tear with Apple’s SSDs; moreover, even that system call does not completely guarantee no data loss in case of power failure. Please refer to Apple’s documentation about these system calls and their recommendation to useF_BARRIERFSYNC
. Consequently, persisted data is not 100% safe from power failure on macOS. However, in our experience, Mac computers are rarely used to run production-grade databases, and moreover Mac laptops (which are the most common form of Mac computers) are equipped with a battery that considerably reduces the chances of sudden power failure. Thus, relaxing consistency in order to improve performance and reduce wear and tear is acceptable in typical usage scenarios of RDFox on macOS. Please contact Oxford Semantic Technologies if you plan to use RDFox in production on macOS.
13.2.1.2. file-sequence
persistence¶
The file-sequence
persistence option stores the persisted content (whether
it is the data store catalog, roles database, or the content of a data store)
in a sequence of files with one file per version. The path of each file is
determined by the relevant version number (server, role manager or data store).
Unlike with file
persistence, a server directory using file-sequence
persistence may be shared by several RDFox servers at once. So long as the
underlying file system meets the criteria described in
Section 13.2.1.2.1, any modification successfully
made via any of these instances will be replicated to the other instances
eventually. This provides the basis for deploying RDFox in a leaderless, high
availability (HA) configuration. Please refer to
Section 13.2.1.2.2 for information about replication lag and
Section 21 for details of how to setup HA deployments.
Also unlike the file
persistence type, file-sequence
server directories
can be backed up during write operations, removing the need for maintenance
windows in order to collect backups. This is because the files that form the
file sequence never contain partial deltas.
When data stores persisted with this file type are compacted, old snapshots and deltas are replaced with a small file indicating that original file content has been deleted from the disk, known as a tombstone.
Note
In order to use the file-sequence
persistence option, the license key
provided at startup must explicitly support the feature.
Note
The file-sequence
persistence option is EXPERIMENTAL
on Windows and
macOS.
13.2.1.2.1. System Requirements¶
In order to make it safe for any of the RDFox instances sharing a server directory to persist new transactions, each instance must add files to file sequences in a way that will fail if the version they are trying to create has already been created by another instance. This requires that the file system containing the server directory supports an atomic move or link operation that returns a distinct error code when the target path is already occupied.
The Network File System (NFS) protocol meets the above requirement through the
LINK
operation (documented, in Section 18.9 of RFC5661
<https://datatracker.ietf.org/doc/html/rfc5661#section-18.9>). Oxford Semantic
Technologies has successfully tested the file-sequence
persistence option
under sustained write contention on the following managed NFS services:
In each case, testing was performed by provisioning an instance of the file
system and mounting it to three Linux hosts in separate availability zones
using the mounting procedure recommended by the service provider. An instance
of RDFox was then started in shell
mode on each host with the
server-directory
parameter pointing to the same directory on the NFS file
system. Using one of the instances, a data store was created with the following
command:
dstore create default
Next, the following commands were run on each host:
set output out
rwtest 500 5000
The rwtest
shell command has been designed specifically to detect
replication errors and is described fully in Section 15.2.41. When invoked as
shown, the test attempts to commit a transaction every 2.75 s on average.
Running the command on three instances simultaneously results in frequent write
contention events.
After 72 hours, the rwtest
command was interrupted by issuing Ctrl-C
in
the terminal on each host. This produces a final report that shows the total
number of transactions successfully committed by each instance. The sum of
these numbers was found to match the data store version minus 1 (the initial
data store version) as expected. If more than one of the instances had
concluded that it had successfully created any given version, the sum of these
numbers would be higher. If in any iteration of the test loop on any of the
three instances the content of the data store differed from the expected
content, which is known for each data store version, the test would have
stopped with an error.
The above procedure constitutes a minimum test for qualifying file systems (and
the associated configuration options) for production use in scenarios where
write contention may occur. Users planning deployments of RDFox that use the
file-sequence
persistence option are advised to conduct their own testing
using this procedure. The degree of write contention can be varied in the test
by changing the numeric parameters to the command which represent the minimum
and maximum duration in milliseconds between iterations of the test loop.
It is worth noting that the atomic operation described above is only required
in situations where there is a risk of write contention and that a broader
range of file systems may be safe to use under the constraint that write
contention will not occur. This can be achieved by ensuring (externally) that
all writes will be committed via the same nominated instance. Some approaches
to this are reviewed in Section 21.3. To
qualify file systems for use in such setups, the rwtest
command can be
invoked with the read-only
argument on all but one of the hosts.
Note
On UNIX operating systems, RDFox uses the link
system call as the
atomic commit operation, while on Windows MoveFileW
is used. The
EEXISTS
(UNIX) and ERROR_ALREADY_EXISTS
(Windows) error codes are
interpreted to mean that the commit has failed because another instance
successfully committed a change first.
13.2.1.2.2. Replication Performance¶
In order for a change to be replicated between instances, the instance writing the change must successfully commit it to the path reserved for the new version number, and the other instances must then discover it and apply the changes to their own in-memory copies of the data. The time taken for this process is called replication lag.
In all, there are three mechanisms by which a receiving instance can discover
new version files. The first is through polling. The poll interval is
controlled by the persistence.file-system-poll-interval
server parameter
which has a default of 60 s. A separate poll interval is established for each
persisted component. This means that, in the case of a server with three
persisted data stores, the file system will be polled five times in each
interval: once for new server versions, once for new role manager versions, and
once for each of the three data stores. It is generally desirable to keep
polling intervals long to minimise load on the file system so, while this
mechanism helps bound worst-case replication lag, it is unsuitable for
achieving low average-case replication lag.
The second mechanism by which new version files are discovered is when a commit fails because the local version has fallen behind the highest persisted version. In this case, the instance will apply the new version and any others that have been created since as soon as the failed transaction has been rolled back. This mechanism is provided to reduce the time taken for an instance to catch up with the latest changes after a commit fails due to write contention. It will not, in most cases, be useful for achieving low average-case replication lag.
The third mechanism for new version files to be discovered is by notification
over UDP. This mechanism gives the lowest possible average-case replication
lag. To activate this mechanism, the persistence.notifications-address
server parameter must be set to a host name and UDP port number separated by a
plus (+
) symbol. An instance configured this way will register itself for
notifications by writing the given notifications address to the server
directory itself. For the mechanism to work, the host name must be resolvable
by the other instances that share the server directory and UDP packets must be
able to flow freely from the other instances to the specified port. With this
mechanism in place, it should be possible to achieve sub-second replication lag
for instances within the same data center assuming that changes are small.
13.2.2. Encryption¶
When using any form of persistence, a RDFox server can be configured to encrypt
and decrypt the data stored in the server directory by supplying a
base64-encoded key via the persistence.encryption.key
server parameter. By
default the AES-256-CBC
cipher, which requires a 256-bit AES key, is used
but this can be changed by setting the persistence.encryption.algorithm
parameter. See the full documentation of the above parameters in
Section 4.3 for more details.