2. RDFox Features and Requirements¶
2.1. RDFox Features¶
RDFox® provides the following main functionality:
RDFox can import RDF triples, rules, and OWL 2 and SWRL axioms either programmatically or from files of certain formats (see Section 8.2 for details). RDF data can be validated using the SHACL constraint language. Additionally, RDFox can access information from external data sources, such as CSV files, relational databases, or Apache Solr (see Section 7).
Triples, rules and axioms can be exported into a number of different formats (see Section 8.3 for details). Furthermore, the contents of the system can be incrementally saved into a binary file, which can later be loaded to restore the system’s state.
RDFox can answer SPARQL 1.1 queries (see Section 9) and provides functionality for monitoring query answering and accessing query plans.
RDFox supports materialization-based reasoning, where all triples that logically follow from the facts and rules in the system are materialized as new triples (see Section 10) . Materializations can be incrementally updated, which means that reasoning does not need to be performed from scratch once the information in the system is updated. Furthermore, the results of reasoning can be explained, which means that RDFox is able to return proofs for any new fact added to the store through materialization.
RDFox supports ACID transactional updates (see Section 11 for further details on transactions).
Individual information elements in the system can be assigned different access permissions for different users (see Section 12 for further details on access control).
2.2. Software Archive¶
RDFox is distributed as an archive containing the following files and directories:
RDFox(macOS/Linux) orRDFox.exe(Windows): a stand-alone executable that can be used to run RDFox on the command line.lib: a directory containing the following libraries:JRDFox.jar: the Java API to the RDFox engine.libRDFox.dylib(macOS),libRDFox.so(Linux), orlibRDFox.dll(Windows): a dynamic/shared library that implements the C and the Java Native APIs of RDFox.libRDFox.lib(Windows only): the import library needed for linkinglibRDFox.dllon Windows.libRDFox-static.a(macOS and Linux) orlibRDFox-static.lib(Windows): a static library that implements the C API of RDFox.
include: a directory containing include files providing access to the C and C++ APIs.examples: a directory containing demonstration programs that show how to call RDFox as a library.C: a directory containing a C source file demonstrating how to call RDFox via the C and C++ APIs. The directory also contains scriptscompile-shared-and-run.shandcompile-static-and-run.shon macOS and Linux, and scriptscompile-shared-and-run.batandcompile-static-and-run.baton Windows, which can be used to build and run the demo. On macOS and Linux, the scripts assumes a C17 compliant version of gcc is available on the path. On Windows, the scripts assume thatvcvars64.bathas been executed in the shell prior to execution.C++: a directory containing a C++ source file demonstrating how to use RDFox via the C++ API. The directory also contains scriptscompile-shared-and-run.shandcompile-static-and-run.shon macOS and Linux, and scriptscompile-shared-and-run.batandcompile-static-and-run.baton Windows, which can be used to build and run the demo. On macOS and Linux, the script assumes a version of g++ supporting C++11 is available on the path. On Windows, the script assumes thatvcvars64.bathas been executed in the shell prior to execution.Java: a directory containing source code for a program demonstrating how to call RDFox via the Java API. Theexamples/Java/build.xmlApache Ant script can be used to compile and run the program.
2.3. Interfaces¶
Users and developers can interact with RDFox through the following interfaces:
- CLI
RDFox comes with a built-in shell that can be used to interact with and control the RDFox Server. The shell can be launched together with an RDFox Server instance using the
shellorsandboxmodes of the executable. Alternatively theremoteexecutable mode can be used to connect to and use the shell interface of a remote RDFox Server. See Section 15 for details.- RESTful API
When RDFox’s endpoint is running, clients can interact with the associated RDFox server via a RESTful API. For details of the RESTful API, see Section 16. For details of how to configure and start the endpoint, see Section 19.
- Java API
RDFox can be embedded into Java applications and called via the Java API described in Section 16 and Section 17. To use JRDFox in your project, simply add
JRDFox.jarto your classpath, and make sure that the path to the dynamic library is correctly specified when starting your program using the following JVM option:-Djava.library.path=<path to the dynamic library>
- C/C++ APIs
RDFox can be dynamically loaded and called through a C API. A C++ wrapper of this API is also provided.
- GUI
As well as serving the REST API, the RDFox endpoint serves the RDFox Console, a browser-based user interface supporting basic querying and visualization of data store content. When the endpoint is running, the Console can be loaded by visiting
http[s]:<hostname>:<port>/console/where<hostname>and<port>are the host name and port number at which the endpoint can be reached.
2.4. System Requirements¶
2.4.1. Software¶
2.4.1.1. Operating Systems¶
RDFox supports the following operating system versions:
- Windows
Windows 10 or higher
- Mac
macOS 10.14 or higher
- Linux
Ubuntu 18.04 or higher
Amazon Linux 2 or higher
Additionally, RDFox can be run using Docker. See Section 22 for details.
2.4.1.2. Third-party Software¶
Some RDFox features depend on dynamic-link libraries (DLL) from the list of the third-party software packages below. In each case, the DLL or DLLs are loaded on-demand the first time the dependent functionality is accessed within a session. This means that RDFox can be deployed in the absence of these packages if the dependent functionality is not needed.
- OpenSSL
Used to implement TLS for RDFox’s HTTP client and server code, as well as for persistence and session encryption. The search paths used to locate the DLLs from this package when the endpoint is starting can be specified via the
RDFOX_LIBCRYPTO_PATHandRDFOX_LIBSSL_PATHenvironment variables. If the environment variables are not set, the default values shown in the following table are used.Platform
libcrypto search path
libssl search path
Windows
libcrypto-3-x64.dlllibssl-3-x64.dllmacOS
libcrypto.3.dyliblibssl.3.dylibLinux
libcrypto.solibssl.soThe resolved libraries must have version v3.0.0 or higher.
- libpq
Used to access PostgreSQL data sources. The search path used to locate
libpqwhen registering one of these data sources can be specified via theRDFOX_LIBPQ_PATHenvironment variable. If the environment variable is not set, the default value shown in the following table is used.Platform
libpq search path
Windows
libpq.dllmacOS
libpq.dylibLinux
libpq.soThe resolved library should be of a version that matches that of the PostgreSQL server being connected to. The current release was built and tested with both library and server from PostgreSQL 14, however it will also work with a wider range of versions, both higher and lower. Please test your configuration and contact OST support as needed.
- iODBC or unixODBC
Used to access external data sources via ODBC. The search path used to locate the DLL that will manage drivers for accessing the ODBC source can be specified via the
RDFOX_ODBC_DRIVER_MANAGER_PATHenvironment variable. If the environment variable is not set, RDFox will attempt to use the default search paths shown in the following table to load unixODBC and, if that fails and the platform is not Windows, iODBC.Platform
unixODBC search path
iODBC search path
Windows
odbc32.dll(not supported)
macOS
libodbc.dyliblibiodbc.dylibLinux
libodbc.solibiodbc.soAlthough iODBC can be used, unixODBC is recommended. The current release was built and tested with unixODBC v2.3, however it will work with a wider range of versions, both higher and lower. Please test your configuration and contact OST support as needed.
- libsqlite3
Used to access SQLite data sources. The search path used to locate the SQLite library can be specified via the
RDFOX_LIBSQLITE_PATHenvironment variable. If the environment variable is not set, RDFox will attempt to use the default search paths shown in the following table.Platform
libsqlite3 search path
Windows
libsqlite3.dllmacOS
libsqlite3.dylibLinux
libsqlite3.soThe resolved library should be of a version that matches that of the SQLite file being connected to. The current release was built and tested with SQLite v3.49.1, however it will also work with a wider range of versions, both higher and lower. Please test your configuration and contact OST support as needed.
- Lucene
Used to access Lucene data sources. The search path used to locate the Lucene libraries when registering a Lucene data source can be specified via the server parameter
jvm.optionsor the environment variableRDFOX_JVM_OPTIONS. The JVM options should include-Djava.class.pathof the required Lucene libraries. The separator between the classpath is:(On Windows,;). Any other JVM options can be specified as needed using the|separator. If a Lucene data source is used with JRDFox, the required Lucene libraries must be included in the classpath of the JVM running JRDFox. The required Lucene libraries for-Djava.class.pathare shown in the following table with explanations of their purpose.Lucene library search path
Description
lucene-core-<version>.jar(Required) Lucene core library.
lucene-queryparser-<version>.jar(Required) Query parsers and parsing framework.
lucene-backward-codecs-<version>.jar(Optional) This library can be used to access Lucene indexes created with older major versions of Lucene.
The resolved libraries must have version v9.6.0 or higher.
- libjvm
Used to access Lucene data sources with RDFox executable. The search path used to locate the JVM library can be specified via the
RDFOX_LIBJVM_PATHenvironment variable. This is required to embed the Java Virtual Machine (JVM) within RDFox. If the environment variable is not set, RDFox will attempt to use the default search paths shown in the following table.Platform
libjvm search path
Windows
jvm.dllmacOS
libjvm.dylibLinux
libjvm.soThe resolved library should match the version of the Java Runtime Environment (JRE) in use. This release was built and tested with Java 17, but it is also compatible with Java 11 or higher. Please test your configuration and contact OST support if needed.
For a list of other third-party components used within RDFox, see Acknowledgments.
2.4.1.3. License Key¶
Creating an RDFox Server requires a time-limited license key issued by Oxford Semantic Technologies. At server creation time, RDFox will search the following locations, in the order shown, for the license key:
the value of the
license.contentserver parameter, if set[RDFox executable only] the value of the
RDFOX_LICENSE_CONTENTenvironment variable, if setthe content of the file specified via the
license.fileserver parameter, if set[RDFox executable only] the content of the file specified via the
RDFOX_LICENSE_FILEenvironment variable, if set[RDFox executable only] the content of the file
RDFox.licin the directory containing the running executable, if the file existsthe content of the file at the default value for the
license.fileserver parameter, if the file exists
If a candidate key is found in one location, the remaining locations will not be
checked even if the candidate turns out to be invalid or expired. See
Section 4.3 for details of how to specify server parameters such
as license.content and license.file.
2.4.2. Hardware¶
This section describes the hardware requirements for running RDFox.
2.4.2.1. Memory¶
RDFox is a main-memory data store and as such its performance is heavily dependent on access to a suitable amount of memory. The amount of memory required for a given application can be broken down into the following two components.
Fact storage cost is the amount of memory required to store the facts (triples or quads) including both those imported explicitly and those added by materialization. This component depends on the number of facts and other characteristics of the data set such as how many unique resources it contains and the size of those resources.
Operating memory cost is the amount of memory required for operations such as querying, reasoning, compaction, and assorted other activities. This component is proportional to the fact storage cost but the exact proportion varies considerably with the characteristics of the workload.
Fact storage costs typically vary between 45 and 85 bytes per fact. One should provision an additional 10-100% of this for operating memory costs. The following workload characteristics will usually increase the operating memory costs:
high numbers of queries evaluated concurrently with updates,
queries that return large result sets with the
ORDER BYorDISTINCTkeywords, andusing large and/or complicated sets of Datalog rules.
A special case with high operating memory costs is that of HA replicas with the
highly-available setting for the persistence.snapshot-restore-mode
server parameter (see Section 4.3). These should reserve
at least 100% of the fact storage cost as additional memory, as this is
required to restore the data store from a snapshot.
While the above figures can be useful to estimate the memory requirements, an application should always be tested thoroughly to determine the actual memory requirements.
In general, it is recommended to ensure that the complete memory requirements
of an application are met without relying on memory paging because RDFox’s
performance will degrade significantly if the OS swaps the RDFox process’s
memory pages in and out of disk. It can, however, be useful to enable a
suitably sized swap file to avoid RDFox processes being killed by the operating
system during compaction if memory requirements increase suddenly but
temporarily. This is relevant for all servers that perform compaction, as well
as HA replicas that restore the new snapshots and which use with the
highly-available setting for the persistence.snapshot-restore-mode.
2.4.2.2. Disk Space¶
When RDFox is configured for Persistence, data will also be saved to disk, ready to be loaded in subsequent sessions. The underlying file system must satisfy the system requirements documented in Section 13.2.1 for the chosen persistence option, and have 40-60 bytes of disk space per triple. This includes enough space to store the data itself and some working free space that is needed for operations such as compaction and upgrade.
Recovering from low-memory or low-disk-space conditions can be complex, so it is vital to monitor these metrics and take action before they become exhausted. Regular compaction of data stores can help minimize a server’s memory and disk space usage.