Warning: This document is for an old version of RDFox. The latest version is 7.2.

5. Data Stores

As explained in Section 4, a data store encapsulates a unit of logically related information. Many applications will store all of their related data in one data store (although some applications may use more than one data store). It is important to keep in mind that a query and rule can operate only on one data store; thus, all information that should be queried or reasoned within one unit should be loaded into one data store.

As explained in Section 4, a data store serves as a container for other kinds of objects:

  • tuple tables are data store components that store facts (see Section 6);

  • data sources can be registered with a data store to access external, non-RDF data (see Section 7);

  • OWL axioms and Datalog rules are used to specify rules of inference that are to be applied to the data loaded into the data store (see Section 10);

  • statistics modules summarize the data loaded into a data store in a way that helps query planning.

The behavior of a data store can be customized using various parameters, which are listed in Section 5.4.

5.1. Operations on Data Stores

The following list summarizes the operations on data stores available in the shell or via one of the available APIs.

  • A data store can be created on a server. To create a data store, one must specify the data store name and zero or more parameters expressed as key-value pairs.

  • A data store can be deleted on the server. RDFox allows a data store to be deleted only if there are no active connections to the data store.

  • A data store can be saved to and subsequently loaded from a binary file. The file obtained in this way contains all data store content; thus, when a data store is loaded from a file, it is restored to exactly the same state as before saving. RDFox supports the following binary formats.

    • The standard format stores the data in a way that is more resilient to changes in RDFox implementation. This format should be used in most cases.

    • The raw format stores the data in exactly the same way as the data is stored in RAM. This format allows one to reconstruct the state of a data store exactly and is therefore useful when reporting bugs, but it is more likely to change between RDFox releases.

5.2. Organization of RDF Data

The following sections explain how RDF data is organized within a data store with details on how the data are stored, imported and queried.

5.2.1. Storage

By default, two in-memory tuple tables are created automatically when a data store is created. The first, DefaultTriples, has arity three and is designed to hold triples from an RDF dataset’s default graph. The second, Quads, has arity four and is designed to hold triples from all of the RDF dataset’s named graphs. If the default-graph-name (see Section 5.4.2) parameter is specified at data store creation time, the DefaultTriples table is not created.

5.2.2. Importation and DELETE/INSERT Clauses

By default, when performing import operations or evaluating SPARQL DELETE or INSERT clauses, RDFox maps default graph triples to entries in the DefaultTriples table and named graph triples (also known as quads) to entries in the Quads table. While named graph triples are always mapped to the Quads table, it is possible to override the default mapping for default graph triples so that they too are mapped to entries in the Quads table. The exact mechanisms for doing this are described next.

For import operations, the default mapping for default graph triples can be overridden by setting the target default graph parameter (see Section 8.2.1) of the operation. In this case, default graph triples are written to the Quads table with the specified graph name. When the target default graph parameter is not provided for the specific operation, but the data store’s default-graph-name parameter is set (see Section 5.4.2), the data store parameter is used as the graph name instead.

For SPARQL DELETE and INSERT clauses, overriding the default graph can be achieved by wrapping the triple patterns in the standard syntax for named graphs. When this syntax is not used, any setting for the default-graph-name parameter is used as the name of the graph that should receive all default triples.

When importing rules, default graph atoms are interpreted as references to the the DefaultTriples table unless the target default graph parameter is set for the import or the default-graph-name parameter is set for the data store. In the latter two cases, default graph atoms are interpreted as references to the named graph with the specified name.

When importing OWL, all axioms are applied to the DefaultTriples table unless the target default graph parameter is set for the import or the default-graph-name parameter is set for the data store. In the latter two cases, axioms are instead applied to the named graph with the specified name.

5.2.3. Mapping Triples and Quads to RDF Datasets for SPARQL WHERE/ASK

Before evaluating SPARQL WHERE clauses or ASK queries, it is necessary to determine the RDF dataset for the query. An RDF dataset comprises exactly one default graph plus zero or more named graphs. The default RDF dataset for an RDFox data store contains the triples stored in the data store’s DefaultTriples table in the default graph and all named graphs stored in the data store’s Quads table as its named graphs.

If the DefaultTriples table is not present when a query begins, the default RDF dataset instead contains the multiset of all triples from the data store’s Quads table in the default graph and, as above, all named graphs stored in the data store’s Quads table as its named graphs.

Regardless of whether the DefaultTriples table exists or not, the default behaviors described above can always be overridden using either FROM and FROM NAMED clauses in the query itself or, in the case of the SPARQL protocol, the default-graph-uri and named-graph-uri request parameters.

5.3. Supported Data Types for Literals

As well as IRIs and blank nodes, RDFox data stores can store literal values in the following formats:

Datatype

http://www.w3.org/2000/01/rdf-schema#Literal

http://www.w3.org/2001/XMLSchema#anyURI

http://www.w3.org/2001/XMLSchema#string

http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral

http://www.w3.org/2001/XMLSchema#boolean

http://www.w3.org/2001/XMLSchema#dateTime

http://www.w3.org/2001/XMLSchema#dateTimeStamp

http://www.w3.org/2001/XMLSchema#time

http://www.w3.org/2001/XMLSchema#date

http://www.w3.org/2001/XMLSchema#gYearMonth

http://www.w3.org/2001/XMLSchema#gYear

http://www.w3.org/2001/XMLSchema#gMonthDay

http://www.w3.org/2001/XMLSchema#gDay

http://www.w3.org/2001/XMLSchema#gMonth

http://www.w3.org/2001/XMLSchema#duration

http://www.w3.org/2001/XMLSchema#yearMonthDuration

http://www.w3.org/2001/XMLSchema#dayTimeDuration

http://www.w3.org/2001/XMLSchema#double

http://www.w3.org/2001/XMLSchema#float

http://www.w3.org/2001/XMLSchema#decimal

http://www.w3.org/2001/XMLSchema#integer

http://www.w3.org/2001/XMLSchema#nonNegativeInteger

http://www.w3.org/2001/XMLSchema#nonPositiveInteger

http://www.w3.org/2001/XMLSchema#negativeInteger

http://www.w3.org/2001/XMLSchema#positiveInteger

http://www.w3.org/2001/XMLSchema#long

http://www.w3.org/2001/XMLSchema#int

http://www.w3.org/2001/XMLSchema#short

http://www.w3.org/2001/XMLSchema#byte

http://www.w3.org/2001/XMLSchema#unsignedLong

http://www.w3.org/2001/XMLSchema#unsignedInt

http://www.w3.org/2001/XMLSchema#unsignedShort

http://www.w3.org/2001/XMLSchema#unsignedByte

5.4. Data Store Parameters

The behavior of a data store is determined by a number of options encoded as key-value pairs. The options specified at data store creation time cannot be subsequently changed.

5.4.1. auto-update-statistics

The auto-update-statistics option governs how RDFox manages statistics about the data loaded into the system. RDFox uses these statistics during query planning in order to identify an efficient plan, so query performance may be suboptimal if the statistics are not up to date. The allowed values are as follows.

  • off: Statistics are never updated automatically, but they can be updated manually using the stats update command or via one of the available APIs.

  • balanced: The cost of updating the statistics is balanced against the possibility of using outdated statistics. This is the default.

  • eager: Statistics are updated after each operation that has the potential to invalidate the statistics (e.g., importing data).

5.4.2. default-graph-name (EXPERIMENTAL)

The default-graph-name option can be set to any IRI to activate RDFox’s alternative handling of default graph data. See Section 5.2 for a full description. By default, this parameter is unset.

5.4.3. equality

The equality option determines how RDFox deals with the semantics of equality, which is encoded using the owl:sameAs property. This option has the following values.

  • off: There is no special handling of equality and the owl:sameAs property is treated as just another property. This is the default if the equality option is not specified.

  • noUNA: The owl:sameAs property is treated as equality, and the Unique Name Assumption is not used — that is, deriving an equality between two IRIs does not result in a contradiction. This is the treatment of equality in OWL 2 DL.

  • UNA: : The owl:sameAs property is treated as equality, but interpreted under UNA — that is, deriving an equality between two IRIs results in a contradiction, and only equalities between an IRI and a blank node, or between two blank nodes are allowed. Thus, if a triple of the form <IRI₁, owl:sameAs, IRI₂> is derived, RDFox detects a clash and derives <IRI₁, rdf:type, owl:Nothing> and <IRI₂, rdf:type, owl:Nothing>.

  • chase: The owl:sameAs property is treated as equality with UNA, and furthermore no reflexivity axioms are derived. A data store initialized with this option does not support incremental reasoning. This option is intended to simulate the “chase” procedure commonly used in database research.

In all equality modes (i.e., all modes other than off), distinct RDF literals (e.g., strings, numbers, dates) are assumed to refer to distinct objects, and so deriving an equality between the distinct literals results in a contradiction.

Note RDFox will reject rules that use negation-as-failure or aggregation in all equality modes other than off.

5.4.4. import.rename-user-blank-nodes

If the import.rename-user-blank-nodes option is set to true, then user-defined blank nodes imported from distinct files are renamed apart during the importation process; hence, importing data merges blank nodes according to the RDF specification. There is no way to control the process of renaming blank nodes, which can be problematic in some applications. Because of that, the default value of this option is false since this ensures that the data is imported ‘as is’. Regardless of the state of this option, autogenerated blank nodes (i.e., blank nodes obtained by expanding [] or (...) in Turtle files) are always renamed apart.

5.4.5. init-resource-capacity

The value of the init-resource-capacity option is an integer that is used as a hint to the data store specifying the number of resources that the store will contain. This hint is used to initialize certain data structures to the sizes that ensure faster importation of data. The actual number of resources that a data store can contain is not limited by this option: RDFox will resize the data structures as needed if this hint is exceeded.

5.4.6. init-tuple-capacity

The value of the init-tuple-capacity option is an integer that is used as a hint to the data store specifying the number of tuples that the store will contain. This hint is used to initialize certain data structures to the sizes that ensure faster importation of data. The actual number of tuples that a data store can contain is not limited by this option: RDFox will resize the data structures as needed if this hint is exceeded.

5.4.7. invalid-literal-policy

The invalid-literal-policy option governs how RDFox handles invalid literals in imported data, queries, and updates.

  • error: Invalid literals in the input are treated as errors, and so input containing such literals cannot be processed. This is the default.

  • as-string: During import, invalid literals are converted to string literals during import and a warning is emitted alerting the user to the fact that the value was converted. In queries and updates, invalid literals are converted to strings, but no warning is emitted.

  • as-string-silent: Invalid literals are converted to string literals in the import data, queries, and updates, but without emitting a warning.

5.4.8. max-data-pool-size

The value of the max-data-pool-size option is an integer that determines the maximum number of bytes that RDFox can use to store resource values (e.g., IRIs and strings). Specifying this option can reduce significantly the amount of virtual memory that RDFox uses per data store.

5.4.9. max-resource-capacity

The value of the max-resource-capacity option is an integer that determines the maximum number of resources that can be stored in the data store. Specifying this option can reduce significantly the amount of virtual memory that RDFox uses per data store.

5.4.10. max-tuple-capacity

The value of the max-tuple-capacity option is an integer that determines the maximum number of tuples that can be stored by the in-memory tuple tables of a data store. Specifying this option can reduce significantly the amount of virtual memory that RDFox uses per data store.

5.4.11. persist-ds

The persist-ds option controls how RDFox persists data contained in a data store. The option can be set to:

  • off. The content of the data store will reside in memory only and will discarded when RDFox exits.

  • file. The content of the data store will be automatically and incrementally saved to a file within the server directory. This option can be selected only if the server parameter persist-ds is also set to file.

  • file-sequence. The content of the data store will be automatically and incrementally saved to a sequence of files within the server directory. This option can be selected only if the server parameter persist-ds is also set to file-sequence.

If the persist-ds option is not specified for a data store then it will use the value of the persist-ds option specified for the server. Please refer to Section 13.2 for more information on how to configure persistence in RDFox.

5.4.12. quad-table-type

The parameter quad-table-type, determines the type of the table Quads, which is used to store named graph triples. The available options are quad-table-lg, the default value, and quad-table-sg. Tuple table type quad-table-lg uses indexing suitable for the typical use cases where named graphs contain non-trivial amount of facts. In contrast, the type quad-table-sg uses indexing suitable for the rare cases where each graph consists of a very few triples.

5.4.13. swrl-negation-as-failure

The swrl-negation-as-failure option determines how RDFox treats ObjectComplementOf class expressions in SWRL rules.

  • off. SWRL rules are interpreted under the open-world assumption and SWRL rules featuring ObjectComplementOf are rejected. This is the default value.

  • on. SWRL rules are interpreted under the closed-world assumption, as described in Section 10.7.3.

5.4.14. type

The type option determines the storage scheme used by the data store. The value determines the maximum capacity of a data store (i.e., the maximum number of resources and/or facts), its memory footprint, the speed with which it can answer certain types of queries, and whether a data store can be used concurrently. The following data store types are currently supported:

  • parallel-nn (default)

  • parallel-nw

  • parallel-ww

In suffixes nn, nw, and ww, the first character determines whether the system uses 32-bit (n for narrow) or 64-bit (w for wide) unsigned integers for representing resource IDs, and the second character determines whether the system uses 32-bit (n) or 64-bit (w) unsigned integers for representing triple IDs. Thus, an nw store can contain at most 4 × 109 resources and at most 1.8 × 1019 triples.