Datalog Queries

Introduction

XTDB is a schemaless document database that provides you with a comprehensive means of traversing and querying across all of your documents. XTDB automatically indexes the top-level fields in all documents, supporting efficient ad-hoc joins and retrievals. XTDB is both immutable and bitemporal, which are both required characteristics for a safe schemaless database.

XTDB is also a graph database. The central characteristic of a graph database is that it can support arbitrary-depth graph queries (recursive traversals) very efficiently by default, without any need for schema-level optimisations. Graph queries are possible through the XTDB dialect of the Datalog query language.

EDN Datalog

Extensible Data Notation (edn) is a simple data format that is used to describe XTDB Datalog queries. To understand how edn works, read the brief description at http://edn-format.org.

To learn about Datalog queries and get familiar with the concepts, read the Learn XTDB Datalog Today tutorial.

Basic Structure

A query in XTDB is performed by calling xt/q on an XTDB database snapshot with a quoted map and, optionally, additional arguments.

(xt/q
 (xt/db node) (1)
 '{:find [p1] (2)
   :where [[p1 :name n]
           [p1 :last-name n]
           [p1 :name name]]
   :in [name]}
 "Ivan") (3)
1 Database value. Usually the snapshot view comes from calling xt/db on an XTDB node
2 Query map or vector (ex. [:find …​etc…​ ], as found in other Datalog databases)
3 Argument(s) supplied to the :in relations

The query map accepts the following Keywords

Table 1. Query Keys
Key Type Purpose

:find

Vector

Specify values to be returned

:where

Vector

Restrict the results of the query

:in

Vector

Specify external arguments

:order-by

Vector

Control the result order

:limit

Int

Specify how many results to return

:offset

Int

Specify how many results to discard

:rules

Vector

Define powerful statements for use in :where clauses

:timeout

Int

Specify maximum query run time in ms

Find

The find clause of a query specifies what values to be returned. These will be returned as a vector.

Logic Variable

You can specify logic variables within your query without first declaring them elsewhere. Logic variables are commonly prefix with a ? (question mark) as a stylistic convention but it is not necessary and can simply be considered part of the variable’s identifier. For example, foo or ?foo can be used equivalently, but they are not equal and will be treated as independent logic variables.

The following will return all last names bound to the logic variable n.

(xt/q
 (xt/db node)
 '{:find [n]
   :where [[p :last-name n]]})

Expressions

You can use basic Clojure-like expressions within a :find parameter.

{:find [product-name
        (* net-price (+ 1.0 tax-rate))
        (if (> stock 0)
          "in stock"
          "out of stock")]
 :where [[p :product-name product-name]
         [p :net-price net-price]
         [p :tax-rate tax-rate]
         [p :stock-available stock]]})
  • let, and other macros are unsupported - you can create and call fully-qualified Clojure functions for more advanced use cases.

  • Function calls are guarded by :fn-allow-list, if it was specified in the node configuration.

Aggregates

You can specify an aggregate function to apply to at most one logic variable.

Table 2. Built-in Aggregate Functions

Usage

Description

(sum <expr>)

Accumulates as a single value via the Clojure + function

(min <expr>)

Return a single value via the Clojure compare function which may operate on many types (integers, strings, collections etc.)

(max <expr>)

(count <expr>)

Return a single count of all values including any duplicates (note: this always performs a scan - no statistics or other stateful materializations are used)

(avg <expr>)

Return a single value equivalent to sum / count

(median <expr>)

Return a single value corresponding to the statistical definition

(variance <expr>)

(stddev <expr>)

(rand N <expr>)

Return a vector of exactly N values, where some values may be duplicates if N is larger than the range

(sample N <expr>)

Return a vector of at-most N distinct values

(distinct <expr>)

Return a set of distinct values

  • Aggregates cannot be nested within another - e.g. (sum (count ?x)) is disallowed.

  • Results are implicitly grouped by all logic variables referred to outside of aggregations (even if the same variables are also referred to within aggregations). For example, in the following example, the results are grouped by ?a and ?b:

    {:find [?a (/ ?b (sum (+ ?b ?c)))], ...}

Example:

(xt/q (xt/db node)
      '{:find [(sum ?heads)
               (min ?heads)
               (max ?heads)
               (count ?heads)
               (count-distinct ?heads)]
        :in [[[?monster ?heads]]]}
      [["Cerberus" 3]
       ["Medusa" 1]
       ["Cyclops" 1]
       ["Chimera" 1]])

;; =>
#{[6 1 3 4 2]}

Custom Aggregates

Custom (user-defined) aggregates are supported by adding a new method (via Clojure defmethod) for xtdb.query/aggregate. This method takes a single (ignored) parameter and returns a multi-arity function which accepts zero parameters, an accumulator, or an accumulator and a single entity. For example:

(defmethod xtdb.query/aggregate 'sort-reverse [_]
  (fn
    ([] [])
    ([acc] (vec (reverse (sort acc))))
    ([acc x] (conj acc x))))

Pull

XTDB queries support a pull syntax, allowing you to decouple specifying which entities you want from what data you’d like about those entities in your queries. XTDB’s support is based on the excellent EDN Query Language (EQL) library.

To specify what data you’d like about each entity, include a (pull ?logic-var projection-spec) entry in the :find clause of your query:

;; with just 'query':
(xt/q
 (xt/db node)
 '{:find [?uid ?name ?profession]
   :where [[?user :user/id ?uid]
           [?user :user/name ?name]
           [?user :user/profession ?profession]]})
#{[1 "Ivan" :doctor] [2 "Sergei" :lawyer], [3 "Petr" :doctor]}

;; using `pull`:
(xt/q
 (xt/db node)
 '{:find [(pull ?user [:user/name :user/profession])]
   :where [[?user :user/id ?uid]]})
#{[{:user/name "Ivan" :user/profession :doctor}]
  [{:user/name "Sergei" :user/profession :lawyer}]
  [{:user/name "Petr" :user/profession :doctor}]}

You can quickly grab the whole document by specifying * in the projection spec:

(xt/q
 (xt/db node)
 '{:find [(pull ?user [*])]
   :where [[?user :user/id 1]]})
#{[{:xt/id :ivan :user/id 1, :user/name "Ivan", :user/profession :doctor}]}

If you have the entity id(s) in hand, you can call pull or pull-many directly:

           ;; using `pull`:
           (xt/pull
            (xt/db node)
            [:user/name :user/profession]
            :ivan)
           {:user/name "Ivan", :user/profession :doctor}
           ;; using `pull-many`:
           (xt/pull-many
            (xt/db node)
            [:user/name :user/profession]
            [:ivan :sergei])
           [{:user/name "Ivan", :user/profession :doctor},
            {:user/name "Sergei", :user/profession :lawyer}]

We can navigate to other entities (and hence build up nested results) using joins. Joins are specified in {} braces in the projection-spec - each one maps one join key to its nested spec:

;; with just 'query':
(xt/q
 (xt/db node)
 '{:find [?uid ?name ?profession-name]
   :where [[?user :user/id ?uid]
           [?user :user/name ?name]
           [?user :user/profession ?profession]
           [?profession :profession/name ?profession-name]]})
#{[1 "Ivan" "Doctor"] [2 "Sergei" "Lawyer"] [3 "Petr" "Doctor"]}

;; using `pull`:
(xt/q
 (xt/db node)
 '{:find [(pull ?user [:user/name {:user/profession [:profession/name]}])]
   :where [[?user :user/id ?uid]]})
#{[{:user/name "Ivan" :user/profession {:profession/name "Doctor"}}]
  [{:user/name "Sergei" :user/profession {:profession/name "Lawyer"}}]
  [{:user/name "Petr" :user/profession {:profession/name "Doctor"}}]}

We can also navigate in the reverse direction, looking for entities that refer to this one, by prepending _ to the attribute name:

(xt/q
 (xt/db node)
 '{:find [(pull ?profession [:profession/name {:user/_profession [:user/id :user/name]}])]
   :where [[?profession :profession/name]]})
#{[{:profession/name "Doctor"
    :user/_profession [{:user/id 1 :user/name "Ivan"},
                       {:user/id 3 :user/name "Petr"}]}]
  [{:profession/name "Lawyer"
    :user/_profession [{:user/id 2 :user/name "Sergei"}]}]}

Attribute parameters

XTDB pull syntax supports a handful of custom EQL parameters, specified by wrapping the :attribute key in a pair: (:attribute {:param :value, …​}).

  • :as - to rename attributes in the result, wrap the attribute in (:source-attribute {:as :output-name}):

    {:find [(pull ?profession [:profession/name
                               {(:user/_profession {:as :users}) [:user/id :user/name]}])]
     :where [[?profession :profession/name]]}
    
    ;; => [{:profession/name "Doctor",
    ;;      :users [{:user/id 1, :user/name "Ivan"},
    ;;              {:user/id 3, :user/name "Petr"}]},
    ;;     {:profession/name "Lawyer",
    ;;      :users [{:user/id 2, :user/name "Sergei"}]}]
  • :limit - limit the amount of values returned under the given property/join: (:attribute {:limit 5})

  • :default - specify a default value if the matched document doesn’t contain the given attribute: (:attribute {:default "default"})

  • :into - specify the collection to pour the results into: (:attribute {:into #{}})

    {:find [(pull ?profession [:profession/name
                               {(:user/_profession {:as :users, :into #{}})
                               [:user/id :user/name]}])]
     :where [[?profession :profession/name]]}
    
    ;; => [{:profession/name "Doctor",
    ;;      :users #{{:user/id 1, :user/name "Ivan"},
    ;;               {:user/id 3, :user/name "Petr"}}},
    ;;     {:profession/name "Lawyer",
    ;;      :users #{{:user/id 2, :user/name "Sergei"}}}]
  • :cardinality (reverse joins) - by default, reverse joins put their values in a collection - for many-to-one/one-to-one reverse joins, specify {:cardinality :one} to return a single value.

For full details on what’s supported in the projection-spec, see the EQL specification

Returning maps

To return maps rather than tuples, supply the map keys under :keys for keywords, :syms for symbols, or :strs for strings:

(xt/q
 (xt/db node)
 '{:find [?name ?profession-name]
   :keys [name profession]
   :where [[?user :user/id 1]
           [?user :user/name ?name]
           [?user :user/profession ?profession]
           [?profession :profession/name ?profession-name]]})
#{{:name "Ivan", :profession "Doctor"}}

Where

The :where section of a query limits the combinations of possible results by satisfying all clauses and rules in the supplied vector against the database (and any :in relations).

Table 3. Valid Clauses

Name

Description

triple clause

Restrict using EAV indexes

predicate

Restrict with any predicate

range predicate

Restrict with any of < <= >= > =

unification predicate

Unify two distinct logic variables with != or ==

not rule

Negate a list of clauses

not-join rule

Not rule with its own scope

or rule

Restrict on at least one matching clause

or-join rule

Or with its own scope

defined rule

Restrict with a user-defined rule

Clause Inputs

Clauses may refer to combinations of literal values and logic variables as inputs. Identical logic variables used across multiple clauses unify automatically (with the exception of or-join or not-join scoping).

A literal set containing literal values (e.g. #{"val-1" "val-2"}) will be interpreted as an input relation of distinct values, and can be used in place of any literal input (e.g. a set of entity IDs in the first position of a triple clause).

Triple

A triple clause is a vector of (1) a literal entity ID or a logic variable, (2) a hard-coded attribute keyword (top-level key in a document), and (3) optionally, a value which can be a literal or a logic variable.

It restricts results by matching EAV facts

(xt/q
 (xt/db node)
 '{:find [p]
   :where [[p :name]]}) (1)

(xt/q
 (xt/db node)
 '{:find [p]
   :where [[p :name "Ivan"]]}) (2)

(xt/q
 (xt/db node)
 '{:find [p]
   :where [[q :name n]
           [p :last-name n]]}) (3)
1 This matches all entities, p, which have a :name field.
2 This matches all entities, p, which have a :name of "Ivan".
3 This matches all entities, p, which have a :name which match the :last-name of q.

Note that the keyword attribute must always be specified and therefore you cannot use a logic variable in the attribute position. You can however enumerate combinations of known attributes using rules, or execute independent queries efficiently using openDB (open-db). A list of all known attributes is available via the attributeStats (attribute-stats) API.

Where fine-grained query join ordering control is desired, and in particularly the e should always be resolved before the v, the built-in (see get-attr) lookup function may be a useful substitute to consider instead of a triple clause.

Predicates

Any fully qualified Clojure function that returns a boolean can be used as a "filter" predicate clause.

Predicate clauses must be placed in a clause, i.e. with a surrounding vector.

(xt/q
 (xt/db node)
 '{:find [p]
   :where [[p :age age]
           [(odd? age)]]})

This matches all entities, p which have an odd :age.

Subqueries

You can nest a subquery with a :where clause to bind the result for further use in the query.

Binding results as a scalar

(xt/q
 (xt/db node)
 '{:find [x]
   :where [[(q {:find [y]
                :where [[(identity 2) x]
                        [(+ x 2) y]]})
            x]]})

In the above query, we perform a subquery doing some arithmetic operations and returning the result - and bind the resulting relation as a scalar.

Result set:

#{[[[4]]]}

Binding results as a tuple

(xt/q
 (xt/db node)
 '{:find [x]
   :where [[(q {:find [y]
                :where [[(identity 2) x]
                        [(+ x 2) y]]})
            [[x]]]]})

Similar to the previous query, except we bind the resulting relation as a tuple.

Result set:

#{[4]}

In this example, we bind the results of a subquery and use them to return another result.

(xt/q
 (xt/db node)
 '{:find [x y z]
   :where [[(q {:find [x y]
                :where [[(identity 2) x]
                        [(+ x 2) y]]})
            [[x y]]]
           [(* x y) z]]})

Result set:

#{[2 4 8]}

Any fully qualified Clojure function can also be used to return relation bindings in this way, by returning a list, set or vector.

Range Predicate

A range predicate is a vector containing a list of a range operator and then two logic variables or literals.

Allowed range operators are <, <=, >=, >, and =.

(xt/q
 (xt/db node)
 '{:find [p] (1)
   :where [[p :age a]
           [(> a 18)]]})

(xt/q
 (xt/db node)
 '{:find [p] (2)
   :where [[p :age a]
           [q :age b]
           [(> a b)]]})

(xt/q
 (xt/db node)
 '{:find [p] (3)
   :where [[p :age a]
           [(> 18 a)]]})
1 Finds any entity, p, with an :age which is greater than 18
2 Finds any entity, p, with an :age which is greater than the :age of any entity
3 Finds any entity, p, for which 18 is greater than :age of p

Unification Predicate

Use a unification predicate, either == or !=, to constrain two independent logic variables. Literals (and sets of literals) can also be used in place of one of the logic variables.

;; Find all pairs of people with the same age:

[[p :age a]
 [p2 :age a2]
 [(== a a2)]]

;; ...is approximately equivalent to...

[[p :age a]
 [p2 :age a]]

;; Find all pairs of people with different ages:

[[p :age a]
 [p2 :age a2]
 [(!= a a2)]]

;; ...is approximately equivalent to...

[[p :age a]
 [p2 :age a2]
 (not [(= a a2]])]

Not

The not clause rejects a graph if all the clauses within it are true.

[{:xt/id :petr-ivanov :name "Petr" :last-name "Ivanov"} (1)
 {:xt/id :ivan-ivanov :name "Ivan" :last-name "Ivanov"}
 {:xt/id :ivan-petrov :name "Ivan" :last-name "Petrov"}
 {:xt/id :petr-petrov :name "Petr" :last-name "Petrov"}]

(xt/q
 (xt/db node)
 '{:find [e]
   :where [[e :xt/id]
           (not [e :last-name "Ivanov"] (2)
                [e :name "Ivan"])]})

#{[:petr-ivanov] [:petr-petrov] [:ivan-petrov]} (3)
1 Data
2 Query
3 Result

This will match any document which does not have a :name of "Ivan" and a :last-name of "Ivanov".

Not Join

The not-join rule allows you to restrict the possibilities for logic variables by asserting that there does not exist a match for a given sequence of clauses.

You declare which logic variables from outside the not-join scope are to be used in the join.

Any other logic variables within the not-join are scoped only for the join.

[{:xt/id :ivan :name "Ivan" :last-name "Ivanov"} (1)
 {:xt/id :petr :name "Petr" :last-name "Petrov"}
 {:xt/id :sergei :name "Sergei" :last-name "Sergei"}]

(xt/q
 (xt/db node)
 '{:find [e]
   :where [[e :xt/id]
           (not-join [e] (2)
                     [e :last-name n] (3)
                     [e :name n])]})

#{[:ivan] [:petr]} (4)
1 Data
2 Declaration of which logic variables need to unify with the rest of the query
3 Clauses
4 Result

This will match any entity, p, which has different values for the :name and :last-name field.

Importantly, the logic variable n is unbound outside the not-join clause.

Or

An or clause is satisfied if any of its legs are satisfied.

[{:xt/id :ivan-ivanov-1 :name "Ivan" :last-name "Ivanov" :sex :male} (1)
 {:xt/id :ivan-ivanov-2 :name "Ivan" :last-name "Ivanov" :sex :male}
 {:xt/id :ivan-ivanovtov-1 :name "Ivan" :last-name "Ivannotov" :sex :male}
 {:xt/id :ivanova :name "Ivanova" :last-name "Ivanov" :sex :female}
 {:xt/id :bob :name "Bob" :last-name "Controlguy"}]

(xt/q
 (xt/db node)
 '{:find [e] (2)
   :where [[e :name name]
           [e :name "Ivan"]
           (or [e :last-name "Ivanov"]
               [e :last-name "Ivannotov"])]})

#{[:ivan-ivanov-1] [:ivan-ivanov-2] [:ivan-ivanovtov-1]} (3)
1 Data
2 Query
3 Result

This will match any document, p, which has a :last-name of "Ivanov" or "Ivannotov".

When within an or rule, you can use and to group clauses into a single leg (which must all be true).

(xt/q
 (xt/db node)
 '{:find [name]
   :where [[e :name name]
           (or [e :sex :female]
               (and [e :sex :male]
                    [e :name "Ivan"]))]})

Whenever the query engine complains that each leg in the or or or-join clause requires the "same logic variables", you can add a no-op predicate clause like [(any? e)] within and clauses for each of the missing variables in the various legs. You need to add such no-op predicates until each leg contains the same set of variables.

(xt/q
 (xt/db node)
 '{:find [name]
   :where [[e :name name]
           (or (and [e :sex :female]
                    [(= name "Ivanova")])
               (and [e :sex :male]
                    [(any? name)]))]})

Hypothetically, XTDB could automatically detect these cases and insert no-op predicates on the user’s behalf, but that would be a deviation from the essential semantics of Datalog. Note that clojure.core/any? is a function that always returns true.

Or Join

The or-join clause is satisfied if any of its legs are satisfied.

You declare which logic variables from outside the or-join scope are to be used in the join.

Any other logic variables within the or-join are scoped only for the join.

[{:xt/id :ivan :name "Ivan" :age 12} (1)
 {:xt/id :petr :name "Petr" :age 15}
 {:xt/id :sergei :name "Sergei" :age 19}]

(xt/q
 (xt/db node)
 '{:find [p]
   :where [[p :xt/id]
           (or-join [p] (2)
                    (and [p :age a] (3)
                         [(>= a 18)])
                    [p :name "Ivan"])]})

#{[:ivan] [:sergei]} (4)
1 Data
2 Declaration of which logic variables need to unify with the rest of the query
3 Clauses
4 Result

This will match any document, p which has an :age greater than or equal to 18 or has a :name of "Ivan".

Importantly, the logic variable a is unbound outside the or-join clauses.

Rules

In

XTDB queries can take a set of additional arguments, binding them to variables under the :in key within the query.

:in supports various kinds of binding.

Scalar binding

(xt/q
 (xt/db node)
 '{:find [e]
   :in [first-name]
   :where [[e :name first-name]]}
 "Ivan")

In the above query, we parameterize the first-name symbol, and pass in "Ivan" as our input, binding "Ivan" to first-name in the query.

Result Set:

#{[:ivan]}

Collection binding

(xt/q
 (xt/db node)
 '{:find [e]
   :in [[first-name ...]]
   :where [[e :name first-name]]}
 ["Ivan" "Petr"])

This query shows binding to a collection of inputs - in this case, binding first-name to all of the different values in a collection of first-names.

Result Set:

#{[:ivan] [:petr]}

Tuple binding

(xt/q
 (xt/db node)
 '{:find [e]
   :in [[first-name last-name]]
   :where [[e :name first-name]
           [e :last-name last-name]]}
 ["Ivan" "Ivanov"])

In this query we are binding a set of variables to a single value each, passing in a collection as our input. In this case, we are passing a collection with a first-name followed by a last-name.

Result Set:

#{[:ivan]}

Relation binding

(xt/q
 (xt/db node)
 '{:find [e]
   :in [[[first-name last-name]]]
   :where [[e :name first-name]
           [e :last-name last-name]]}
 [["Petr" "Petrov"]
  ["Smith" "Smith"]])

Here we see how we can extend the parameterisation to match using multiple fields at once by passing and destructuring a relation containing multiple tuples.

Result Set:

#{[:petr] [:smith]}

Ordering and Pagination

A Datalog query naturally returns a result set of tuples, however, the tuples can also be consumed as a sequence and therefore you will always have an implicit order available. Ordinarily this implicit order is undefined (i.e. not meaningful), because the join order and result order are unlikely to correlate.

The :order-by option is available for use in the query map to explicitly control the result order.

(xt/q
 (xt/db node)
 '{:find [time device-id temperature humidity]
   :where [[c :condition/time time]
           [c :condition/device-id device-id]
           [c :condition/temperature temperature]
           [c :condition/humidity humidity]]
   :order-by [[time :desc] [device-id :asc]]})

Use of :order-by will require that results are fully-realised by the query engine, however this happens transparently and it will automatically spill to disk when sorting large numbers of results.

Basic :offset and :limit options are supported however typical pagination use-cases will need a more comprehensive approach because :offset will naively scroll through the initial result set each time.

(xt/q
 (xt/db node)
 '{:find [time device-id temperature humidity]
   :where [[c :condition/time time]
           [c :condition/device-id device-id]
           [c :condition/temperature temperature]
           [c :condition/humidity humidity]]
   :order-by [[device-id :asc]]
   :limit 10
   :offset 90})

Ordered results are returned as bags, not sets, so you may want to deduplicate consecutive identical result tuples (e.g. using clojure.core/dedupe or similar).

:limit may be used in isolation, without :order-by, and will also return a bag of results that can contain duplicates. This will process exactly the required number of results from the underlying streaming query to satisfy the limit, which can be useful for minimizing unnecessary processing.

To use :order-by with an aggregate, simply restate the aggregate element exactly as it is written in your :find vector.

More powerful ordering and pagination features may be provided in the future. Feel free to open an issue or get in touch to discuss your ordering requirements, e.g. see #1514

Rules

Rules are defined by a rule head and then clauses as you would find in a :where statement.

They can be used as a shorthand for when you would otherwise be repeating the same restrictions in your :where statement.

(xt/q
 (xt/db node)
 '{:find [p]
   :where [(adult? p)] (1)
   :rules [[(adult? p) (2)
            [p :age a] (3)
            [(>= a 18)]]]})
1 Rule usage clause (i.e. invocation)
2 Rule head (i.e. signature)
3 Rule body containing one or more clauses

The above defines the rule named adult? which checks that the supplied entity has an :age which is >= 18

Multiple rule bodies may be defined for a single rule name (i.e. using matching rule heads) which works in a similar fashion to an or-join.

The clauses within Rules can also be further Rule invocation clauses. This allows for the recursive traversal of entities and more.

(xt/q
 (xt/db node)
 '{:find [?e2]
   :in [?e1]
   :where [(follow ?e1 ?e2)]
   :rules [[(follow ?e1 ?e2)
            [?e1 :follow ?e2]]
           [(follow ?e1 ?e2)
            [?e1 :follow ?t]
            (follow ?t ?e2)]]}
 :ivan)

This example finds all entities that the entity with :name "Smith" is connected to via :follow, even if the connection is via intermediaries.

Bound arguments

To improve the performance of a rule you can specify that certain arguments in the rule head must be "bound" logic variables (i.e. there must be known values for each argument at the point of evaluation) by enclosing them in a vector in the first argument position. Any remaining arguments will be treated as regular "free" logic variables.

As an analogy, bound variables are input arguments to a function, and free variables are the destructured return values from that function.

Changes are only necessary in the rule head(s) - no changes are required in the body or the usage clauses. Rule heads must always match.

For example, the following query and rule set will work and return the correct results.

(xt/q
 (xt/db node)
 '{:find [child-name]
   :in [parent]
   :where [[parent :xt/id]
           (child-of parent child)
           [child :name child-name]]
   :rules [[(child-of p c)
            [p :child c]]
           [(child-of p c)
            [p :child c1]
            (child-of c1 c)]]}
 parent-id)

However, specifying that the p variable should be bound before the rule can be evaluated will improve the evalution time by many orders-of-magnitude for large data sets.

(xt/q
 (xt/db node)
 '{:find [child-name]
   :in [parent]
   :where [[parent :xt/id]
           (child-of parent child)
           [child :name child-name]]
   :rules [[(child-of [p] c)
            [p :child c]]
           [(child-of [p] c)
            [p :child c1]
            (child-of c1 c)]]}
 parent-id)

Timeout

:timeout sets the maximum run time of the query (in milliseconds).

If the query has not completed by this time, a java.util.concurrent.TimeoutException is thrown.

Valid Time travel

When performing a query, xt/q is called on a database snapshot.

To query based on a different Valid Time, create this snapshot by specifying the desired Valid Time when we call db on the node.

(xt/submit-tx
 node
 [[::xt/put
   {:xt/id :malcolm :name "Malcolm" :last-name "Sparks"}
   #inst "1986-10-22"]])

(xt/submit-tx
 node
 [[::xt/put
   {:xt/id :malcolm :name "Malcolma" :last-name "Sparks"}
   #inst "1986-10-24"]])

Here, we have put different documents in XTDB with different Valid Times.

(def q
  '{:find [e]
    :where [[e :name "Malcolma"]
            [e :last-name "Sparks"]]})

Here, we have defined a query, q to find all entities with a :name of "Malcolma" and :last-name of "Sparks"

We can run the query at different Valid Times as follows

(xt/q (xt/db node #inst "1986-10-23") q)

(xt/q (xt/db node) q)

The first query will return an empty result set (#{}) because there isn’t a document with the :name "Malcolma" valid at #inst "1986-10-23"

The second query will return #{[:malcolm]} because the document with :name "Malcolma" is valid at the current time. This will be the case so long as there are no newer versions (in the valid time axis) of the document that affect the current valid time version.

Joins

Query: "Join across entities on a single attribute"

Given the following documents in the database

[{:xt/id :ivan :name "Ivan"}
 {:xt/id :petr :name "Petr"}
 {:xt/id :sergei :name "Sergei"}
 {:xt/id :denis-a :name "Denis"}
 {:xt/id :denis-b :name "Denis"}]

We can run a query to return a set of tuples that satisfy the join on the attribute :name

(xt/q
 (xt/db node)
 '{:find [p1 p2]
   :where [[p1 :name n]
           [p2 :name n]]})

Result Set:

#{[:ivan :ivan]
  [:petr :petr]
  [:sergei :sergei]
  [:denis-a :denis-a]
  [:denis-b :denis-b]
  [:denis-a :denis-b]
  [:denis-b :denis-a]}

Note that every person joins once, plus 2 more matches.

Query: "Join with two attributes, including a multi-valued attribute"

Given the following documents in the database

[{:xt/id :ivan :name "Ivan" :last-name "Ivanov"}
 {:xt/id :petr :name "Petr" :follows #{"Ivanov"}}]

We can run a query to return a set of entities that :follows the set of entities with the :name value of "Ivan"

(xt/q
 (xt/db node)
 '{:find [e2]
   :where [[e :last-name l]
           [e2 :follows l]
           [e :name "Ivan"]]})

Result Set:

#{[:petr]}

Note that because XTDB is schemaless there is no need to have elsewhere declared that the :follows attribute may take a value of edn type set.

Streaming Queries

Query results can also be streamed, which is particularly useful for advanced scenarios where complete results may not fit into memory. xtdb.api/open-q returns a Closeable sequence. Note that results are returned as bags, not sets, so you may wish to deduplicate consecutive identical result tuples (e.g. using clojure.core/dedupe or similar).

We recommend using with-open to ensure that the sequence is closed properly.

Ensure that the sequence is eagerly consumed (as much of it as you need) within the with-open block - attempting to consume the sequence from outside the block (either explicitly, or by accidentally returning a lazy sequence from the with-open block) will result in undefined behaviour (e.g. a JVM segfault crash when using RocksDB). Lazy work can be realized (and therefore avoided) using doall/dorun/doseq or similar.
(with-open [res (xt/open-q (xt/db node)
                             '{:find [p1]
                               :where [[p1 :name n]
                                       [p1 :last-name n]
                                       [p1 :name "Smith"]]})]
  (doseq [tuple (iterator-seq res)]
    (prn tuple)))

History API

Full Entity History

XTDB allows you to retrieve all versions of a given entity:

(xt/submit-tx
  node
  [[::xt/put
    {:xt/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 100}
    #inst "2018-05-18T09:20:27.966"]
   [::xt/put
    {:xt/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 1000}
    #inst "2015-05-18T09:20:27.966"]])

; yields
{::xt/tx-id 1555314836178,
 ::xt/tx-time #inst "2019-04-15T07:53:56.178-00:00"}

; Returning the history in descending order
; To return in ascending order, use :asc in place of :desc
(xt/entity-history (xt/db node) :ids.persons/Jeff :desc)

; yields
[{::xt/tx-time #inst "2019-04-15T07:53:55.817-00:00",
  ::xt/tx-id 1555314835817,
  ::xt/valid-time #inst "2018-05-18T09:20:27.966-00:00",
  ::xt/content-hash ; sha1 hash of document contents
  "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561"}
 {::xt/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  ::xt/tx-id 1555314836178,
  ::xt/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  ::xt/content-hash "a95f149636e0a10a78452298e2135791c0203529"}]

Retrieving previous documents

When retrieving the previous versions of an entity, you have the option to additionally return the documents associated with those versions (by using :with-docs? in the additional options map)

(xt/entity-history (xt/db node) :ids.persons/Jeff :desc {:with-docs? true})

; yields
[{::xt/tx-time #inst "2019-04-15T07:53:55.817-00:00",
  ::xt/tx-id 1555314835817,
  ::xt/valid-time #inst "2018-05-18T09:20:27.966-00:00",
  ::xt/content-hash
  "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561"
  ::xt/doc
  {:xt/id :ids.persons/Jeff
   :person/name "Jeff"
   :person/wealth 100}}
 {::xt/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  ::xt/tx-id 1555314836178,
  ::xt/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  ::xt/content-hash "a95f149636e0a10a78452298e2135791c0203529"
  ::xt/doc
  {:xt/id :ids.persons/Jeff
   :person/name "Jeff"
   :person/wealth 1000}}]

Document History Range

Retrievable entity versions can be bounded by four time coordinates:

  • valid-time-start

  • tx-time-start

  • valid-time-end

  • tx-time-end

All coordinates are inclusive. All coordinates can be null. The specified sort-order direction will be relative to the actual order of the start/end times provided.

The returned order is always sorted by valid-time first, and then tx-time second as the tie-break (e.g. when with-corrections? true is specified).

; Passing the additional 'opts' map with the start/end bounds.
; As we are returning results in :asc order, the map contains the earlier starting coordinates -
; If returning history range in descending order, we pass the later coordinates as start coordinates to the map
(xt/entity-history
 (xt/db node)
 :ids.persons/Jeff
 :asc
 {:start-valid-time #inst "2015-05-18T09:20:27.966"
  :start-tx {::xt/tx-time #inst "2015-05-18T09:20:27.966"}
  :end-valid-time #inst "2020-05-18T09:20:27.966"
  :end-tx {::xt/tx-time #inst "2020-05-18T09:20:27.966"}})

; yields
[{::xt/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  ::xt/tx-id 1555314836178,
  ::xt/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  ::xt/content-hash
  "a95f149636e0a10a78452298e2135791c0203529"}
 {::xt/tx-time #inst "2019-04-15T07:53:55.817-00:00",
  ::xt/tx-id 1555314835817
  ::xt/valid-time #inst "2018-05-18T09:20:27.966-00:00",
  ::xt/content-hash "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561"}]

Lazy Iteration

See the open-entity-history (Clojure) and openEntityHistory (Java) APIs.

History predicates

There also exists the predicates get-start-valid-time and get-end-valid-time to access the history of an entity inside a query. These allow to get the start valid-time and end-valid time of an entity with respect to the snapshot view of the given db.

(xt/submit-tx node [[::xt/put
                     {:xt/id 1 :hello :world}
                     #inst "1900-08-29T15:05:31.530-00:00"
                     #inst "2100-08-29T15:05:31.530-00:00"]])

(xt/q (xt/db node)
      '{:find [e start-time end-time]
        :where [[e :hello :world]
                [(get-start-valid-time e) start-time]
                [(get-end-valid-time e) end-time]]})

;yields
#{[1
   #inst "1900-08-29T15:05:31.530-00:00"
   #inst "2100-08-29T15:05:31.530-00:00"]}

Clojure Tips

Quoting

Logic variables used in queries must always be quoted in the :find and :where clauses, which in the most minimal case could look like the following:

(xt/q db
  {:find ['?e]
   :where [['?e :event/employee-code '?code]]}))

However it is often convenient to quote entire clauses or even the entire query map rather than each individual use of every logic variable, for instance:

(xt/q db
  '{:find [?e]
    :where [[?e :event/employee-code ?code]]}))

Note that use of Clojure’s syntax quoting may cause confusion and is therefore not recommended. This is because Datalog built-ins like range predicates, unification predicates and rules (or, not etc.) will be mistakenly coerced into corresponding Clojure core functions (clojure.core/or, clojure.core/== etc.) and produce invalid queries.

Maps and Vectors in data

Say you have a document like so and you want to add it to an XTDB db:

{:xt/id :me
 :list ["carrots" "peas" "shampoo"]
 :pockets {:left ["lint" "change"]
           :right ["phone"]}}

XTDB decomposes top-level sets and vectors as a collection of into triples (which are fully indexed) so the query engine is able see all elements on the base level. As a result of this the query engine is not required to traverse any structures or any other types of search algorithm which would slow the query down. The same thing should apply for maps so instead of doing :pocket {:left thing :right thing} you should put them under a namespace, instead structuring the data as :pocket/left thing :pocket/right thing to put the data all on the base level. Like so:

(xt/submit-tx
  node
  [[::xt/put
    {:xt/id :me
     :list ["carrots" "peas" "shampoo"]
     :pockets/left ["lint" "change"]
     :pockets/right ["phone"]}]
   [::xt/put
    {:xt/id :you
     :list ["carrots" "tomatoes" "wig"]
     :pockets/left ["wallet" "watch"]
     :pockets/right ["spectacles"]}]])

To query inside these vectors the code would be:

(xt/q (xt/db node) '{:find [e l]
                     :where [[e :list l]]
                     :in [l]}
                   "carrots")
;; => #{[:you "carrots"] [:me "carrots"]}

(xt/q (xt/db node) '{:find [e p]
                     :where [[e :pockets/left p]]
                     :in [p]}
                   "watch")
;; => #{[:you "watch"]}

Note that l and p is returned as a single element as XTDB decomposes the vector

DataScript Differences

This list is not necessarily exhaustive and is based on the partial re-usage of DataScript’s query test suite within XTDB’s query tests.

XTDB does not support:

  • vars in the attribute position, such as [e ?a "Ivan"] or [e _ "Ivan"]

XTDB does not yet support:

  • ground (you can use alternatively use identity)

  • get-else (see get-attr which returns a relation instead)

  • get-some

  • missing? (however, instead of [(missing? $ ?e :height)] you can use (not-join [?e] [?e :height]))

  • reverse attribute syntax in triple clauses (i.e. [?child :example/_child ?parent]), however reverse attribute joins (like :example/_child) are supported in pull

Custom Functions and Advanced Examples

Many advanced query requirements can be addressed using custom predicate function calls since you can reference any function that is loaded (and available on the classpath) by using its fully qualified name in a Datalog clause, e.g. [(clojure.core/+ x y) z)]. These custom functions can be passed the db query context using the $ argument convention for sophisticated nesting of queries and other lookups. Destructuring the results of a function as a relation is also supported, similarly to :in bindings. The suite of functions in the clojure.core namespace may be referred to without the fully qualified prefix, including clojure.core/eval, however an allowlist feature is available to help maintain security (particularly relevant when the HTTP server is being used).

These functions can be used, for example, to access data that is deeply nested with a given top-level value.

[{:xt/id :foo
  :bar {:baz {:qux {:qaz 123}}}}] (1)

(xt/q
 (xt/db node)
 '{:find [v]
   :where [[:foo :bar bar-val]
           [(get-in bar-val [:baz :qux :qaz]) v]]}) (2)

(xt/q
 (xt/db node)
 '{:find [v]
   :in [k1 k2]
   :where [[(vector k1 k2 :qaz) ks] (3)
           [:foo :bar bar-val]
           [(get-in bar-val ks) v]]}
 :baz :qux)

#{[123]} (4)
1 Data
2 Usage of clojure.core/get-in function with literal vector of keys
3 Usage of clojure.core/vector function to construct intermediate vector value
4 Result

Be aware during development that query compilation internalizes function definitions and this means that subsequent re-definitions of your custom functions may not be reflected until the query is modified and therefore re-compiled. During development, you can introduce a dummy predicate clause like [(any? 1)] (which always evaluates to true and has no effect on the query) and increment the integer when needed to bypass the effects of query caching.

The full range of XTDB APIs are available:

(defn get-vt-created [db eid]
  (with-open [h (xt/open-entity-history db eid :asc)]
    (-> (iterator-seq h)
        first
        ::xt/valid-time)))

(defn get-vt-last-updated [db eid]
  (with-open [h (xt/open-entity-history db eid :desc)]
    (-> (iterator-seq h)
        first
        ::xt/valid-time)))

(xt/q (xt/db my-node)
      '{:find [e vt-created vt-last-updated]
        :limit 1
        :where [[e :xt/id]
                [(dev/get-vt-created $ e) vt-created]
                [(dev/get-vt-last-updated $ e) vt-last-updated]]})

;;=> [[-219 #inst "2021-10-15T12:01:06.289-00:00" #inst "2021-10-15T12:15:11.020-00:00"]]

Many examples of advanced queries are showcased across the query tests and the suite of benchmarks maintained in the bench sub-project.