Checkpointing

XTDB nodes can save checkpoints of their query indices on a regular basis, so that new nodes can start to service queries faster.

XTDB nodes that join a cluster have to obtain a local set of query indices before they can service queries. These can be built by replaying the transaction log from the beginning, although this may be slow for clusters with a lot of history. Checkpointing allows the nodes in a cluster to share checkpoints into a central 'checkpoint store', so that nodes joining a cluster can retrieve a recent checkpoint of the query indices, rather than replaying the whole history.

The checkpoint store is a pluggable module - there are a number of officially supported implementations:

XTDB nodes in a cluster don’t explicitly communicate regarding which one is responsible for creating a checkpoint - instead, they check at random intervals to see whether any other node has recently created a checkpoint, and create one if necessary. The desired frequency of checkpoints can be set using approx-frequency.

The default lifecycle of checkpoints is unbounded - XTDB will not attempt to clean up old checkpoints. Users will typically want to define a retention policy that best fits their requirements and create means of implementing that policy that is external to XTDB.

Setting up

You can enable checkpoints on your index-store by adding a :checkpointer dependency to the underlying KV store:

  • JSON

  • Clojure

  • EDN

{
  "xtdb/index-store": {
    "kv-store": {
      "xtdb/module": "xtdb.rocksdb/->kv-store",
      ...
      "checkpointer": {
        "xtdb/module": "xtdb.checkpoint/->checkpointer",
        "store": {
          "xtdb/module": "xtdb.checkpoint/->filesystem-checkpoint-store",
          "path": "/path/to/cp-store"
        },
        "approx-frequency": "PT6H"
      }
    }
  },
  ...
}
{:xtdb/index-store {:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
                               ...
                               :checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
                                              :store {:xtdb/module 'xtdb.checkpoint/->filesystem-checkpoint-store
                                                      :path "/path/to/cp-store"}
                                              :approx-frequency (Duration/ofHours 6)}}}
 ...}
{:xtdb/index-store {:kv-store {:xtdb/module xtdb.rocksdb/->kv-store
                               ...
                               :checkpointer {:xtdb/module xtdb.checkpoint/->checkpointer
                                              :store {:xtdb/module xtdb.checkpoint/->filesystem-checkpoint-store
                                                      :path "/path/to/cp-store"}
                                              :approx-frequency "PT6H"}}}
 ...}

Checkpointer parameters

  • approx-frequency (required, Duration): approximate frequency for the cluster to save checkpoints

  • store: (required, CheckpointStore): see the individual store for more details.

  • checkpoint-dir (string/File/Path): temporary directory to store checkpoints in before they’re uploaded

  • keep-dir-between-checkpoints? (boolean, default true): whether to keep the temporary checkpoint directory between checkpoints

  • keep-dir-on-close? (boolean, default false): whether to keep the temporary checkpoint directory when the node shuts down

FileSystem Checkpoint Store parameters

  • path (required, string/File/Path/URI): path to store checkpoints.

Note about using FileSystem Checkpoint Store

Most Cloud providers offer some high-performance block storage facility (e.g. AWS EBS or Google Cloud Storage). These volumes usually can be mounted on compute platforms (e.g. EC2) or containers (e.g. K8S). Often the Cloud provider also offers tooling to quickly snapshot those volumes (see EBS snapshots). A possible checkpoint strategy for XTDB nodes running in a cloud environment with such a facility, would consist in:

  • having [:checkpointer :store :path] pointing to a filesystem mounted on such a volume

  • having the joining XTDB node [:checkpointer :store :path] point to a filesystem mounted an a snapshot from the main node (above).