AWS S3

You can use AWS’s Simple Storage Service (S3) as XTDB’s 'document store'.

Project Dependency

In order to use S3 within XTDB, you must first add S3 as a project dependency:

  • deps.edn

  • pom.xml

com.xtdb/xtdb-s3 {:mvn/version "1.23.3"}
<dependency>
    <groupId>com.xtdb</groupId>
    <artifactId>xtdb-s3</artifactId>
    <version>1.23.3</version>
</dependency>

Using S3 as a document store.

Replace the implementation of the document store with xtdb.s3/->document-store

  • JSON

  • Clojure

  • EDN

{
  "xtdb/document-store": {
    "xtdb/module": "xtdb.s3/->document-store",
    "bucket": "your-bucket",
    ...
  },
}
{:xtdb/document-store {:xtdb/module 'xtdb.s3/->document-store
                       :bucket "your-bucket"
                       ...}}
{:xtdb/document-store {:xtdb/module xtdb.s3/->document-store
                       :bucket "your-bucket"
                       ...}}

Parameters

  • configurator (S3Configurator)

  • bucket (string, required)

  • prefix (string): S3 key prefix

  • cache-size (int): size of in-memory document cache (number of entries, not bytes)

Using S3 as a checkpoint store

S3 can be used as a query index checkpoint store. Checkpoints aren’t GC’d by XTDB - we recommend you set a lifecycle policy on your bucket to remove older checkpoints.

Option 1: Using the legacy S3 checkpoint module

;; under :xtdb/index-store -> :kv-store -> :checkpointer
;; see the Checkpointing guide for other parameters
{:checkpointer {...
                :store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
                        :configurator ...
                        :bucket "..."
                        :prefix "..."
                 ...}}

Parameters

Configuring S3 requests

This is unfortunately currently only accessible from Clojure - we plan to expose it outside of Clojure soon.

While the above is sufficient to get xtdb-s3 working out of the box, there are a plethora of configuration options in S3 - how to get credentials, object properties, serialisation of the documents, etc. We expose these via the xtdb.s3.S3Configurator interface - you can supply an instance using the following in your node configuration.

Through this interface, you can supply an S3AsyncClient for xtdb-s3 to use, adapt the PutObjectRequest/GetObjectRequest as required, and choose the serialisation format. By default, we get credentials through the usual AWS credentials provider, and store documents using Nippy.

  • Clojure

{:xtdb/document-store {:xtdb/module 'xtdb.s3/->document-store
                       :configurator (fn [_]
                                       (reify S3Configurator
                                         ...)
                       ...}}

Option 2: Using the new checkpoint-transfer-manager module

  • Clojure

    {
              ...
              :checkpointer
              {:xtdb/module `xtdb.checkpoint/->checkpointer

               :store {:xtdb/module `xtdb.s3.checkpoint-transfer-manager/->cp-store
                       :bucket "checkpoint-bucket"
                       :prefix "checkpoint-dir"
                       :configurator `s3-configurator}         ;;;; ;; see below for an example

               :approx-frequency (java.time.Duration/ofSeconds 3600)}}}
               ...
    }

Although AWS Transfer Manager works fine with the regular S3AsyncClient it is recommended to use the new CRT-based S3 client in order to gain its full benefit. For example:

  • EDN

  • Clojure

  {:deps
     {
       ...
       software.amazon.awssdk/s3-transfer-manager {:mvn/version "2.19.21"}
       software.amazon.awssdk.crt/aws-crt {:mvn/version "0.21.1"}
       ...
     }
    ...
  }
(defn- s3-configurator [_]
  (reify S3Configurator
    (makeClient [_]
      (let [configurator
            (-> (S3AsyncClient/crtBuilder)
                (.credentialsProvider
                 (. ProfileCredentialsProvider create "dev-profile"))
                 (.targetThroughputInGbps 20.0)
                 (.minimumPartSizeInBytes (* 8 1024))
                (.build))]
        configurator))))

When using the CRT Client, S3 Transfer Manager uses multipart transfers: it is recommended that you configure the AbortIncompleteMultipartUpload policy on your bucket.