AWS S3

You can use AWS’s Simple Storage Service (S3) as as either:

  • XTDB’s 'document store'

  • An implementation of the XTDB query index checkpoint store.

Authentication

Authentication for both the document store and checkpoint store components within the module is handled via the S3 API, which in turn uses the Default AWS Credential Provider Chain. See the AWS documentation on the various methods which you can handle authentication to be able to make use of the operations inside the modules.

Whichever method of Authentication you use, ensure that they at least have the permissions to:

  • Read objects from the bucket

  • Write objects to the bucket

If using the checkpoint store, you will need the following permissions in addition to the above:

  • List blobs from the bucket

  • Delete blobs from the bucket

Project Dependency

In order to use S3 within XTDB, you must first add S3 as a project dependency:

  • deps.edn

  • pom.xml

com.xtdb/xtdb-s3 {:mvn/version "1.24.1"}
<dependency>
    <groupId>com.xtdb</groupId>
    <artifactId>xtdb-s3</artifactId>
    <version>1.24.1</version>
</dependency>

Using S3 as a document store.

Replace the implementation of the document store with xtdb.s3/->document-store

  • JSON

  • Clojure

  • EDN

{
  "xtdb/document-store": {
    "xtdb/module": "xtdb.s3/->document-store",
    "bucket": "your-bucket",
    ...
  },
}
{:xtdb/document-store {:xtdb/module 'xtdb.s3/->document-store
                       :bucket "your-bucket"
                       ...}}
{:xtdb/document-store {:xtdb/module xtdb.s3/->document-store
                       :bucket "your-bucket"
                       ...}}

Parameters

  • configurator (S3Configurator)

  • bucket (string, required)

  • prefix (string): S3 key prefix

  • cache-size (int): size of in-memory document cache (number of entries, not bytes)

Using S3 as a checkpoint store

S3 can be used as a query index checkpoint store. Checkpoints aren’t GC’d by XTDB - we recommend you set a lifecycle policy on your bucket to remove older checkpoints.

Option 1: Using the S3 checkpoint module

;; under :xtdb/index-store -> :kv-store -> :checkpointer
;; see the Checkpointing guide for other parameters
{:checkpointer {...
                :store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
                        :configurator ...
                        :bucket "..."
                        :prefix "..."
                 ...}}

Parameters

Configuring S3 requests

This is unfortunately currently only accessible from Clojure - we plan to expose it outside of Clojure soon.

While the above is sufficient to get xtdb-s3 working out of the box, there are a plethora of configuration options in S3 - how to get credentials, object properties, serialisation of the documents, etc. We expose these via the xtdb.s3.S3Configurator interface - you can supply an instance using the following in your node configuration.

Through this interface, you can supply an S3AsyncClient for xtdb-s3 to use, adapt the PutObjectRequest/GetObjectRequest as required, and choose the serialisation format. By default, we get credentials through the usual AWS credentials provider, and store documents using Nippy.

  • Clojure

{:xtdb/document-store {:xtdb/module 'xtdb.s3/->document-store
                       :configurator (fn [_]
                                       (reify S3Configurator
                                         ...))
                       ...}}

Option 2: Using the newer checkpoint-transfer-manager module

  • Clojure

    {
              ...
              :checkpointer
              {:xtdb/module `xtdb.checkpoint/->checkpointer

               :store {:xtdb/module `xtdb.s3.checkpoint-transfer-manager/->cp-store
                       :bucket "checkpoint-bucket"
                       :prefix "checkpoint-dir"
                       :configurator `s3-configurator}         ;;;; ;; see below for an example

               :approx-frequency (java.time.Duration/ofSeconds 3600)}
               ...
    }

Although AWS Transfer Manager works fine with the regular S3AsyncClient it is recommended to use the new CRT-based S3 client in order to gain its full benefit. For example:

  • EDN

  • Clojure

  {:deps
     {
       ...
       software.amazon.awssdk/s3-transfer-manager {:mvn/version "2.19.21"}
       software.amazon.awssdk.crt/aws-crt {:mvn/version "0.21.1"}
       ...
     }
    ...
  }
(defn- s3-configurator [_]
  (reify S3Configurator
    (makeClient [_]
      (let [configurator
            (-> (S3AsyncClient/crtBuilder)
                (.credentialsProvider
                 (. ProfileCredentialsProvider create "dev-profile"))
                 (.targetThroughputInGbps 20.0)
                 (.minimumPartSizeInBytes (* 8 1024))
                (.build))]
        configurator))))

When using the CRT Client, S3 Transfer Manager uses multipart transfers: it is recommended that you configure the AbortIncompleteMultipartUpload policy on your bucket.