AWS S3
You can use AWS’s Simple Storage Service (S3) as XTDB’s 'document store'.
Using S3 as a document store.
Replace the implementation of the document store with xtdb.s3/->document-store
{
"xtdb/document-store": {
"xtdb/module": "xtdb.s3/->document-store",
"bucket": "your-bucket",
...
},
}
{:xtdb/document-store {:xtdb/module 'xtdb.s3/->document-store
:bucket "your-bucket"
...}}
{:xtdb/document-store {:xtdb/module xtdb.s3/->document-store
:bucket "your-bucket"
...}}
Parameters
-
configurator
(S3Configurator
) -
bucket
(string, required) -
prefix
(string): S3 key prefix -
cache-size
(int): size of in-memory document cache (number of entries, not bytes)
Using S3 as a checkpoint store
S3 can be used as a query index checkpoint store. Checkpoints aren’t GC’d by XTDB - we recommend you set a lifecycle policy on your bucket to remove older checkpoints.
Option 1: Using the legacy S3 checkpoint
module
;; under :xtdb/index-store -> :kv-store -> :checkpointer
;; see the Checkpointing guide for other parameters
{:checkpointer {...
:store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
:configurator ...
:bucket "..."
:prefix "..."
...}}
Parameters
-
configurator
(S3Configurator
) -
bucket
(string, required) -
prefix
(string): S3 key prefix -
transfer-manager?
(boolean, optional defaultfalse
): Use AWS S3 Transfer Manager
Configuring S3 requests
This is unfortunately currently only accessible from Clojure - we plan to expose it outside of Clojure soon. |
While the above is sufficient to get xtdb-s3
working out of the box, there are a plethora of configuration options in S3 - how to get credentials, object properties, serialisation of the documents, etc.
We expose these via the xtdb.s3.S3Configurator
interface - you can supply an instance using the following in your node configuration.
Through this interface, you can supply an S3AsyncClient
for xtdb-s3 to use, adapt the PutObjectRequest
/GetObjectRequest
as required, and choose the serialisation format.
By default, we get credentials through the usual AWS credentials provider, and store documents using Nippy.
Option 2: Using the new checkpoint-transfer-manager
module
{
...
:checkpointer
{:xtdb/module `xtdb.checkpoint/->checkpointer
:store {:xtdb/module `xtdb.s3.checkpoint-transfer-manager/->cp-store
:bucket "checkpoint-bucket"
:prefix "checkpoint-dir"
:configurator `s3-configurator} ;;;; ;; see below for an example
:approx-frequency (java.time.Duration/ofSeconds 3600)}}}
...
}
Although AWS Transfer Manager works fine with the regular S3AsyncClient
it is recommended to use the new
CRT-based S3 client in order to gain its full benefit. For example:
{:deps
{
...
software.amazon.awssdk/s3-transfer-manager {:mvn/version "2.19.21"}
software.amazon.awssdk.crt/aws-crt {:mvn/version "0.21.1"}
...
}
...
}
(defn- s3-configurator [_]
(reify S3Configurator
(makeClient [_]
(let [configurator
(-> (S3AsyncClient/crtBuilder)
(.credentialsProvider
(. ProfileCredentialsProvider create "dev-profile"))
(.targetThroughputInGbps 20.0)
(.minimumPartSizeInBytes (* 8 1024))
(.build))]
configurator))))
When using the CRT Client, S3 Transfer Manager uses multipart transfers: it is recommended that you configure the AbortIncompleteMultipartUpload policy on your bucket.