Bitemporality
Overview
XTDB is optimised for efficient and globally consistent point-in-time queries
using a pair of transaction-time
and valid-time
timestamps.
Ad-hoc systems for bitemporal recordkeeping typically rely on explicitly
tracking either valid-from
and valid-to
timestamps or range types directly
within relations. The bitemporal document model that XTDB provides is very
simple to reason about and it is universal across the entire database,
therefore it does not require you to consider which historical information is
worth storing in special "bitemporal tables" upfront.
One or more documents may be inserted into XTDB via a put
transaction at a
specific valid-time
, defaulting to the transaction time
(i.e. now
), and
each document remains valid until explicitly updated with a new version via
put
or deleted via delete
.
Why?
The rationale for bitemporality is also explained in this blog post.
A baseline notion of time that is always available is
transaction-time
; the point at which data is transacted into the
database.
Bitemporality is the addition of another time-axis: valid-time
.
Time | Purpose |
---|---|
|
Used for audit purposes, technical requirements such as event sourcing. |
|
Used for querying data across time, historical analysis. |
transaction-time
represents the point at which data arrives into the
database. This gives us an audit trail and we can see what the state
of the database was at a particular point in time. You cannot write a
new transaction with a transaction-time
that is in the past.
valid-time is an arbitrary time that can originate from an upstream system, or by default is set to transaction-time. Valid time is what users will typically use for query purposes.
In XTDB, when transaction-time isn’t specified, it is set to
now. When writing data, in case there isn’t any specific valid-time
available, valid-time and transaction-time take the same value.
|
Valid Time
In situations where your database is not the ultimate owner of the data—where corrections to data can flow in from various sources and at various times—use of transaction-time is inappropriate for historical queries.
Imagine you have a financial trading system and you want to perform calculations based on the official 'end of day', that occurs each day at 17:00 hours. Does all the data arrive into your database at exactly 17:00? Or does the data arrive from an upstream source where we have to allow for data to arrive out of order, and where some might always arrive after 17:00?
This can often be the case with high throughput systems where there are clusters of processing nodes, enriching the data before it gets to our store.
In this example, we want our queries to include the straggling bits of data for our calculation purposes, and this is where valid-time comes in. When data arrives into our database, it can come with an arbitrary time-stamp that we can use for querying purposes.
We can tolerate data arriving out of order, as we’re not completely dependent on transaction-time.
Transaction Time
For audit reasons, we might wish to know with certainty the value of a
given entity-attribute at a given tx-instant
. In this case, we want to
exclude the possibility of the valid past being amended, so we need a
pre-correction view of the data, relying on tx-instant
.
To achieve this you can use as-of
using ts
(valid-time
) and tx-ts
(transaction-time
).
Domain Time
Valid time is valuable for tracking a consistent view of the entire state of the database, however, unless you explicitly include a timestamp or other temporal component within your documents you cannot currently use this information about valid time inside of your Datalog queries.
Domain time or "user-defined" time is simply the storing of any additional
time-related information within your documents, for instance valid-time
,
duration
or timestamps relating to additional temporal life-cycles (e.g.
decision, receipt, notification, availability).
Queries that use domain times do not automatically benefit from any kind of native indexes to support efficient execution, however XTDB encourages you to build additional layers of functionality to do so. See decorators for examples.
Known Uses
Recording bitemporal information with your data is essential when dealing with lag, corrections, and efficient auditability:
-
Lag is found wherever there is risk of non-trivial delay until an event can be recorded. This is common between systems that communicate over unreliable networks.
-
Corrections are needed as errors are uncovered and as facts are reconciled.
-
Ad-hoc auditing is an otherwise intensive and slow process requiring significant operational complexity.
With XTDB you retain visibility of all historical changes whilst compensating for lag, making corrections, and performing audit queries. By default, deleting data only erases visibility of that data from the current perspective. You may of course still evict data completely as the legal status of information changes.
These capabilities are known to be useful for:
-
Event Sourcing (e.g. retroactive and scheduled events and event-driven computing on evolving graphs)
-
Ingesting out-of-order temporal data from upstream timestamping systems
-
Maintaining a slowly changing dimension for decision support applications
-
Recovering from accidental data changes and application errors (e.g. billing systems)
-
Auditing all data changes and performing data forensics when necessary
-
Responding to new compliance regulations and audit requirements
-
Avoiding the need to set up additional databases for historical data and improving end-to-end data governance
-
Building historical models that factor in all historical data (e.g. insurance calculations)
-
Accounting and financial calculations (e.g payroll systems)
-
Development, simulation and testing
-
Live migrations from legacy systems using ad-hoc batches of backfilled temporal data
-
Scheduling and previewing future states (e.g. publishing and content management)
-
Reconciling temporal data across eventually consistent systems
Applied industry-specific examples include:
-
Legal Documentation – maintain visibility of all critical dates relating to legal documents, including what laws were known to be applicable at the time, and any subsequent laws that may be relevant and applied retrospectively
-
Insurance Coverage – assess the level of coverage for a beneficiary across the lifecycle of care and legislation changes
-
Reconstruction of Trades – readily comply with evolving financial regulations
-
Adverse Events in Healthcare – accurately record a patient’s records over time and mitigate human error
-
Intelligence Gathering – build an accurate model of currently known information to aid predictions and understanding of motives across time
-
Criminal Investigations – efficiently organise analysis and evidence whilst enabling a simple retracing of investigative efforts
Example Queries
Crime Investigations
This example is based on an academic paper.
During a criminal investigation it is critical to be able to refine a temporal understanding of past events as new evidence is brought to light, errors in documentation are accounted for, and speculation is corroborated. The paper referenced above gives the following query example:
The paper then lists a sequence of entry and departure events at various United States border checkpoints. We as the investigator will step through this sequence to monitor a set of suspects. These events will arrive in an undetermined chronological order based on how and when each checkpoint is able to manually relay the information.
Day 0
Assuming Day 0 for the investigation period is #inst "2018-12-31"
, the
initial documents are ingested using the Day 0 valid time:
{:xt/id :p2
:entry-pt :SFO
:arrival-time #inst "2018-12-31"
:departure-time :na}
#inst "2018-12-31"
{:xt/id :p3
:entry-pt :LA
:arrival-time #inst "2018-12-31"
:departure-time :na}
#inst "2018-12-31"
The first document shows that Person 2
was recorded entering via :SFO
and
the second document shows that Person 3
was recorded entering :LA
.
Day 1
No new recorded events arrive on Day 1 (#inst "2019-01-01"
), so there are no
documents available to ingest.
Day 2
A single event arrives on Day 2 showing Person 4
arriving at :NY
:
{:xt/id :p4
:entry-pt :NY
:arrival-time #inst "2019-01-02"
:departure-time :na}
#inst "2019-01-02"
Day 3
Next, we learn on Day 3 that Person 4
departed from :NY
, which is
represented as an update to the existing document using the Day 3 valid time:
{:xt/id :p4
:entry-pt :NY
:arrival-time #inst "2019-01-02"
:departure-time #inst "2019-01-03"}
#inst "2019-01-03"
Day 4
On Day 4 we begin to receive events relating to the previous days of the investigation.
First we receive an event showing that Person 1
entered :NY
on Day 0 which
must ingest using the Day 0 valid time #inst "2018-12-31"
:
{:xt/id :p1
:entry-pt :NY
:arrival-time #inst "2018-12-31"
:departure-time :na}
#inst "2018-12-31"
We then receive an event showing that Person 1
departed from :NY
on Day 3,
so again we ingest this document using the corresponding Day 3 valid time:
{:xt/id :p1
:entry-pt :NY
:arrival-time #inst "2018-12-31"
:departure-time #inst "2019-01-03"}
#inst "2019-01-03"
Finally, we receive two events relating to Day 4, which can be ingested using the current valid time:
{:xt/id :p1
:entry-pt :LA
:arrival-time #inst "2019-01-04"
:departure-time :na}
#inst "2019-01-04"
{:xt/id :p3
:entry-pt :LA
:arrival-time #inst "2018-12-31"
:departure-time #inst "2019-01-04"}
#inst "2019-01-04"
Day 5
On Day 5 there is an event showing that Person 2
, having arrived on Day 0
(which we already knew), departed from :SFO
on Day 5.
{:xt/id :p2
:entry-pt :SFO
:arrival-time #inst "2018-12-31"
:departure-time #inst "2019-01-05"}
#inst "2019-01-05"
Day 6
No new recorded events arrive on Day 6 (#inst "2019-01-06"
), so there are no
documents available to ingest.
Day 7
On Day 7 two documents arrive. The first document corrects the previous
assertion that Person 3
departed on Day 4, which was misrecorded due to human
error. The second document shows that Person 3
has only just departed on Day
7, which is how the previous error was noticed.
{:xt/id :p3
:entry-pt :LA
:arrival-time #inst "2018-12-31"
:departure-time :na}
#inst "2019-01-04"
{:xt/id :p3
:entry-pt :LA
:arrival-time #inst "2018-12-31"
:departure-time #inst "2019-01-07"}
#inst "2019-01-07"
Day 8
Two documents have been received relating to new arrivals on Day 8. Note that
Person 3
has arrived back in the country again.
{:xt/id :p3
:entry-pt :SFO
:arrival-time #inst "2019-01-08"
:departure-time :na}
#inst "2019-01-08"
{:xt/id :p4
:entry-pt :LA
:arrival-time #inst "2019-01-08"
:departure-time :na}
#inst "2019-01-08"
Day 9
On Day 9 we learn that Person 3
also departed on Day 8.
{:xt/id :p3
:entry-pt :SFO
:arrival-time #inst "2019-01-08"
:departure-time #inst "2019-01-08"}
#inst "2019-01-09"
Day 10
A single document arrives showing that Person 5
entered at :LA
earlier that
day.
{:xt/id :p5
:entry-pt :LA
:arrival-time #inst "2019-01-10"
:departure-time :na}
#inst "2019-01-10"
Day 11
Similarly to the previous day, a single document arrives showing that Person
7
entered at :NY
earlier that day.
{:xt/id :p7
:entry-pt :NY
:arrival-time #inst "2019-01-11"
:departure-time :na}
#inst "2019-01-11"
Day 12
Finally, on Day 12 we learn that Person 6
entered at :NY
that same day.
{:xt/id :p6
:entry-pt :NY
:arrival-time #inst "2019-01-12"
:departure-time :na}
#inst "2019-01-12"
Question Time
Let’s review the question we need to answer to aid our investigations:
We are able to easily express this as a query in XTDB:
(xt/q
(xt/db node
{
::xt/valid-time #inst "2019-01-02" ; `as at` valid time
::xt/tx #inst "2019-01-03" ; `as of` transaction time
})
'{:find [p entry-pt arrival-time departure-time]
:where [[p :entry-pt entry-pt]
[p :arrival-time arrival-time]
[p :departure-time departure-time]]}
The answer given by XTDB is a simple set of the three relevant people along with the details of their last entry and confirmation that none of them were known to have yet departed at this point:
#{[:p2 :SFO #inst "2018-12-31" :na]
[:p3 :LA #inst "2018-12-31" :na]
[:p4 :NY #inst "2019-01-02" :na]}
Related Concepts
Retroactive Data Structures
At a theoretical level XTDB has similar properties to retroactive data structures, which are data structures that support "efficient modifications to a sequence of operations that have been performed on the structure […] modifications can take the form of retroactive insertion, deletion or updating of an operation that was performed at some time in the past".
XTDB’s bitemporal indexes are partially persistent due to the immutability of transaction time. This allows you to query any previous version, but only update the latest version. The efficient representation of valid time in the indexes makes XTDB "fully retroactive", which is analogous to partial persistence in the temporal dimension, and enables globally-consistent reads.
XTDB does not natively implement "non-oblivious retroactivity" (i.e. persisted queries and cascading corrections), although this is an important area of investigation for event sourcing applications, temporal constraints, and reactive bitemporal queries.
In summary, the XTDB indexes as a whole could be described as a "partially persistent and fully retroactive data structure".