Datomic is a proprietary database management system. It is an operational DBMS, in other words, it allows updates in real time. Instead of assigning and overwriting values into named attributes, Datomic keeps track of all immutable facts over time, which sets Datomic apart because previous states can be accessed at any time. Datomic is also a distributional DBMS, which provides horizontal read scalability.
Another feature of Datomic is that it empowers application server by running queries in application server, as opposed to many other client-server DBMS in which case database server runs the queries.
In addition, Datomic leverages existing storage services like Cassandra, SQL and Dynamo DB, which provides more flexibility.
There are two Datomic products, Datomic Cloud and Datomic On-Prem. Datomic Cloud is built for AWS integration, and Datomic On-Prem (On-Premise) could be built on any infrastructure and storage services.
Early March 2012, the Relevance team (later joined with Metadata to form Cognitect) around Rich Hickey released Datomic, which they have started to work on since 2010. Their motivation was to move the substantial portion of power attributed to database servers into application servers, so that programmer would have more power programming with data inside the application logic.
Datomic Cloud was release in early 2018 using Amazon's components:
Since the release of Datomic Cloud, the original Datomic was referred to as Datomic On-Prem (On-Premise) to distinguish from the new release.
The company building Datomic (Cognitect) was acquired by Nubank in 2020.
Index trees contain "segments," arrays of records that are serialized and then compressed with zip.
Multi-version Concurrency Control (MVCC)
Datomic keeps the entire history of transactions, which allows for multi-version concurrency control.
Datomic stores immutable facts as datoms over time. A datom follows the form of a 5-tuple
Although Datomic doesn't require a table schema that specifies attribute columns in advance, it requires to specify properties of individual attributes. This is called universal schema.
Data in Datomic are stored in "distributed storage services," a cluster of machines where each machine stores a subset (shard) of the data independently. There could be redundancies across shards. Datomic uses key value store as its data model, and it has a consistent hash function that hashes the key (Entity ID) to the location, i.e. machine, where the corresponding tuple is stored.
Foreign keys can be defined using
:db.type/ref attribute, but no foreign key constraints are enforced on them automatically. User needs to specify their own database functions to impose those constraints.
Datomic indexes are covering indexes. In other words, instead of storing reference to data in the index, Datomic directly reads data from index. The index trees are shallow, with at most 3 levels: root, directories and segment leaf.
Datomic maintains four index trees with different sorting orders for efficient access of different queries. As mentioned in data model, Datomic stores immutable facts as 5-tuples, and four of them are used for indexing:
The four index trees are sorted by EAVT, AEVT, AVET, and VAET order respectively.
There is only one process responsible for writing transactions, so transactions are always serializable.
Datalog expressions use Clojure compiler. Clojure compiler produces Java byte code, which is typically then JIT-compiled by the JVM.
Datomic uses Datalog as its query language. Datalog is a set-oriented language rather than record-oriented, which means that instead of processing a tuple at a time, it can retrieve a set at a time.
Datomic's query interface is an extension from Datalog. The main difference is that Datalog systems usually have a global fact database and a set of rules, but Datomic Datalog could take multiple databases and sets of rules.
In storage services, data is stored in disk as segments, which is an array of datoms. As application server reads data from storage services, it builds index trees locally in memory. This allows application servers to run queries with in-memory data.
Datomic treats storage as a service, which means that Datomic only provides the ways to access underlying storage, but doesn't provide the actual storage. One can modify the system to change the storage service by changing the connection string.
Datomic stores data in storage services as sorted chunks of datoms.
User-defined functions can be invoked during transaction processes, which are called transaction functions in Datomic.
In a Datomic-based system, an application server along with partial data stored on that server is called a peer. Queries run in peers. A peer can read from storage services directly through peer library, but it can only request write through a transactor, which is a single designated process in charge of writes. The transactor adds the new datoms into storage services using ACID transactions, and propagates the writes since redundancy is allowed.
Since datoms stored in storage services is immutable, peers perform extensive caching so that they can query data locally. This allows programmers to access query results as simple data structures.