FoundationDB is a distributed non-relational database that supports ACID transactions and OLTP workloads. FoundationDB decouples its data storage technology from its data model. All data is stored as an ordered key-value data structure and can be remapped to custom data models or indexes by using user-written layer module API. FoundationDB doesn’t have any separate query language, it only exposes API to access data.
FoundationDB was famous for having rigorous and thorough testing of their fault tolerance. They built their own [deterministic testing] (https://www.youtube.com/watch?v=4fFDFbi3toc) while developing their system to make sure their system implementation behaves correctly. The simulation was built to model real-life scenarios, such as a combination of transaction executions while having network failure, database configuration change, and human operator errors.
FoundationDB is built to handle high-load transaction processing with high-performance with strong guarantee (ACID).
FoundationDB is originally built in 2009 by three co-founders, Dave Rosenthal, Dave Scherer, Nick Lavezzo. The founders used to work for Visual Sciences; an analytics company (now is a subsidiary of Adobe). The company acquired Akiban in 2013. FoundationDB was acquired by Apple in 2015. In 2018, Apple open-sourced FoundationDB under Apache License 2.0.
Multi-version Concurrency Control (MVCC) Optimistic Concurrency Control (OCC)
FoundationDB uses Optimistic Concurrency Control (OCC) for writes and Multiversion Concurrency Control (MVCC) for reads. The DBMS only maintains conflicting transaction information for a five second period. Thus, it doesn't support long-running read/write transactions. Conflicting transactions will fail at commit and the client is responsible to retry the transactions.
FoundationDB exposes a single data model, an ordered key-value data model. Both keys and values are byte strings. To support a richer data-model or index, a user can write his own custom layer module API to remap the key-value data model.
FoundationDB uses Optimistic Concurrency Control to achieve Serializable Isolation level. This can be achieved because all modifications to key-value data store are done via transaction.
The only way to model the data and query them is by writing layer. FoundationDB only allows the user to interact with the data through their custom API in Python, Ruby, Java, Go, or C.
The DBMS used to support SQL layer in 2014 but it is not actively supported and maintained anymore.
FoundationDB has two storage options,
All data to be read must reside in memory, and all writes will be written to disk with the number of copies based on the redundancy mode. The default DBMS configuration is
memory, and the maximum size of data in memory is 1 GB.
FoundationDB recommends the ext4 filesystem which support for kernel asynchronous I/O.
FoundationDB stores data in a key-value model.
ssd mode, FoundationDB stores data with B-tree data structures optimized for SSD. The DBMS gives lower priority to data deletion compared to other normal database operations. Thus, there will be a delay between data deletion and free storage space recovery.
memory mode, the DBMS constructs its in-memory data structure from stored logs on disk on process start-up.
As of December 2018, FoundationDB is experimenting with new storage engine using multi-version B-tree, called Redwood.
FoundationDB uses shared-nothing architecture. Every time the DBMS writes data, the data is distributed by pieces to different nodes.
FoundationDB has a couple of components to handle scalability:
Coordinators: communicate and store a small amount of data for fault-tolerant purposes. Coordinators do not involve in transactions.
Cluster Controller: an entry point of all processes in the cluster which is elected by coordinators.
Master: coordinates proxies, transactions logs, and resolvers. Master also runs data distribution algorithm and ratekeeper.
Proxies: track storage servers, provide read versions, and committing transactions.
Transaction Logs: receive commits from the proxy, write and
fsync data to the append-only logs on disk, and respond to proxy. Once the data is written to disk, storage servers pop the data from the log.
Resolvers: hold the last 5 seconds of committed transactions to detect conflicting transactions. The Resolvers make sure the transactions’ read is valid according to MVCC.
Storage Servers: store data based on the range of keys assigned. Storage servers keep the freshest data in memory (< 5 seconds old) and the rest of the data are located on disk.
Committing transactions are a sequence of steps achieved by different components:
Master: Provides a commit version to Proxies.
Resolvers: Check whether the current transaction has a conflict with previously committed transactions.
Proxies: Send the valid commits to transactions logs and wait until transaction logs have logged the transaction.