ForestDB is a single-node key-value storage engine Couchbase server. It addresses the performance issue on current B+ tree and LSM-tree indexing on variable length strings. ForestDB uses an in-memory log structured write buffer index and an on-disk HB+-trie as indexing. HB+-trie splits keys into chunks and stored in normal B+ tree nodes to guide the traffic. The write buffer index stores the disk locations of records not in HB+-trie and the log-structure of write buffer index reduces disk I/O.
ForestDB came from a project in ACM SIGMOD 2011 programming contest called "A Durable Main-Memory Index Using Flash", of which the author is Jung-Sang Ahn from Korea Advanced Institute of Science and Technology. Then Jung-Sang Ahn worked with engineers from Couchbase and turned that project into a research paper published in 2015. It is now under the development of Couchbase's Caching and Storage team to replace the previous Couchstore storage engine and plans to conduct optimizations for SSD, support reduce feature and allow users to pause/resume compaction.
For every commit operations, the system checks the log size. If the size exceeds some threshold, the system will aggregate those logs and write them to the persistent storage (disk) in batch. ForestDB uses
fsync() to flush logs to persistent storage similar to shadow paging, so it is a non-blocking checkpointing.
ForestDB allows users to enable the compression on both keys and values. It uses Snappy library to compress data.
ForestDB supports Multi-version Concurrency Control and uses append-only manner to store records. Each record has a sequence number representing the version information. When a transaction starts, it gets a maximum sequence number showing what version of records is visible to it.
Key/value store establishes a mapping from one entity to another. In ForestDB, the maximum size of key is 64KB and that of value is 4GB.
There are two kinds of indexes in ForestDB: HB+-trie and Write Buffer(WB) Index. HB+-trie is a variant of trie. The whole structure of the index is trie and each node in the index is a B+ tree. Each B+ tree node in HB+-trie has its own nodes. The leaves in each B+tree node point to either other B+tree nodes (sub tree in HB+-trie) or records on disk. All keys in HB+-trie are split to chunks. The chunk size can be configured by users and the default value is 8 bytes. Each chunk of the key can be used to search B+tree nodes on a single level of HB+-trie. There are two indexes for ForestDB records: ID index and Seq index: ID index stores document IDs as keys and Seq index stores 64-bit sequential numbers of documents as keys. Write Buffer (WB) Index is an in-memory index, which is similar to C0 tree in LSM-tree. It stores disk locations of records that are already on disk but have not been updated to HB+-trie. It uses a log structure and is flushed to HB+-trie when the size exceeds the threshold. It can reduce disk I/Os by updating HB+-trie in batch.
Currently, the ForestDB only supports two isolation levels:
READ COMMITTED and
ForestDB stores all mutations in append-only manner. For each insert and update, ForestDB directly appends the new key/value pair at the end of a file. For each delete, ForestDB marks the key/value pair as deleted. It uses periodical compaction to conduct limit the log size. For each compaction, all records (key/value pairs) that are committed will be written to a new file. Once the compaction completes, ForestDB replaces the old file with the new file.
delete operations with a key/value pair and the metadata of keys. The maximum size of key and its metadata are both 64KB while that of value is 4GB. ForestDB supports retrieving metadata and the value by their corresponding key and the sequence number. It also supports retrieving metadata and the value of a record directly by its byte offset on disk.
ForestDB is a disk-oriented storage engine. All records (values) are stored on the disk in files. All updates on records (values) are appended at the end of the corresponding file. The HB+-trie is also stored on disk.
In ForestDB, a file on disk contains multiple blocks. Within each block, ForestDB uses NSM to organize records. Each record stores the value and the metadata of the key. Users can enable block alignment in ForestDB, which does not allow records to be stored across blocks. If the record size is larger than the size of a single block, ForestDB will compare the number of blocks required if the record were stored at the end of last record against if the record were stored at the beginning of the next block. ForestDB will choose the way requiring fewer blocks.
All records (values) are stored in log-structured on disk. All updates are appended at the end of files on the disk. Similarly, all nodes of the HB+-trie index are also stored on disk. When there are updates on the HB+-trie index, those updates are also appended at the end of the file as new nodes in the HB+-trie. After all updates on the index are appended, a new block storing pointers to the new nodes is appended at the end of the file.
ForestDB is a single-node storage engine.