RethinkDB

RethinkDB is an open source scalable JSON database. It was built from scratch with C++, and it is intended for the real-time web. The most significant advantage of RethinkDB is it can continuously push updated query result to applications.

Another advantage of RethinkDB is it offers a flexible query language. It is called ReQL, and it could do nearly anything SQL can do, including table joins and aggregation functions. It could even mix queries with JavaScript expressions and map-reduce.

History

RethinkDB was founded in 2009. The RethinkDB was first released open source version 1.2 in Nov. 2012. It had been developed for 5 years by a team of database experts before the first release. In the first release, it covered the JSON data model, immediate consistency support, Hadoop-style map/reduce, sharing, multi-datacenter replication, and failover. The very first version of RethinkDB was an SSD-optimized storage engine for MySQL. They then switched to build a document DBMS like MongoDB.

In Jun. 2013, it introduced lots of new features for ReQL, like basic access control, regular expression matching, new array operations, random sampling and better error handling. The ReQL is an essential feature for RethinkDB, and this release gave lots of improvement for ReQL.

In Apr. 2015, it released version 2.0.0, and it was the first production-ready release of RethinkDB. In Aug. 2015, it supported automatic failover using a Raft-based protocol. In Nov. 2015, it introduced atomic changefeeds, which include existing values from the database into the changefeed result, and then atomically transition to streaming updates.

In Oct. 2016, RethinkDB company shut down. The reason was they could not build a sustainable business. After one year, the source code was purchased by the Cloud Native Computing Foundation. Moreover, it released a new version with community effort in July 2017.

Storage Organization

Log-structured

The data is stored in a log-structured storage engine built specifically for RethinkDB and inspired by the architecture of BTRFS.

Indexes

B+Tree

RethinkDB index the data based on the primary key. If the user did not specify the primary key, a random unique is generated for the index automatically. RethinkDB to place the document into an appropriate shard based on primary key, and index it within that shard using a B-Tree data structure.

RethinkDB supports both secondary and compound indexes.

Storage Architecture

Disk-oriented

The data is stored in a log-structured storage engine built specifically for RethinkDB and inspired by the architecture of BTRFS, which is a file system based on the copy-on-write (COW) principle

The storage engine is also used in conjunction with a custom B-Tree-aware caching engine which allows file sizes much greater than the amount of memory.

System Architecture

Shared-Nothing

In RethinkDB, a single authoritative primary replica will be in charge of shard of data. Given the primary replica, every replica is exactly the same. When reads and writes come to a given shard, they will get directed to their respective primary.

Stored Procedures

Not Supported

Joins

Index Nested Loop Join

In RethinkDB, joins are automatically distributed. The appropriate nodes will receive the join commands. Then the combined data will be presented to the user.

It supports using primary keys and secondary indexes to join the data.

Data Model

Document / XML

RethinkDB stores JSON documents with a binary on disk serialization. The data types supported by RethinkDB are: number (double precision floating-point), string, boolean, array, object, null.

Logging

Physical Logging

The data is stored in a log-structured storage engine built specifically for RethinkDB and inspired by the architecture of BTRFS. The log is implicitly integrated into the storage engine.

For data replication across the replicas, it doesn't require log-shipping. RethinkDB replication is based on B-Tree diff algorithms.

Concurrency Control

Multi-version Concurrency Control (MVCC)

RethinkDB implements block-level multiversion concurrency control. When a write operation comes while there is an ongoing read operation, RethinkDB takes a snapshot of the B-Tree for each relevant shard. Then it maintains different versions of the blocks in order to execute read and write operations concurrently.

RethinkDB takes exclusive block-level locks when multiple writes are performed on documents when they are close to each other in B-Tree. In the most case, it will not present performance problems because the top levels of B-Tree are cached along with the frequently used blocks.

Query Interface

Custom API

RethinkDB provides a unified chainable query language. It can start with a table and incrementally chain transformer operations to the end of the query. It supports CRUD operations, aggregations including map-reduce & group-map-reduce, joins, full sub-queries and changefeeds.

Changefeeds allow clients to recieve changes on a table from a specific query when they happen. Nearly any ReQL query can become a changefeed. When specifying the start point, the changefeed stream will start with the current contents of the monitored table.

Query Execution

Vectorized Model

ReQL queries are constructed by making function calls in the JavaScript/Python/Ruby/Java. They will execute entirely on the database server when user calls the run command and pass it to an active database connection. Queries are executed lazily. RethinkDB will do just enough work to read the data requested.

All queries are automatically parallelized on the RethinkDB server. It could also break complicated queries up into stages, and execute each stage in parallel. Then it will comnbine the data to return a complete result.

RethinkDB Logo
Website

http://www.rethinkdb.com/

Source Code

https://github.com/rethinkdb/rethinkdb

Tech Docs

https://www.rethinkdb.com/docs/

Developer

RethinkDB

Country of Origin

US

Start Year

2009

Acquired By

Cloud Native Computing Foundatio

Project Type

Open Source

Written in

Bash, C++, Java, JavaScript, Python

Supported languages

C#, C++, Clojure, Dart, Delphi, Elixir, Erlang, Go, Haskell, Java, JavaScript, Lua, Nim, Perl, PHP, Python, R, Ruby, Rust, Swift

Operating Systems

BSD, Linux, OS X, Windows

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/RethinkDB