CrateDB

CrateDB is an open-source, distributed, shared-nothing SQL database system developed by Crate.io, built on top of a variety of open source projects. Some of these projects include Facebook’s Presto SQL parser and the Apache Lucene search library.

The system's target workload is machine-generated IoT data, particularly for operational analytics applications.

History

CradeDB started as a standalone project by Jodok Batlogg (who previously contributed to Open Source Initiative Vorarlberg), Christian Lutz, and Bern Dorn. The group, who ran a consulting business before that helped companies use tools for their data needs, turned that knowledge into a product. The team won Judge’s Choice at GigaOm Structure Launchpad competition in June 2014 and TechCrunch Discord Europe in October 2014.

The first version was released in September 2016 The second and enterprise versions released in May 2017.

Concurrency Control

Optimistic Concurrency Control (OCC)

Optimistic concurrency control is implemented using rows’ sequence number (_seq_no) and primary term (_primary_term). Initially each row’s sequence number is 0, and is incremented with every INSERT/UPDATE/DELETE to its shard (partition of the table). The primary term is incremented when a shard becomes primary. When updating or deleting, the query must be done with the correct sequence number and primary term, otherwise no effect will take place. CrateDB does not support transactions.

Isolation Levels

Read Uncommitted Read Committed Serializable Repeatable Read

CrateDB supports four isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The isolation level is specified in the begin statement as a transaction_mode.

Joins

Nested Loop Join Hash Join

CrateDB supports two join algorithms: nested loop joins and block hash joins. By default the system uses nested loop joins. Block hash joins can only be applied on inner joins where the condition meets the following criteria: it contains at least one equal operator and no or operators and every argument of an equal operator can only reference fields from one relation. The hash join algorithm can be enabled or disabled explicitly.

The system supports cross joins, inner joins, and left/right/full outer joins, and its performance is limited when joining two or more tables (resulting in poor execution plans).

Logging

Logical Logging

Each shard has its own write ahead log (called the translog), which gets flushed to the index storage of Lucene when the log is full. The translog stores records of the operations on nodes. Operations store metadata about the query that was executed (such as job_id and used_bytes). The job that corresponds to the given job id stores the actual query statement that was run.

Additionally, CrateDB supports application logging (with Log4j) and JVM garbage collection logging. Apache’s Log4J provides logging functionalities for Java. Crate DB uses JVM garbage collection logging to maintain garbage collection times.

Query Execution

Materialized Model

Given an operation tree, CrateDB’s Job Execution Service will execute the given plan. Each operation will emit the entire (intermediate) result to other nodes, which will continue the query execution with the given result.

Query Interface

SQL HTTP / REST Command-line / Shell

CrateDB can be queried with regular SQL. One way to query a table is through the CrateDB admin UI, in which the user can construct and view the results of a query. Another way is through CrateDB Shell (called Crash), which is their custom command-line shell. An HTTP endpoint is also provided for submitting queries; using HTTPie users can query using some endpoint, to which CrateDB will respond with JSON. There are also numerous third party client tools that will work with CrateDB. Since the system supports PostgreSQL wire protocol, most third party tools for PostgreSQL will also be compatible with CrateDB.

Storage Architecture

Disk-oriented

CrateDB stores data on disk, which is retrieved and stored depending on the execution plan. The data is partitioned and stored across many nodes.

Storage Model

Hybrid

CrateDB stores given data both in row and column store formats. Column store is enabled by default for primitive types (such as integers or booleans) and cannot be turned off. It is supported for a limited length of text data as well, but can be disabled. It is not supported for other types, specifically compound or geographic types.

Rows of a table are stored as a semi-structured document that can be nested in objects, and operation on these documents are atomic.

Storage Organization

Sorted Files

In CrateDB, the tables are sharded (partitioned) and divided amongst the nodes, and the shard is stored as a Lucene index, which is further broken down to stored as files under a directory of a node. Data is appended to files, never removed, making replication and recovery easier. When writing to a file, the primary node is looked up and the new data is added to the file. The operation is repeated on replicas.

Stored Procedures

Not Supported

CrateDB does not support stored procedures.

System Architecture

Shared-Nothing

CrateDB is a shared nothing distributed system. Every Crate Node has the same four components, making them all equal in terms of functionality. They can all process SQL statements, execute queries, interact with the cluster, and store data. Therefore, any node can receive, process, and execute queries.

A Crate cluster is defined as two different nodes representing the same database instance but running on different hosts. The cluster state includes metadata including global settings, discovered nodes, schemas, and status/location of shards. A single node in the cluster is elected the “Metadata Primary”, and is the only node allowed to change the state at runtime. Discovering a node refers to finding, adding, and removing nodes. A node will ping potential hosts, and after receiving a response including the Metadata Primary, the new node can send a join request to that cluster specifically.

All nodes in a Crate cluster can communicate with any other node in the cluster, via byte serialized POJOs (plain old Java objects). This full mesh topology improves reliability and messages to be sent along the shortest possible path, but is limited in growth.

Views

Virtual Views

CrateDB supports creating, querying, and dropping views. The view is not materialized, so the query associated with this view is rerun every time the view is used. The enterprise version allows different users to have privileges, so to query a view the user must have DQL privileges on the view. The user who created the view automatically has DQL privileges on all the relations in the view.

CrateDB Logo
Website

https://crate.io/

Source Code

https://github.com/crate/crate

Tech Docs

https://crate.io/docs/

Twitter

@cratedb

Developer

Crate.io

Country of Origin

AT

Start Year

2014

Project Type

Commercial, Open Source

Written in

Java

Derived From

Elasticsearch, PrestoDB

Embeds / Uses

Lucene

Compatible With

PostgreSQL

Operating Systems

All OS with Java VM

Licenses

Apache v2

Wikipedia

https://en.wikipedia.org/wiki/CrateDB