ClickHouse

View Current Viewing Revision #7 from 11/05/2018 9:38 p.m.

CilckHouse is an open-source column-oriented OLAP DBMS, which outperforms existing commercial OLAP DBMSs(Vertica, Hive, MySQL) on similar OLAP worloads. It's famous for its linear scalability, hardware efficiency, fault tolerance, rick features, simplicity and high reliability.

History

ClickHouse is developed by a Russian company called Yandex. It is designed for multiple projects within Yandex. Yandex needed a fast DBMS for analyzing large amount of data, which cannot be solved by its original solutions. So it began to develop its own column-oriented DBMS, which can handle analytical data on the internet scale. The prototype of ClickHouse appeared in 2009 and it was released in 2016 and then blazing fast.

Storage Model

Decomposition Storage Model (Columnar)

ClickHouse is a column-oriented DBMS and it stores data by columns.

Joins

Hash Join

ClickHouse only supports hash join, which is done by placing right part of data in a hash table in memory. Hash join is faster but require enough memory.

Data Model

Column Family / Wide-Column

ClickHouse not only store data by columns, but also process data by columns.

Concurrency Control

Not Supported

ClickHouse doesn't support transactions.

System Architecture

Shared-Nothing

ClickHouse system is a cluster of shards. It uses asynchronous multimaster replication and there is no single point of contention across the system.

Logging

Physical Logging

ClickHouse replicates its data on multiple nodes and monitors data synchronicity on replicas. It recovers after failures by syncing data from other replica nodes.

Query Compilation

Code Generation

ClickHouse supports runtime code generation. The code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. Runtime code generation can be better when it fuses many operations together and fully utilizes CPU execution units.

Stored Procedures

Not Supported

Currently, stored procedures and UDF are listed as open issues in ClickHouse.

Query Interface

Custom API SQL

ClickHouses provides two types of parsers: a full SQL parser and a data format parser. It uses SQL parser for all types of queries and the data format parser only for INSERT queries. Beyond the query language, it provides multiple user interfaces, including HTTP interface, JDBC driver, TCP interface, command-line client, etc.

Indexes

Log-Structured Merge Tree

ClickHouse supports primary key indexes. The index mechanism is called sparse index. In the MergeTree, data are sorted by primary key lexicographically in each part. Then ClickHouse selects some marks every index_granualarity rows. These marks are served as sparse indexes, which allows efficient range queries.

Views

Virtual Views Materialized Views

ClickHouse supports both virtual views and materialized views. The materialized views store data transformed by corresponding SELECT query. The SELECT query can contain DISTINCT, GROUP BY, ORDER BY, LIMIT, etc.

Query Execution

Vectorized Model

Storage Architecture

Disk-oriented In-Memory

ClickHouse has multiple types of table engines. The type of the table engine determines where the data is stored, concurrent level, whether indexes are supported and some other properties. The table engines that store data on disks include TinyLog and Log. The Memory engine stores data in memory and this table engine is mainly used for temporary tables with external query data. The data of Memory engine will disapper after the server is restarted.

Revision #7 | Updated 11/05/2018 9:38 p.m.

View Current Viewing Revision #7 from 11/05/2018 9:38 p.m.

Website

https://clickhouse.yandex/

Source Code

https://github.com/yandex/ClickHouse

Developer

Yandex

Country of Origin

Start Year

2016