ClickHouse

View Current Viewing Revision #9 from 07/21/2019 9:50 a.m.

CilckHouse is an open-source column-oriented OLAP DBMS, which outperforms existing commercial OLAP DBMSs(Vertica, Hive, MySQL) on similar OLAP worloads. It's famous for its linear scalability, hardware efficiency, fault tolerance, rick features, simplicity and high reliability.

History

ClickHouse is developed by a Russian company called Yandex. It is designed for multiple projects within Yandex. Yandex needed a DBMS to analyze large amounts of data, thus they began to develop their own column-oriented DBMS. The prototype of ClickHouse appeared in 2009 and it was released in 2016.

Logging

Physical Logging

ClickHouse replicates its data on multiple nodes and monitors data synchronicity on replicas. It recovers after failures by syncing data from other replica nodes.

Query Interface

Custom API SQL

ClickHouses provides two types of parsers: a full SQL parser and a data format parser. It uses SQL parser for all types of queries and the data format parser only for INSERT queries. Beyond the query language, it provides multiple user interfaces, including HTTP interface, JDBC driver, TCP interface, command-line client, etc.

System Architecture

Shared-Nothing

ClickHouse system is a cluster of shards. It uses asynchronous multimaster replication and there is no single point of contention across the system.

Views

Virtual Views Materialized Views

ClickHouse supports both virtual views and materialized views. The materialized views store data transformed by corresponding SELECT query. The SELECT query can contain DISTINCT, GROUP BY, ORDER BY, LIMIT, etc.

Query Compilation

Code Generation

ClickHouse supports runtime code generation. The code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. Runtime code generation can be better when it fuses many operations together and fully utilizes CPU execution units.

Stored Procedures

Not Supported

Currently, stored procedures and UDF are listed as open issues in ClickHouse.

Storage Model

Decomposition Storage Model (Columnar)

ClickHouse is a column-oriented DBMS and it stores data by columns.

Data Model

Relational

ClickHouse uses the relational database model.

Query Execution

Vectorized Model

Storage Architecture

Disk-oriented In-Memory

ClickHouse has multiple types of table engines. The type of the table engine determines where the data is stored, concurrent level, whether indexes are supported and some other properties. The table engines that store data on disks include TinyLog and Log. The Memory engine stores data in memory and this table engine is mainly used for temporary tables with external query data. The data of Memory engine will disapper after the server is restarted.

Joins

Hash Join

ClickHouse only supports hash join, which is done by placing right part of data in a hash table in memory. Hash join is faster but require enough memory.

Concurrency Control

Not Supported

ClickHouse does not support multi-statement transactions.

Indexes

Log-Structured Merge Tree

ClickHouse supports primary key indexes. The index mechanism is called sparse index. In the MergeTree, data are sorted by primary key lexicographically in each part. Then ClickHouse selects some marks every index_granualarity rows. These marks are served as sparse indexes, which allows efficient range queries.

Revision #9 | Updated 07/21/2019 9:50 a.m.