Cubrick is a distributed multidimensional in-memory DBMS developed for internal use at Facebook. It is designed for low-latency realtime OLAP analysis over large datasets. It is built from scratch to support merely the necessary features required by its realtime analysis use cases.
String fields in Cubrick are dictionary encoded, for both dimensions (i.e., indices) and metrics (i.e., values). Internally, Cubrick processes string fields using their encoded integers, and only converts them back when returning the results to the users.
Cubrick also uses BESS (Bit-Encoded Sparse Structure) encoding for compressing the multidimensional index for each cell (i.e., a group of metrics corresponding to the same dimension).
Cubrick stores data in bricks (i.e., partitions) in a column-oriented way. In each brick, each column has a dynamic vector to store the metrics or the BESS encoded indices. Cells in a brick are unordered, and the ingested cells are only appended to the end of the brick.
Cubrick uses Granular Partitioning as the main indexing approach to organize bricks (i.e., partitions) in a cell (i.e., table). Multidimensional indices are converted to partition ids via a conversion function, which maps predefined multidimensional ranges to an integer. The partition id to storage node mapping is maintained by consistent hashing.
Records are partitioned by a predefined conversion function and stored in nodes determined by consistent hashing (row-oriented partitioning). Within each partition, records are stored in a column-oriented way (column-oriented storage).