PosDB is a distributed disk-based column-store that leverages late materialization idea to deal with complex analytical workloads. Posdb stands for "Positional database", which refers to its unique indexing technique. It uses the positional indexing technique to efficiently store and retrieve data. PosDB is a columnar DBMS with SQL support designed for analytic workloads. Unlike classic columnar systems, PosDB features fluent data position handling, which produces two types of intermediate representations that can be used during query processing. This leads to unconventional query plans which significantly improve query performance by conserving disk bandwidth and enabling sophisticated query processing techniques. It also features other natural benefits of column-stores, such as efficient column-level compression.
Development of PosDB started in 2017 as a research project. The goals were to explore the perspectives of late materialization in a distributed environment and to study its impact on the processing of complex queries containing a large number of joins, subqueries and nontrivial aggregation (e.g., window functions).
According to George Chernishev, the main developer of PosDB, currently, PosDB focuses only on read-only workloads with bulk loading of data. Therefore, there are no checkpoints for transactions now. As for reliability, they have some homebrew mechanisms for handling network failures, but it is rather simple.
It has implemented a generalized I/O-RAM compression scheme in which the compression algorithm is a parameter that can be changed. They now support both light-weight and heavy-weight algorithms including PFOR, VByte, SIMD-FastPFOR128, SIMD-BinaryPacking128 and Brotli (default configuration). The pages are decompressed as soon as they arrive from disk.
The system target read-only workloads.
Data model is relational with tables stored on disk as separate columns.
PosDB does not support isolation since currently the developers focus only on read-only workloads with bulk loading of data.
Nested Loop Join Hash Join Sort-Merge Join Broadcast Join Shuffle Join Semi Join
PosDB supports both local and distributed joins with arbitrary partitioning and replication schemas. The system supports hash, nested-loop, and sort-merge algorithms. Also, experimental branches support some exotic joins like band-join.
PosDB only supports equi-join and includes NestedLoopJoin, MergeJoin, and HashJoin operators.
PosDB can store data on several machines, as well as process each query using several threads and a set of machines (inter- and intra-query parallelism).
PosDB currently doesn't support query compilation.
PosDB utilizes a blocked version of the Volcano query execution model and introduces two phases of query execution: before and after the so-called materialization point. Each of these phases has different operators that work with different kinds of data: positions (row ids) and tuples correspondingly.
Positional operators work with columnar data and are specifically designed for massive filtering, filtering joins and network communication. Lightweight intermediates and good cache locality are important here.
Using the Volcano execution model, PosDB represents each query by a tree of operators that support the iterator interface. Other than iterators, PosDB also supports vectorized processing. PosDB supports parallel operators including SyncReader. PosDB has two operators for supporting intraquery parallelism: Asynchronizer and UnionAll. Asynchronizer is a unary operator that only executes the child operator in a thread while UnionAll is a polyadic operator that executes all child operators and then merges the results.
On the other hand, tuple-based operators target aggregation, which needs multiple operations for each wide row. The first (lowest) of these operators is a materialization point: positional data is transformed into tuples. The transformation is sometimes coupled with grouping and window functions to reduce the amount of materialized data.
PosDB supports inter- and intra- query parallelism. To implement the latter PosDB uses two special operators. Asynchronizer allows it to execute a single operator tree in a separate thread and UnionAll is used to collect data from several subtrees that are executed in their own threads.
PosDB is a natively distributed DBMS in terms of both data and query execution. Each table may be fragmented and replicated across multiple nodes. A number of table-level fragmentation strategies are supported: round-robin, hash and range partitioning. Distributed query execution allows PosDB to run a query on multiple nodes, with each node processing an arbitrary part of the query plan. Both positional and tuple operators can be executed on arbitrary nodes, regardless of where their children reside. Also, several operators support internal distribution embedded into their core algorithms, like distributed join and aggregation.
PosDB supports a subset of SQL which consists of selection, projection, join, ordering, aggregation, and window functions (a naive rule-based optimizer is run). It does not support nested queries, updates, deletion, and data definition language (DDL).
It also provides a C++ interface that allows manual query plan construction with exact details of used algorithms and distributed schema.
Since PosDB is a disk-based system all data resides on disk. Disk subsystem efficiency is provided via columnar storage and a buffer manager. The former reduces disk load and helps in homogenizing processed data. A buffer manager allows important data to stay in memory for longer, including the case of several concurrently running queries.
Decomposition Storage Model (Columnar)
PosDB is a column-oriented DBMS.
Indexed Sequential Access Method (ISAM)
The data is stored on disk and the system has a storage manager which manages the data and provides access to them. Each replica is stored in a separate file without paging and the buffer manager can help with efficient query execution.
PosDB supports materialized views and utilize the late materialization strategy. It also implements other optimizations including multi-columns and ultra-late materialization. It can support a wide range of queries including window functions.