Kdb+ is a column-based relational time series database, developed by Kx Systems. Kdb+ database is designed to be used in financial area to store time series data and scale up/out when data increases.
In 1998, Kx Systems released kdb. Kx Systems then released kdb+ as the 64-bit version in 2003. It is written in q language. Kdb+ is built to process large volume of time-series data in areas including finance and IoT.
Kdb+ also supports ODBC/JDBC query interfaces.
Deterministic Concurrency Control
Kdb+ uses partition-based timestamp ordering. Each transaction gets their timestamp at the begin. And on each partition, transactions are executed in order of their timestamp.
Kdb+ uses physical logging and WAL. In-memory event-engine will log new data tolog file to ensure durability.
Kdb+ supports sql standard joins. It also supports as-of join and window join.
Decomposition Storage Model (Columnar)
Kdb+ uses DSM both for in-memory and on-disk storage.
Kdb+ only supports SERIALIZABLE isolation level. This is achieved by using deterministic concurrency control (partition-based). Transactions get their timestamp and execute in order on each partition.
Kdb+ is written in q language and it's vector-based. Each function/operation in the query plan manipulates array/vector data.
Kdb+ supports user to write and store UDF in q language in addition to built-in functions.
Kdb+ supports both primary and secondary indexes.
Kdb+ has both in-memory and on-disk storage. New data is held in memory and old data is flushed to disk. The flush is controlled by event-engine. By default, event-engine will flush in-memory data to disk at daily basis. Rationale behind this design is the system wants to keep everyday new data in memory for fast query.
Kdb+ supports referential integrity.
Kdb+ uses relational model. One big problem to apply relational model in time series database is to handle large data set. Kdb+ supports on-disk compression to hold more data on single machine and data partitioning to distribute data among different machines.
Kdb+ uses Lambda architecture on each single node. It has the following properties:
Data currently using stores in memory, while historical data is stored on disk.
New data come in from streaming sources.
Event-engine distribute data to downstream subscribers, including real-time database engine and streaming query engine.
Real-time database projects its content down to on-disk historical database for analytic use at daily basis, controlled by event-engine.
Kdb+ supports on-disk compression with following algorithms:
kdb+ algorithm: default compression algorithm
gzip: supports different level of compression, larger compression rate needs more computation time
Google Snappy: time performance is better but compression rate is lower compared with previous two algorithms
First Derivatives plc