HStreamDB

Streaming

HStreamDB is an open source distributed streaming database designed for accessing, storing, and processing real-time streaming data from sources such as IoT devices. All records added to the database are appended to an immutable object called a stream and there can be multiple streams in a database at once. HStreamDB seeks to provide low-latency access to analyses on the most current data in streams, which it achieves by incrementally updating in-memory materialized views in real-time as streaming data is ingested. HStreamDB also provides the ability to consume data from a stream from multiple client consumers through stream subscriptions, which deliver data to the client once it is ingested to the DBMS. HStreamDB allows for SQL queries with extensions for supporting streams, and it was built from scratch with Haskell.

History

HStreamDB is built by EMQ, a company providing open source IoT data infrastructure. It was first open sourced in 2021 and is under active development by the Haskell Team from EMQ.

HStreamDB was developed to incorporate a data-driven model to efficiently process stream data in a database. In contrast to the command-driven model of most databases which analyzes data when the client sends a request, HStreamDB’s goal was to analyze data as it is ingested in real-time and deliver the analyses with low-latency.

Compression

Naïve (Record-Level)

Compression is used to reduce network bandwidth utilization when transferring data to and from the database. Compression and decompression is performed entirely by the client and the compressed data is stored natively in the database. HStreamDB supports both gzip and zstd compression algorithms.

Concurrency Control

Not Supported

Since records can only be appended to streams, and records can be written out of order, concurrency control is not necessary for HStreamDB.

Data Model

Relational

HStreamDB models data as records which are written to streams. All records have a unique identifier, and the data in a record can either be an HRecord or a Raw Record. A HRecord can be thought of as a traditional tuple in a database with support for nested maps and arrays. HRecords can be queried using SQL. A Raw Record contains arbitrary binary data which the database does not interpret or query. Raw Records are intended to be consumed from subscriptions.

Foreign Keys

Not Supported

Indexes

Not Supported

Isolation Levels

Not Supported

Joins

Nested Loop Join

HStreamDB supports nested loop joins between two streams and two materialized views. Joins between a stream and materialized view are also supported.

Query Compilation

Not Supported

Query Execution

Tuple-at-a-Time Model

Query Interface

Custom API SQL

HStreamDB supports interfacing with the database with either SQL or its custom API. It uses a SQL dialect that is a subset SQL-92 with extensions to support stream operations. Queries can be executed from a command line interface and the Java, Go, and Python clients. HStreamDB’s custom API is implemented in its clients and can be used to insert and consume data.

Since HStreamDB is a streaming database, it handles queries differently from a typical database. Queries are treated as running tasks that fetch data from streams and produce results continuously as the streams are updated. HStreamDB also supports subscriptions, where multiple consumers can read data in real-time from a single stream as records are added by producers.

Storage Architecture

Disk-oriented

HStreamDB is a disk-oriented database that uses the RocksDB storage engine. This allows HStreamDB to support large scale data streams.

Storage Model

Custom

HStreamDB does not implement its own storage layer, and instead relies on RocksDB as a key-value store. All of its data is eventually processed and stored in the key-value file format implemented in RocksDB.

Stored Procedures

Not Supported

Views

Materialized Views

HStreamDB supports incrementally updated materialized views. As data is added to streams, views are updated in real-time. This makes querying views fast since they always contain the latest result. Views are different from streams since they are only stored in memory.

People Also Viewed

HStreamDB Logo
Website

https://hstream.io/

Source Code

https://github.com/hstreamdb/hstream

Tech Docs

https://docs.hstream.io/

Twitter

@HStreamDB

Developer

EMQ Technologies Co., Ltd.

Country of Origin

CN

Start Year

2020

Project Type

Commercial, Open Source

Written in

Haskell

Supported languages

Go, Java, Python

Embeds / Uses

RocksDB

Operating Systems

Linux

Licenses

BSD

People Also Viewed