HyperDex

HyperDex is a distributed, searchable, and consistent key-value store. It is mainly developed by Robert Escriva in Cornell University. HyperDex provides serializability consistency and tolerates a threshold of server failures. Another highlight is a novel search primitive that supports query on secondary attributes. The main techniques HyperDex uses are hyperspace hashing and value-dependent chaining. HyperDex has a commercial extension called HyperDex Warp, which supports ACID transactions involving multiple objects. If not specified, "HyperDex" refers to the basic version without warp.

History

The development of HyperDex started in 2011. In the same year, it was publicly available and authors provided a report to show its performance. In 2013, HyperDex 1.0 is officially released with a fault-tolerant coordinator. In 2014, HyperDex supported ACID transactions involving multiple objects with the Warp add-on. However, both development and commercial support has stopped.

Checkpoints

Consistent Blocking

HyperDex supports blocking, consistent checkpoints. While taking a backup, all servers will coordinate to capture a consistent snapshot. Service will pause for less than one second, waiting for all pending operations complete. Moreover, servers are replicated in each region to provide fault tolerance.

Concurrency Control

Optimistic Concurrency Control (OCC) Timestamp Ordering

Since HyperDex employs a novel storage model, the concurrency control scheme is also different from typical methods. Moreover, HyperDex and HyperDex Warp use different types of concurrency control. HyperDex uses timestamp ordering. There is a centralized coordinator that assigns a version number to every update. All servers apply updates in the same order according to the version number. HyperDex Warp uses a new form of OCC called acyclic transaction, which provides serializability on top of a distributed data store.

Data Model

Key/Value

HyperDex is a key-value store. Each object stored in it complies with an application-provided schema, consisting of a key and zero or more secondary attributes.

Foreign Keys

Not Supported

HyperDex is a NoSQL DBMS that does not support foreign keys.

Indexes

Hash Table

HyperDex does not use any traditional index. Instead, it applies a new hashing technique called hyperspace hashing. All attributes are mapped to dimensions in a multi-dimensional hyperspace and attribute values are hashed independently. Objects reside at the coordinate specified by the hashes. In order to support range search, attribute order is preserved with hashing. In sum, it is easy and efficient to insert, search, retrieve objects with this hyperspace hashing mapping.

Isolation Levels

Serializable

HyperDex provides serializability for operations. All clients connected to the system are guaranteed to see operations in the same order. Specifically, a search will observe any update happen before it and return the latest version of data. To provide serializability consistency, HyperDex adopts a technique called value-dependent chaining. The system arranges all replicas of an object into a value-dependent chain, whose items are determined by hashing the object into subspaces (see Storage Model). Each update flows through the chain and remains pending until the tail sends an acknowledgment back. Out-of-order updates are avoided with the global version number. Therefore, it ensures that all operations are strictly ordered and reliably committed at all replicas before clients receiving responses.

Joins

Not Supported

HyperDex does not support joins. Since different tables are mapped to different hyperspaces, records from different tables cannot be combined in HyperDex.

Logging

Not Supported

HyperDex does not support logging. The storage layer HyperDisk is itself log-structured, so there is no need for extra logging. To provide fault tolerance, it employs replicas to store one piece of data.

Query Execution

Materialized Model

Client first maps a query into hyperspace and determines which servers’ regions intersect the resulting hyperplane. It then sends the request to those servers and collects matching results if necessary.

Query Interface

Custom API

HyperDex provides an imperative API. It supports two kinds of queries: 1) basic query to insert, retrieve, update and delete objects identified by keys. 2) search query to retrieve the objects whose attributes match user-specified range (any subset of attributes). HyperDex contains full client bindings for C, C++, and Python, and partial bindings for Java, Node.JS and Ruby.

Storage Architecture

Disk-oriented

HyperDex does not explicitly state the storage architecture. However, the paper does imply disk-oriented architecture. HyperDex has embeddable storage layer called HyperDisk, which recursively apply hyperspace hashing technique in storage. A region mapped to a server is further partitioned into non-overlapping subregions and objects in each sub-region is stored in a file on disk.

Storage Model

Custom

As mentioned in index, HyperDex applies hyperspace hashing to store data. It regards tables as multi-dimensional Euclidean spaces and each dimension corresponds to one attribute. An object is mapped to a position in the hyperspace by hashing every attribute value to a coordinate on the corresponding axis. Each server is assigned a region in the same hyperspace and is responsible to store the data fall within its region. Specifically, the hyperspace is divided into hyperrectangular regions and a fault-tolerance coordinator is responsible for mapping servers to regions. Since tables have different sets of attributes, HyperDex maintains multiple independent hyperspaces. Moreover, to address the problem that the hyperspace expands exponentially as the number of attributes increases. HyperDex partitions tables with many attributes into multiple smaller subspaces of fewer dimensions. Hyperspace hashing does not distinguish between the key and other attributes. However, to support efficient key-based operation, HyperDex has a one-dimension subspace dedicated to the key.

Storage Organization

Log-structured

As mentioned in Storage Architecture, records are stored in files on disk. Files are log-structured. Updates such as insertion are recorded on the tail and search operation linearly scan the whole file.

System Architecture

Shared-Nothing

HyperDex supports distributed shared-nothing architecture. HyperDex contains three components: the coordinator, clients, and storage servers. The coordinator is responsible for maintaining state for the system, such as region-server mapping, server failures, the latest version number. Though it is logically centralized, it is implemented as a replicated machine. Clients send requests directly to corresponding servers. Servers are deployed on cluster and each server stores a piece of data. All messages are sent through TCP.

Revision #8 | Updated 06/26/2022 11:08 p.m.

HyperDex

History

Checkpoints

Concurrency Control

Data Model

Foreign Keys

Indexes

Isolation Levels

Joins

Logging

Query Execution

Query Interface

Storage Architecture

Storage Model

Storage Organization

System Architecture

People Also Viewed

Website

Source Code

Tech Docs

Twitter

Developer

Country of Origin

Start Year

End Year

Project Type

Written in

Supported languages

Operating Systems

Licenses

People Also Viewed