Geode

Apache Geode is an open-source, in-memory distributed database designed to support transactional applications with the demand for low latency and high concurrency.

Isolation Levels

Read Uncommitted Repeatable Read

Geode supports repeatable read isolation. By default, the transactions are isolated at the thread level. So other threads can't see the changes made by one thread of a transaction until the commit begins. But after the commit begins, in the cache the partial results of the transaction are now visible to threads who visit the changing data, which could lead to dirty reads.

Users can prevent dirty reads and acquire a strict isolation model by setting "-Dgemfire.detectReadConflicts=true".

Storage Architecture

In-Memory

Geode follows the in-memory data grid (IMDG) outline. But it also has a disk store module to deal with data overflow and persistence. With the disk store, users can export data to disk when memory usage becomes too high or to persist data as a backup copy.

Query Interface

Custom API

Geode supports Object Query Language (OQL) to query region data. Although OQL and SQL share many syntactical similarities, they differ a lot, such as OQL doesn't support aggregation functions, it supports querying on complex object graphs, and by default, OQL queries on the value of the region rather than the key, etc.

Geode also supports query index hints to filter on the specified index by add "" at the beginning of the select query.

Foreign Keys

Not Supported

Although Geode doesn't support foreign keys, it provides a similar function called data colocation to store related data entries that have the same ID from different data regions into one single member. For example, the Geode system contains one customer records region and one customer orders region and they are related to each other through the customer.

By using colocation, users can maintain all records and orders information for a customer in a cache of a single member, which will be used by all operations regarding this customer only.

System Architecture

Shared-Nothing

Geode is designed in the shared-nothing architecture in order to have low latency and high concurrency performance.

Compression

Bit Packing / Mostly Encoding

By default, Geode supports Snappy compressor, and users are allowed to use their customized compressor by region. When compression is enabled in a region, all the values in memory will be compressed, and they will be decompressed when they are read from the cache.

Data Model

Key/Value

In a Geode distributed, system, caches are defined as an abstraction that describes the node of the in-memory storage for the data. Each cache contains regions, which data is stored in the form of key-value pairs. So caches are similar to the construct of databases and regions are similar to tables in a relational database.

Concurrency Control

Timestamp Ordering

For a partitioned region, all the updates on a specific key will be routed to the primary Geode member serializable.

For a replicated region, as long as a member hosts the region, it can update the key and propagate it to other members. The consistency is retained by storing the version stamp and the Geode member ID which last updated the entry in each entry of a region. The version stamp will get incremented each time an update for that entry appears. When an update message is received, the Geode member/client will compare the version stamp of the update message to the one in the local cache. If the former one is larger, the member/client will perform the update and update the version stamp information. When the update message has the same version stamp as the local cache, the one with the higher member ID will be kept. The concurrency control of non-replicated regions and client cache is similar to the way in replicated regions.