Geode

Apache Geode is an open-source, in-memory distributed data grid that scales out horizontally across nodes.

History

Originally released commercially in 2002 under GemStone Systems as Gemfire, it was first used in Wall Street trading platforms. In 2010, GemStone Systems was acquired by VMWare and by 2013, Pivotal had spun out of VMWare and took Gemfire with it. Pivotal then proceeded to submit Gemfire to the Apache Incubator in 2015 under the name Geode, and was graduated to a top-level project in 2016.

Data Model

Key/Value

In a Geode distributed, system, caches are defined as an abstraction that describes the node of the in-memory storage for the data. Each cache contains regions, which data is stored in the form of key-value pairs. So caches are similar to the construct of databases and regions are similar to tables in a relational database.

Query Interface

Custom API

Geode supports Object Query Language (OQL) to query region data. Although OQL and SQL share many syntactical similarities, they differ a lot, such as OQL doesn't support aggregation functions, it supports querying on complex object graphs, attributes, and by default OQL queries on the value of the region rather than the key, etc.

The syntax for a OQL query is: [IMPORT package] SELECT [DISTINCT] projectionList FROM collection1, [collection2, …] [WHERE clause] [ORDER BY order_criteria [desc]]

Geode also supports query index hints to filter on the specified index. Here is an example: Select projectionList From collection1.

Isolation Levels

Read Uncommitted Repeatable Read

Geode supports repeatable reads, but its default allows for dirty reads. [TBD]

Storage Architecture

In-Memory

Geode follows the in-memory data grid (IMDG) outline. But it also has a disk store module to deal with data overflow and persistence. With the disk store, users can export data to disk when memory usage becomes too high or to persist data as a backup copy.

Foreign Keys

Not Supported

Although Geode doesn't support foreign keys, it provides a similar function called data colocation to store related data entries that have the same ID from different data regions into one single member. For example, the Geode system contains one customer records region and one customer orders region and they are related to each other through the customer. By using colocation, users can maintain all records and orders information for a customer in a cache of a single member, which will be used by all operations regarding this customer only.

Concurrency Control

Timestamp Ordering

Any modifications to caches are replicated and checked for consistency before acknowledging success. If there are any concurrent modifications, then changes are made according to time stamp. [TBD]