GrapheekDB is a lightweight graph database with support for multiple back-end storage managers. It only represents directed graphs and is persistent if the chosen data model is a Key/Value store.
GrapheekDB was developed in 2014 by Raphaël Braud, a freelance developer from France. It was built for a recommendation system to extract the contents of documents, tokenizing their contents, and give recommendations of similar documents based on user queries. A graph database was chosen over a relational database to avoid multiple joins on tables of several million rows to improve performance. It was built with a specific purpose of recommending documents and has a python-like API (close to Django and Gremlin).
The Naive Page Rank compression algorithm is listed as one of the todo items in the source-code but is not yet supported.
Two-Phase Locking (Deadlock Prevention)
GrapheekDB supports a pessimistic lock based concurrency protocol. Transactions are only allowed to take exclusive locks on data items. Following a graph based implementation, a transaction T is only allowed to explicitly lock a data item Q if the parent of Q is currently locked by T. Like Two-Phase Locking, the concurrency protocol leads to a deadlock-free, conflict serializable schedules, but are susceptible to cascading rollbacks.
The DBMS is a multi-model document store. Presently it can either be a graph or Key/Value Store (KVS). The DBMS uses many KVS backends such as Kyoto Cabinet and Symas LMDB. If a KVS backend is used, the DBMS becomes object persistent. There are no strict assertions on data modelling.
While a graph database is index-free as it consists of direct pointers to its adjacent elements (a property known as adjacency), GrapheekDB does not need an index to find node and edge indices. However, the latest version of the DBMS does support nodes and edge indices for lookups on sparse graphs. The current version only supports "exact match indices" and performs a Depth-First-Search (DFS) in order to match indices. Storing the indices leads to a storage overhead and slows down writes in the DBMS.
The DBMS was built with serializable execution in mind. This was done to avoid loading the entire data in memory every time the intended recommendation algorithm was run and produce the desired list of documents based on the user query.
A graph database does not need join operations as they are expensive.
Almost every query such as collections and aggregations in the DBMS is implemented via Python iterators referred to as "entity iterators". The term 'entity' refers to the property of the objects in the database used to generate recommendations. For example, an object "book" is an entity if the DBMS is recommending a list of books to read based on a user's query for a book.
The Query interface is close to Germlin and Django frontend. The DBMS has methods for lookups on graphs that resemble Django lookups and methods for path traversals for inner and outer vertices and edges that resemble Germlin traversal methods. The DBMS also has aliasing and collecting methods as well as aggregation methods such as count and sum which are implemented using python's entity iterators.
The DBMS uses in memory storage to store the graph.
GrapheekDB is a multi model document store. The nodes and edges can have related data, but this is not enforced. The database is schemaless.
The database uses a client-server model and runs on TCP, port 5555. The database lacks an authentication mechanism between the client and the server. It can be used as a pure-in memory database but is targeted to be used with persistent backends such as KyotoCabinet or LMDB.