ArangoDB is a multi-model mostly-memory database. It supports key-value, documents, and graphs stores with JSON data format. ArangoDB stores all data in persistent storage to provide durability. However, to efficiently use ArangoDB, the frequently accessed pages, or equivalently called the working set, should be able to fit into the main memory. At the same time, unlike most NoSQL databases, ArangoDB supports join operation and allows users to specify either multi-collections transactions for ACID properties or standard single-document transactions for performance boosting.
The motivation of ArangoDB is to combine the most common usages of NoSQL databases. Other NoSQL databases also using JSON data format like MongoDB for documents and Neo4j for graphs naturally only support a single data model. ArangoDB tries to combine their use cases together to build a "all-in-one" database so that users do not need to use a second database for different types of data. ArangoDB was ready to be used in production since version 1.0 released in spring 2012. ArangoDB is original named AvocadoDB. Later to avoid legal issues, the name is change to ArangoDB in May, 2012.
Generally, there is no view in ArangoDB. However, there is a search engine specifically designed for document searching called ArangoSearch that adopts a similar "materialized view" idea as in SQL databases. By utilizing pre-processed document information, it can reduce the complexity of execution plans and allows fuzzy search.
The default MMFiles engine supports serializable isolation. In each transaction, users need to specify collections they need to access in advance, and all these collections will be locked at the beginning of a transaction to prevent from others modifying at the same time. Within these collections, it is guaranteed that there is no uncommitted changes, unrepeatable reads, and phantom problem. Another storage option RocksDB engine only disallowed write-write conflict. Therefore when two transactions read and write the same set of collections at the same time, it is possible to read uncommitted changes.
Two-Phase Locking (Deadlock Detection)
The user need to specify which collections a transaction needs to read/write. ArangoDB will first collect all the locks in lexoigraphical order of the collection names at the beginning of each transaction, and release the locks in reverse order after the transaction finishes. In case there is a deadlock, ArangoDB will automatically abort one of the transactions, roll back the changes, and throw an error to the client.
Each query first goes to a query optimizer, which generates one or more possible plans according to the current data model and estimate the cost of each. Only the one with lowest cost is returned. The output plan will be execute in a pipeline manner on execution nodes. Each node receives a job from its parent, divides and distributes it to children nodes. All results from children are then aggregated and returned to the current node's parent.
ArangoDB allows users to define their own User Defined Functions (UDFs). Users can also use the Foxx microservice framework to build their own logic into a microservice inside the database and able to access data it needs. This can achieve the same functionalities of stored procedures.
By default, the index of a key-value or a document store is a hash index on its primary key. At the same time, user can specify other indexes including skip list, fulltext index, persistent index, geo-spatial Index etc. The graph store adopts a different strategy. It uses a hybrid index combining hash index and doubly linked list to deal with graph operations more efficiently.
Key/Value Document / XML Graph
In ArangoDB, a document collection always has a primary key. Therefore, without specifying any secondary index, it is just like a key-value store. Generally, there can be multiple attributes and multiple secondary indexes, then it is like a common document store. By default, the sharding key is the same as the primary key. This is to help partition similar data to the same shard so that it can efficiently process queries and achieve better linear scalability. Besides key-value store and document store, ArangoDB also supports graph store. It supports operations including traversal (e.g. breadth-first search, depth-first search), shortest path, etc.
ArangoDB is a mostly-memory database, which means it needs the working set to fit into the main memory to perform well. The whole dataset is stored on disk to avoid data loss. There are two storage engines available. The default one is called MMFiles which is based on memory-mapped files. The other available option is RocksDB.
https://github.com/arangodb/arangodb
ArangoDB GmbH
2011
AvocadoDB