RavenDB is a full-transactional NoSQL ACID-compliant database and is developed by Hibernating Rhinos. RavenDB is designed to operate and be deployed in a distributed cluster, combining both on-premise and cloud deployments. RavenDB also provides auto-indexing capabilities alongside full-text search features to speed up queries. RavenDB provides a range of clients outside of the box, storage-layer encryption, HTTP network support, and the Raven Query Language (with similarities to SQL) for database queries.
RavenDB was originally started as primarily an independent project by [email protected] and was originally named Rhino Divan DB before becoming renamed eventually to RavenDB. According to Ayende, no document database was built for the .NET ecosystem, with most adding support as an afterthought. After his reading of CouchDB source, Ayende designed RavenDB, with a focus on providing a REST-based document store, that solves two separate problems: (1) eliminating places of friction when using CouchDB and (2) building a document database for the .NET ecosystem.
Documents are stored compressed, however the specific compression algorithm used is not discussed. RavenDB versions 3.x utilized naive compression where the entire document is stored in a compressed form. However, in RavenDB 4.0 and onwards, individual fields within each document are selectively compressed, thus allowing the engine to still be able to work with certain parts of the document without necessarily needing to first decompress the document.
Optimistic Concurrency Control (OCC)
By default, RavenDB does not employ a concurrency control scheme, instead resolving concurrent changes to the same document using the "Last Writer Wins" conflict resolution strategy.
RavenDB allows fine-grained control of which requests are executed with optimistic concurrency support enabled: (1) DocumentStore (all sessions and requests); (2) per-session basis; and (3) per-request basis. RavenDB's optimistic concurrency scheme does not check for conflicts on data that is only read by the transaction. Instead, the scheme only checks for conflicting writes and ensures ordering of transactions with regards to write order.
RavenDB is primarily a document based database, with the strict requirement that documents be formatted as JSON. However, RavenDB can also be utilized as a key/values using the Compare Exchange interface to perform cluster-wide interlocked distributed operations for updating key/value pairs.
RavenDB does not have foreign keys or constraints resembling foreign keys. In RavenDB, a document's field that refers to another document is simply a string representing the other document's ID. RavenDB provides a client API "Includes" for loading a document and all its referenced documents together in a single request.
B+Tree Inverted Index (Full Text)
RavenDB indexes are either variable size key and values or fixed sized B+tree where keys are Int64 and values are fixed at creation time. The description of the indexes is provided by RavenDB’s custom storage engine Voron. As far as the documentation states, a table is composed of raw data sections and indexes which are under the hood regular or fixed-size B+Trees. In addition, RavenDB couples Lucene's indexing engine (which provides inverted indexes) with an additional Voron layer that provides ACID guarantees. Lucene's indexes and indexing engine are leveraged during query execution when performing either full-text search or handling Lucene-syntax clauses.
Serializable Snapshot Isolation
RavenDB supports two different types of transactions. A cluster-wide transaction (i.e distributed transaction) requires consensus amongst the voting nodes of the cluster before committing. A single-node transaction only requires the transaction to succeed against that particular node, with the changes replicated to other nodes at a later time.
RavenDB supports two different transaction levels, depending on whether the transaction is a cluster-wide transaction or not. Generally for a single-node transaction, all the operations are executed at the snapshot isolation level, with all document state retrieved as it was at the beginning of the HTTP request containing all requested operations. At the cluster-wide concurrent transaction level, the transactions are executed at the serializable isolation level (appearing that each transaction ran in a sequential order); however, if cluster-wide and single-node transactions run concurrently, the cluster-wide transaction will take precedence over the modifications of the non-cluster transaction.
RavenDB does not support out-of-the-box joins. Although they support the Includes operation, the Includes operation is not actually a join - the operation simply fetches the related document. Although RavenDB does not provide out-of-the-box joins, joins can be emulated either by grouping related data together during the indexing process or using RavenDB’s projection support that was added in RavenDB 4.x.
RavenDB logs transaction data to a Write Ahead Journal (equivalent to WAL). RavenDB employs physical logging where all modifications made by a transaction are written to the journal file using unbuffered I/O with write-through. RavenDB also provides optional encryption to the logged data for security purposes.
RavenDB does not utilize code generation of any form. Upon close inspection of the query execution pipeline from RavenDB’s raw source code, it is evident that RavenDB executes queries in an iterative manner by picking the correct index, creating an index if necessary, and then doing a lookup into the correct index.
RavenDB’s query execution is purely index-based, either the execution engine finds an index that can answer the query or it creates an index that can answer the query. As such, all relevant documents are pulled into memory at the time relevant information is retrieved from the index. This also allows for server-side projections (transformers) to operate on all the visible document data, mutating it as necessary.
As a performance speedup, RavenDB also supports the notion of streaming, whereby a single result is written out to the network stream as results are found and enumerated through during the index lookup. In an extended client-server interaction, this can give semblance to a document-at-a-time processing model where an individual document is pushed from the server to the client for processing as they become available, rather than waiting for the entire collection.
RavenDB’s query language is RQL which is a language with resemblance to SQL but designed specifically for RavenDB. As RavenDB utilizes REST over HTTP for communication between client and server nodes, queries are transmitted using HTTP calls, which can exploit HTTP cache semantics for document loading (i.e leveraging the ETag HTTP header to detect whether any documents have changed which can reduce the amount of server-side code execution and network traffic).
The query interface allows the user to specify use of a specific static index (an index created explicitly by the user) or allow the query optimizer to search for the best index. If no index could be found, the system will create and persist a new index to answer the query, also taking into consideration previous queries against that data collection. After the new index has been created, any redundant indexes (indexes which will never be picked again by the optimizer due to the new index) will be dropped from the system.
RavenDB’s storage engine Voron stores all data files on the, which basically includes all raw data blocks, data block indexes, and extra indexes. Voron utilizes memory mapped file techniques in order to bring disk-resident data into main memory.
Indexes and raw data are stored in separate .voron files, which are individually mapped into main memory. Raw data blocks are operated on in terms of 8 KB pages (with various optimizations employed for performance such as prefetching) that support random I/O operations. RavenDB stores documents larger than 8KB using consecutive 8KB pages that are logically treated by the rest of the system as a single page. RavenDB also ensures that attachments, which are binary additions to the normal JSON data, carry the same ACID semantics as regular transactional documents do and are guaranteed to be stored on the same storage as the containing documents.
JSON documents are stored in data sections on disk by RavenDB's Voron storage manager. Small documents are grouped and stored together in sections that range from 2MB to 8MB in size. Large documents are placed independently, with the section size rounded up to the nearest page size. Under the hood and without any user control, Voron maintains "Voron indexes" (which are B+trees) that allow mapping from given document identifiers to the relevant data sections on disk.
RavenDB does not explicitly state the distributed architecture the system is built for. However, the documentation does imply shared-nothing due to the notion of distinct nodes and replication between nodes over the network. In a clustered environment, each RavenDB node is a full-fledged node using Rachis (based off of Raft) for consensus with master to master replication with no support for sharding. As such, RavenDB is probably intended to be used as a shared-nothing architecture, as indicated by the primary developer.
Virtual Views Materialized Views
RavenDB supports materialized views. A common technique would be creating a RavenDB index that can transform various documents into another document (the view) which can then be quickly queried at a later time. The view is automatically kept in sync with the underlying data by the indexing process that runs asynchronously in the background whenever data is added or changed.
RavenDB also supports the idea of virtual views. RavenDB supports the creation of a virtual view through the use of a server-side projection, a function which the client must be aware of when making the initial query request. Since the server-side projection transformers create these these new documents (the view) on-demand, it can be thought of as a virtual view.
As a single view object can essentially be treated as a document, RavenDB supports writing and persisting view objects. In addition, RavenDB provides the Changes API (similar to triggers) to assist with updating stale view documents automatically when their underlying documents change.