JaguarDB

Nearest Neighbor Search

JaguarDB is a distributed SQL DBMS that stores data as a flat array containing records of fixed-length. In each record, there is a key that might be composite and a value that might have multiple columns.

History

JaguarDB was rebranded as a vector DBMS designed for LLMs in 2023.

Checkpoints

Blocking

All data can be backed up in a remote storage server. Data backup will be performed periodically with user specified frequency. Each node can also locally back up data, i.e. taking a snapshot of the database at user-defined frequency.

In case of a node crash: users should always have spare Jaguar servers to prepare for node crashes. The spare servers should have empty data directory so that when other nodes crash, the spare servers can receive data from replicas and start functioning as a regular Jaguar server.

In case of temporary network disconnections, the Jaguar servers will automatically re-sync the data after connections are up again.

Concurrency Control

Optimistic Concurrency Control (OCC)

According to the CAP theory, Jaguar is an AP system, which means that when there is network partition, Jaguar provides availability rather than consistency. It offers "eventual consistency".

Multiple users can do parallel reading and writing at the same time. For a single node, the critical region will be locked.

Data Model

Key/Value Array / Matrix

Jaguar is a key-value store. Each table is an "Sorted Elastic Array" (SEA, which will be covered in more details later). The array contains keys and pointers to fixed-length records.

The data types supported include standard types like int, float, strings and also range, file and spatial data types. Spatial data type are values can be a parameterized geometric shape (e.g. Circle(center=Point(1,3), radius=2))

Foreign Keys

Supported

Join operations can be performed on any column (including key column) in any table.

Indexes

B+Tree

It supports creating BTree index on both key and non-key columns. All the key columns are automatically sorted in "Sorted Elastic Array" which enables easy range query.

Joins

Hash Join Sort-Merge Join

Inner join operations are supported. Join operations can be performed on any column in any table. From the code-base it seems that both hash tables and sort-merge algorithms are developed and used.

Logging

Physical Logging

Jaguar servers log client commands and table management history to disk.

Query Interface

Custom API SQL Command-line / Shell

JaguarDB provides a set of built-in helper functions that can be used in SQL commands. Standard SQL commands are also supported including create/drop table and index, load table, insert/delete record, select, join, update, group by, aggregation.

JaguarDB also supports schema change: when table is created, 30% extra space is allocated to allow users to add new columns if extra space is big enough to hold the new columns. Otherwise the table is dropped and recreated with new columns.

Jaguar supports libraries including JDBC and has API for Python, PHP.

It also supports querying spatial data attributes (e.g. select all circles that have x-coordinate > 5). API provided include built-in functions like Distance() that computes distance between two arbitrary geometric shapes.

Storage Architecture

Disk-oriented

All data is stored on disk. Memory is only used for caching and computation.

Storage Format

Custom

Storage Model

Custom

Each table is a giant array containing fixed-length records. Each record has a key which may be a composite key, and a value which may include multiple columns.

The array is called "Sorted Elastic Array (SEA)". The array maintains an important invariant that it should be at least 30% sparse (at least 30% of the array space is unoccupied). As more and more elements are added to the array and the sparse ratio is no longer bigger than 30%, the array will be resized. During the array resizing, a new longer array will be created and all the existing elements in the old array will be copied to the new one with enough spacing between any two adjacent elements.

Storage Organization

Sorted Files

For each table, all the keys are stored in one big "Sorted Elastic Array". The array is cut into multiple blocks to be stored to disk and there is a block meta table that maintains pointers to each block's starting index in the SEA. When the SEA is resized, the meta table is also updated.

Stored Procedures

Supported

When scaling out the Jaguar server cluster, the cluster does not move data around like other NoSQL databases. The new servers are minimally configured to join the cluster and then can start to process reads and writes from clients.

JaguarDB also supports importing and syncing tables from other databases like MySQL or Oracle. All that is required is that the other databases maintain a change-log table and triggers that automatically record all the changes users made to the data. The Jaguar server will monitor the the change-log table in order to automatically synchronize its own data whenever there is a change to the other databases.

System Architecture

Shared-Nothing

Jaguar uses flat master-master architecture. The Jaguar server cluster contains many Jaguar servers (which may even be in separate data centers) and users can have multiple clients. Each client can choose to connect to arbitrary number of servers without restrictions. The Jaguar servers sync any data update among themselves in real time in its best effort to maintain one-copy semantics. The storage capability scales almost linearly to the number of Jaguar servers in-use.

Each client will maintain connections to multiple Jaguar servers at the same time. When the client wants to update a record, it computes the hash value of the key of the data record and sends the request to the server that is responsible for managing the specific hash value. As a result different servers manage different data records and can thus process multiple different user requests at the same time in parallel.

People Also Viewed