Blazegraph

View Current Viewing Revision #10 from 12/12/2018 5:23 p.m.

Blazegraph is an open-source graph database system written in Java. Blazegraph is supported for use on a standalone server, as well as a highly available (HA) replication cluster. ACID properties are fully supported in both use cases. Blazegraph makes use of multi-version concurrency control (MVCC). As a graph database, Blazegraph is optimal for storing and querying linked data. Blazegraph uses RDF and RDR as the standards for the data model.

History

Blazegraph, formerly known as Bigdata, was released in August of 2016. Its former version, Bigdata, was released in February of 2015. Blazegraph was announced as SYSTAP, LLC's flagship product. Blazegraph is still supported as of December 2018. Blazegraph was developed to work with web-scale semantic graphs. Blazegraph's key-range partitioned B+ tree indexing was influenced by the architecture of Google's BigTable system. SYSTAP, LLC received funding from DARPA to develop the GPU-acceleration feature of Blazegraph. Blazegraph has been used as part of commercial applications in addition to being resold by various OEM companies.

Query Compilation

Stored Procedure Compilation

Stored procedure compilation is supported in Blazegraph through the SPARQL extensions.

Checkpoints

Consistent

Blazegraph supports consistent checkpoints. Blazegraph is capable of providing a consistent database state given a user-specified commit point.

Isolation Levels

Snapshot Isolation

Blazegraph uses snapshot isolation. Read-only transactions return a fully consistent view of the database state as of the user-specified commit point. Read-write transactions buffer writes on isolated indices. They commit only if the write set has been validated.

Concurrency Control

Optimistic Concurrency Control (OCC)

Blazegraph supports transactions. Blazegraph uses Multi-Version Optimistic Concurrency Control (OCC).

Hardware Acceleration

GPU

The enterprise version of Blazegraph supports drop-in GPU acceleration. The GPU-accelerated Blazegraph supports graph queries that are 200-300x faster than without the hardware acceleration. The GPU acceleration works by exploiting the superior main memory bandwidth of GPUs.

Query Execution

Vectorized Model

Blazegraph uses a vectorized query execution model that supports concurrency at the operator level and query plan level.

Logging

Not Supported

Compression

Dictionary Encoding

Blazegraph uses dictionary encoding as the compression method for values stored in graph nodes and edges.

Stored Procedures

Supported

Blazegraph supports stored procedures through SPARQL extensions. These allow for more complex logic to be applied to the database. A stored query can be used by invoking an instance of the associated stored query class.

Foreign Keys

Supported

Blazegraph supports the use of foreign keys, as well as joins on foreign keys.

Storage Organization

Copy-on-Write / Shadow Paging

Blazegraph makes use of copy-on-write mechanisms with its use of B+ trees. This contributes to the system's ability to decide which operations are isolatable by transactions.

Indexes

B+Tree

Blazegraph uses B+ trees in its architecture. Keys and values are both implemented as byte arrays. Blazegraph provides an interface for both single-machine and scale-out B+ trees. Use of this interface requires the user to manage concurrency control.

Data Model

Graph Triplestore / RDF

Blazegraph functions as a triplestore (RDF) and graph database. As a graph database, Blazegraph uses a graph structure of nodes and edges to represent data. Blazegraph also supports the triplestore (RDF) data model, which can be viewed as a specialized version of graph databases that is optimized for storing and retrieving triples. The advantage of RDF is that it provides a standardized data model that support data merging between differing schemas.

System Architecture

Shared-Nothing

Blazegraph makes use of a shared-nothing system architecture. When used in the highly available (HA) deployment mode, continued operation is possible with a quorum of present nodes in the case of failure.

Query Interface

SPARQL

Blazegraph's query interface aligns with SPARQL standards. Note that SPARQL semantics uses a sequential approach in join operations. Blazegraph may reorder join groups in order to minimize query time.

Storage Architecture

Hybrid

Blazegraph supports both in-memory storage and disk-oriented storage. However, operations are optimized for faster disk speed than for greater memory capacity.

Joins

Hash Join Index Nested Loop Join

Blazegraph supports both nested index joins (referred to as pipelined joins) and hash joins. Nested index joins are considered to be "zero investment" joins when used for an RDF (triplestore) database. Hash joins are built dynamically during query evaluation. Blazegraph supports hash join operators that run on JVM heap, as well as hash join operators that run on the native process heap. The former is more appropriate for lower volumes of data, while the latter is more appropriate for higher volumes.

Storage Model

N-ary Storage Model (Row/Record)

Revision #10 | Updated 12/12/2018 5:23 p.m.

View Current Viewing Revision #10 from 12/12/2018 5:23 p.m.

Website

https://www.blazegraph.com/

Source Code

https://github.com/blazegraph/database

Developer

SYSTAP, LLC.

Country of Origin

Start Year

2006

Former Name

Bigdata