Sqrrl is a graph database management system (DBMS) . It uses secure, adaptable NoSQL built from Apache Accumulo. Originally founded by former employees of the National Security Agency (NSA), Sqrrl focuses on cybersecurity.
The main product from Sqrrl is Sqrrl Enterprise, which is a solution for enterprise users on top of Accumulo that can handle massive structured and/or unstructured data sets. Sqrrl is aimed towards users who want to analyze data for security purposes. A Sqrrl whitepaper describes Sqrrl Enterprise as a threat hunting tool that uses large amounts of data with linked data analysis to aid users going through the 'hunting loop'. In particular, Sqrrl manages data and can display it to users raw or in the form of visualizations for analytics, finding threat patterns, or for further investigation.
Sqrrl has native Hadoop integration and supports multi-structured data.
Sqrrl development began in 2011. As a company, it was founded by Ely Kahn and Oren J. Falkowitz who worked in cybersecurity before creating Sqrrl. Sqrrl was built on Apache Accumulo. Customers using Sqrrl came from both the private and public sector.
In January of 2018, Sqrrl was acquired by Amazon Web Services (AWS). Existing users of Sqrrl were able to continue using the software without any changes. In late November 2018, Sqrrl co-founder Ely Kahn (who is now a Security Strategist at AWS) commented on the release of AWS Security Hub, which includes visualizations and security remediation recommendations. However, there were no statements made explicitly about Sqrrl's integration with AWS.
Sqrrl Enterprise offers users a proprietary query language called SqrrlQL that is integrated with the cell-level security concept. Users are able to execute SQL-like queries (key-value), full-text queries, or graph searches.
Multi-version Concurrency Control (MVCC)
Concurrency control is implemented in Sqrrl in a way that is not publicly known. Sqrrl supports concurrency control by doing atomic updates on data entities. Sqrrl is also built on top of Zookeeper, which has locks to control concurrency.
The exact kind of logging Sqrrl uses is not known, but based on the Accumulo documentation, it likely saves each write to its log files.
Sqrrl is considered to be a wide column store, which is a kind of NoSQL storage model.
Within Accumulo, which Sqrrl is built on, partitions of tables (called tablets) organize the columns and values within a row boundary so that the information for any particular row is in one tablet.
Based on the Accumulo implementation, isolation can be activated. When it is, a user reading the database will either see none of the updates while they are being applied to a row, or they can see all of the updates after they are applied to the row. However, at this level of isolation, users may see different values from a row on two different reads, or have phantom new rows be added that are only visible after the reading transaction is over.
Based on the Accumulo database management system Sqrrl is built on, Sqrrl uses both disk space in the form of Hadoop and Zookeeper locks and an in-memory store of sorted key/value pairs.
Based on the Accumulo implementation, checkpoints are organized with the Hadoop Distributed File System (HDFS) in order to create a system of of redundant files. The replication framework creates two instances of Accumulo that are consistent with each other.
Accumulo does not have built-in foreign keys as a NoSQL DBMS. The specific implementation in Sqrrl is not publicly known.
After data is ingested, the Sqrrl Enterprise product details each piece at a cell level. Data can be a key/value pair or a field in a JSON document. Sqrrl then uses secondary indexing techniques to store the data in Apache Accumulo in its native form. This flat architecture allows Sqrrl to handle data of different structures.
In 2016, a presentation by Sqrrl stated that the version of Accumulo used by Sqrrl Enterprise stores sorted key-value pairs. Sqrrl then combines the data into graph storage form, using a linked data model.
Sqrrl is therefore considered a graph DBMS because it both represents data to users in a graphical format and is schema-free. Custom iterators for Accumulo developed by Sqrrl allow users to complete graph analysis on the data.
With native Hadoop integration, Sqrrl Enterprise ingests data. It is encrypted and labeled, then indexed and stored in Sqrrl's version of Apache Accumulo within the Hortonworks Data Platform (HDP). Sqrrl and HDP, together, store multi-structured data in its native format. This allows users of Sqrrl to make queries of many different kinds. Sqrrl is considered a distributed database.
Sqrrl converts SQL queries into Accumulo iterators.
Delta Encoding Bit Packing / Mostly Encoding
Based on the implementation of Accumulo, Sqrrl likely uses several types of compression. GZip and LZO are two compression algorithms used by Accumulo. In addition, Accumulo uses delta encoding by resending information if it is the same as prior sent information. Delta encoding is also used when storing sequential information in the table rows.
Indexed Sequential Access Method (ISAM)
Accumulo, which Sqrrl is built on top of, uses a file called a Relative Key File (RFS) which is a type of Indexed Sequential Access Method (ISAM). These hold data from tablets, which in Accumulo are partitions of tables.
Sqrrl Data, Inc.