Splunk

Splunk is a database system designed for extracting structure and analyzing machine-generated data. It takes in data from other databases, web servers, networks, sensors, etc. and then offers services to analyze the data, and produce dashboards, graphs, reports, alerts, and other visualizations. All this data is captured in a searchable repository and served via a web interface called Splunk Web.

Splunk is a horizontal application and is used by a large and diverse set of users with different knowledge bases in an organization to monitor IT operations, security, and business analytics. It is also possible to extend the Splunk environment by installing or developing an app. An app runs on the Splunk platform and includes inputs, lookups, and reports to display information about the data to add specific functionality. Over 90 of the Fortune 100 companies use Splunk.

History

Splunk was founded by Erik Swan, Michael Baum, and Rob Das in 2002. Prior to founding Splunk, all three founders were dealing with large-scale search infrastructures and were unhappy about the tools available for analyzing log files at the time. Early customers of Splunk reported their experience of debugging their environments as ‘digging through caves’ and ‘crawling through the muck to find the problems’, which inspired the founders to name the company after the word for exploration of caves, spelunking.

Splunk raised a $5 million Series A in 2004 led by August Capital and reached profitability by 2009. Splunk went public on NASDAQ under the ticker name SPLK at a price of $17 a share in 2012. Splunk acquired SignalFx, a cloud monitoring platform for infrastructure, microservices, and applications, in August 2019 for $1.1 billion.

Data Model

Hierarchical

Splunk is a NoSQL database management system. Its data model is a hierarchical search-time mapping of data. The knowledge managers on a Splunk instance design the data model.

Joins

Not Supported

Splunk supports inner (default), outer, and left joins using the join command. This works best when the results of the subsearch are less than 50,000 rows.

It can also join a search result set with itself using the selfjoin command.

Concurrency Control

Not Supported

Splunk supports concurrent search but limits the number in order to preserve performance. It also allows you to configure the maximum number of concurrent searches between scheduled and summarization queries based on your usage.

Splunk also supports concurrent users. A user uses exactly one CPU core on each indexer for the duration of the search. By default, a search on Splunk cannot use multiple cores.

Checkpoints

Not Supported

Splunk supports the notion of checkpoints. When reading data and indexing, a checkpoint can be created to mark the data as being read or indexed.

Indexes

B+Tree

Splunk adds all incoming data to indexes after processing it. It indexes data by breaking them into events, based on the timestamp. After breaking the data up into events, the events are passed through the indexing pipeline where additional steps are taken, such as breaking the events into segments so indexing and searching can be done efficiently, building data structures for the indexes, and writing the events out to disk.

Splunk supports events and metrics indexes. Events indexes are the default index type, impose minimal structure, and can accommodate any type of data. Metrics indexes are highly structured and designed to handle high volume and low latency demands. These indexes have better performance and less space utilization compared to events indexes.

Logging

Not Supported

Scripts in Splunk can send logging data to splunkd.log for tracking and troubleshooting using the stderr command. It supports 5 log levels:

  • DEBUG
  • INFO
  • WARN
  • ERROR (default)
  • FATAL

Compression

Naïve (Record-Level)

Splunk compresses the raw data up to approximately half its original size. For indexes, Splunk supports gzip (default), lz4, and zstd for compression and can handle different buckets compressed with different algorithms.

Storage Model

Custom

Splunk stores data in a flat file format. All data in Splunk is stored in an index and in Hot, Warm, and Cold buckets depending on the size and age of the data. It supports both clustered and non-clustered indexes.

Foreign Keys

Supported

Splunk supports referential integrity.

Isolation Levels

Not Supported

In Splunk, workload management allows resource isolation search and ingest processes. This lets users allocate resources to search pools without affecting ingest processes.

Query Execution

Vectorized Model

Splunk uses MapReduce to speed up searches.

Views

Not Supported

Splunk's Web Framework includes a library of views (such as Chart, Table, SplunkMap, Timeline, etc.), UI widgets that allow you to display certain data in certain ways.

Splunk Logo
Website

http://www.splunk.com/

Tech Docs

https://docs.splunk.com/Documentation/Splunk

Developer

Splunk

Country of Origin

US

Start Year

2002

Project Type

Commercial

Written in

C++

Supported languages

C#, Java, JavaScript, PHP, Python, Ruby

Licenses

Proprietary

Wikipedia

https://en.wikipedia.org/wiki/Splunk