HBase is an open source, distributed, non-relational, scalable big data store that runs on top of Hadoop Distributed Filesystem. Hbase is suitable for storing large quantities of data, but it lacks many of the features that relational database management systems usually have, such as column types, secondary indexes, advanced query languages, etc. HBase stores the data in rows and columns. A row is referenced by a row key, and columns are grouped into "column families". HBase is written in Java, and is supported by Apache Software Foundation.
HBase was initially a project by the company Powerset, a San Francisco-based search and natural language company. Microsoft acquired Powerset in 2008.
HBase guarantees ACID semantics per-row. HBase uses a form of Multiversion Concurrency Control (MVCC) to avoid row locks for read operations. Write operations still need to acquire row locks.
An HBase table consists of rows and columns. Rows are referenced by row keys which are raw byte arrays and are sorted by row key. The sort is byte-ordered. Each row contains columns. A column's content is also an uninterpreted array of bytes. All columns belong to a column family.
For each table, HBase only provide an B+Tree like index on row keys. HBase does not natively support secondary indexes. Users can use filters for querying on non-rowkey columns. There are some techniques to create another table which can be used as a secondary index.
HBase only provide "read committed" isolation level. Users can downgrade the isolation level to "read uncommitted" by modifying the source code.
HBase's write-ahead-log is named HLog.
HBase does not provide native support for SQL. Unlike RDBMS, HBase has four primary operations: Get, Put, Scan and Delete. It also has some DDL operations, e.g., Create. HBase provides a shell which users can fire queries from. Users can specify table name, column names and apply filters in their query. HBase also offers Java Client API and Thrift/REST API. Some third-party drivers are also available for other programming languages.
HBase is schema-less column-oriented datastore.
Stored procedures are not directly supported in HBase. But users can use coprocessors to resemble store procedures.
HBase is organized as a cluster of HBase nodes that complies with the master-slave architecture. There are two types of nodes: a master node, and one or more slave nodes called RegionServers. RegionServers serve data for reads and writes. An HBase Region is a subset of an HBase table that has a continuous range of sorted rowkeys. Region assignment and DDL operations are handled by the HBase Master. HBase is built on top of Hadoop File System, thereby making it shared-disk. The Hadoop DataNodes store the data that RegionServers are managing. The HDFS Zookeeper maintains the server status in the cluster.