Cloud BigTable

Cloud BigTable is a distributed storage system used in Google. BigTable is designed mainly for scalability.

Google has now provided BigTable as its cloud NoSQL database service.

History

BigTable was among the early attempts Google made to manage big data. Jeffrey Dean and Sanjay Ghemawat were involved in it. It is one of the three components Google built for managing big data (the other two are Google File System and MapReduce).

These three components focus on different aspects of big data: Google File System is a reliable distributed file system that the other two build upon; MapReduce is a distributed data processing framework; BigTable is a distributed storage system.

Concurrency Control

Not Supported

BigTable only supports transactions on a single row[1]. It does not support transactions spanning multiple rows

Data Model

Column Family / Wide-Column

BigTable does not support relational data model. Instead, it provides users the ability to create column families in a table.

Each table usually contains a small number of column families, which should be rarely changed (because the change of them involves metadata change). Inside each column family, there can be unlimited number of columns. Users can freely add or delete columns in a column family. Deleting of an entire column family is also supported.

BigTable does not have any type information associated with a given column. It only treats data as strings of bytes.

Indexes

Not Supported

Logging

Physical Logging

BigTable uses physical logging. For performance consideration, all tablets on a tablet server write logs to the same log file[1].

Query Compilation

Not Supported

Query Interface

Custom API

BigTable provides clients with the following APIs: 1. Look Up (Read a Single Row) 2. Scan (Read a subset of rows) 3. Write 4. Delete 5. Customized Scripts (written in Sawzall language)

Storage Architecture

Disk-oriented

BigTable assumes an underlying reliable distributed file system (here is Google File System). The tablets are stored in Google File System, which is a disk-oriented file system. The most recently written records are stored in memtable, which is in memory. However, most of the data is stored on disk.

Storage Model

Custom

In BigTable, a table is split into multiple tablets, each of which is a subset of consecutive rows[1]. A tablet is a unit of data distribution and load balancing. Different tablets of a table may be assigned to different tablet servers. A tablet is stored in the form of a log-structured merge tree[2] (which they call memtable and SSTable).

Furthermore, BigTable allows clients to create locality group[3]. A locality group is a subset of columns in a table. BigTable will create a separate SSTable for each locality group, which will improve read performance of this locality group.

Stored Procedures

Not Supported

Derivative Systems

Embeddings

People Also Viewed

Cloud BigTable Logo
Website

https://cloud.google.com/bigtable/

Developer

Google

Country of Origin

US

Start Year

2005

Former Name

BigTable

Project Type

Commercial

Supported languages

C++

Operating Systems

Linux

Wikipedia

https://en.wikipedia.org/wiki/Bigtable

Derivative Systems

Embeddings

People Also Viewed