Cloud BigTable is a distributed storage system used in Google. BigTable is designed mainly for scalability.
Google has now provided BigTable as its cloud NoSQL database service.
BigTable was among the early attempts Google made to manage big data. Jeffrey Dean and Sanjay Ghemawat were involved in it. It is one of the three components Google built for managing big data (the other two are Google File System and MapReduce).
These three components focus on different aspects of big data: Google File System is a reliable distributed file system that the other two build upon; MapReduce is a distributed data processing framework; BigTable is a distributed storage system.
BigTable only supports transactions on a single row. It does not support transactions spanning multiple rows
BigTable does not support relational data model. Instead, it provides users the ability to create column families in a table.
Each table usually contains a small number of column families, which should be rarely changed (because the change of them involves metadata change). Inside each column family, there can be unlimited number of columns. Users can freely add or delete columns in a column family. Deleting of an entire column family is also supported.
BigTable does not have any type information associated with a given column. It only treats data as strings of bytes.
BigTable uses physical logging. For performance consideration, all tablets on a tablet server write logs to the same log file.
BigTable provides clients with the following APIs: 1. Look Up (Read a Single Row) 2. Scan (Read a subset of rows) 3. Write 4. Delete 5. Customized Scripts (written in Sawzall language)
BigTable assumes an underlying reliable distributed file system (here is Google File System). The tablets are stored in Google File System, which is a disk-oriented file system. The most recently written records are stored in memtable, which is in memory. However, most of the data is stored on disk.
In BigTable, a table is split into multiple tablets, each of which is a subset of consecutive rows. A tablet is a unit of data distribution and load balancing. Different tablets of a table may be assigned to different tablet servers. A tablet is stored in the form of a log-structured merge tree (which they call memtable and SSTable).
Furthermore, BigTable allows clients to create locality group. A locality group is a subset of columns in a table. BigTable will create a separate SSTable for each locality group, which will improve read performance of this locality group.