2
Bigtable: A Distributed Storage System for Structured Data Paper Review By Syed Jibranuddin UB #(50026775) Bigtable is a distributed system build over GFS and Chubby lock service for storing large scale structured data at Google. Big table provided flexible and high performance solutions to the applications demanding different constraints and requirements. Bigtable is a sparse, distributed, persistent multidimensional sorted map, indexed by a row key column key, and a timestamp. In many cases Bigtable resembles database but it does not support any relational model. Successive entries can be pushed in the same <Row, Column>, multiple copies will be differentiated by the timestamp. In addition all columns belongs to column family, which provides another layer of organization to the data. This paper explains the organization, the implementation of Bitable, test data, and then explains the real-world use of the BigTable in their products such as Google Earth and Analytics. Tables are partitioned into tablet which is the basic unit for load balancing, fault tolerance and other operations. When client access Bigtable for the first time, it first access chubby server to acquire location of root tablet, then obtain metadata tablet with the help of the location information in root tablet. This metadata tablet contains real tablet’s location information stored in the servers. The master node is responsible to perform load balancing and fault recovering among the whole system. Each server will acquire a lock on Chubby server for each tablet stored on it and the each lock has a life period which has to be refreshed by servers respectively. A master node can simply monitor the related directory to get a full view of the whole system. When one server fails, the master node will communicate with GFS to get location of replicated server and then it can restart those tablets on the replicated server. Key Features: Bigtables architecture consists of client libraries, one master server (for assigning tablets to tablet servers, load balancing, and garbage collection), and several tablet servers. BigTable uses a three-level indexing scheme to resolve a value: (row, column, time) Uses GFS for persistent storage of data. Bigtable uses Chubby distributed lock service to control file locking and replication. Single BigTable tablet server handles all the reads and writes for a tablet. Bigtable treats data as uninterpreted strings. Cleints rarely communicate with the master as master doesn’t contain the tablet location information. Pros Bigtable chose very simple data model, and gives users the right to control the detailed format of data according to their unique demands. The data structure can support a large range of applications, e.g., Google Earth, Crawls, Percolator and so on. Interaction with the master is reduced as client requests directly goes to the tablet servers. Tablet servers can be dynamically added or removed from a cluster to accommodate workloads. Bigtable is able to provide good locality property. Highly scalable to thousands of machines. Cons Bigtable is not a relational database Bigtable does not support general transactions across row keys. Only one Master node is responsible for tablet assignment, detecting expiration, load

Bigtable: A Distributed Storage System for Structured Data Review

Embed Size (px)

DESCRIPTION

Bigtable is a distributed storage system for managingstructured data that is designed to scale to a very largesize: petabytes of data across thousands of commodityservers. Many projects at Google store data in Bigtable,including web indexing, Google Earth, and Google Fi-nance. These applications place very different demandson Bigtable, both in terms of data size (from URLs toweb pages to satellite imagery) and latency requirements(from backend bulk processing to real-time data serving).Despite these varied demands, Bigtable has successfullyprovided a flexible, high-performance solution for all ofthese Google products. In this paper we describe the sim-ple data model provided by Bigtable, which gives clientsdynamic control over data layout and format, and we de-scribe the design and implementation of Bigtable.

Citation preview

Page 1: Bigtable: A Distributed Storage System for Structured Data Review

Bigtable: A Distributed Storage System for Structured DataPaper Review

By Syed JibranuddinUB #(50026775)

Bigtable is a distributed system build over GFS and Chubby lock service for storing large scale structured data at Google. Big table provided flexible and high performance solutions to the applications demanding different constraints and requirements. Bigtable is a sparse, distributed, persistent multidimensional sorted map, indexed by a row key column key, and a timestamp. In many cases Bigtable resembles database but it does not support any relational model. Successive entries can be pushed in the same <Row, Column>, multiple copies will be differentiated by the timestamp. In addition all columns belongs to column family, which provides another layer of organization to the data. This paper explains the organization, the implementation of Bitable, test data, and then explains the real-world use of the BigTable in their products such as Google Earth and Analytics. Tables are partitioned into tablet which is the basic unit for load balancing, fault tolerance and other operations. When client access Bigtable for the first time, it first access chubby server to acquire location of root tablet, then obtain metadata tablet with the help of the location information in root tablet. This metadata tablet contains real tablet’s location information stored in the servers. The master node is responsible to perform load balancing and fault recovering among the whole system. Each server will acquire a lock on Chubby server for each tablet stored on it and the each lock has a life period which has to be refreshed by servers respectively.A master node can simply monitor the related directory to get a full view of the whole system. When one server fails, the master node will communicate with GFS to get location of replicated server and then it can restart those tablets on the replicated server. Key Features:

● Bigtables architecture consists of client libraries, one master server (for assigning tablets to tablet servers, load balancing, and garbage collection), and several tablet servers.

● BigTable uses a three-level indexing scheme to resolve a value: (row, column, time)● Uses GFS for persistent storage of data.● Bigtable uses Chubby distributed lock service to control file locking and replication.● Single BigTable tablet server handles all the reads and writes for a tablet.● Bigtable treats data as uninterpreted strings.● Cleints rarely communicate with the master as master doesn’t contain the tablet location

information.Pros

● Bigtable chose very simple data model, and gives users the right to control the detailed format of data according to their unique demands.

● The data structure can support a large range of applications, e.g., Google Earth, Crawls, Percolator and so on.

● Interaction with the master is reduced as client requests directly goes to the tablet servers.● Tablet servers can be dynamically added or removed from a cluster to accommodate

workloads.● Bigtable is able to provide good locality property.● Highly scalable to thousands of machines.

Cons

● Bigtable is not a relational database● Bigtable does not support general transactions across row keys.● Only one Master node is responsible for tablet assignment, detecting expiration, load

Page 2: Bigtable: A Distributed Storage System for Structured Data Review

balancing, garbage collection, column family creations,It can be a bottleneck for these operations.

Discussion