CSC590 Selected Topics

CSC590 Selected Topics

Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach

Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

by Haifa Alyahya432920323

• Introduction• Data Model• APIs• Building Blocks• Implementation• Refinements• Performance• Real Applications• Conclusion

Outline

Discussion

• Bigtable(Bt) is a distributed storage system for managing structured data that is designed to scale to a very large size.

• Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.

• Bigtable is designed to reliably scale to petabytes of data and thousands of machines.

• Bigtable has achieved several goals:

– Wide applicability.– Scalability.– High performance.– High availability.

Introduction

• Scale Problem

– Lots of data

– Millions of machines

– Different project/applications

– Hundreds of millions of users

• Storage for (semi-)structured data.

• No commercial system big enough

– Couldn’t afford if there was one

• Low-level storage optimization help performance significantly Much harder to do when running on top of a database layer

Motivation

Data Model

• A sparse, distributed persistent multi-dimensional sorted map

(row, column, timestamp) -> cell contents

Data Model

• Rows– Arbitrary string– Access to data in a row is atomic– Ordered lexicographically

Data Model

• Column

– Tow-level name structure:

• family: qualifier

– Column Family is the unit of access control

Data Model

• Timestamps

– Store different versions of data in a cell– Lookup options

• Return most recent K values• Return all values

Data Model

• The row range for a table is dynamically partitioned• Each row range is called a tablet• Tablet is the unit for distribution and load balancing

APIs

• Metadata operations– Create/delete tables, column families, change metadata

• Writes– Set(): write cells in a row– DeleteCells(): delete cells in a row– DeleteRow(): delete all cells in a row

• Reads– Scanner: read arbitrary cells in a bigtable

• Each row read is atomic• Can restrict returned rows to a particular range• Can ask for just data from 1 row, all rows, etc.• Can ask for all columns, just certain column families, or specific

columns

APIs

Building Blocks

• Google File System (GFS)– stores persistent data (SSTable file format)

• Scheduler– schedules jobs onto machines

• Chubby– Lock service: distributed lock manager– master election, location bootstrapping

• MapReduce (optional)– Data processing– Read/write Bigtable data

Chubby

• {lock/file/name} service• Coarse-grained locks• Each clients has a session with Chubby.

– The session expires if it is unable to renew its session lease within the lease expiration time.

• 5 replicas, need a majority vote to be active• Also an OSDI ’06 Paper

Implementation

• The Bigtable implementation has three major components:– A library that is linked into every client– One master server– Many tablet servers

Tablet Location Management

Refinements

• Locality groups:– Clients can group multiple column families together

into a locality group.

• Compression:– Uses Bentley and McIlroy's scheme and fast

compression algorithm.

• Caching for read performance:– Uses Scan Cache and Block Cache.

• Bloom filters:– Reduce the number of accesses.

Performance Evaluation

Real Applications

• Google Analytics– http://analytics.google.com

• Google Earth– http://earth.google.com

• Personalized search – www.google.com/psearch

Conclusions

• Users like… – the performance and high availability provided by the

Bigtable implementation

– that they can scale the capacity of their clusters by simply adding more machines to the system as their resource demands change over time

– There are significant advantages to building a custom storage solution

• Challenges…– User adoption and acceptance of a new interface

– Implementation issues

Documents

CSC590 Selected Topics