18
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University of Science and Technology, [email protected]

Google Bigtable A Distributed Storage System for Structured Data

  • Upload
    astro

  • View
    63

  • Download
    0

Embed Size (px)

DESCRIPTION

Google Bigtable A Distributed Storage System for Structured Data. Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University of Science and Technology, [email protected]. Introduction. BigTable is a distributed storage system for managing structured data. - PowerPoint PPT Presentation

Citation preview

Page 1: Google  Bigtable A Distributed Storage System for Structured Data

Google BigtableA Distributed Storage System for Structured Data

Hadi Salimi,Distributed Systems Laboratory,

School of Computer Engineering,Iran University of Science and Technology,

[email protected]

Page 2: Google  Bigtable A Distributed Storage System for Structured Data

Introduction

• BigTable is a distributed storage system for managing structured data.

• Scales to Petabytes of data and thousands of machines.

• Developed and in use at Google since 2005. Used for more than 60 Google products.

2

Page 3: Google  Bigtable A Distributed Storage System for Structured Data

Data Model• (row, column, time) => string• Row, column, value are arbitrary strings.• Every read or write of data under a single row key is atomic (regardless of

the number of different columns being read or written in the row).• Columns are dynamically added.• Timestamps for different versions of data.

– Assigned by client application.– Older versions are garbage-collected.

• Example: Web map

3

Page 4: Google  Bigtable A Distributed Storage System for Structured Data

Tablets

• Rows are sorted lexicographically.• Consecutive keys are grouped together as

“tablets”.– Allows data locality. – Example rows: com.google.maps/index.html and

com.google.maps/foo.html are likely to be in same tablet.

4

Page 5: Google  Bigtable A Distributed Storage System for Structured Data

Column Families

• Column keys are grouped into sets called “column families”.

• Column key is named using syntax: family:qualifier

• Access control and disk/memory accounting are at column family level

• Example: “anchor:cnnsi.com”

5

Page 6: Google  Bigtable A Distributed Storage System for Structured Data

API• Data Design

– Creating/deleting tables and column families– Changing cluster, table and column family metadata like access

control rights• Client Interactions

– Write/Delete values– Read values– Scan row ranges– Single-row transactions (e.g., read/modify/write sequence for data

under a row key)• Map/Reduce integration.

– Read from Big Table; Write to Big Table.

6

Page 7: Google  Bigtable A Distributed Storage System for Structured Data

Building Blocks• SSTable file: Data structure for storage

– Maps keys to values– Ordered. Enables data locality for efficient writes/reads.– Immutable. On reads, no concurrency control needed. Need to

garbage collect deleted data.– Stored in Google File System (GFS), and optionally can be mapped

into memory.• Replicates data for redundancy.

• Chubby: Distributed lock service.– Store the root tablet, schema info, access control list– Synchronize and detect tablet servers

7

Page 8: Google  Bigtable A Distributed Storage System for Structured Data

Implementation

3 components:1. Client library2. Master Server (exactly 1).

• Assigns tablets to tablet servers.• Detecting the addition and expiration of tablet servers.• Balancing tablet-server load• Garbage collection of GFS files• Schema changes such as table and column family creations.

3. Tablet Servers (multiple, dynamically added/removed)• Handles read and write requests to the tablets that it has loaded• Splits tablets that have grown too large. Each tablet 100-200 MB.

8

Page 9: Google  Bigtable A Distributed Storage System for Structured Data

Tablet Location

• How to know which node to route client request?

• 3-level hierarchy– One file in Chubby for location of Root Tablet– Root tablet contains location of Metadata tablets – Metadata table contains location of user tablets• Row: [Tablet’s Table ID] + [End Row]• Key: [Node ID]

• Client library caches tablet locations.9

Page 10: Google  Bigtable A Distributed Storage System for Structured Data

Tablet Assignment

• Master keeps track of tablet assignment and live servers

• Chubby– Tablet server creates & locks a unique file.– Tablet server stops serving if loses lock.– Master periodically checks tablet servers. If fails,

master tries to lock the file and un-assigns the tablet.– Master failure does not change tablets assignments.

• Master restart

10

Page 11: Google  Bigtable A Distributed Storage System for Structured Data

Tablet Serving

Write1. Check well-formedness of request.2. Check authorization in Chubby file.3. Write to “tablet log” (i.e., a transaction log

for “redo” in case of failure).4. Write to memtable (RAM).5. Separately, “compaction” moves memtable

data to SSTable. And truncates tablet log.11

Read

1. Check well-formedness of request.

2. Check authorization in Chubby file.

3. Merge memtable and SSTables to find data.

4. Return data.

Page 12: Google  Bigtable A Distributed Storage System for Structured Data

Compaction

In order to control size of memtable, tablet log, and SSTable files, “compaction” is used.

1.Minor Compaction. Move data from memtable to SSTable. Truncate tablet log.

2.Merging Compaction. Merge multiple SSTables and memtable to a single SSTable.

3.Major Compaction. Remove deleted data.

12

Page 13: Google  Bigtable A Distributed Storage System for Structured Data

Refinements• Locality group.

– Client can group multiple column families into a locality group. Enables more efficient reads since each locality group is a separate SSTable.

• Compression. – Client can choose to compress at locality group level.

• Two level caching in servers– Scan cache ( K/V pairs)– Block cache (SSTable blocks read from GFS)

• Bloom filter– Efficient check if a SSTable contain data for a row/column pair.

• Commit log implementation– Each tablet server has a single commit log (not one-per-tablet).

13

Page 14: Google  Bigtable A Distributed Storage System for Structured Data

Performance Evaluation

• Random reads are slowest. Need to access SSTable block from disk.

• Writes are faster than reads. Commit log is append-only. Reads require merging of SSTables and memtable.

• Scans reduce number of read operations.14

Page 15: Google  Bigtable A Distributed Storage System for Structured Data

Performance Evaluation: Scaling

• Not linear, but not bad up to 250 tablet servers.• Random read has worst scaling. Block transfers saturate

network.15

Page 16: Google  Bigtable A Distributed Storage System for Structured Data

Conclusions

• Satisfies goals of high-availability, high-performance, massively scalable data storage.

• API. Successfully used by various Google products (>60).• Additional features in progress:

– Secondary indexes– Cross data center replication.– Deploy as a hosted service.

• Advantages of the custom development:– Significant flexibility due to own data model.– Can remove bottlenecks and inefficiencies as they arise.

16

Page 17: Google  Bigtable A Distributed Storage System for Structured Data

Big Table Family Tree

17

Non-relational DBs (HBase, Cassandra, MongoDB, etc.)• Column-oriented data model.• Multi-level storage (commit log, RAM table, SSTable)• Tablet management (assignment, splitting, recovery, GC,

Bloom filters)

Google related technologies and open-source equivalents• GFS => Hadoop Distributed File System (HDFS)• Chubby => Zookeeper• Map/Reduce => Apache Map/Reduce

Page 18: Google  Bigtable A Distributed Storage System for Structured Data

Any Question ?

18