36
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed Storage System for Structured Data Presented by: Arif Bin Hossain Dept. of Computer Science UTSA

Bigtable : A Distributed Storage System for Structured Data

  • Upload
    apollo

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Bigtable : A Distributed Storage System for Structured Data. Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber. Presented by: Arif Bin Hossain Dept. of Computer Science UTSA. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Bigtable : A Distributed Storage System for Structured Data

AuthorsFay ChangJeffrey DeanSanjay GhemawatWilson HsiehDeborah WallachMike BurrowsTushar ChandraAndrew FikesRobert Gruber

Bigtable: A Distributed Storage System for Structured Data

Presented by:Arif Bin HossainDept. of Computer ScienceUTSA

Page 2: Bigtable : A Distributed Storage System for Structured Data

Motivation

Large scale structured data URLs: Contents, links, anchors, page rank User data: Pref. settings, recent queries, search

results Geographic locations: Physical entities, roads,

satellite image

Large set of structured MATLAB data EEG, EMG, Eye motion Field are not uniform among datasets Data types are not uniform among datasets

Page 3: Bigtable : A Distributed Storage System for Structured Data

Why not Relational Database?

Scale is too large for most commercial databases

Even if it weren’t, cost would be very highLow-level storage optimizations help

performance significantlyHard to map semi-structured data to

relational databaseNon-uniform fields makes it difficult to

insert/query data

Page 4: Bigtable : A Distributed Storage System for Structured Data

Bigtable

BigTable is a distributed storage system for managing structured data.

Designed to scale to a very large sizeUsed for many Google projects

Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance

Efficient scans over all or interesting subsets of data

Efficient joins of large one-to-one and one-to-many datasets

Page 5: Bigtable : A Distributed Storage System for Structured Data

Bigtable

Used for variety of demanding workloads Throughput oriented batch processing Latency sensitive data serving

Data is indexed using row and column namesTreats data as uninterpreted stringsClients can control the localityDynamic controls to serve data out of

memory or from disk

Page 6: Bigtable : A Distributed Storage System for Structured Data

Building Blocks

Google File System (GFS) Large scale distributed file system Maintains multiple replicas Consists for Master and Chunk server

Chunk Server Stores the data files Each data file broken into fixed size chunks Each chunk is replicated at least three times

Master Stores the metadata associated with the chunks

Page 7: Bigtable : A Distributed Storage System for Structured Data

Building Blocks

Chubby lock service Have five active replicas Provides namespace that consists of directories and

files Each file can be used as a lock Each Chubby client maintains a session with Chubby

service When the session expires, it loses any locks and open

handles

Page 8: Bigtable : A Distributed Storage System for Structured Data

Building Block

SSTable Immutable file format used internally to store data

files Sorted Key-Value pairs of arbitrary byte strings Contains a sequence of blocks Block index is used to locate blocks Index is loaded into memory when the SSTable is

opened Lookup can be performed in single disk access

Index

64K block

64K block

64K block

SSTable

Page 9: Bigtable : A Distributed Storage System for Structured Data

Basic Data Model

A table is a sparse, distributed, persistent multidimensional sorted map

Data is organized into three dimensions (row: string, column: string, time: int64) string

Each cell is referenced by a row key, column key and timestamp

Page 10: Bigtable : A Distributed Storage System for Structured Data

Basic Data Model

(row, column, timestamp) cell contents

Example: webtable

Page 11: Bigtable : A Distributed Storage System for Structured Data

Data Model: Row

Name is an arbitrary string. Access to data in a row is a atomic. Row creation is implicit upon storing data. Transactions with in a row

Rows ordered lexicographically by row key Rows close together lexicographically usually on

one or a small number of machines.Rows are grouped together to form the unit of load

balancing

Page 12: Bigtable : A Distributed Storage System for Structured Data

Data Model: Column

Columns has two-level name structure: Family:qualifier

Example: “anchor: cnnsi.com”Column keys are grouped into sets called Column Family

Unit of access control

All data stored in a column family is usually of same type Additional level of indexing, if desired

Main idea: Limited families, Unbounded columns

Page 13: Bigtable : A Distributed Storage System for Structured Data

Data Model: Timestamp

Used to store different versions of data in a cell New writes default to current time Can also be set explicitly by clients

Look up examples “Return most recent K values” “Return all values in timestamp range(on all values)”

Can be used to mark column family “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”

Page 14: Bigtable : A Distributed Storage System for Structured Data

Tablets

Rows with consecutive key are grouped into tablets Unit of load balancing

Reads of short row ranges are efficient and require communication with a small number of machines

Clients can use this property to get good locality by selecting row keys efficiently

Page 15: Bigtable : A Distributed Storage System for Structured Data

Tablets (cont.)

Contains some range of rows, essentially a set of SSTables

Index

64K block

64K block

64K block

SSTable

Index

64K block

64K block

64K block

SSTable

Tablet

Page 16: Bigtable : A Distributed Storage System for Structured Data

Implementation

Three major components Library linked into every client Single master server

Assigning tablets to tablet servers Detecting addition and expiration of tablet servers Balancing tablet-server load Garbage collection files in GFS

Many tablet servers Manages a set of tablets Tablet servers handle read and write requests to its table Splits tablets that have grown too large

Page 17: Bigtable : A Distributed Storage System for Structured Data

Implementation (cont.)

Clients communicates directly with tablet servers for read/write

Each table consists of a set of tablets Initially, each table have just one tablet Tablets are automatically split as the table grows

Row size can be arbitrary (hundreds of GB)

Page 18: Bigtable : A Distributed Storage System for Structured Data

Locating Tablets

How do clients find a right machine ? Need to find tablet whose row range covers the

target row

Three level hierarchy Level 1: Chubby file containing location of the root

tablet Level 2: Root tablet contains the location of

METADATA tablets Level 3: Each METADATA tablet contains the

location of user tablets Location of tablet is stored under a row key that

encodes table identifier and its end row

Page 19: Bigtable : A Distributed Storage System for Structured Data

Locating Tablets

Page 20: Bigtable : A Distributed Storage System for Structured Data

Assigning Tablets

Each tablet is assigned to one tablet server at a time.

Master server keeps track of Set of live tablet servers Current assignments of tablets to servers. Unassigned tablets.

When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient space.

Page 21: Bigtable : A Distributed Storage System for Structured Data

Assigning Tablets

Tablet server startup It creates and acquires an exclusive lock on uniquely named

file on Chubby Master monitors this directory to discover tablet servers.

Tablet server stops serving tablets If it loses its exclusive lock. Tries to reacquire the lock on its file as long as the file still

exists. If file no longer exists, the tablet server will never be able to

serve again

Page 22: Bigtable : A Distributed Storage System for Structured Data

Assigning Tablets

Master server startup Grabs unique master lock in Chubby. Scans the tablet server directory in Chubby. Communicates with every live tablet server Scans METADATA table to learn set of tablets.

Master is responsible for finding when tablet server is no longer serving its tablets and reassigning those tablets as soon as possible. Periodically asks each tablet server for the status of its lock If no reply, master tries to acquire the lock itself If successful to acquire lock, then tablet server is either dead or

having network trouble

Page 23: Bigtable : A Distributed Storage System for Structured Data

Tablet Serving

Updates are committed to a commit log that stores the redo records Recently committed updates are stored in memory in a sorted buffer

called memtable Memtable maintains the updates on a row-by-row basis Older updates are stored in a sequence of immutable SSTables. To recover a tablet

Tablet server reads data from METADATA table. Metadata contains list of SSTables and set of redo points Server reads the indices of the SSTables in memory Reconstructs the memtable by applying all of the updates since

redo points.

Page 24: Bigtable : A Distributed Storage System for Structured Data

Tablet Serving

Write operation Server checks if it is well-formed Checks if the sender is authorized Write to commit log After commit, contents are inserted into Memtable

Read operation Similar check for well-formedness and authorization Executed on a merged view of the sequence of

SSTables and memtable

Page 25: Bigtable : A Distributed Storage System for Structured Data

Compaction: Minor

As write operations execute, size of memtable increases

When memtable reaches threshold Frozen memtable is converted to an SSTable SSTable written to file system

Goals Reduce memory usage of the tablet server Reduce the amount of data to read from commit log

during recovery

Page 26: Bigtable : A Distributed Storage System for Structured Data

Compaction

Problem: too many SSTable Read operations might need to merge from a number

of SSTablesMerging compaction

Reads the contents of a few SSTable and memtable Writes new SSTable

Merging compaction that re-writes all SSTables into exactly one SSTable is a major compaction

Page 27: Bigtable : A Distributed Storage System for Structured Data

Locality Groups

Each column families is assigned to a locality group defined by client

Seperate SSTable is created for each locality group during compaction

Increases read efficiency as columns that are grouped together are usually accessed together

Used to organize underlying storage representation for performance Scans over one locality group are

O(bytes_in_locality_group), not O(bytes_in_table) Data in locality group can be explicitly memory mapped

Page 28: Bigtable : A Distributed Storage System for Structured Data

Refinements

Compression Clients can control SSTable compression for a locality

groupCaching

Scan Cache: a high-level cache that caches key-value pairs returned by the SSTable interface

Block Cache: a lower-level cache that caches SSTable blocks read from file system

Bloom Filters Allows to ask whether an SSTable might contain any

data for a given row/column pair Reduces disk access while reading SSTables

Page 29: Bigtable : A Distributed Storage System for Structured Data

Example: Cassandra

Initially developed by Facebook for inbox search

Built on BigTable data modelProvides a structured key-value storeKeys map to multiple values, which are

grouped into column familiesUsed by

Page 30: Bigtable : A Distributed Storage System for Structured Data

Cassandra

A table in cassandra is distributed multidimensional map indexed by a key

The row key in a table is a string with no size restrictions

Usually a four dimensional map Keyspace -> Column Family Column Family -> Column Family Row Column Family Row -> Columns Column -> Data value

Page 31: Bigtable : A Distributed Storage System for Structured Data

Cassandra: Column

Column{name: "emailAddress", value: "[email protected]", timestamp: 123456789 }

Page 32: Bigtable : A Distributed Storage System for Structured Data

Cassandra: SuperColumn

SuperColumn{name: "homeAddress", value: {

street: {name: "street", value: "1234 x street", timestamp: 123456789},

city: {name: "city", value: "san francisco", timestamp: 123456789},

zip: {name: "zip", value: "94107", timestamp: 123456789}, }

}

Page 33: Bigtable : A Distributed Storage System for Structured Data

Cassandra: ColumnFamily

Column Family

UserProfile = {ahossain: {

username: " ahossain", email: “[email protected]", phone: "(210) 123-4567"

}, jdoe: {

username: “jdoe", email: “[email protected]", phone: "(210) 765-4321" age: "66", gender: “male"

}, }

Page 34: Bigtable : A Distributed Storage System for Structured Data

Example: Pelops (Write)

String pool = "pool"; String keyspace = "mykeyspace"; String colFamily = "users"; String rowKey = "abc123";Cluster cluster = new Cluster("localhost", 9160);

Pelops.addPool(pool, cluster, keyspace); Mutator mutator = Pelops.createMutator(pool); mutator.writeColumns(

colFamily, rowKey, mutator.newColumnList( mutator.newColumn("name", "Dan"), mutator.newColumn("age", Bytes.fromInt(33)) )

); mutator.execute(ConsistencyLevel.ONE);

Page 35: Bigtable : A Distributed Storage System for Structured Data

Example: Pelops (Read)

Selector selector = Pelops.createSelector(pool);

List<Column> columns = selector.getColumnsFromRow(colFamily, rowKey, false, ConsistencyLevel.ONE);

System.out.println("Name: " + Selector.getColumnStringValue(columns, "name"));

System.out.println("Age: " + Selector.getColumnValue(columns, "age").toInt());

Page 36: Bigtable : A Distributed Storage System for Structured Data

Thank you

Questions?