13
Bigtable: A Distributed Storage System for Structured Data student prof. Nenu Anda-Roxana Ciprian Dobre [email protected] [email protected]

Bigtable: A Distributed Storage System for Structured Data

Embed Size (px)

DESCRIPTION

A Distributed Storage System for Structured Data

Citation preview

Bigtable: A Distributed Storage System for Structured Data

student prof.

Nenu Anda-Roxana Ciprian Dobre

[email protected] [email protected]

Problem … data (lots of it)● data size

● latency requirements

Solution … Bigtable● what is it?

● works for Google

● data model

Data Model● sparse, distributed, persistent multidimensional sorted map

● (row:string, column:string, time:int64) string

Example

Rows● lexicographic order by row key

● row range - unit of distribution

● reads of short ranges are efficient and typically require communication with only

a small number of machines

● selecting row keys for good locality for data accesses

Column Families● column keys grouped into sets called column families

● family:qualifier

● ex.

○ language - language ID

○ anchor - anchor:referring_site

Timestamps● multiple versions of the same data

● decreasing timestamp order

● garbage-collect : two per-column-family settings : last n or new-enough versions

● ex.

Building BlocksBigtable is build on other pieces of Google infrastructure:

● GFS to store log and data files

● Google SSTable file format to store Bigtable data

● Chubby : distributed lock service

Implementation● a library linked into every client

● master server

● many tablet servers

Master server● assigning tablets to tablet servers

● add/remove tablet servers

● garbage collection

● schema changes - table and column family creations

Table server● manages a set of tables

● read and write requests

● splits tablets that have grow too large

Client data● does not move through the master

● clients communicate directly with tablet servers for read and writes

Table Assignment (I)● Each tablet is assigned to one tablet server at a time.

● When a tablet is unassigned, and a tablet server with sufcient room for the tablet

is available, the master assigns the tablet by sending a tablet load request to the

tablet server.

● The master is responsible for detecting when a tablet server is no longer serving

its tablets, and for reassigning those tablets as soon as possible

● Uses Chubby to keep track of table servers

Table Assignment (II)● When a tablet server starts, it creates, and acquires an exclusive lock on, a

uniquely-named file in a specific Chubby directory.

● The master monitors this directory to discover tablet servers.

● A tablet server can lose its lock eg. due to a network partition that caused the

server to lose its Chubby session.

● A tablet server will attempt to reacquire an exclusive lock on its file as long as the

file still exists. If the file no longer exists, then the tablet server will never be able

to serve again, so it kills itself.

● If tablet server terminates, it attempts to release its lock so that the master will

reassign its tablets more quickly.

Table Assignment (III)● To detect when a tablet server is no longer serving its tablets, the master

periodically asks each tablet server for the status of its lock.

● If problem (?), then the master attempts to acquire an exclusive lock on the

server's file.

● If the master is able to acquire the lock, then Chubby is live and the tablet server

is either dead or having trouble reaching Chubby, so the master ensures that the

tablet server can never serve again by deleting its server file.

● Move all the tablets that were previously assigned to that server into the set of

unassigned tablets.

WhyBigtable? Other projects : Oracle's Real Application Cluster, IBM's DB2 Parallel Edition

Difference : relational database

● The key-value pair model provided by distributed B-trees or distributed hash

tables is too limiting.

● Key-value pairs are a useful building block, but they should not be the only

building block one provides to developers.

● Bigtable : it is still simple enough that it lends itself to a very efficient at-file

representation, and it is transparent enough (via locality groups) to allow users to

tune important behaviors of the system.