Google - Bigtable

Bigtable : A Distributed Storage System for Struc-tured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew

Fikes,Robert E. Gruber

Google, Inc.

IndexIntroductionData ModelAPIBuilding BlocksImplementationRefinementsReal ApplicationsConclusions

Introduction1. Motivation2. What is a Bigtable?3. Why not a DBMS?

Introduction : MotivationLot of structured data at Google

◦Web page, Geographic Info. , User data, Mail

Millions of machinesDifferent projects/applications

Introduction : Why not a DBMS?Provide more than Google needsRequired DB with wide scalability,

wide applicability, high perfor-mance and high availability

Low-level storage optimizations help performance significantly

Cost would be very high◦Most DBMSs require very expensive

infrastructure

Introduction : What is a Bigtable?Bigtable is a distributed storage system

for managing structured dataAchieved several goals

◦wide applicability, scalability, high perfor-mance

Scalable◦ Terabytes of in-memory data◦ Petabyte of disk-based data◦ Millions of reads/writes per second, efficient scans

Self-managing◦Servers can be added/removed dynamically◦Servers adjust to load imbalance

Data Model1. Row2. Column families3. Timestamps

Data Model : RowThe row keys in a table are arbi-

trary stringsData is maintained lexicographic

older by row keyRow range is called a “tablet”,

which is the unit of distribution and load balancing

Sorted by row key in tablet

Data Model : Column Fam-iliesColumn keys are grouped into

sets called “column families”Basic unit of access controlA column key is named using the

this syntax “ family:qualifier”Access control and disk/memory ac-

counting are performed at the col-umns-family level

Data Model : TimestampsEach cell in a Bigtable can con-

tain multiple versions of the same data

sorted by timestamp order by descending

64-bit integersreal time in microseconds or as-

signed by client application

Data Model : Example

Columns Columns family

Timestamps

APIThe Bigtable API provieds functions

◦Create/delete table and column families◦Change table, column family metadata◦Look up values from individual rows◦Iterate over a subset of the data

Supports single-row trancsactionsCan be used with

MapReduce(HBase)

API : ExampleUses a Scanner to iterate over all

anchors in particular rowTable *T = OpenOrDie(“/bigtable/web/webtable”);

Building BlocksUses the distributed Google File

System(GFS) to store log and data files

A Bigtable cluster typically oper-ates in a shared pool of machines

Depend on cluster management system

The Google SSTable file format is used internally to store Bigtable data

Relies on a highly-available and persistent distributed lock service called Chubby

Building Blocks : GFS & SSTable & ChubbyGoogle File System:

◦Google File System grew out of an earlier Google effort, "BigFiles”

◦Select for high data throughputs

Building Blocks : GFS & SSTable & ChubbySSTable:

◦provides a persistent, ordered map from keys to values

◦Contains a sequence of index block

Building Blocks : GFS & SSTable & ChubbyChubby:

◦ensure that there is at most one ac-tive master at any time

◦store the bootstrap location of Bigtable data

◦discover tablet servers and finalize tablet server deaths

◦store Bigtable schema information (the column family information for each table)

Implementation1. Tablet Location2. Tablet Assignment3. Tablet Serving

ImplementationThree major components

◦Library that is linked every client◦One master server◦Many tablet servers

Implementation : Tablet LocationUse three-level hierarchy analogous to

that of a B+tree to store tablet loca-tion information(Maximum three level)

The first level is a file stored in Chubby that contains the location of the root tablet

Implementation : Tablet LocationRoot tablet

◦First tablet in the METADATA table◦Never split to ensure that the tablet

location hierarchy has no more than three levels

METADATA tablet◦Stores the location of a tablet under

a row key that is an encoding of the tablet’s table identifier and its end row

Implementation : Tablet Assign-ment

Master server◦assign tablets to tablet servers◦detect presence of absence(expiration) of

tablet servers◦balance tablet-server load◦handle schema changes such as table and

column family creationsTablet server

◦manage a set of tablets(ten to a thousand tablets per tablet server)

◦handle read/write requests to the tablets◦split tablets that have grown too large

Implementation : Tablet ServingUpdates are committed to a

commit log that stores redo records.

Recently committed ones are store in memtable

Older updates are stored in a se-quence of SSTables

Refinements1. Locality groups2. Compression3. Caching for read performance4. Bloom filters5. Commit-log implementation

RefinementsLocality groups

◦Client can group multiple column fami-lies together into a locality group

Compression◦We benefit in that small portions of an

SSTable can be read without decom-pressing the entire file

◦Encode at 100-200MB/s◦Decode at 400-1000MB/s◦10-to-1 reduction in space

RefinementsCaching for read performance

◦Tablet servers use two levels of caching Scan/Block Cache

Bloom filters◦Should be created for SSTable in a

particular locality groupCommit-log implementation

◦Co-mingling mutations for different tablets in the same physical log file

Real Applications1. Google Analytics2. Personalized Search

Real ApplicationsGoogle Analytics

◦Use two of the tables The raw click table(~200TB) The summary table(~20TB)

◦Use a MapReducePersonalized Search

◦History of users◦Use a MapReduce

ConclusionsBigtable clusters have been in

production use since April 2005 at Google

Provide Performance and high availability

Found that there are significant ad-vantages to building storage solution at Google

Apache Hbase based on Bigtable

Thank you!

Google - Bigtable

Engineering

GOOGLE BIGTABLE - Universitetet i osloData Model A cluster is a set of machines with Bigtable processes Each Bigtable cluster serves a set of tables A table is a sparse, distributed,

BigTable: A System for Distributed Structured Storagepages.cs.wisc.edu/~remzi/Classes/739/Fall2017/Papers/bigtable-slides-05.pdf · Bigtable master Bigtable tablet server Bigtable

Online Bigtable merge compaction - UCRneal/Slides/bigtable_merge_compaction.pdf · 2015-09-22 · BIGTABLE — data storage at Google Maps, Search/Crawl, Gmail ...use BIGTABLE to

Bigtable A Distributed Storage System for …cis.csuohio.edu/~sschung/cis612/Lecture_Notes_BigTable...Google Bigtable A distributed storage system for managing structured data that

The Google Bigtable

MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Технологии хранения и обработки больших объёмов данных, весна 2016: NoSQL СУБД. Google Bigtable

Google File System, HDFS, BigTable, Hbase

Staggeringly Large File Systems - Cornell University...Google File System The Authors • Sanjay Ghemawat • Google Fellow, worked on GFS, MapReduce, BigTable, ... • PhD from MIT

The Google Storage Stack (Chubby, GFS, BigTable)€¦ · •Each of these systems has been quite inﬂuential • Lots of open-source clones: GFS -> HDFS BigTable -> HBase, Cassandra,

GOOGLE BIGTABLE - Universitetet i oslo · Bigtable Similar to a database, but not a full relational data model Data is indexed using row and column names Treats data as uninterpreted

BigTable and Google File System Presented by: Ayesha Fawad 10/07/2014 1

BigTABLE - boun.edu.tr

Google Cloud Bigtable Documentation · 2019-04-02 · Google Cloud Bigtable Documentation, Release 0.0.1 To use the API, the Clientclass deﬁnes a high-level interface which handles

Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

CouchDB - ACCU · PDF fileCouchDB Can we be comfortable without SQL? ... • Google BigTable, HBase/Hadoop, Cassandra, ... IBM, Apple, ebay

Bigtable A Distributed Storage System for Structured Data ...eecs.csuohio.edu/~sschung/...BigTable_Updated.pdf · Google Bigtable A distributed storage system for managing structured

Scaling for Big Data at Google - Jen's Homepage · Scaling for Big Data at Google. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes. Agenda Research Bigtable BigQuery

Lecture: The Google Bigtable$brents/cs494-cdcs/slides/thegooglebigtable.pdf · Agenda • Introduc3on • Datamodel! • API • Buildingblocks • Implementaon! • Reﬁnements!

bigtable - courses.cs.washington.edu