23
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG

PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG

Embed Size (px)

Citation preview

PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM

FENGLI ZHANG

CONTENT

• REQUIREMENTS OF WEB APPLICATIONS

• PNUTS ARCHITECTURE

• EXPERIMENT

• CONCLUSION

REQUIREMENTS OF WEB APPLICATIONS

• SCALABILITY ARCHITECTURAL SCALABILITY, SCALE LINEARLY

• GEOGRAPHIC SCOPE DATA REPLICAS ON MULTIPLE CONTINENTS

• HIGH AVAILABILITY

FAILURES, APPS WILL STILL BE ABLE TO READ

• RELAXED CONSISTENCY GUARANTEES

TOLERATE STALE OR REORDERED DATA

WHAT IS PNUTS?

• PNUTS: A MASSIVELY PARALLEL AND GEOGRAPHICALLY DISTRIBUTED DATABASE

SYSTEM FOR YAHOO!’S WEB APPLICATIONS.

• PNUTS PROVIDES:

DATA STORAGE ORGANIZED AS HASHED OR ORDERED TABLES

LOW LATENCY FOR LARGE NUMBERS OF CONCURRENT REQUESTS INCLUDING UPDATES AND QUERIES

NOVEL PER-RECORD CONSISTENCY GUARANTEES

DETAILED ARCHITECTURE

TABLET SPLITTING

DATA STORAGE AND RETRIEVAL

• DATA STORAGE ORGANIZED AS HASHED OR ORDERED TABLES

• IN ORDER TO DETERMINE WHICH STORAGE UNIT IS RESPONSIBLE FOR A GIVEN RECORD TO BE READ OR WRITTEN BY THE CLIENT, WE MUST FIRST DETERMINE WHICH TABLET CONTAINS THE RECORD, AND THEN DETERMINE WHICH STORAGE UNIT HAS THAT TABLET.

• BOTH OF THESE FUNCTIONS ARE CARRIED OUT BY THE ROUTER.

DATA STORAGE AND RETRIEVAL

• ROUTERS CONTAIN ONLY A CACHED COPY OF THE INTERVAL MAPPING

• THE MAPPING IS OWNED BY THE TABLET CONTROLLER

• THE TABLET CONTROLLER DETERMINES WHEN TO MOVE A TABLET BETWEEN STORAGE UNITS AND WHEN A LARGE TABLET MUST BE SPLIT

• ROUTERS PERIODICALLY POLL THE TABLET CONTROLLER TO GET ANY CHANGES TO THE MAPPING

QUERY PROCESSING

• ACCESSING DATA

MULTI-RECORD REQUEST

UPDATES

ASYNCHRONOUS REPLICATION AND CONSISTENCY

• EXAMPLE OF EVENTUAL CONSISTENCY

• A USER WISHES TO DO A SEQUENCE OF 2 UPDATES TO HIS RECORD:

U1: REMOVE HIS MOTHER FROM THE LIST OF PEOPLE WHO CAN VIEW HIS PHOTOS

U2: POST SPRING-BREAK PHOTOS

A USER IS ABLE TO READ A STATE OF THE RECORD THAT NEVER SHOULD HAVE EXISTED: THE PHOTOS HAVE BEEN POSTED BUT THE CHANGE IN ACCESS CONTROL HAS NOT TAKEN PLACE.

RECORD TIMELINE CONSISTENCY

• RECORD-LEVEL MASTERING:

ONE OF THE REPLICAS IS DESIGNATED AS THE MASTER, INDEPENDENTLY FOR EACH RECORD,

AND ALL UPDATES TO THAT RECORD ARE FORWARDED TO THE MASTER.

THE REPLICA RECEIVING THE MAJORITY OF WRITE REQUESTS FOR A PARTICULAR RECORD BECOMES THE MASTER FOR THAT RECORD

• PER-RECORD TIMELINE CONSISTENCY

ALL REPLICAS OF A GIVEN RECORD APPLY ALL UPDATES TO THE RECORD IN THE SAME ORDER.

THE RECORD CARRIES A SEQUENCE NUMBER THAT IS INCREMENTED ON EVERY WRITE

PER-RECORD TIMELINE CONSISTENCY

• WE (CURRENTLY) KEEP ONLY ONE VERSION OF A RECORD AT EACH REPLICA.

RECORD TIMELINE CONSISTENCY

TRANSACTIONS:• ALICE CHANGES STATUS FROM “SLEEPING” TO “AWAKE”

• ALICE CHANGES LOCATION FROM “HOME” TO “WORK

• TIMELINE CONSISTENCY COMES AT A PRICE

• WRITES NOT ORIGINATING IN RECORD MASTER REGION FORWARD TO MASTER AND HAVE LONGER LATENCY

• WHEN MASTER REGION DOWN, RECORD IS UNAVAILABLE FOR WRITE

EXPERIMENTAL SETUP

• THREE PNUTS REGIONS• 2 WEST COAST, 1 EAST COAST• 5 STORAGE UNITS, 2 MESSAGE BROKERS, 1 ROUTER• WEST: DUAL 2.8 GHZ XEON, 4GB RAM, 6 DISK RAID 5 ARRAY• EAST: QUAD 2.13 GHZ XEON, 4GB RAM, 1 SATA DISK

• WORKLOAD• 1200-3600 REQUESTS/SECOND• 0-50% WRITES• 80% LOCALITY

INSERT

• INSERTS

• REQUIRED 75.6 MS PER INSERT IN WEST 1 (TABLET MASTER)

• 131.5 MS PER INSERT INTO THE NON-MASTER WEST 2, AND

• 315.5 MS PER INSERT INTO THE NON-MASTER EAST.

SCALABILITY

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6

Storage units

Ave

rag

e la

ten

cy (

ms)

Hash table Ordered table

CONCLUSION AND ONGOING WORK

• PNUTS IS AN INTERESTING RESEARCH PRODUCT• RESEARCH: CONSISTENCY, PERFORMANCE, FAULT TOLERANCE, RICH FUNCTIONALITY

• PRODUCT: MAKE IT WORK, KEEP IT (RELATIVELY) SIMPLE, LEARN FROM EXPERIENCE AND REAL APPLICATIONS

• ONGOING WORK• INDEXES AND MATERIALIZED VIEWS

• BUNDLED UPDATES

• BATCH QUERY PROCESSING

THANK YOU !

QUESTIONS?