Upload
prudence-phoebe-dawson
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
REQUIREMENTS OF WEB APPLICATIONS
• SCALABILITY ARCHITECTURAL SCALABILITY, SCALE LINEARLY
• GEOGRAPHIC SCOPE DATA REPLICAS ON MULTIPLE CONTINENTS
• HIGH AVAILABILITY
FAILURES, APPS WILL STILL BE ABLE TO READ
• RELAXED CONSISTENCY GUARANTEES
TOLERATE STALE OR REORDERED DATA
WHAT IS PNUTS?
• PNUTS: A MASSIVELY PARALLEL AND GEOGRAPHICALLY DISTRIBUTED DATABASE
SYSTEM FOR YAHOO!’S WEB APPLICATIONS.
• PNUTS PROVIDES:
DATA STORAGE ORGANIZED AS HASHED OR ORDERED TABLES
LOW LATENCY FOR LARGE NUMBERS OF CONCURRENT REQUESTS INCLUDING UPDATES AND QUERIES
NOVEL PER-RECORD CONSISTENCY GUARANTEES
DATA STORAGE AND RETRIEVAL
• DATA STORAGE ORGANIZED AS HASHED OR ORDERED TABLES
• IN ORDER TO DETERMINE WHICH STORAGE UNIT IS RESPONSIBLE FOR A GIVEN RECORD TO BE READ OR WRITTEN BY THE CLIENT, WE MUST FIRST DETERMINE WHICH TABLET CONTAINS THE RECORD, AND THEN DETERMINE WHICH STORAGE UNIT HAS THAT TABLET.
• BOTH OF THESE FUNCTIONS ARE CARRIED OUT BY THE ROUTER.
• ROUTERS CONTAIN ONLY A CACHED COPY OF THE INTERVAL MAPPING
• THE MAPPING IS OWNED BY THE TABLET CONTROLLER
• THE TABLET CONTROLLER DETERMINES WHEN TO MOVE A TABLET BETWEEN STORAGE UNITS AND WHEN A LARGE TABLET MUST BE SPLIT
• ROUTERS PERIODICALLY POLL THE TABLET CONTROLLER TO GET ANY CHANGES TO THE MAPPING
ASYNCHRONOUS REPLICATION AND CONSISTENCY
• EXAMPLE OF EVENTUAL CONSISTENCY
• A USER WISHES TO DO A SEQUENCE OF 2 UPDATES TO HIS RECORD:
U1: REMOVE HIS MOTHER FROM THE LIST OF PEOPLE WHO CAN VIEW HIS PHOTOS
U2: POST SPRING-BREAK PHOTOS
A USER IS ABLE TO READ A STATE OF THE RECORD THAT NEVER SHOULD HAVE EXISTED: THE PHOTOS HAVE BEEN POSTED BUT THE CHANGE IN ACCESS CONTROL HAS NOT TAKEN PLACE.
RECORD TIMELINE CONSISTENCY
• RECORD-LEVEL MASTERING:
ONE OF THE REPLICAS IS DESIGNATED AS THE MASTER, INDEPENDENTLY FOR EACH RECORD,
AND ALL UPDATES TO THAT RECORD ARE FORWARDED TO THE MASTER.
THE REPLICA RECEIVING THE MAJORITY OF WRITE REQUESTS FOR A PARTICULAR RECORD BECOMES THE MASTER FOR THAT RECORD
• PER-RECORD TIMELINE CONSISTENCY
ALL REPLICAS OF A GIVEN RECORD APPLY ALL UPDATES TO THE RECORD IN THE SAME ORDER.
THE RECORD CARRIES A SEQUENCE NUMBER THAT IS INCREMENTED ON EVERY WRITE
RECORD TIMELINE CONSISTENCY
TRANSACTIONS:• ALICE CHANGES STATUS FROM “SLEEPING” TO “AWAKE”
• ALICE CHANGES LOCATION FROM “HOME” TO “WORK
• TIMELINE CONSISTENCY COMES AT A PRICE
• WRITES NOT ORIGINATING IN RECORD MASTER REGION FORWARD TO MASTER AND HAVE LONGER LATENCY
• WHEN MASTER REGION DOWN, RECORD IS UNAVAILABLE FOR WRITE
EXPERIMENTAL SETUP
• THREE PNUTS REGIONS• 2 WEST COAST, 1 EAST COAST• 5 STORAGE UNITS, 2 MESSAGE BROKERS, 1 ROUTER• WEST: DUAL 2.8 GHZ XEON, 4GB RAM, 6 DISK RAID 5 ARRAY• EAST: QUAD 2.13 GHZ XEON, 4GB RAM, 1 SATA DISK
• WORKLOAD• 1200-3600 REQUESTS/SECOND• 0-50% WRITES• 80% LOCALITY
INSERT
• INSERTS
• REQUIRED 75.6 MS PER INSERT IN WEST 1 (TABLET MASTER)
• 131.5 MS PER INSERT INTO THE NON-MASTER WEST 2, AND
• 315.5 MS PER INSERT INTO THE NON-MASTER EAST.
SCALABILITY
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6
Storage units
Ave
rag
e la
ten
cy (
ms)
Hash table Ordered table
CONCLUSION AND ONGOING WORK
• PNUTS IS AN INTERESTING RESEARCH PRODUCT• RESEARCH: CONSISTENCY, PERFORMANCE, FAULT TOLERANCE, RICH FUNCTIONALITY
• PRODUCT: MAKE IT WORK, KEEP IT (RELATIVELY) SIMPLE, LEARN FROM EXPERIENCE AND REAL APPLICATIONS
• ONGOING WORK• INDEXES AND MATERIALIZED VIEWS
• BUNDLED UPDATES
• BATCH QUERY PROCESSING