47
Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio

Scalable Internet Services

  • Upload
    cricket

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Scalable Internet Services. Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio. Outline. Overview of cluster services lessons from giant-scale services (BR01) SEDA (staged event-driven architecture) Capriccio. Scalable Servers. - PowerPoint PPT Presentation

Citation preview

Page 1: Scalable Internet Services

Scalable Internet Services

Cluster Lessons and Architecture Design for Scalable Services

BR01, TACC, Porcupine, SEDA and Capriccio

Page 2: Scalable Internet Services

Ben Y. Zhao [email protected]

Outline• Overview of cluster services

• lessons from giant-scale services (BR01)

• SEDA (staged event-driven architecture)

• Capriccio

Page 3: Scalable Internet Services

Ben Y. Zhao [email protected]

Scalable Servers• Clustered services

• natural platform for large web services

• search engines, DB servers, transactional servers

• Key benefit• low cost of computing, COTS vs. SMP

• incremental scalability

• load balance traffic/requests across servers

• Extension from single server model• reliable/fast communication, but partitioned

data

Page 4: Scalable Internet Services

Ben Y. Zhao [email protected]

Goals• Failure transparency

• hot-swapping components w/o loss of avail

• homogeneous functionality and/or replication

• Load balancing• partition data / requests for max service rate

• need to colocate requests w/ associated data

• Scalability• aggregate performance should scale w/# of

servers

Page 5: Scalable Internet Services

Ben Y. Zhao [email protected]

Two Different Models• Read-mostly data

• web servers, DB servers, search engines (query)

• replicate across servers + (RR DNS / redirector)

IP Network (WAN)IP Network (WAN)

clientclient client client

client

Round Robin DNS

Page 6: Scalable Internet Services

Ben Y. Zhao [email protected]

Two Different Models …• Read-write model

• mail servers, e-commerce sites, hosted services

• small(er) replication factor for stronger consistency

IP Network (WAN)IP Network (WAN)

clientclient client client

client

Load Redirector

Page 7: Scalable Internet Services

Ben Y. Zhao [email protected]

Key Architecture Challenges

• Providing high availability• availability across component failures

• Handling flash crowds / peak load• need support for massive concurrency

• Other challenges• upgradability: maintaining availability and

minimal cost during upgrades in S/W, H/W, functionality

• error diagnosis: fast isolation of failures / performance degradation

Page 8: Scalable Internet Services

Ben Y. Zhao [email protected]

Nuggets• Definition

• uptime = (MTBF – MTTR)/MTBF

• yield = queries completed / queries offered

• harvest = data available / complete data

• MTTR• at least as important at MTBF

• much easier to tune and quantify

• DQ principle• data/query x queries/second constant

• physical bottlenecks limit overall throughput

Page 9: Scalable Internet Services

Ben Y. Zhao [email protected]

Staged Event-driven Architecture

• SEDA (SOSP’05)

Page 10: Scalable Internet Services

Ben Y. Zhao [email protected]

Break…• Come back in 5 mins

• more on threads vs. events…

Page 11: Scalable Internet Services

Ben Y. Zhao [email protected]

Tapestry Software Architecture

SEDA event-driven frameworkJava Virtual Machine

Dynamic Tap.

distance map

core router

application programming interface

applications

Patchwork

network

Page 12: Scalable Internet Services

Ben Y. Zhao [email protected]

Impact of Correlated Events

• web / application servers• independent requests• maximize individual throughput

Network

???

???

?

ABC

• correlated requests: A+B+CD• e.g. online continuous queries, sensor

aggregation, p2p control layer, streaming data mining

event handler

+ + =

Page 13: Scalable Internet Services

Ben Y. Zhao [email protected]

Capriccio• User-level light-weight threads

(SOSP03)

• Argument• threads are the natural programming model

• current problems result of implementation• not fundamental flaw

• Approach• aim for massive scalability

• compiler assistance

• linked stacks, block graph scheduling

Page 14: Scalable Internet Services

Ben Y. Zhao [email protected]

The Price of Concurrency• Why is concurrency hard?

• Race conditions

• Code complexity

• Scalability (no O(n) operations)

• Scheduling & resource sensitivity

• Inevitable overload

• Performance vs. Programmability• No good solution

PerformanceEase

of

Pro

gra

mm

ing Threads

Threads

Events

Ideal

Page 15: Scalable Internet Services

Ben Y. Zhao [email protected]

The Answer: Better Threads

• Goals• Simple programming model

• Good tools & infrastructure• Languages, compilers, debuggers, etc.

• Good performance

• Claims• Threads are preferable to events

• User-Level threads are key

Page 16: Scalable Internet Services

Ben Y. Zhao [email protected]

“But Events Are Better!”• Recent arguments for events

• Lower runtime overhead

• Better live state management

• Inexpensive synchronization

• More flexible control flow

• Better scheduling and locality

• All true but…• Lauer & Needham duality argument

• Criticisms of specific threads packages

• No inherent problem with threads!

Page 17: Scalable Internet Services

Ben Y. Zhao [email protected]

Criticism: Runtime Overhead

• Criticism: Threads don’t perform well for high concurrency

• Response• Avoid O(n) operations

• Minimize context switch overhead

• Simple scalability test• Slightly modified GNU Pth

• Thread-per-task vs. single thread

• Same performance!

Requ

ests

/ Sec

ond

Concurrent Tasks

Event-Based Server

Threaded Server

20000

30000

40000

50000

60000

70000

80000

90000

100000

110000

1 10 100 1000 10000 100000 1e+06

Page 18: Scalable Internet Services

Ben Y. Zhao [email protected]

Criticism: Synchronization• Criticism: Thread synchronization is

heavyweight• Response

• Cooperative multitasking works for threads, too!

• Also presents same problems• Starvation & fairness• Multiprocessors• Unexpected blocking (page faults, etc.)

• Both regimes need help• Compiler / language support for concurrency• Better OS primitives

Page 19: Scalable Internet Services

Ben Y. Zhao [email protected]

Criticism: Scheduling

• Criticism: Thread schedulers are too generic• Can’t use application-specific information

• Response• 2D scheduling: task & program location

• Threads schedule based on task only• Events schedule by location (e.g. SEDA)

• Allows batching• Allows prediction for SRCT

• Threads can use 2D, too!• Runtime system tracks current location• Call graph allows prediction

Task

Pro

gra

m

Loca

tion

Threads

Events

Page 20: Scalable Internet Services

Ben Y. Zhao [email protected]

The Proof’s in the Pudding• User-level threads package

• Subset of pthreads

• Intercept blocking system calls

• No O(n) operations

• Support > 100K threads

• 5000 lines of C code

• Simple web server: Knot• 700 lines of C code

• Similar performance• Linear increase, then steady

• Drop-off due to poll() overhead

0

100

200

300

400

500

600

700

800

900

1 4 16 64 256 1024 4096 16384

KnotC (Favor Connections)

KnotA (Favor Accept)

Haboob

Concurrent Clients

Mbit

s /

seco

nd

Page 21: Scalable Internet Services

Ben Y. Zhao [email protected]

Arguments For Threads• More natural programming model

• Control flow is more apparent

• Exception handling is easier

• State management is automatic

• Better fit with current tools & hardware• Better existing infrastructure

Page 22: Scalable Internet Services

Ben Y. Zhao [email protected]

Why Threads: control Flow• Events obscure control flow

• For programmers and tools

Threads Eventsthread_main(int sock) {

struct session s;

accept_conn(sock, &s);

read_request(&s);

pin_cache(&s);

write_response(&s);

unpin(&s);

}

pin_cache(struct session *s) {

pin(&s);

if( !in_cache(&s) )

read_file(&s);

}

AcceptHandler(event e) {

struct session *s = new_session(e);

RequestHandler.enqueue(s);

}

RequestHandler(struct session *s) {

…; CacheHandler.enqueue(s);

}

CacheHandler(struct session *s) {

pin(s);

if( !in_cache(s) ) ReadFileHandler.enqueue(s);

else ResponseHandler.enqueue(s);

}

. . .

ExitHandler(struct session *s) {

…; unpin(&s); free_session(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Page 23: Scalable Internet Services

Ben Y. Zhao [email protected]

Threads Eventsthread_main(int sock) {

struct session s;

accept_conn(sock, &s);

read_request(&s);

pin_cache(&s);

write_response(&s);

unpin(&s);

}

pin_cache(struct session *s) {

pin(&s);

if( !in_cache(&s) )

read_file(&s);

}

CacheHandler(struct session *s) {

pin(s);

if( !in_cache(s) ) ReadFileHandler.enqueue(s);

else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {

…; CacheHandler.enqueue(s);

}

. . .

ExitHandler(struct session *s) {

…; unpin(&s); free_session(s);

}

AcceptHandler(event e) {

struct session *s = new_session(e);

RequestHandler.enqueue(s); }

• Events obscure control flow• For programmers and tools

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Why Threads: control Flow

Page 24: Scalable Internet Services

Ben Y. Zhao [email protected]

Why Threads: Exceptions• Exceptions complicate control flow

• Harder to understand program flow• Cause bugs in cleanup code

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Threads Eventsthread_main(int sock) {

struct session s;

accept_conn(sock, &s);

if( !read_request(&s) )

return;

pin_cache(&s);

write_response(&s);

unpin(&s);

}

pin_cache(struct session *s) {

pin(&s);

if( !in_cache(&s) )

read_file(&s);

}

CacheHandler(struct session *s) {

pin(s);

if( !in_cache(s) ) ReadFileHandler.enqueue(s);

else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {

…; if( error ) return; CacheHandler.enqueue(s);

}

. . .

ExitHandler(struct session *s) {

…; unpin(&s); free_session(s);

}

AcceptHandler(event e) {

struct session *s = new_session(e);

RequestHandler.enqueue(s); }

Page 25: Scalable Internet Services

Ben Y. Zhao [email protected]

Why Threads: State Management

Threads Eventsthread_main(int sock) {

struct session s;

accept_conn(sock, &s);

if( !read_request(&s) )

return;

pin_cache(&s);

write_response(&s);

unpin(&s);

}

pin_cache(struct session *s) {

pin(&s);

if( !in_cache(&s) )

read_file(&s);

}

CacheHandler(struct session *s) {

pin(s);

if( !in_cache(s) ) ReadFileHandler.enqueue(s);

else ResponseHandler.enqueue(s);

}

RequestHandler(struct session *s) {

…; if( error ) return; CacheHandler.enqueue(s);

}

. . .

ExitHandler(struct session *s) {

…; unpin(&s); free_session(s);

}

AcceptHandler(event e) {

struct session *s = new_session(e);

RequestHandler.enqueue(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

• Events require manual state management• Hard to know when to free

• Use GC or risk bugs

Page 26: Scalable Internet Services

Ben Y. Zhao [email protected]

Why Threads: Existing Infrastructure

• Lots of infrastructure for threads• Debuggers

• Languages & compilers

• Consequences• More amenable to analysis

• Less effort to get working systems

Page 27: Scalable Internet Services

Ben Y. Zhao [email protected]

Building Better Threads• Goals

• Simplify the programming model• Thread per concurrent activity• Scalability (100K+ threads)

• Support existing APIs and tools

• Automate application-specific customization

• Mechanisms• User-level threads

• Plumbing: avoid O(n) operations

• Compile-time analysis

• Run-time analysis

Page 28: Scalable Internet Services

Ben Y. Zhao [email protected]

Case for User-Level Threads

• Decouple programming model and OS• Kernel threads

• Abstract hardware• Expose device concurrency

• User-level threads• Provide clean programming model• Expose logical concurrency

• Benefits of user-level threads• Control over concurrency model!

• Independent innovation

• Enables static analysis

• Enables application-specific tuning

Threads

App

OS

User

Page 29: Scalable Internet Services

Ben Y. Zhao [email protected]

Case for User-Level Threads

Threads

OS

User

App

• Decouple programming model and OS• Kernel threads

• Abstract hardware• Expose device concurrency

• User-level threads• Provide clean programming model• Expose logical concurrency

• Benefits of user-level threads• Control over concurrency model!

• Independent innovation

• Enables static analysis

• Enables application-specific tuningSimilar argument tothe design of overlay

networks

Page 30: Scalable Internet Services

Ben Y. Zhao [email protected]

Capriccio Internals• Cooperative user-level threads

• Fast context switches• Lightweight synchronization

• Kernel Mechanisms• Asynchronous I/O (Linux)

• Efficiency• Avoid O(n) operations • Fast, flexible scheduling

Page 31: Scalable Internet Services

Ben Y. Zhao [email protected]

Safety: Linked Stacks• The problem: fixed stacks

• Overflow vs. wasted space• LinuxThreads: 2MB/stack• Limits thread numbers

• The solution: linked stacks• Allocate space as needed• Compiler analysis

• Add runtime checkpoints • Guarantee enough space until

next check

Fixed Stacks

Linked Stack

waste

overflow

Page 32: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

MaxPath = 8

Page 33: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

Page 34: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

Page 35: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

Page 36: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

Page 37: Scalable Internet Services

Ben Y. Zhao [email protected]

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

• Parameters• MaxPath• MinChunk

• Steps• Break cycles• Trace back

• chkpts limit MaxPath length

• Special Cases• Function pointers• External calls• Use large stack

Page 38: Scalable Internet Services

Ben Y. Zhao [email protected]

Special Cases• Function pointers

• categorize f* by # and type of arguments

• “guess” which func will/can be called

• External functions• users annotate trusted stack bounds on libs

• or (re)use a small # of large stack chunks

• Result• use/reuse stack chunks much like VM

• can efficiently share stack chunks

• memory-touch benchmark, factor of 3 reduction in paging cost

Page 39: Scalable Internet Services

Ben Y. Zhao [email protected]

Scheduling: Blocking Graph

• Lessons from event systems• Break app into stages

• Schedule based on stage priorities

• Allows SRCT scheduling, finding bottlenecks, etc.

• Capriccio does this for threads• Deduce stage with stack traces at

blocking points

• Prioritize based on runtime information

Accept

Write

Read

Read

Open

Web Server

Close

Close

Page 40: Scalable Internet Services

Ben Y. Zhao [email protected]

Resource-Aware Scheduling

• Track resources used along BG edges• Memory, file descriptors, CPU

• Predict future from the past

• Algorithm• Increase use when underutilized• Decrease use near saturation

• Advantages• Operate near the knee w/o thrashing

• Automatic admission control

Accept

Write

Read

Read

Open

Web Server

Close

Close

Page 41: Scalable Internet Services

Ben Y. Zhao [email protected]

Pitfalls• What is the max amt of resource?

• depends on workload• e.g.: disk thrashing depends on sequential

or random seeks• use early signs of thrashing to indicate max

capacity

• Detecting thrashing• only estimate using “productivity/overhead”• productivity from guessing (threads

created, files opened/closed)

Page 42: Scalable Internet Services

Ben Y. Zhao [email protected]

Thread Performance

Capriccio

Capriccio-notrace

LinuxThreads

NPTL

Thread Creation 21.5 21.5 37.5 17.7

Context Switch 0.56 0.24 0.71 0.65

Uncontested mutex lock

0.04 0.04 0.14 0.15

• Slightly slower thread creation• Faster context switches

• Even with stack traces!• Much faster mutexes

Time of thread operations (microseconds)

Page 43: Scalable Internet Services

Ben Y. Zhao [email protected]

Runtime Overhead• Tested Apache 2.0.44• Stack linking

• 78% slowdown for null call• 3-4% overall

• Resource statistics• 2% (on all the time)• 0.1% (with sampling)

• Stack traces• 8% overhead

Page 44: Scalable Internet Services

Ben Y. Zhao [email protected]

Microbenchmark: Producer / Consumer

Page 45: Scalable Internet Services

Ben Y. Zhao [email protected]

Web Server Performance

Page 46: Scalable Internet Services

Ben Y. Zhao [email protected]

Example of “Great Systems Paper”

• observe higher level issue• threads vs. event programming abstraction

• use previous work (duality) to identify problem• why are threads not as efficient as events?

• good systems design• call graph analysis for linked stacks• resource aware scheduling

• good execution• full solid implementation• analysis leading to full understanding of detailed

issues

• cross-area approach (help from PL research)

Page 47: Scalable Internet Services

Ben Y. Zhao [email protected]

Acknowledgements• Many slides “borrowed” from the

respective talks / papers:• Capriccio (Rob von Behren)

• SEDA (Matt Welsh)

• Brewer01: “Lessons…”