53
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer University of California at Berkeley {jrvb, jcondit, zf, necula, brewer}@cs.berkeley.edu http://capriccio.cs.berkeley.edu

Capriccio: Scalable Threads for Internet Services

Embed Size (px)

DESCRIPTION

Capriccio: Scalable Threads for Internet Services. Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer University of California at Berkeley {jrvb, jcondit, zf, necula, brewer}@cs.berkeley.edu http://capriccio.cs.berkeley.edu. The Stage. Highly concurrent applications - PowerPoint PPT Presentation

Citation preview

Page 1: Capriccio: Scalable Threads for Internet Services

Capriccio: Scalable Threads for Internet Services

Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer

University of California at Berkeley{jrvb, jcondit, zf, necula, brewer}@cs.berkeley.edu

http://capriccio.cs.berkeley.edu

Page 2: Capriccio: Scalable Threads for Internet Services

The Stage

Highly concurrent applications

Internet servers & frameworks Flash, Ninja, SEDA

Transaction processing databases

Workload High performance Unpredictable load spikes Operate “near the knee” Avoid thrashing!

Ideal

Peak: some resource at max

Overload: someresource thrashing

Load (concurrent tasks)Perf

orm

anc

e

Page 3: Capriccio: Scalable Threads for Internet Services

The Price of Concurrency

What makes concurrency hard?

Race conditions Code complexity Scalability (no O(n) operations) Scheduling & resource sensitivity Inevitable overload

Performance vs. Programmability

No current system solves Must be a better way!

PerformanceEase

of

Pro

gra

mm

ing Threads

Threads

Events

Ideal

Page 4: Capriccio: Scalable Threads for Internet Services

The Answer: Better Threads Goals

Simple programming model Good tools & infrastructure

Languages, compilers, debuggers, etc. Good performance

Claims Threads are preferable to events User-Level threads are key

Page 5: Capriccio: Scalable Threads for Internet Services

“But Events Are Better!” Recent arguments for events

Lower runtime overhead Better live state management Inexpensive synchronization More flexible control flow Better scheduling and locality

All true but… Lauer & Needham duality argument Criticisms of specific threads packages No inherent problem with threads!

Thread implementations can be improved

Page 6: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Runtime Overhead

Criticism: Threads don’t perform well for high concurrency

Response Avoid O(n) operations Minimize context switch overhead

Simple scalability test Slightly modified GNU Pth Thread-per-task vs.

single thread Same performance!

Requ

ests

/ Sec

ond

Concurrent Tasks

Event-Based Server

Threaded Server

20000

30000

40000

50000

60000

70000

80000

90000

100000

110000

1 10 100 1000 10000 100000 1e+06

Page 7: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Synchronization

Criticism: Thread synchronization is heavyweight

Response Cooperative multitasking works for threads, too! Also presents same problems

Starvation & fairness Multiprocessors Unexpected blocking (page faults, etc.)

Both regimes need help Compiler / language support for concurrency Better OS primitives

Page 8: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Scheduling

Task

Pro

gra

m

Loca

tion

Criticism: Thread schedulers are too generic

Can’t use application-specific information Response

2D scheduling: task & program location Threads schedule based on task only Events schedule by location (e.g. SEDA)

Allows batching Allows prediction for SRCT

Threads can use 2D, too! Runtime system tracks current location Call graph allows prediction

Page 9: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Scheduling

Task

Pro

gra

m

Loca

tion

Threads

Criticism: Thread schedulers are too generic

Can’t use application-specific information Response

2D scheduling: task & program location Threads schedule based on task only Events schedule by location (e.g. SEDA)

Allows batching Allows prediction for SRCT

Threads can use 2D, too! Runtime system tracks current location Call graph allows prediction

Page 10: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Scheduling

Criticism: Thread schedulers are too generic

Can’t use application-specific information Response

2D scheduling: task & program location Threads schedule based on task only Events schedule by location (e.g. SEDA)

Allows batching Allows prediction for SRCT

Threads can use 2D, too! Runtime system tracks current location Call graph allows prediction

Task

Pro

gra

m

Loca

tion

Threads

Events

Page 11: Capriccio: Scalable Threads for Internet Services

The Proof’s in the Pudding User-level threads package

Subset of pthreads Intercept blocking system calls No O(n) operations Support > 100K threads 5000 lines of C code

Simple web server: Knot 700 lines of C code

Similar performance Linear increase, then steady Drop-off due to poll()

overhead

0

100

200

300

400

500

600

700

800

900

1 4 16 64 256 1024 4096 16384

KnotC (Favor Connections)

KnotA (Favor Accept)

Haboob

Concurrent Clients

Mbit

s /

seco

nd

Page 12: Capriccio: Scalable Threads for Internet Services

Arguments For Threads More natural programming model

Control flow is more apparent Exception handling is easier State management is automatic

Better fit with current tools & hardware Better existing infrastructure

Page 13: Capriccio: Scalable Threads for Internet Services

Arguments for Threads:Control Flow Events obscure control flow

For programmers and tools

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s);}RequestHandler(struct session *s) { …; CacheHandler.enqueue(s);}CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}. . . ExitHandler(struct session *s) { …; unpin(&s); free_session(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Page 14: Capriccio: Scalable Threads for Internet Services

Arguments for Threads:Control Flow

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; CacheHandler.enqueue(s);}. . . ExitHandler(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

Events obscure control flow For programmers and tools

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Page 15: Capriccio: Scalable Threads for Internet Services

Arguments for Threads:Exceptions Exceptions complicate control flow

Harder to understand program flow Cause bugs in cleanup code

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s);}. . . ExitHandler(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

Page 16: Capriccio: Scalable Threads for Internet Services

Arguments for Threads:State Management

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s);}. . . ExitHandler(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Events require manual state management Hard to know when to free

Use GC or risk bugs

Page 17: Capriccio: Scalable Threads for Internet Services

Arguments for Threads:Existing Infrastructure Lots of infrastructure for threads

Debuggers Languages & compilers

Consequences More amenable to analysis Less effort to get working systems

Page 18: Capriccio: Scalable Threads for Internet Services

Building Better Threads Goals

Simplify the programming model Thread per concurrent activity Scalability (100K+ threads)

Support existing APIs and tools Automate application-specific customization

Mechanisms User-level threads Plumbing: avoid O(n) operations Compile-time analysis Run-time analysis

Page 19: Capriccio: Scalable Threads for Internet Services

The Case for User-Level Threads

Decouple programming model and OS

Kernel threads Abstract hardware Expose device concurrency

User-level threads Provide clean programming model Expose logical concurrency

Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning

Threads

App

OS

User

Page 20: Capriccio: Scalable Threads for Internet Services

The Case for User-Level Threads

Threads

OS

User

App

Decouple programming model and OS

Kernel threads Abstract hardware Expose device concurrency

User-level threads Provide clean programming model Expose logical concurrency

Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning

Page 21: Capriccio: Scalable Threads for Internet Services

Capriccio Internals Cooperative user-level threads

Fast context switches Lightweight synchronization

Kernel Mechanisms Asynchronous I/O (Linux)

Efficiency Avoid O(n) operations Fast, flexible scheduling

Page 22: Capriccio: Scalable Threads for Internet Services

Safety: Linked Stacks The problem: fixed stacks

Overflow vs. wasted space Limits thread numbers

The solution: linked stacks Allocate space as needed Compiler analysis

Add runtime checkpoints Guarantee enough space

until next check

Fixed Stacks

Linked Stack

waste

overflow

Page 23: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

MaxPath = 8

Page 24: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

Page 25: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

Page 26: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

Page 27: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

Page 28: Capriccio: Scalable Threads for Internet Services

Linked Stacks: Algorithm

5

4

2

6

3

3

2

3

MaxPath = 8

Parameters MaxPath MinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls Use large stack

Page 29: Capriccio: Scalable Threads for Internet Services

Scheduling:The Blocking Graph

Lessons from event systems Break app into stages Schedule based on stage priorities Allows SRCT scheduling, finding

bottlenecks, etc. Capriccio does this for threads

Deduce stage with stack traces at blocking points

Prioritize based on runtime information

Accept

Write

Read

Read

Open

Web Server

Close

Close

Page 30: Capriccio: Scalable Threads for Internet Services

Resource-Aware Scheduling

Track resources used along BG edges Memory, file descriptors, CPU Predict future from the past Algorithm

Increase use when underutilized Decrease use near saturation

Advantages Operate near the knee w/o thrashing Automatic admission control

Accept

Write

Read

Read

Open

Web Server

Close

Close

Page 31: Capriccio: Scalable Threads for Internet Services

Thread Performance

Capriccio

Capriccio-notrace

LinuxThreads

NPTL

Thread Creation 21.5 21.5 37.5 17.7

Context Switch 0.56 0.24 0.71 0.65

Uncontested mutex lock

0.04 0.04 0.14 0.15

Slightly slower thread creation Faster context switches

Even with stack traces! Much faster mutexes

Time of thread operations (microseconds)

Page 32: Capriccio: Scalable Threads for Internet Services

Runtime Overhead Tested Apache 2.0.44 Stack linking

78% slowdown for null call 3-4% overall

Resource statistics 2% (on all the time) 0.1% (with sampling)

Stack traces 8% overhead

Page 33: Capriccio: Scalable Threads for Internet Services

Web Server Performance

Page 34: Capriccio: Scalable Threads for Internet Services

The Future:Compiler-Runtime Integration

Insight Automate things event programmers do by

hand Additional analysis for other things

Specific targets Live state management Synchronization Static blocking graph

Improve performance and decrease complexity

Page 35: Capriccio: Scalable Threads for Internet Services

Conclusions Threads > Events

Equivalent performance Reduced complexity

Capriccio simplifies concurrency Scalable & high performance Control over concurrency model

Stack safety Resource-aware scheduling Enables compiler support, invariants

Themes User-level threads are key Compiler-runtime integration very promising

PerformanceEase

of

Pro

gra

mm

ing Threads

Threads

Events

Capriccio

Page 36: Capriccio: Scalable Threads for Internet Services
Page 37: Capriccio: Scalable Threads for Internet Services
Page 38: Capriccio: Scalable Threads for Internet Services
Page 39: Capriccio: Scalable Threads for Internet Services

Apache Blocking Graph

Page 40: Capriccio: Scalable Threads for Internet Services
Page 41: Capriccio: Scalable Threads for Internet Services

Microbenchmark: Buffer Cache

Page 42: Capriccio: Scalable Threads for Internet Services

Microbenchmark: Disk I/O

Page 43: Capriccio: Scalable Threads for Internet Services

Microbenchmark: Producer / Consumer

Page 44: Capriccio: Scalable Threads for Internet Services

Microbenchmark: Pipe Test

Page 45: Capriccio: Scalable Threads for Internet Services
Page 46: Capriccio: Scalable Threads for Internet Services

Threads v.s. Events:The Duality Argument

General assumption: follow “good practices” Observations

Major concepts are analogous Program structure is similar Performance should be similar

Given good implementations!

Threads Events Monitors Exported functions Call/return and fork/join Wait on condition variable

Event handler & queue Events accepted Send message / await reply Wait for new messages

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Page 47: Capriccio: Scalable Threads for Internet Services

Threads v.s. Events:The Duality Argument

Threads Events Monitors Exported functions Call/return and fork/join Wait on condition variable

Event handler & queue Events accepted Send message / await reply Wait for new messages

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

General assumption: follow “good practices” Observations

Major concepts are analogous Program structure is similar Performance should be similar

Given good implementations!

Page 48: Capriccio: Scalable Threads for Internet Services

Threads v.s. Events:The Duality Argument

Threads Events Monitors Exported functions Call/return and fork/join Wait on condition variable

Event handler & queue Events accepted Send message / await reply Wait for new messages

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

General assumption: follow “good practices” Observations

Major concepts are analogous Program structure is similar Performance should be similar

Given good implementations!

Page 49: Capriccio: Scalable Threads for Internet Services
Page 50: Capriccio: Scalable Threads for Internet Services

Threads v.s. Events:Can Threads Outperform Events? Function pointers & dynamic dispatch

Limit compiler optimizations Hurt branch prediction & I-cache locality

More context switches with events? Example: Haboob does 6x more than Knot Natural result of queues

More investigation needed!

Page 51: Capriccio: Scalable Threads for Internet Services
Page 52: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Live State Management

Criticism: Stacks are bad for live state

Response Fix with compiler help Stack overflow vs. wasted space

Dynamically link stack frames Retain dead state

Static lifetime analysis Plan arrangement of stack Put some data on heap Pop stack before tail calls

Encourage inefficiency Warn about inefficiency

Live

Live

Dead

Unused

Thread State (stack)

Event State (heap)

Page 53: Capriccio: Scalable Threads for Internet Services

Threading Criticism:Control Flow

Criticism: Threads have restricted control flow

Response Programmers use simple patterns

Call / return Parallel calls Pipelines

Complicated patterns are unnatural

Hard to understand Likely to cause bugs