High Performance Erlang

Preview:

DESCRIPTION

 

Citation preview

Course Introduction Course Title @ Course Author 2007

Erlang andScalability

Jan Henry Nystromhenry@erlang-consulting.com

Percona Performance 2009

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 2

Introduction• Scalability Killers• Design Decisions – Language and Yours• Thinking Scalable/Parallel• Code for the correct case• Rules of Thumb• Scalability in the small: SMP

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 3

Scalability Killers• Synchronization• Resource contention

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 3

Scalability Killers

• Synchronization

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 4

Design DecisionsNo sharing

• Processes• Encapsulation• No implicit synchronization

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 5

Design DecisionsNo implicit synchronization

• Spawn always succeed• Sending always succeed• Random access message buffer• Fire and forget unless you need the synchronization

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design Decisions

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Clarity is King!

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Clarity is King!

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 6

Design DecisionsConcurrency oriented programming

• Concurrency support an integral part of the language• Distribution support • Sets the focus firmly on the concurrent tasks• Code for the correct case• Clear Code

Clarity is King!

I rather try to get clear code correct than correct code clear

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 7

0

Thinking Scalable/Parallel

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 7

List length: Obviously Linear

:

But not when you have n processors?

Thinking Scalable/Parallel

4

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 8

List length: O(logN) with sufficient processors

Thinking Scalable/Parallel

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 8

List length: O(logN) with sufficient processors

Thinking Scalable/Parallel

2

4

1 111

2

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 9

Thinking Scalable/ParallelIn the Erlang setting

• Do not introduce unneeded synchronization • Remember processes are cheap• Do not introduce unneeded synchronization• A terminated process is all garbage• Do not introduce unneeded synchronization

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10

Code for the Correct Case

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10

Code for the Correct Case

set timer

set timer

set timer

request

request

request

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10

Code for the Correct Case

set timer

set timer

set timer

release timercheck

release timercheck

release timercheck

request

request

request

answer

answer

answer

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 10

Code for the Correct Case

set timer

set timer

set timer

release timercheck

release timercheck

release timercheck

request

request

request

answer

answer

answer

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11

Code for the Correct Case

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11

Code for the Correct Case

set timer request

request

request

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 11

Code for the Correct Case

set timer request

request

request

answer

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12

Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!

f()

g()

h()

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12

Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!

f()

g()

h()

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12

Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!

f()

g()

h()

h(g(f()))

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 12

Rules of Thumb• Rule 1 - All independent tasks should be processes• Rule 2 - Do not invent concurrency that is not there!

f()

g()

h()

h(g(f()))h(g(f()))

h(g(f()))h(g(f()))

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 13

Scalability in the small: SMPErlang SMP ”Credo”

SMP should be transparent to the programmer inmuch the same way as Erlang Distribution

• You shouldn’t have to think about it ...but sometimes you must

• Use SMP mainly for stuff that you’d make concurrent anyway• Erlang uses concurrency as a structuring principle

• Model for the natural concurrency in your problem

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 14

Scalability in the small: SMP• Erlang on multicore

• SMP prototype ‘97, First OTP release May ‘06.

• Mid -06 benchmark mimicking call handling (axdmark) on the (experimental) SMP emulator. Observed speedup/core: 0.95

• First Ericsson product (TGC) released on SMP Erlang in Q207.

”Big bang” benchmark on Sunfire T2000

Simultaneous processes16 schedulers

1 scheduler

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 15

Scalability in the small: SMPCase Study: Telephony Gateway Controller

• Mediates between legacy telephony and multimedia networks.

• Hugely complex state machines• + massive concurrency.• Developed in Erlang.• Multicore version shipped to customer Q207.• Porting from 1-core PPC to 2-core Intel took < 1 man-year

(including testing).

AXE TGC

GWGW GW

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 16

Scalability in the small: SMP

3.17X call/sec

1.55X call/sec

0.4X call/sec

AXDCPB5

14X call/sec

7.6X call/sec

2.1X call/sec

AXDCPB6

ISUP-ISUP /Intra MGW

ISUP-ISUP /Inter MGW

POTS-POTS /AGW

Trafficscenario

5.5X call/sec

3.6X call/sec

X call/sec

IS/GCP1slot/board

7.7X call/sec

One core used

2.3X call/sec

One core used

IS/GEPDual coreOne core running

2slots/board

26X call/sec

13X call/secOTP R11_3 beta

+patches

4.3X call/secOTP R11_3 beta

+patches

IS/GEPDual coreTwo cores

running2slots/board

Case Study: Telephony Gateway Controller

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 16

Scalability in the small: SMP

3.17X call/sec

1.55X call/sec

0.4X call/sec

AXDCPB5

14X call/sec

7.6X call/sec

2.1X call/sec

AXDCPB6

ISUP-ISUP /Intra MGW

ISUP-ISUP /Inter MGW

POTS-POTS /AGW

Trafficscenario

5.5X call/sec

3.6X call/sec

X call/sec

IS/GCP1slot/board

7.7X call/sec

One core used

2.3X call/sec

One core used

IS/GEPDual coreOne core running

2slots/board

26X call/sec

13X call/secOTP R11_3 beta

+patches

4.3X call/secOTP R11_3 beta

+patches

IS/GEPDual coreTwo cores

running2slots/board

Case Study: Telephony Gateway Controller

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 17

Scalability in the small: SMP

0

1.25

2.50

3.75

5.00

1 2 3 4 5 6 7 8

1.00

1.92 2.05

2.733.11

3.63 3.79 3.96

Speedup on 4 Hyper Threaded Pentium4

Sp

ed

du

p

# Schedulers

• Chatty• 1000 processes created• Each process randomly sends req/recieves ack from all other

processes

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 18

Scalability in the small: SMPErlang VM

Scheduler

run queuenon-SMP VM

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 19

Scalability in the small: SMPErlang VM

Scheduler #1

Scheduler #2

Scheduler #N

run queueCurrent SMP VMOTP R11/R12

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 20

Scalability in the small: SMP

Erlang VM

Scheduler #1

Scheduler #2

run queue

Scheduler #2

Scheduler #N

run queue

run queue

migrationlogic

migrationlogic

New SMP VMOTP R13

Released 21th April

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 21

• Speedup of ”Big Bang” on a Tilera Tile64 chip (R13A)• 1000 processes, all talking to each other

Memory allocation locks dominate...

Scalability in the small: SMP

Multiplerun queues

Singlerun queue

Speedup: Ca 0.43 * N @ 32 cores

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 22

Scalability in the small: SMPShift in Bottlenecks

• All scalable Erlang systems were stress tested for CPU usage for network usage

• With SMP hardware we must stress test for memory usage • In the typical SMP system, the bottleneck has shifted from

the CPU to the memory

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 23

Scalability in the small: SMPDeath by a thousand cuts

• Many requests that generate short spikes in memory usage• Limit or serialize those requests• More on this in coming paper from CTO Ulf Wiger

loop(State) -> receive

{request, typeA, Data} -> Data1 = allocate_lots_of_memory(Data), a_server ! {request, typeA, self()},

receive {answer, …

Percona Performance Conference © 2009 -2009, Erlang Training and ConsultingErlang and Scalability 24

Questions

???

Recommended