42
FP in indus Urban Boquis <Urban Boquist Rev PA1 2009-10-12 <Urban.Boquist stry - Erlang st – Ericsson AB t@ericsson com> 1 t@ericsson.com>

FP in industry - Erlang · try - Erlang t – Ericsson AB t@ericsson com> 1 [email protected]> Out • Who Am I • Mobile Telecommunications ... • Ericsson SGSN is “dual access

Embed Size (px)

Citation preview

FP in indus

Urban Boquis<Urban Boquist

Rev PA1 2009-10-12

<Urban.Boquist

stry - Erlang

st – Ericsson ABt@ericsson com>

1

[email protected]>

Out• Who Am I• Mobile Telecommunications • Packet Core Network – GPR• Use of Erlang in SGSN• SGSN Design Principles for

concurrency & distribution– concurrency & distribution– fault tolerance– multicore– overload protection– runtime code replacement

• Examples

Rev PA1 2009-10-12

Examples

tline

NetworksRS & SGSN

Erlang:

2

Who A

• Chalmers (D-linjen)• Chalmers (PhD Compilation• Chalmers (PhD, Compilation• Carlstedt Research & Techno• QEP (own startup, consultanQ (o sta tup, co su ta• Ericsson AB, Lindholmen• ...

Rev PA1 2009-10-12

Am I?

& Optimization of Haskell) & Optimization of Haskell)ology (consultant)t)t)

3

GSM –

CS circuit switched

Services in telecommunications ne

CS – circuit switched

● voiceSMS● SMS

• GPRS – General Packet Rad

Rev PA1 2009-10-12

GPRS

PS packet switched

etworks:

PS – packet switched

● everything that is “IP”● www● www● email● MMS

dio Service

4

GPRadio Network

Rev PA1 2009-10-12

RSPacket Core Network

5

“3G” – UMT

• Different Radio Network• Packet Core Network (almost) the • Ericsson SGSN is “dual access”• Much higher (end user) speeds:

– Up to 384 Kbps for 3G (WCDMA)Up to 384 Kbps for 3G (WCDMA)– Up to 14.4 Mbps for HSDPA (later

• Voice / video calls are still CS!St i di / id i PS (TV• Streaming audio / video is PS (TV

• Future: voice / video in PS• “Voice-over-IP”

Rev PA1 2009-10-12

TS / WCDMA

same as in GPRS

up to 42 Mbit – Evolved HSPA)

MBMS)== MBMS)

6

Rev PA1 2009-10-12 7

3GP

• Standards define everything.• Interoperability is vital!• Interoperability is vital!• ”Tens of thousands” pages o

build an SGSN.• See www.3gpp.org.

Rev PA1 2009-10-12

PP

of standard text needed to

8

SGSN – Bas

• authenticationControl Signalling

• admission control• quality of service• mobility• roaming• ...

Rev PA1 2009-10-12

sic Services

Payload transport

t ffi● user traffic● charging

9

SGSN Node

Capacity• ~ 50 k subscribers, 2000,• ~ 100 k subscribers, 2002• ~ 500 k subscribers, 2004• ~ 1 M subscribers 2005• ~ 1 M subscribers, 2005• ~ 2 M subscribers, 2008

Rev PA1 2009-10-12 10

SGSN Arc

Control P

CPCP ...

Switc

PPPP ...MS

Rev PA1 2009-10-12

Payload

chitecturesoft real time

Planesoft real time

CP CP

ch

hard real time

PP PPInternet

11

Plane

SGSN H

• ≈ 20 30 Control Processors (• ≈ 20-30 Control Processors (– UltraSPARC or PowerPC cpus– 2 GB memory– Solaris/Linux + Erlang / C / C++

• ≈ 20-30 Payload Processors 1 3 PowerPC cpus– 1-3 PowerPC cpus

– Special hardware (FPGAs) for e– Physical devices: frame relay, a– VxWorks + C / C++

• Backplane: 1 Gbit ethernet

Rev PA1 2009-10-12

Current release: ≈ 2.000.000 Simu

ardware

boards):boards):

+

(boards):

encryptionatm, ...

12

ultaneously Attached Users (phones)

Traffic Cont

• Control Processors (Solaris /• Most control signalling handlMost control signalling handl• One “Erlang” running on eac

• Distributed Erlang system wit

• Mobile Phones are distribute

Rev PA1 2009-10-12

trol in SGSN

/ Sparc or Linux / PowerPC)ed by Erlang codeed by Erlang codeh CP

th 20-40 nodes

d over CP:s

13

Control S

• attach (phone is turned on)• israu (routing area update, m• activation (initiate payload tra• activation (initiate payload tra• etc. [thousands of signals]

Telecom standards are HU

We need a high level langGPRS, not on programmi

Rev PA1 2009-10-12

Signalling

obility in radio network)affic)affic)

GE (see www.3gpp.org)!

guage – concentrate on ing details!

14

Erlang

• Invented at Ericsson Comput• Intended for large scale relia• Intended for large scale relia• Erlang is: functional language

concurrency.• OTP (Open Telecom Platform

Rev PA1 2009-10-12

g/OTP

ter Science Lab in the 1980s.ble telecom systemsble telecom systems.e + built-in support for

m) = Erlang + lots of libraries.

15

Erlang vs

• Erlang can do most things Hahigher order functions, list cohigher order functions, list co

• BUT – where Haskell is ”bea• Erlang is strict (like ML, expre

immediately, not when they a• Erlang has no real type syste

compiles but may crash at rucompiles but may crash at ru

Rev PA1 2009-10-12

s. Haskell

askell can (pattern matching, omprehensions, ...)omprehensions, ...)utiful”, Erlang is ”ugly”!essions evaluated are needed)em (like LISP, everything untime)untime)

16

Why E

• Good things in Erlang:– built-in concurrency (processes– built-in distributionbuilt in distribution– built-in fault-tolerance– support for runtime code replac

• This is exactly what is needePlane in a telecom system!

• Control Plane Software is no• User Plane (payload) is time

Rev PA1 2009-10-12

• User Plane (payload) is time

rlang?

s and message passing)

cement

ed to build a robust Control

t time critical (Erlang)critical (VxWorks + C)

17

critical (VxWorks + C)

Fault To• SGSN must never be out-of-s• Hardware fault tolerance

– Faulty boards are automaticallyMobile phones redistributed– Mobile phones redistributed

• Software fault tolerance– SW error triggered by one phongg y p– Serious error in “system SW” s

handled by that board

Think: how can such requirements be

Example: the SW handling one phone

Rev PA1 2009-10-12

Example: the SW handling one phoneand overwrites all the memory with ga

oleranceservice! (99.999%)

y taken out of service

ne should not affect others!hould affect at most the phones

e realized?

e goes crazy

18

e goes crazyarbage.

SGSN Architectur

CP CP

• On each CP ≈ 100 processes– “static workers”

• On each CP ≈ 50.000 proces– “dynamic workers”

Rev PA1 2009-10-12

– dynamic workers

re – Control Plane

CP

s providing “system services”

sses each handling one phone

19

Dynamic

• System principle: one Erlangwith a single mobile phone

• A worker encodes a number signal – do some computatioP l d l t l t “• Payload plane translates a “sinto an Erlang message and worker, and vice versa

Rev PA1 2009-10-12

workers

g process handles all signalling

of state machines: receive a on – send a reply signal

i l” f th bil hsignal” from the mobile phone sends it to the correct dynamic

20

Dynamic wo

• A process crash should neveguarantees memory protectio

• SW errors in SGSN leads to phone, dynamic worker will bS f SW i MS• Same for SW errors in MS, ewill crash dynamic worker (of

Rev PA1 2009-10-12

orkers cont.

er affect other mobiles (Erlang on)a short service outage for the

be restarted after the crashf il t f ll t d de.g., failure to follow standards

ffensive programming)

21

Superv

Super

Worker1 Wor

• Crash of worker is noticed by• Supervisor triggers “recovery• Either the crashed worker is• Either the crashed worker is

or

• All workers are killed and res

Rev PA1 2009-10-12

vision

rvisor

rker2 Worker3

y supervisory action”restartedrestarted

started

22

RRecovery

• Recovery action after SW cra• Many restart levels:

– very very small restartvery small restart– very small restart

– small restart– medium restart– large restart– SGSN restart

• Lowest restart level affects oLowest restart level affects o• Highest level affects all phon• Try low level first, if it does no

Rev PA1 2009-10-12

y

i i lprinciples

ash is “restart”

escalation

nly one mobile phonenly one mobile phoneesot help, escalate to next level

23

p

Recovery prin

• Orthogonal to ”restart” is ”takmobile phones are ”taken ovmobile phones are taken ovfailure – ideally phone should

• Method: separate ”control” frh i li done phone is replicated to on

• Efficiency? Can not replicateselect ”good points” to do repselect good points to do rep

Rev PA1 2009-10-12

nciples cont.

keover” – service of existing er” by other board after HWer by other board after HW d not noticeom ”data” – all data related to

h b dne other board every time data changes –plication (transaction concept)plication (transaction concept)

24

Processes - Ge

• Most processes are “server lsome computation – send re

• SGSN extends OTP gen_ser– message passing via cast, no r

message passing ia call ( ca– message passing via call (≈ ca

Rev PA1 2009-10-12

eneric Servers

ike”; receive message – do plyrver behaviour:replyst + s nchroni ation + ret rn al e)st + synchronization + return value)

25

Example Erlang m

sender:sender:.Pid ! Msg,.

ireceiver:.receive

Msg -><action>

end

Rev PA1 2009-10-12

end,.

message passing

26

E l tExample contsender:

.Ret = gen_server:call(Pid, Msg),.

receiver:handle_call(Msg) ->

case Msg of{add, N} ->{ , }

{reply, N + 1};...

end

Rev PA1 2009-10-12

end.

. - gen_server

27

Improved gImproved ggen_server2:

handle_call({M,F,A}) ->_ ({ })apply(M,F,A).

sender:Ms = gen server2:call(Pid {mobility aMs = gen_server2:call(Pid,{mobility,aRet = gen_server2:call(Pid,{session,a

receiver (file mobility.erl):attach(Id) ->

<do something>.

receiver (file session.erl):activate(Ms) ->

Rev PA1 2009-10-12

activate(Ms) ><do something more>.

gen servergen_server

attach [Id]})attach,[Id]}),activate,[Ms]}),

28

Example – robust

• Problem: implement ”cast” wireceiver crashes before messH ?• How?

• Implement cast as: send messtoragestorage

• In receiver: after processing, storage

• In startup of receiver (after crmessages

Rev PA1 2009-10-12

message passing

th guaranteed delivery even if sage is handled

ssage + write into persistent

remove message from

rash): check for stored

29

Erlang – Co

• “Normal” synchronization primmonitors, does not look the s,everything is done with proce

• Mutual exclusion – use a singresource Clients call processresource. Clients call process

• Critical sections – allow only section

Rev PA1 2009-10-12

oncurrency

mitives, like semaphores or same in Erlang. Instead gesses and message passing.gle process to handle s to get accesss to get access.one process to execute

30

Erlang - Conc

• Atomic operations:– ets:update_counter()– mnesia:transaction()

• “home made” using a transaction h– client starts transaction, messa,– client does some “work”– client ends transaction, messa– TP commits “work”TP commits work– “failure” when transaction is sta

revert to state before the start

Rev PA1 2009-10-12

currency cont.

handler process (TP)age to TPg

age to TP

arted but not ended makes TP

31

Erlang - Di

• General rule in SGSN: avoid• General rule in SGSN: avoid synchronization if possible

• Design algorithms that work i– fault tolerance– load balancing

• Avoid relying on global resou• Avoid relying on global resou• Data handling:

– keep as much locally as possibwith mobile phones)

– some data must be distributed /– many different variants of persi

Rev PA1 2009-10-12

y p

istribution

remote communication orremote communication or

independently on each board

urcesurces

ble (typically traffic data associated

/ shared, use mnesia or manualstency, redundancy, replication

32

y, y, p

Example – intra

• Problem – an incoming signathe Payload Plane, to which

• Old solution: a global resourcbetween different “identities” and the corresponding CPand the corresponding CP

• New solution: construct identCP somewhere in Id

• For Ids that are outside SGSrandom CP (rare) or broadca

Rev PA1 2009-10-12

a-SGSN routing

al from a phone is received in CP should it be routed?ce was used to keep mappings that were linked to the phone

tities in a clever way, encode

N control, send signal to a ast to all CPs (very rare)

33

Multi

• Erlang in theory gives you m• The BEAM (Erlang virtual ma( g

schedule Erlang processes o

• However – linear speedup isapplication code to offer eno”parallelism”. Very easy to geparallelism . Very easy to ge

• Profiling in a multicore enviro• In SGSN – dual core gave ro

Rev PA1 2009-10-12

core

ulticore support ”for free”achine) will automatically ) yonto all available cores.

not guaranteed. It is up to the ugh possibilities for et resource bottlenecks.et resource bottlenecks.onment is hard!oughly 20% without tuning.

34

Runtime code

• Fact: SW is never bug free!• Must be able to install error cMust be able to install error c

delivered systems without dis• Erlang can load a new versio

system• Be careful! Code loading req

running SW and great care frrunning SW and great care frEx: since “live data” survives in ta

code must be able to handle da

Rev PA1 2009-10-12

e replacement

corrections into alreadycorrections into already sturbing operationon of a module in a running

uires co-operation from the rom the SW designerrom the SW designerables/storage, the new version of the ata in both new and old format.

35

Bugs in

• Bugs in Erlang / OTP are as • How do we protect SGSN agHow do we protect SGSN ag• Base: same methods as for S

restarts and escalation• Addition: if restarts local to on

to resolve an error condition,

• Using Erlang in a robust wayhardware may suddenly fail i

Rev PA1 2009-10-12

y y

n Erlang

common as bugs in SGSNgainst Erlang failures?gainst Erlang failures?SGSN code; recovery by

ne Erlang node repeatedly fails then kill that Erlang node

y in a distributed system where s a very hard problem!

36

y p

Overload P

• If CPU load or memory usageaccept new connections from

• The SGSN must never stop toverload, better to skip servicR li d i i• Realized in message passingdisgarded (silently dropped o

Rev PA1 2009-10-12

Protection

e goes too high SGSN will not m mobile phonesto “respond” because of ce for some phones

if OLP hitg; if OLP hits messages are or a denial reply generated)

37

What about ”functio

• Designers implementing the need to bother with programmneed to bother with programm

• Framework code offers lots o• Almost like a ”domain specifi• To realize this, functional pro

• But to summarize: FP is a gre

Rev PA1 2009-10-12

nal programming”?

GPRS standards should not ming details.ming details.of ”abstractions” to help out.c language”.

ogramming is very good!

eat help – but not vital. Or?

38

Hask

• Could we use Haskell instead• Not trivial need to do some• Not trivial – need to do some

system:– ”one process per mobile phone

h d l ?scheduler?– ”memory protection between pr

related to phone #1 from data r– ”recovery from software faults”

without losing all data?

Rev PA1 2009-10-12

kell?

d of Erlang?e fundamental re design of thee fundamental re-design of the

e” – need to implement our own

rocesses” – need to separate ”data” related to phone #2– how do we crash and restart

39

Haskel

• Redesign cont.:– ”concurrency” – sending messa– ”runtime code replacement” – nu t e code ep ace e t

without losing the data about th– ”efficiency & memory usage”?

• Reflection: consider Erlang vare the most similar?

Rev PA1 2009-10-12

l cont.

ages between boardsneed to replace broken software eed to ep ace b o e so t a ehe phones

vs. Haskell vs. C++. Which two

40

C lConclu

Pros:• Erlang works very well for GP• High level language – concen• Has the right primitives; fault Cons:

Erlang/OTP not a main strea• Erlang/OTP not a main strea– Poor programming environmen– Single implementation maintain

• Hard to find good Erlang prog• High level language – easy to

f li f d

Rev PA1 2009-10-12

few lines of code...

iusions

PRS traffic control handlingntrate on important partstolerance, distribution, ...

m languagem languagents (debugging, modelling, etc)ned by too few people, lots of bugs

grammers (?)o create a real mess in just a

41

Rev PA1 2009-10-12 42