Upload
hoangnga
View
217
Download
0
Embed Size (px)
Citation preview
FP in indus
Urban Boquis<Urban Boquist
Rev PA1 2009-10-12
<Urban.Boquist
stry - Erlang
st – Ericsson ABt@ericsson com>
1
Out• Who Am I• Mobile Telecommunications • Packet Core Network – GPR• Use of Erlang in SGSN• SGSN Design Principles for
concurrency & distribution– concurrency & distribution– fault tolerance– multicore– overload protection– runtime code replacement
• Examples
Rev PA1 2009-10-12
Examples
tline
NetworksRS & SGSN
Erlang:
2
Who A
• Chalmers (D-linjen)• Chalmers (PhD Compilation• Chalmers (PhD, Compilation• Carlstedt Research & Techno• QEP (own startup, consultanQ (o sta tup, co su ta• Ericsson AB, Lindholmen• ...
Rev PA1 2009-10-12
Am I?
& Optimization of Haskell) & Optimization of Haskell)ology (consultant)t)t)
3
GSM –
CS circuit switched
Services in telecommunications ne
CS – circuit switched
● voiceSMS● SMS
• GPRS – General Packet Rad
Rev PA1 2009-10-12
GPRS
PS packet switched
etworks:
PS – packet switched
● everything that is “IP”● www● www● email● MMS
dio Service
4
“3G” – UMT
• Different Radio Network• Packet Core Network (almost) the • Ericsson SGSN is “dual access”• Much higher (end user) speeds:
– Up to 384 Kbps for 3G (WCDMA)Up to 384 Kbps for 3G (WCDMA)– Up to 14.4 Mbps for HSDPA (later
• Voice / video calls are still CS!St i di / id i PS (TV• Streaming audio / video is PS (TV
• Future: voice / video in PS• “Voice-over-IP”
Rev PA1 2009-10-12
TS / WCDMA
same as in GPRS
up to 42 Mbit – Evolved HSPA)
MBMS)== MBMS)
6
3GP
• Standards define everything.• Interoperability is vital!• Interoperability is vital!• ”Tens of thousands” pages o
build an SGSN.• See www.3gpp.org.
Rev PA1 2009-10-12
PP
of standard text needed to
8
SGSN – Bas
• authenticationControl Signalling
• admission control• quality of service• mobility• roaming• ...
Rev PA1 2009-10-12
sic Services
Payload transport
t ffi● user traffic● charging
9
SGSN Node
Capacity• ~ 50 k subscribers, 2000,• ~ 100 k subscribers, 2002• ~ 500 k subscribers, 2004• ~ 1 M subscribers 2005• ~ 1 M subscribers, 2005• ~ 2 M subscribers, 2008
Rev PA1 2009-10-12 10
SGSN Arc
Control P
CPCP ...
Switc
PPPP ...MS
Rev PA1 2009-10-12
Payload
chitecturesoft real time
Planesoft real time
CP CP
ch
hard real time
PP PPInternet
11
Plane
SGSN H
• ≈ 20 30 Control Processors (• ≈ 20-30 Control Processors (– UltraSPARC or PowerPC cpus– 2 GB memory– Solaris/Linux + Erlang / C / C++
• ≈ 20-30 Payload Processors 1 3 PowerPC cpus– 1-3 PowerPC cpus
– Special hardware (FPGAs) for e– Physical devices: frame relay, a– VxWorks + C / C++
• Backplane: 1 Gbit ethernet
Rev PA1 2009-10-12
Current release: ≈ 2.000.000 Simu
ardware
boards):boards):
+
(boards):
encryptionatm, ...
12
ultaneously Attached Users (phones)
Traffic Cont
• Control Processors (Solaris /• Most control signalling handlMost control signalling handl• One “Erlang” running on eac
• Distributed Erlang system wit
• Mobile Phones are distribute
Rev PA1 2009-10-12
trol in SGSN
/ Sparc or Linux / PowerPC)ed by Erlang codeed by Erlang codeh CP
th 20-40 nodes
d over CP:s
13
Control S
• attach (phone is turned on)• israu (routing area update, m• activation (initiate payload tra• activation (initiate payload tra• etc. [thousands of signals]
Telecom standards are HU
We need a high level langGPRS, not on programmi
Rev PA1 2009-10-12
Signalling
obility in radio network)affic)affic)
GE (see www.3gpp.org)!
guage – concentrate on ing details!
14
Erlang
• Invented at Ericsson Comput• Intended for large scale relia• Intended for large scale relia• Erlang is: functional language
concurrency.• OTP (Open Telecom Platform
Rev PA1 2009-10-12
g/OTP
ter Science Lab in the 1980s.ble telecom systemsble telecom systems.e + built-in support for
m) = Erlang + lots of libraries.
15
Erlang vs
• Erlang can do most things Hahigher order functions, list cohigher order functions, list co
• BUT – where Haskell is ”bea• Erlang is strict (like ML, expre
immediately, not when they a• Erlang has no real type syste
compiles but may crash at rucompiles but may crash at ru
Rev PA1 2009-10-12
s. Haskell
askell can (pattern matching, omprehensions, ...)omprehensions, ...)utiful”, Erlang is ”ugly”!essions evaluated are needed)em (like LISP, everything untime)untime)
16
Why E
• Good things in Erlang:– built-in concurrency (processes– built-in distributionbuilt in distribution– built-in fault-tolerance– support for runtime code replac
• This is exactly what is needePlane in a telecom system!
• Control Plane Software is no• User Plane (payload) is time
Rev PA1 2009-10-12
• User Plane (payload) is time
rlang?
s and message passing)
cement
ed to build a robust Control
t time critical (Erlang)critical (VxWorks + C)
17
critical (VxWorks + C)
Fault To• SGSN must never be out-of-s• Hardware fault tolerance
– Faulty boards are automaticallyMobile phones redistributed– Mobile phones redistributed
• Software fault tolerance– SW error triggered by one phongg y p– Serious error in “system SW” s
handled by that board
Think: how can such requirements be
Example: the SW handling one phone
Rev PA1 2009-10-12
Example: the SW handling one phoneand overwrites all the memory with ga
oleranceservice! (99.999%)
y taken out of service
ne should not affect others!hould affect at most the phones
e realized?
e goes crazy
18
e goes crazyarbage.
SGSN Architectur
CP CP
• On each CP ≈ 100 processes– “static workers”
• On each CP ≈ 50.000 proces– “dynamic workers”
Rev PA1 2009-10-12
– dynamic workers
re – Control Plane
CP
s providing “system services”
sses each handling one phone
19
Dynamic
• System principle: one Erlangwith a single mobile phone
• A worker encodes a number signal – do some computatioP l d l t l t “• Payload plane translates a “sinto an Erlang message and worker, and vice versa
Rev PA1 2009-10-12
workers
g process handles all signalling
of state machines: receive a on – send a reply signal
i l” f th bil hsignal” from the mobile phone sends it to the correct dynamic
20
Dynamic wo
• A process crash should neveguarantees memory protectio
• SW errors in SGSN leads to phone, dynamic worker will bS f SW i MS• Same for SW errors in MS, ewill crash dynamic worker (of
Rev PA1 2009-10-12
orkers cont.
er affect other mobiles (Erlang on)a short service outage for the
be restarted after the crashf il t f ll t d de.g., failure to follow standards
ffensive programming)
21
Superv
Super
Worker1 Wor
• Crash of worker is noticed by• Supervisor triggers “recovery• Either the crashed worker is• Either the crashed worker is
or
• All workers are killed and res
Rev PA1 2009-10-12
vision
rvisor
rker2 Worker3
y supervisory action”restartedrestarted
started
22
RRecovery
• Recovery action after SW cra• Many restart levels:
– very very small restartvery small restart– very small restart
– small restart– medium restart– large restart– SGSN restart
• Lowest restart level affects oLowest restart level affects o• Highest level affects all phon• Try low level first, if it does no
Rev PA1 2009-10-12
y
i i lprinciples
ash is “restart”
escalation
nly one mobile phonenly one mobile phoneesot help, escalate to next level
23
p
Recovery prin
• Orthogonal to ”restart” is ”takmobile phones are ”taken ovmobile phones are taken ovfailure – ideally phone should
• Method: separate ”control” frh i li done phone is replicated to on
• Efficiency? Can not replicateselect ”good points” to do repselect good points to do rep
Rev PA1 2009-10-12
nciples cont.
keover” – service of existing er” by other board after HWer by other board after HW d not noticeom ”data” – all data related to
h b dne other board every time data changes –plication (transaction concept)plication (transaction concept)
24
Processes - Ge
• Most processes are “server lsome computation – send re
• SGSN extends OTP gen_ser– message passing via cast, no r
message passing ia call ( ca– message passing via call (≈ ca
Rev PA1 2009-10-12
eneric Servers
ike”; receive message – do plyrver behaviour:replyst + s nchroni ation + ret rn al e)st + synchronization + return value)
25
Example Erlang m
sender:sender:.Pid ! Msg,.
ireceiver:.receive
Msg -><action>
end
Rev PA1 2009-10-12
end,.
message passing
26
E l tExample contsender:
.Ret = gen_server:call(Pid, Msg),.
receiver:handle_call(Msg) ->
case Msg of{add, N} ->{ , }
{reply, N + 1};...
end
Rev PA1 2009-10-12
end.
. - gen_server
27
Improved gImproved ggen_server2:
handle_call({M,F,A}) ->_ ({ })apply(M,F,A).
sender:Ms = gen server2:call(Pid {mobility aMs = gen_server2:call(Pid,{mobility,aRet = gen_server2:call(Pid,{session,a
receiver (file mobility.erl):attach(Id) ->
<do something>.
receiver (file session.erl):activate(Ms) ->
Rev PA1 2009-10-12
activate(Ms) ><do something more>.
gen servergen_server
attach [Id]})attach,[Id]}),activate,[Ms]}),
28
Example – robust
• Problem: implement ”cast” wireceiver crashes before messH ?• How?
• Implement cast as: send messtoragestorage
• In receiver: after processing, storage
• In startup of receiver (after crmessages
Rev PA1 2009-10-12
message passing
th guaranteed delivery even if sage is handled
ssage + write into persistent
remove message from
rash): check for stored
29
Erlang – Co
• “Normal” synchronization primmonitors, does not look the s,everything is done with proce
• Mutual exclusion – use a singresource Clients call processresource. Clients call process
• Critical sections – allow only section
Rev PA1 2009-10-12
oncurrency
mitives, like semaphores or same in Erlang. Instead gesses and message passing.gle process to handle s to get accesss to get access.one process to execute
30
Erlang - Conc
• Atomic operations:– ets:update_counter()– mnesia:transaction()
• “home made” using a transaction h– client starts transaction, messa,– client does some “work”– client ends transaction, messa– TP commits “work”TP commits work– “failure” when transaction is sta
revert to state before the start
Rev PA1 2009-10-12
currency cont.
handler process (TP)age to TPg
age to TP
arted but not ended makes TP
31
Erlang - Di
• General rule in SGSN: avoid• General rule in SGSN: avoid synchronization if possible
• Design algorithms that work i– fault tolerance– load balancing
• Avoid relying on global resou• Avoid relying on global resou• Data handling:
– keep as much locally as possibwith mobile phones)
– some data must be distributed /– many different variants of persi
Rev PA1 2009-10-12
y p
istribution
remote communication orremote communication or
independently on each board
urcesurces
ble (typically traffic data associated
/ shared, use mnesia or manualstency, redundancy, replication
32
y, y, p
Example – intra
• Problem – an incoming signathe Payload Plane, to which
• Old solution: a global resourcbetween different “identities” and the corresponding CPand the corresponding CP
• New solution: construct identCP somewhere in Id
• For Ids that are outside SGSrandom CP (rare) or broadca
Rev PA1 2009-10-12
a-SGSN routing
al from a phone is received in CP should it be routed?ce was used to keep mappings that were linked to the phone
tities in a clever way, encode
N control, send signal to a ast to all CPs (very rare)
33
Multi
• Erlang in theory gives you m• The BEAM (Erlang virtual ma( g
schedule Erlang processes o
• However – linear speedup isapplication code to offer eno”parallelism”. Very easy to geparallelism . Very easy to ge
• Profiling in a multicore enviro• In SGSN – dual core gave ro
Rev PA1 2009-10-12
core
ulticore support ”for free”achine) will automatically ) yonto all available cores.
not guaranteed. It is up to the ugh possibilities for et resource bottlenecks.et resource bottlenecks.onment is hard!oughly 20% without tuning.
34
Runtime code
• Fact: SW is never bug free!• Must be able to install error cMust be able to install error c
delivered systems without dis• Erlang can load a new versio
system• Be careful! Code loading req
running SW and great care frrunning SW and great care frEx: since “live data” survives in ta
code must be able to handle da
Rev PA1 2009-10-12
e replacement
corrections into alreadycorrections into already sturbing operationon of a module in a running
uires co-operation from the rom the SW designerrom the SW designerables/storage, the new version of the ata in both new and old format.
35
Bugs in
• Bugs in Erlang / OTP are as • How do we protect SGSN agHow do we protect SGSN ag• Base: same methods as for S
restarts and escalation• Addition: if restarts local to on
to resolve an error condition,
• Using Erlang in a robust wayhardware may suddenly fail i
Rev PA1 2009-10-12
y y
n Erlang
common as bugs in SGSNgainst Erlang failures?gainst Erlang failures?SGSN code; recovery by
ne Erlang node repeatedly fails then kill that Erlang node
y in a distributed system where s a very hard problem!
36
y p
Overload P
• If CPU load or memory usageaccept new connections from
• The SGSN must never stop toverload, better to skip servicR li d i i• Realized in message passingdisgarded (silently dropped o
Rev PA1 2009-10-12
Protection
e goes too high SGSN will not m mobile phonesto “respond” because of ce for some phones
if OLP hitg; if OLP hits messages are or a denial reply generated)
37
What about ”functio
• Designers implementing the need to bother with programmneed to bother with programm
• Framework code offers lots o• Almost like a ”domain specifi• To realize this, functional pro
• But to summarize: FP is a gre
Rev PA1 2009-10-12
nal programming”?
GPRS standards should not ming details.ming details.of ”abstractions” to help out.c language”.
ogramming is very good!
eat help – but not vital. Or?
38
Hask
• Could we use Haskell instead• Not trivial need to do some• Not trivial – need to do some
system:– ”one process per mobile phone
h d l ?scheduler?– ”memory protection between pr
related to phone #1 from data r– ”recovery from software faults”
without losing all data?
Rev PA1 2009-10-12
kell?
d of Erlang?e fundamental re design of thee fundamental re-design of the
e” – need to implement our own
rocesses” – need to separate ”data” related to phone #2– how do we crash and restart
39
Haskel
• Redesign cont.:– ”concurrency” – sending messa– ”runtime code replacement” – nu t e code ep ace e t
without losing the data about th– ”efficiency & memory usage”?
• Reflection: consider Erlang vare the most similar?
Rev PA1 2009-10-12
l cont.
ages between boardsneed to replace broken software eed to ep ace b o e so t a ehe phones
vs. Haskell vs. C++. Which two
40
C lConclu
Pros:• Erlang works very well for GP• High level language – concen• Has the right primitives; fault Cons:
Erlang/OTP not a main strea• Erlang/OTP not a main strea– Poor programming environmen– Single implementation maintain
• Hard to find good Erlang prog• High level language – easy to
f li f d
Rev PA1 2009-10-12
few lines of code...
iusions
PRS traffic control handlingntrate on important partstolerance, distribution, ...
m languagem languagents (debugging, modelling, etc)ned by too few people, lots of bugs
grammers (?)o create a real mess in just a
41