81
1 18-847F: Special Topics in Computer Systems Foundations of Cloud and Machine Learning Infrastructure

18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1

18-847F:SpecialTopicsinComputerSystems

FoundationsofCloudandMachineLearning

Infrastructure

Page 2: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

2

Lecture2:OverviewandKeyConcepts

FoundationsofCloudandMachineLearning

Infrastructure

Page 3: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

GraduateSeminarClass

3

(Almost)nolectures

Readingresearchpapers

Studentpresentations

ClassDiscussions

FinalResearchProject(NoExams!)

Page 4: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TODO

4

o  Sign-upforpresentation

o  Formgroupsforclassprojects

o  Startthinkingaboutprojects

Page 5: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TopicsCovered

5

CloudCompu)ng DistributedStorage

MachineLearning

Modelreplica

PARAMETERSERVERw’=w–αΔw

Modelreplica

Modelreplica

w Δw

a b a+b

Page 6: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HistoryandOverview

6

CloudCompu)ng DistributedStorage

a b a+b

Page 7: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HistoryandOverview

7

CloudCompu)ngo  MapReduce,Spark

o  SchedulinginParallelComputing

o  StragglerReplication

o  TaskReplicationinQueueingSystems

Page 8: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Whatisthecloud?

8

Acollectionofserversthatcanfunctionasasinglecomputing

node,andcanbeaccessedfrommultipledevices

Page 9: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1960’s:TheMainframeEra

9

o  Large,expensivemachineso  Onlyoneperuniversity/institution

IBM704(1964)

Page 10: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1970’s:Virtualization

10

o  IBMreleasedaVMOSthatallowedmultipleuserstosharethemainframecomputer

IBM704(1964)

Page 11: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1980’s-1990’s:InternetandPCs

11

o  PCsbecomeaffordableo  Internetconnectivitywentonimproving

o  VirtualPrivateNetworks(VPNs)

o  GridComputing:ConnectcheapPCsviatheInternet

o  Onthetheoryside,queueingtheory,traditionallyusedinoperationsmanagementrebounded

Page 12: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

AShortTutorialonQueueingTheory

12

Page 13: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

13

Reference Textbooks

Page 14: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

DesignQuestion1Whatifthearrivalratedoubles?

14

Servicerateμ=5Job Arrival

Rate λ = 3

MeanResponseTimeT=Wai)ng)meinQueue+ServiceTime

Q: If λ doubles, do you need a server of 2x rate to achieve the same E[T]?

Page 15: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

DesignQuestion2Manyslow,oronefastserver?

15

3μλ

μ

μ

μ

Choose first idle server

λ

Q: Which of the two systems gives lower E[T]?

Page 16: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

DesignQuestion3Howtoassignjobstoservers

16

Q: Which policy works the best?

μ

μ

μ

λ

Random, Round-robin, Shortest Queue, SITA, LWL

Page 17: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

QueueingTerminology

17

ServicerateμArrival Rate λ

MeanServiceTime E[S]=1/μ

MeanWaitingTime E[W]

MeanResponseTime E[T]=E[W]+E[S]

Mean#CustomersinQueue E[N]

ServerUtilization ρ=λ/μ

Page 18: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Little’sLaw

18

Theorem:ForanyergodicopensystemwehaveE[N]=λE[T]

Some Variants E[Nw] = λ E[W] ρ = λ E[S]

Very general and hence powerful law •  Any # of servers, scheduling policy, queue size limit

Page 19: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Little’sLaw:Quiz

19

A professor takes 2 new students in even-numbered years, and 1 new student in odd-numbered years. If avg. graduation time = 6 yrs, how many students will the professor have on average?

Page 20: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Little’sLaw:Answer

20

A professor takes 2 new students in even-numbered years, and 1 new student in odd-numbered years. If avg. graduation time = 6 yrs, how many students will the professor have on average?

E[N]=λE[T] =1.5*6 =9

Page 21: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Kendall’sNotation

21

ServicerateμArrival Rate λ

A/S/nARRIVALDIST.M:PoissonEk:ErlangG:General

SERVICEDIST.M:Exponen)alD:Determinis)c

G:General

Numberofservers

Page 22: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/1Queue

22

WANTTOFIND

1.  MeanResponseTimeE[T]

2.  MeanWaitingTimeE[W]

ServicerateμArrival Rate λ

Page 23: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/1:MarkovModel

23

0 2

λ

µ

1 4 3

λ

µ

λ

µ

λ

µ

where

Page 24: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/1:MeanResponseTime

24

0 2

λ

µ

1 4 3

λ

µ

λ

µ

λ

µ

Page 25: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Quiz:DesignQuestion1Whatifthearrivalratedoubles?

25

Servicerateμ=5Job Arrival

Rate λ = 3

MeanResponseTimeT=Wai)ng)meinQueue+ServiceTime

Q: If λ doubles, do you need a server of 2x rate to achieve the same E[T]?

A: Service rate 6+2 = 8 is sufficient

Page 26: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/nQueue

26

μ

μ

μ

Choose first empty queue

λ

WANTTOFIND

1.  MeanResponseTimeE[T]

2.  MeanWaitingTimeE[W]

Page 27: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/nQueue

27

0 2

λ

µ

1 n

λ

λ

λ

λ

where

Erlang-CFormula

PQ =1X

i=n

⇡i

= ⇡0nn

n!

1X

i=n

⇢i

=nn⇡0

n!(1� ⇢)

⇡0 =

"n�1X

i=0

(n⇢)i

i!+

(n⇢)n

n!(1� ⇢)

#�1

⇢ =�

Used in call centers to determine number of agents required

Page 28: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/M/nQueue

28

E[Nw] =1X

i=n

⇡i(i� n)

= ⇡0

1X

i=n

⇢inn

n!(i� n)

= PQ⇢

1� ⇢

E[W ] =E[Nw]

�= PQ

�(1� ⇢)

E[T ] = PQ⇢

�(1� ⇢)+

1

µ

Page 29: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Quiz:Comparisonof3systems

29

μ

μ

μ

Choose first empty server

λnμλ

μ

μ

μ

λ/n

λ/n

λ/n

M/M/1 M/M/n

FDM

Page 30: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Quiz:Comparisonof3systems

30

μ

μ

μ

Choose first empty server

λnμλ

μ

μ

μ

λ/n

λ/n

λ/n

M/M/1 M/M/n

FDM

E[T ]M/M/1 =1

nµ� �

E[T ]FDM =n

nµ� �FDMisn)messlowerthanM/M/1

Page 31: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Quiz:Comparisonof3systems

31

μ

μ

μ

Choose first empty server

λnμλ

M/M/1 M/M/n

E[T ]M/M/n = PQ⇢

�(1� ⇢)+

1

µE[T ]M/M/1 =

�(1� ⇢)

E[T ]M/M/n

E[T ]M/M/1= PQ + n(1� ⇢)

M/M/nisn)messlowerwhenρà0

M/M/nandM/M/1arealmostequalwhenρà1

Page 32: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/G/1QueuePollaczek-KhinchineFormula

32

ServicedistFX Arrival Rate λ

CannotuseMarkovchain

analysis

E[T ] = E[X] +E[X2]

2(1� �E[X])

Page 33: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

ProofofPKformula

33

Resid

ualservice)me

Time

X1 X2

E[Tw] = E[Nw] · E[X] + E[R]

= �E[Tw] · E[X] +E[X2]

2

=E[X2]

2(1� �E[X])

Page 34: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

M/G/nQueue

34

ServicedistFX

CannotuseMarkovchain

analysis

Choose first empty queue

λ

ServicedistFX

ServicedistFX

E[T ] ⇡ E[X] +E[X2]

2E[X]· E[WM/M/n]

Page 35: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1990’s:SchedulinginParallelComputing

o  Bin-Packingo  Needjobsizeestimates

35

J2

J3

J1

J4Processors

Time

Forreferencesseesurvey[Weinberg2008]

Page 36: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1990’s:SchedulinginParallelComputing

o  Bin-Packingo  Needjobsizeestimates

o  ProcessorSharing,i.e.switchingb/wthreadsfordifferentjobso  Needprocessorspeedestimates

o  Load-balancing:Workstealing,Power-of-choiceo  Needqueuelengthestimates

36

Page 37: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

1990’s:InternetandPCs

37

o  PCsbecomeaffordableo  Internetconnectivitywentonimproving

o  VirtualPrivateNetworks(VPNs)o  GridComputing:ConnectcheapPCsviatheInternet

o  ManyInternetCompaniesboughttheirownserversandmanagedthemprivately

o  ButthentheDotcombubbleburst..

Page 38: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

2000’s:TheCloudComputingEra

38

o  Theideaofaflexible,low-cost,scalable,sharedcomputingenvironmentdeveloped

o  Computingbecomeautility,likeelectricity

Page 39: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

KEYISSUE:Jobsizes,serverspeeds&queuelengthsareunpredictableREASON:Large-scaleresourcesharingàVariabilityinservice

•  Virtualization,serveroutagesetc.•  Normandnotanexception[Dean-Barroso2013]

2000’s:TheCloudComputingEra

39

Page 40: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TheTaleofTails

40

TailatScale:99%ilelatencycanbemuchhigherthanaverage

Page 41: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TheTaleofTails

41

TailatScale:99%ilelatencymuchhigherthanaverage

Page 42: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaleofTails:Quiz

42

Aserverfinishesataskin1secwithprobability0.9,and10secwithprobability0.1o  Whatistheexpectedtaskexecutiontime?

o  If100tasksareruninparallelof100servers,whatistheexpectedtimetocompleteallofthem.

Page 43: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaleofTails:Quiz

43

Aserverfinishesataskin1secwithprobability0.9,and10secwithprobability0.1o  Whatistheexpectedtaskexecutiontime?

1*0.9+10*0.1=1.9o  If100tasksareruninparallelof100servers,whatis

theexpectedtimetocompleteallofthem.

Page 44: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaleofTails:Quiz

44

Aserverfinishesataskin1secwithprobability0.9,and10secwithprobability0.1o  Whatistheexpectedtaskexecutiontime?

1*0.9+10*0.1=1.9o  If100tasksareruninparallelof100servers,whatis

theexpectedtimetocompleteallofthem.1*0.9100+10*(1-0.9100)~10

Page 45: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

StragglerReplication

45

Task1

Task2

Task3

Task4

PROBLEM:SlowesttasksbecomeabottleneckSOLUTION:Replicatethestragglersandwaitforonecopy

Task4

Eg.MapReduce,ApacheSparklaunch1replica,keeporiginal

copy

PARAMETERSp:Frac.oftasksreplicatedr:#additionalreplicasc:kill/keeporiginaltask

Page 46: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

StragglerReplicationAnalysis[Wang-GJ-WornellSIGMETRICS2014,15]

46

METRICSE[T]=TimetofinishalltasksE[C]=Totalserverruntimepertask Yistheresidual

service)mealeraddingreplicas

PARAMETERSp:Frac.oftasksreplicatedr:#additionalreplicasc:kill/keeporiginaltask

E[X(1�p)n:n] = x1�p = F

�1X (1� p)

CentralValueTheorem

ExtremeValueTheorem

DifferentbehaviorforExponen)al,LightorHeavytailedY

n -> ∞ n -> ∞

Page 47: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

SimulationsusingGoogleClusterDataLatency-CostTrade-off

47

Increasingfrac)onoftasksreplicated

Expe

cted

Laten

cyE[T]

ExpectedCostE[C]

Carefulchoiceofreplica)onstrategycan

bebeoerthanthedefaultinMapReduce

Page 48: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaskReplicationinQueueingSystems

48

Page 49: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaskReplicationinCloudComputing

IDEA:Assigntasktomultipleserversandwaitforearliestcopy

COST

o  Additionalcomputingtimeatservers

49

Waitfortheearliestcopyto

finish,andcanceltherest

Task

Page 50: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

TaskReplicationinCloudComputing

IDEA:Assigntasktomultipleserversandwaitforearliestcopy

COST

o  Additionalcomputingtimeatservers

o  Increasedqueuingdelayforothertasks

50

Waitfortheearliestcopyto

finish,andcanceltherest

Page 51: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Analogy:SupermarketQueues

51

Page 52: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

SupermarketQueues

52

Page 53: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

SupermarketQueues

Getafriendtojointheotherqueue!

53Whatifeveryoneinthesupermarketusesthisstrategy?

Page 54: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

DesignQuestions

o  Howmanyreplicastolaunch?

o  Whichqueuestojoin?

o  Whentoissueandcancelthereplicas?

54

1

2

n

Page 55: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

SurprisingInsight

Incertainregimes,replicationcouldmakethewholesystemfaster,andcheaper!

55

VS

Effectiveservicerate>Sumofindividualservers

Page 56: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CloudSpotMarkets

56

o  Sparecapacityincloudcomputing

Morning Alernoon Evening Night

Usage

Needthismuchcapacity

Whattodowithsparecapacity?

Capacity

Page 57: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CloudSpotMarkets

57

o  Sellitonthespotmarketforalowerprice!

Morning Alernoon Evening Night

SpotInstancePrice

OnDemandprice

Price

Page 58: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

BiddingforSpotInstances

58

o  Sellitonthespotmarketforalowerprice

Morning Alernoon Evening Night

SpotInstancePrice

OnDemandprice

Bid

Pre-emptedatthis)me

Page 59: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Sept27GuestLecture:Prof.CarleeJoe-Wong

59

o  Biddingandpricingstrategiesforspotmarkets

Morning Alernoon Evening Night

SpotInstancePrice

OnDemandprice

Bid

Pre-emptedatthis)me

Page 60: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HistoryandOverview

60

CloudCompu)ng DistributedStorage

a b a+b

Page 61: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HistoryandOverview

61

DistributedStorageo  RAIDsystems

o  Codingforlocality/repair

o  Systemsimplementationofcodes

o  Reducinglatencyincontent

download

a b a+b

Page 62: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

o  Contentisreplicatedonthecloudforreliability

o  Cansupportmoreuserssimultaneouslyo  Replicatedusedfor“hot”data,i.e.morefrequentaccessed

ReplicatedStorage

62

Any1outof3copiesissufficient

Page 63: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

o  Withan(n,k)MDScode,anykoutofnchunksaresufficiento  Facebook,Google,Microsoftuse(14,10)or(7,4)codeso  Currentlyusedforcolddata,increasingforhotdata

ErasureCodedStorage

63

Anyk=2outofn=3aresufficient

Page 64: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

RAID:RedundantArrayofIndependentDisks(1987)

64

o  LevelsRAID0,RAID1,…:designfordifferentgoalssuch

asreliability,availability,capacityetc.

o  Oneoftheinventors,GarthGibsonishereatCMU!

Page 65: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CodingTheory

65

o  Forreliablecommunicationinpresenceofnoise

o  BellLabswasoneoftheleadersin1950’s

o  Keyfigures:ClaudeShannonandRichardHamming

Page 66: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

SimplestCodes

66

o  RepetitionCodeo  0à000:Rate:1/3

o  Ifreceive0??wecanrecoverfrom2erasures

o  (3,2)code:Databits:a,bParitybit:(aXORb)

o  Example:011,110:Rate2/3

o  Ifwereceive0?1or?10wecancorrectthefailedbit

o  2bitsymbols:(01)?(11)

Page 67: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HammingCodes

67

o  (7,4)HammingCode:4databits,3paritybits

o  Parity

p1 = d1 � d2 � d4

Page 68: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HammingCodes:Quiz

68

o  Whatistherateofthecode?

o  Correctthe2erasureso  (d1,d2,d3,d4,p1,p2,p3)=(0,?,1,?,1,0,0)

Page 69: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

HammingCodes:Answer

69

o  Whatistherateofthecode?R=4/7

o  Correctthe2erasureso  (d1,d2,d3,d4,p1,p2,p3)=(0,0,1,1,1,0,0)

Page 70: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

(n,k)Reed-SolomonCodes:1960

70

o  Data:d1,d2,d3,…dk

o  Polynomial:d1+d2x+d3x2+…dkxk-1

o  Paritybits:Evaluateatn-kpoints:

x=1: d1+d2+d3+d4

x=2: d1+2d2+4d3+8d4

x=3: …. x=4: …. x=n: …

o  Cansolveforthecoefficientsfromanykcodedsymbols

Page 71: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Example:(4,2)Reed-SolomonCode

71

o  Data:d1,d2àPolynomial:d1+d2x+d3x2+…dkxk-1

o  Cansolveforthecoefficientsfromanykcodedsymbolso  Microsoftuses(7,4)codeo  Facebookuses(14,10)code

d1 d2 d1+d2 d1+2d2

Page 72: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

o  RepairingfailednodesishardwithReed-SolomonCodes..

o  Ifwelose1node:

o  Needtocontactkothernodes

o  Needtodownloadktimesthelostdata

LocalityandRepairIssues

72

Page 73: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Solution:LocallyRepairableCodes

73

o  Codesdesignedtominimize:o  RepairBandwidtho  Numberofnodescontacted

Page 74: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

GuestLecture:Prof.RashmiVinayak

74

o  SystemsImplementationof‘piggybacking’erasurecodesinApacheHadoop

Page 75: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

o  Withan(n,k)MDScode,anykoutofnchunksaresufficiento  Facebook,Google,Microsoftuse(14,10)or(7,4)codeso  Currentlyusedforcolddata,increasingforhotdata

Q:Howmanyuserscanweserve,andhowfast?

ErasureCodedStorage

75

Anyk=2outofn=3aresufficient

Page 76: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

The(n,k)fork-joinmodel[GJ-Liu-Soljanin2012,14]

o  Requestallnchunks,waitforanyktobedownloadedo  EachchunktakesservicetimeX~FX

λ

k = 1: Replicated Case k = n: Fork-join system actively studied in 90’s 76

Wait for any 2 out of 3 chunks

Downloadrequests

(3,2) fork-join

Page 77: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CodedComputingandML

77

Ax

o  Sofar:Codingforstorageo  Codescanalsospeedupcomputingandmachinelearning!

o  Example:Matrix-VectorMultiplication

Page 78: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CodedComputingandML

78

x

o  Sofar:codingforstorageo  Codescanalsospeedupcomputingandmachinelearning!

o  Example:Matrix-VectorMultiplication

A1

A2

Page 79: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

CodedComputingandML

79

o  Sofar:codingforstorageo  Codescanalsospeedupcomputingandmachinelearning!

o  Example:Matrix-VectorMultiplication

xA1

xA1

x

A1+A2

Needonly2outof3tofinish

Page 80: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

GuestLecture:SanghamitraDutta

80

Short-dotcodes

Page 81: 18-847F: Special Topics in Computer Systems Foundations of ... · 2 Lecture 2: Overview and Key Concepts ... o On the theory side, queueing theory, traditionally used in operations

Second-halfoftheClass:MachineLearning

81

CloudCompu)ng DistributedStorage

MachineLearning

Modelreplica

PARAMETERSERVERw’=w–αΔw

Modelreplica

Modelreplica

w Δw

a b a+b