Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
GloballySynchronizedtimeviaDatacenterNetworks
KiSuhLeeCornellUniversity
JointworkwithHanWang,VishalShrivastav andHakimWeatherspoon
1
SynchronizedClocks
• Fundamentalfornetworkanddistributedsystems– OWD,Monitoring,Coordination,Snapshots,Updates,…
• Goal:Minimizedandboundedprecisionwithscalability– Minimizedandboundedprecision:hundredsofnanoseconds– Scalability:Entiredatacenter
2
ClockSynchronizationProtocol
• Offset: Timedifferencebetweentwoclocks• Precision:Theworstcaseofoffset
3
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
ClockSynchronizationProtocol
• RTT= 𝑡% − 𝑡" − 𝑡$ − 𝑡#
• Offset=(()*(+)#
− (-*(.#
• Offset= 𝑡# − 𝑡" − 𝑅𝑇𝑇/2
4
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
5
CurrenttimeprotocolsdoNOTprovidebounded precision,duetouncertainty inmeasuredRTT!
Challenge:RTTisnotaccurate
6
• Errorsfrom– Oscillatorskew– InaccurateTimestamping– NetworkStack– NetworkJitter
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
Challenge:RTTisnotaccurate• Errorsfrom– Oscillatorskew– InaccurateTimestamping– NetworkStack– NetworkJitter
• PTP
– Hardwaretimestamping– PTP-enabledswitches– Filtering/Smoothing
7
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
Challenge:Scalability
• Re-synchronizationperiodvs.Networkoverhead• Limitednumberofclients
8
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
SynchronizationProtocols
9
Client Timeserver
𝑡"𝑡#
𝑡$
𝑡%
Precision Scalability Overhead ExtraHardwareNTP us Good Moderate NonePTP sub-us Good Moderate PTP-enableddevicesGPS ns Bad None Timingsignal receivers,cables
Solution:UsethePHYtosynchronizeclocks• ProtocolinthePHY– Eachphysicallylinkisalreadysynchronized!– Noprotocolstackoverhead– Nonetworkoverhead– Scalable:peer-to-peeranddecentralized
10
Application
Transport
Network
DataLink
Physical
𝑡3
𝑡4
DTP:DatacenterTimeProtocol• HighlyScalablewithbounded precision!– ~25ns(4clockticks)betweenpeers– ~150nsforadatacenterwithsixhops– NoNetworkTraffic– Internal ClockSynchronization
• End-to-End:~200nsprecision!
11
Application
Transport
Network
DataLink
Physical
Outline
• Introduction• Design• Evaluation• Discussion• Conclusion
12
DTP:DatacenterTimeProtocol• 10GBackground– Continuous/I/swhenthereisnopacket– Atleast12/I/sbetweentwoEthernetframes
13
Application
Transport
Network
DataLink
Physical
Packeti Packeti+1 Packeti+2
DTP:DatacenterTimeProtocol• 10GBackground– Continuous/I/swhenthereisnopacket– Atleast12/I/sbetweentwoEthernetframes– 1Controlblock(/E/,66bit)=8/I/s– Atleast1/E/betweenanytwoframes– ThePHYisrunby156.25MHz
• Periodis6.4ns
14
Application
Transport
Network
DataLink
Physical
Packeti Packeti+1 Packeti+2
/E/ /E/ /E/ /E/ /E/ /E/ /E/ /E/
/E/ /E/
DTP:DatacenterTimeProtocol
15
Application
Transport
Network
DataLink
Physical
Packeti Packeti+1 Packeti+2
/E/ /E/ /E/ /E/ /E/ /E/ /E/ /E/
/E/ /E/
• DTPoverwrites/E/tosendprotocolmessages– Frequentmessaging– NooverheadtoEthernet(L2)
DTP DTP
DTP DTP
/E/
2bitSyncheader
8bitBlockType
3bitDTPMSGType
53bitDTPPayload
10GbENetworkStack
8/26/16 SoNICNSDI2013
16
Physical64/66bPCS
PMA
PMD
Encode
Scrambler
Gearbox
Decode
Descrambler
Blocksync
DataLink
Network
Transport
Application Data
/S/ /D/ /D/ /D/ /D/ /T/ /E/
DataL3Hdr
DataL3HdrL2Hdr
DataL3HdrL2Hdr GapEthHdr CRCPreamble
011010010110100101101001011010010110100101101001011010010110100101101
Encode
Scrambler
Gearbox
PMA
64bit 2bitsyncheader
16bit
10.3125Gigabits
/S/ /D/ /D/ /D/ /D/ /T/ /E/
Idlecharacters(/I/)
DTP
17
Physical64/66bPCS
Decode
Descrambler
Blocksync
Encode
Scrambler
Gearbox
PMD
PMA
DTPRxDTPTxDTP Control
localcounter
• localcounter:106-bitclock– Frequently,synchronizelow53bits– Occasionally,synchronizehigh53bits
• delay:one-waydelaytopeer
SynchronizationFIFO
delay
LocalClock RemoteClock
Application
Transport
Network
DataLink
Physical
DTP
18
• Runsintwophasesbetweentwopeers– Init Phase:MeasuringOWD– BeaconPhase:Re-Synchronization
Physicallocaldelay
Physicallocaldelay
Application
Transport
Network
DataLink
Physical
DTP: Init Phase
19
• d𝑒𝑙𝑎𝑦 = 𝑡% −𝑡" − 𝛼 /2– 𝛼=3:Ensuredelayisalwayslessthanactualdelay
• Introduce2clocktickerrors– Duetooscillatorskew,timingandSyncFIFO
𝑡"𝑡#𝑡$
𝑡%
Physicallocaldelay
Physicallocaldelay
Application
Transport
Network
DataLink
Physical
DTP: BeaconPhase
20
• local =max(local,remote+delay)• Frequentmessages– Every1.2us(200clockticks)withMTUpackets– Every7.2us(1200clockticks)withJumbopackets
• Introduces2clocktickerrors– Total4clocktickerrors
Physicallocaldelay
Physicallocaldelay
𝑡"𝑡#
Application
Transport
Network
DataLink
Physical
DTPSwitch
21
• global=max(local counters)• Propagatesglobal viaBeaconmessages
Physicallocaldelay
Physicallocaldelay
Physicallocaldelay
Physicallocaldelay
Physicallocaldelay
max
global
Application
Transport
Network
DataLink
Physical
DTPDaemon
• End-to-Endprecision• AccesstheDTPcounterviaPCIe• EstimateDTPtimeusinginvariantTSCcounter
22
DTPProperty
23
• BoundedPrecisioninhardware– Boundedby4T(=25.6ns,T=oscillatortickis6.4ns)– Networkprecisionboundedby4TD
• Disnetworkdiameterinhops
• RequiresNICandswitchmodifications– PTPalsorequiresPTP-enableddevices
DTPvsPTPPTP DTP
Oscillator Skew
Timestamping HW - timestamping PHYtimestamping
NetworkStack Notinvolved Notinvolved
NetworkJitter TransparentClockBoundary Clock
No jitter
Precision UnboundedTenstoHundredsns(When Idle)
Bounded
24
• Handlingfailure• Differentstandards:1GbE,25GbE,40GbE,100GbE,etc• Externalsynchronization(i.e.synchronizingtotruetime)• Incrementaldeployment
25
DTP:Topicsdiscussedinpaper
Handlingfailure
• BitErrors– IgnoresBiterrorsinMSBs– AppendschecksumforlowLSBs
• FaultyDevices– Whentoomanyjumpsoutsidethebound
26
DifferentStandardsData Rate Encoding Data Width Frequency Period Δ1GbE 8b/10b 8bit 125MHz 8ns 25
10 GbE 64b/66b 32bit 156.25MHz 6.4ns 20
40GbE 64b/66b 64bit 625MHz 1.6ns 5
100GbE 64b/66b 64bit 1562.5MHz 0.64ns 2
27
ExternalSynchronization
• Amasterserver– Connectedtoareferencetime– BroadcaststhemappingbetweenDTP andwalltime
• Clientservers– InterpolatestimeusingDTP counters
28
IncrementalDeployment
• Updatesperrack– DTP-enabledswitch– DTP-enabledNICs– Oneserveractingasamaster forwalltime
• SynchronizingRacks– DTP-enabledswitch– DTPbeacon-joinmessageforsynchronizingDTPcounters– Selectanewmaster
29
Outline
• Introduction• Design• Evaluation• Discussion• Conclusion
30
Evaluation
• DTPPrototype– Terasic DE5boardwithAlteraStratix V– UsingBluespec andConnectal framework
31
Evaluation:DTPTopology
32
S4 S5 S6 S7 S8 S9 S10 S11
S1 S2 S3
S0
DTPNIC
Measuredoffsetsbetweenpeers
Evaluation:Logger
• Offsetbetweenpeers:𝑡$ − 𝑡# − OWD• OffsetbetweenSWandHW:𝑡# − 𝑡"
33
Physicallocaldelay
Physicallocaldelay
DTPDaemon DTPDaemon
𝑡"
𝑡# 𝑡$
𝑡", 𝑡#,𝑡$
Evaluation:DTPTopology
34
S4 S5 S6 S7 S8 S9 S10 S11
S1 S2 S3
S0
DTPNIC
Offset=𝑑𝑡𝑝CD- 𝑑𝑡𝑝ED
Evaluation:PTPTopology
35
S4 S5 S6 S7 S8 S9 S10 S11
S1 S2 S3
S0
Timeserver
PTPSwitch
PTPNIC
Evaluation:PTPTopology
36
S4 S5 S6 S7 S8 S9 S10 S11
S1 S2 S3
S0
Timeserver
PTPSwitch
PTPNIC
Evaluation:PTPTopology
37
S4 S5 S6 S7 S8 S9 S10 S11
S1 S2 S3
S0
Timeserver
PTPSwitch
PTPNIC
PTP:IdleNetwork(Notraffic)
• Tenstohundredsofnanosecondprecision
38
-600
-400
-200
0
200
400
600
Offs
et(n
anosecon
d)
Time(min)
PTP:MediumLoaded(4Gbps)
• Tensofmicrosecondsprecision
39
-50
-25
0
25
50
Offs
et(m
icrosecond
)
Time(min)
PTP:HeavilyLoaded(9Gbps)
• Tenstohundredsofmicrosecondprecision
40
-150
-100
-50
0
50
100
150
Offs
et(m
icrosecond
)
Time(min)
DTP:HeavilyLoaded
• Alwayswithin25.6ns(4clockticks)betweenpeers
41
-32
-25.6
-19.2
-12.8
-6.4
0
6.4
12.8
19.2
25.6
32
0 3 6-5
-4
-3
-2
-1
0
1
2
3
4
5
Offs
et(N
anosecon
d)
Time(min)
Offs
et(C
lockTick)
S1-S4 S1-S5 S1-S0 S2-S7 S2-S8
S2-S0 S3-S10 S3-S11 S3-S0
DTPDaemon
42
DTPDaemon(aftersmoothing)
• Usuallycanaccessthecounterwith25.6nsprecision
43
-20
-16
-12
-8
-4
0
4
8
12
16
20
-128
-102.4
-76.8
-51.2
-25.6
0
25.6
51.2
76.8
102.4
128
Offs
et(C
lockTick)
Offs
et(n
anosecon
d)
Time(min)
Outline
• Introduction• Design• Evaluation• Discussion• Conclusion
44
NextSteps
• IntegrationwithOSNT(OpenSourceNetworkTester)– NetFPGA SUMEBoardwithXilinxVirtex-7
45
SomeRelatedWork• SynchronousEthernet(SyncE)– Synchronizethefrequencyofclocks– DTP,PTPsynchronizesthetime ofclocks
• WhiteRabbit:PTP+SyncE– Sub-nanosecondprecision– 1GbEonlyyet
• CommercialPTP+SyncE– Tenstohundredsofnanoseconds
46
Conclusion
• DTPprovidesboundedprecision andscalability– Boundedprecision:4clockticks(25.6ns)betweenpeers– Scalability:153.6nsforadatacenterwithsixhops– Free:NoNetworkTraffic– Applications:Usuallywithin25.6ns(withoutbounds)– End-to-End:153.6+25.6*2=200ns!
47
Questions?
• http://github.com/hanw/sonic-lite• http://sonic.cs.cornell.edu• Email:[email protected]
• CometoPostersessiontomorrow!
48