MED:TheMonitor-Emulator-DebuggerforSoftware-DefinedNetworks
Quanquan Zhi andWeiXuInstitute for Interdisciplinary Information Sciences
Tsinghua University
Software-Defined Networks (SDN):promises and challenges
• SDN will simplify future network design and operation
• Bugs are common─ Controller─ Switch software─ Race conditions
• Network Ops -> SystemsDevOps─ Command line -> programs─ Lacking of tools─ Fast, repeatable
Monitor-Emulator-Debugger:A debug / testing tool forSDNDevOps
• A software Debugger─ fast, repeatable, automated tools
─ addresses concurrency bugs
• Tightly coupled with physical network
- Automatic physical network sync
MED architecture overview
Monitor Emulator Debugger
App
Controlmessages
App App
Controller
RealSDN
MED Agent (Monitor) MED(Emulator)
VirtualSDN
OVS
OVS
OVS
Datapackets
Packet Tracer
Loop and Reachability Checker
Table Checker
Race Conditions Detector
Debugger Controller
Debugger
• Snapshot (initialization)─ Physical network topology (LLDP)─ Initial forwarding table states
• Capture SDN state changes over time─ Openflowmessages to/from the SDN controller─ E.g. packets-in,packets-out,ruleinstallation/removal,andportsup/downevents
• Sample data packets─ Essential for replay/testing
The monitor
The emulator: key ideas• Thekeychallenge
─ Emulating a blackbox controller from physical SDN
• Solution─ Replay all Openflowmessages captured=>settoatime
• Question:In what order?App
Controlmessages
App App
Controller
Statemessages
RealSDN
Emulator Controller
VirtualSDN
OVS
OVS
OVS
Replayedmessages
DebuggerController
App App
Injectmessages
The emulator: operation• Online Operation- Trackingmode
• Offline Operation─ “Time Travel”
Initial setup
Set_to_current Trackingstate
Set_to_stable Specified state
Set_to_nondeterministic(t)State1 State2 StateN
Replay
Online
…
Offline
The emulator: offline operations• Set to a stable state at any time
• Emulateallpossibleorderingforconcurrentevents
Initial setup
Set_to_current Trackingstate
Set_to_stable Specified state
Set_to_nondeterministic(t)State1 State2 StateN
Replay
Online
…
Offline
The debugger
• A controller that injects messages into the replayedmessage stream
• “Apps” built on top of the emulator─ Settoaspecifictime─ Anexternalcontrollerinterface
• Example debugger apps─ Packettracer
─ Loopandreachabilitychecker
─ Forwarding tablechecker
─ Raceconditionsdetector
Emulator Controller
Replayedmessages
VirtualSDN
OVS
OVS
OVS
Exampledebuggerapp 1:PacketTracer(PT)
DebuggerController
PT
TO_CONTROLLERReplay:Packet_Out
Packet_InFlow_Status_Request
Flow_status_reply
PacketmatchesNormalEntryPacketmatchesTO_CONTROLLER
Outputs:1. A packet’s entire path through the network2. Whichforwardingruleisusedoneach hop
Exampledebuggerapp 2:LoopandReachabilityChecker(LRC)
DebuggerController
PT
LRC
Asserts:
• The packet forwarding has no loop
-- AND --
• The packet reaches the destination
• Worksonlineoroffline
Exampledebuggerapp 3:RaceConditionDetector(RCD)
Asserts:
• In ANY possible concurrent state, there is no loopor blackhole
Initial setup
Set_to_nondeterministic(t)State1 State2 StateN…
Offline
• Expensive?Cantriviallyruninparallelwithmultipleemulators
DebuggerController
PT
LRC
RCD
Exampledebuggerapp 4:TableChecker(TC)
Asserts:• The forwarding tables on physical switches are thesame as those in the emulator
Forwardingrules
Flow table
OpenFlow Switch
SDN
Forwardingrules
Flow table
OVS
EmulatorTable
Checker
Installrules
DebuggerController
PT
LRC
RCD
TC
Evaluation
• Performance- Emulatorinitialization
- PacketTracing (PT)performance
• Case studies- Bugsonphysicalswitchsoftware
- Racecondition analysis
Experimentsetup
• 20switchesnetwork,typicalDCNtopology─ Pica8P-3298
─ 30,000OpenFlow total(~1,500rulesperswitch)
Initial setup performance
Discoverphysicaltopo+setupemulator
topo
Dumpallflowtablesfromswitches
InstallallflowtablesentriestoEmulator
(30Krules)
4.9sec 0.54sec 12.2sec
State changed during the setup? Redo until done.
PacketTracing(PT) performance
• Random routing
• Performanceoftracingpathswithdifferentlengths
#hops 2 4 6 8 10
% oftestdata 10.6% 13.2% 57.9% 16.2% 2.1%
Timetaken(ms) 0.626 1.536 2.828 3.532 5.001
Realworldbuginswitch software
Pica8switchflowtable:
MEDOVSflowtable:
BuginPicOS-OVS2.3
“AGREportisinjectingARPrequestpacketsbacktothesameport.TheexpectedresultsistoforwardallpacketsexcepttheGREport.”
http://www.pica8.com/document/v2.3/html/release-notes-for-picos-2.3
Non-deterministicstatesinthenetworkduetoconcurrentmessages
Controller
• Whichswitchprocessedthemessagefirst?─ Sometimeswedonotknow
─ Canbeok,butcanmeanproblems
Raceconditionexample
r:in_port=1->Port2
r:in_port=1->Port3r:in_port=3->Port1
Should we enforce the ordering?
Are we enforcing them correctly?
[1] XinJin,Hongqiang HarryLiu,RohanGandhi,Srikanth Kandula,Ratul Mahajan,MingZhang, JenniferRexford,RogerWattenhofer, DynamicSchedulingofNetworkUpdates, SIGCOMM, 2014
AB
C
Racecondition detectorexample(cont’d)
Conclusion
• A step bring in the software testing / debugging tools toSDN• Fast, reproducible
• Single step tracingwith packets
• Debugging concurrencyproblems
• Emulates physical network
• Evaluation on an SDN with 20-switches
Wei Xu <[email protected]>
Backup slides
MEDfunctions
MED: ausefultooltodebugproblemsinSDN
• Createanemulatorthatcanbesettothenetworkstateatanygivenpointoftime
• Tracetheforwardingpathsandtheflowtableentriesusedalongthepath, foreachindividualdatapackets
• CaptureandfindthecauseofcommonSDNproblems:Loop,Reachability failure and RaceConditions
Performance:insertingrules