Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Concurrent Self-Adjusting Distributed Tree Networks
Bruna Peres Stefan Schmid Chen AvinOlga Goussevskaia
• New technologies allow communication networks to be increasingly flexible and reconfigurable
• Traditional networks designs are still optimized toward static metrics
Motivation
Concurrent Self-Adjusting Distributed Tree Networks 2DISC 2017
ProjecToR
Concurrent Self-Adjusting Distributed Tree Networks 3DISC 2017
• ProjecToR: Agile Reconfigurable Data Center Interconnect. Ghobadi et al., SIGCOMM'16
• Self-adjusting networks ↔ self-adjusting data structures
• Splay Trees
•Self-Adjusting Data Structures
Concurrent Self-Adjusting Distributed Tree Networks 4DISC 2017
request c
x
f t
z
vc h
g
r
j
n
x
f
t
z
v
c
h
grj
n
x
f
t
z
v
c
h
grj
n
SplayNets
Concurrent Self-Adjusting Distributed Tree Networks 5
x
ft
z
vc h
g
r
j
n
x
ft
z
vc h
g
r
j
n
request h
request h
S. Schmid, et al., SplayNet: Towards Locally Self-Adjusting NetworksIEEE/ACM Transactions on Networking, 2016.
DISC 2017
SplayNets
Concurrent Self-Adjusting Distributed Tree Networks 6
• Distributed tree network
• Improves the communication cost between two nodes in a self-adjusting manner
• Nodes communicating more frequently become topologically closer to each other over time
• Lowest common ancestor LCA(u,v): locality is preservedv
u
root
LCA(u,v)
DISC 2017
Our Contributions
Concurrent Self-Adjusting Distributed Tree Networks 7
• While SplayNets are inherently intended to distributed
applications, so far, only sequential algorithms are known to
maintain SplayNets
• We present DiSplayNets, the first distributed and concurrent
implementation of SplayNets
DISC 2017
• Network model:
• Binary tree 𝑇 comprised of a set of 𝑛 communication nodes
• Sequence of communication requests 𝜎 = (𝜎1, 𝜎2, … , 𝜎𝑚):
• 𝜎𝑖 = 𝑠, 𝑑
• 𝑡𝑏 𝜎𝑖 and 𝑡𝑒 𝜎𝑖
• Given 𝜎𝑖 𝑠, 𝑑 , s and d rotate in parallel towards the LCA(s,d)
• LCA might change over time
Concurrent Self-Adjusting Distributed Tree Networks 8
Model
DISC 2017
• State machine executed by each node in parallel
Concurrent Self-Adjusting Distributed Tree Networks 9
DiSplayNet
DISC 2017
Climbing
CommunicatingPassive
Waiting
x
ft
z
vc h
g
r
j
n
request h
• State machine executed by each node in parallel
Concurrent Self-Adjusting Distributed Tree Networks 10
DiSplayNet
DISC 2017
Climbing
CommunicatingPassive
Waiting
x
t z
v
f
c h
g j
n
• State machine executed by each node in parallel
Concurrent Self-Adjusting Distributed Tree Networks 11
DiSplayNet
DISC 2017
Climbing
CommunicatingPassive
Waiting
r
x
f
t z
v
c
h
g j
n
Local reconfigurations
Concurrent Self-Adjusting Distributed Tree Networks 12DISC 2017
u
v
w
v
u
wzig-zig
v
uv
u
zig
w z
v
u
w
v w
u
zig-zag
z zz
T1
T1
T1
T1
T1T1T2 T2
T2
T2
T2
T2
T3
T3
T3
T3 T3
T3
T4
T4
T4
T4
w
Local reconfigurations
Concurrent Self-Adjusting Distributed Tree Networks 13DISC 2017
u
v
w
v
u
wzig-zig
zz
T1
T1
T2
T2T3
T3
T4
T4
β(u)
• In order to ensure deadlock and starvation freedom, concurrent operations are performed according to a priority
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
Concurrent Self-Adjusting Distributed Tree Networks 14
DiSplayNet
DISC 2017
• The algorithm is executed in rounds
Concurrent Self-Adjusting Distributed Tree Networks 15
Algorithm
DISC 2017
𝒔𝒊
𝒅𝒋
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
β(𝒔𝒊)
β(𝒅𝒋)Phase 1Phase 2
Phase 2
• The algorithm is executed in rounds
Concurrent Self-Adjusting Distributed Tree Networks 16
Algorithm
DISC 2017
𝒔𝒊
𝒅𝒋
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
ack
ack
Phase 3
• The algorithm is executed in rounds
Concurrent Self-Adjusting Distributed Tree Networks 17
Algorithm
DISC 2017
𝒔𝒊
𝒅𝒋
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
ack
Phase 4
• The algorithm is executed in rounds
Concurrent Self-Adjusting Distributed Tree Networks 18
Algorithm
DISC 2017
𝒔𝒊
𝒅𝒋
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
Phase 5
• The algorithm is executed in rounds
Concurrent Self-Adjusting Distributed Tree Networks 19
Algorithm
DISC 2017
𝒔𝒊
𝒅𝒋
𝑡𝑏 𝜎𝑖(𝑠𝑖 , 𝑑𝑖) < 𝑡𝑏 𝜎𝑗(𝑠𝑗 , 𝑑𝑗)
• Self-adjust to the communication pattern in a fully-decentralized manner
• Starvation free
• Deadlock free
DiSplayNets
Concurrent Self-Adjusting Distributed Tree Networks 20DISC 2017
• Analyze the efficiency
• Work cost: 𝑊 𝐷𝑖𝑆𝑝𝑙𝑎𝑦𝑁𝑒𝑡, 𝑇0, 𝜎 = 𝜎𝑖 𝜖 𝜎𝑤(𝜎𝑖)
• Time cost:
• Request delay: 𝑡𝑑 𝜎𝑖 = 𝑡𝑒 𝜎𝑖 - 𝑡𝑏 𝜎𝑖
• Makespan: 𝑇 𝑇0,𝜎 = max𝜎𝑖 𝜖 𝜎
𝑡𝑒 𝜎𝑖 − min𝜎𝑖 𝜖 𝜎
𝑡𝑏 𝜎𝑖
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 21DISC 2017
Progress Matrix
Concurrent Self-Adjusting Distributed Tree Networks 22
𝒕𝒊 𝒕𝒊+𝟏 𝒕𝒊+𝟐 𝒕𝒊+𝟑 𝒕𝒊+𝟒
DISC 2017
Progress Matrix
Concurrent Self-Adjusting Distributed Tree Networks 23
Makespan
Work
DISC 2017
• Our simulations show first promising results
• ProjecToR data
• 128 node randomly selected from 2 production clusters (running a mix of workloads, including MapReduce-type jobs, index builders, and database and storage systems)
• 1000 requests
• Poisson process
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 24DISC 2017
• Our simulations show first promising results
• Individual work CDF
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 25DISC 2017
• Our simulations show first promising results
• Total work
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 26DISC 2017
• Our simulations show first promising results
• Request delay
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 27DISC 2017
• Our simulations show first promising results
• Makespan
Future Work
Concurrent Self-Adjusting Distributed Tree Networks 28DISC 2017
Thank you
Bruna Peres
Concurrent Self-Adjusting Distributed Tree NetworksDISC 2017