DeepHop on Edge: Hop-by-hop Routing by Distributed

DeepHop on Edge: Hop-by-hop Routing byDistributed Learning with Semantic Attention

Bo He1, Jingyu Wang1, Qi Qi1, Haifeng Sun1, Zirui Zhuang1, Cong Liu2, Jianxin Liao1

1Beijing University of Posts & Telecommunications2China Mobile Research Institute

State Key Laboratory of Networking and Switching Technology

Outline

2

2

3

4

• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results• Conclusions


Outline

3

2

3

4

• Routing in Edge Networks


Routing in Edge Networks

4

node v1

node v2link (v1,v2)WIFI/LTE/ETH…

queue…

Delay

sender

Bandwidth

Packet loss rate


Outline

5

• Routing in Edge Networks• Challenges


Challenges

6

It is hard for the traditional full-path routing to handle the unexpected trafficstress on distributed edge nodes

The traffic fluctuation leads to some elements of the network state exhibitinggreat variations, thus the heuristic centralized routing can hardly capture thewhole dynamic network state transitions

Network state elements have different significance for routing, but theirimplications are intractable to be distinguished and utilized reasonably in thetraffic data forwarding process


Outline

7

• Routing in Edge Networks• Challenges• Our DeepHop Framework


Challenges

8

It is hard for the traditional full-path routing to handle theunexpected traffic stress on distributed edge nodes

The traffic fluctuation leads to some elements of the networkstate exhibiting great variations, thus the heuristic centralizedrouting can hardly capture the global dynamic network statetransitions

Network state elements have different significance for routing,but their implications are intractable to be distinguished andutilized reasonably in the traffic data forwarding process

Hop-by-hop approach

Multi-Agent DRL method

Self-attention mechanism

The MAPOKTR algorithm

Contributions of DeepHop Framework


Our DeepHop Framework

9

Objective 1:minimizing the ratio of discarded packets:

Hop-by-hop routing mechanism of DeepHop

Objective 2:minimizing average slowdown of eachtraffic packet:



10

Multi-Agent Deep Reinforcement Learning (MADRL) model

State Space:

Action Space:

Reward Function:



11

The Multi-Agent Deep Reinforcement Learning algorithm--MAPOKTR:Multi-Agent Policy Optimization using Kronecker-Factored Trust Region

Loss Function:

the ratio of probability distributions:

the point probability distance:



12

The sub-layers structure of DRL agents State Space:

𝑂𝑂1 = 𝑂𝑂𝜏𝜏,𝑗𝑗𝑇𝑇 :

The destination node index

𝑂𝑂2 = 𝑂𝑂𝜏𝜏𝑁𝑁 :

The network state (links and nodes)

𝑂𝑂3 = 𝑂𝑂𝜏𝜏,𝑗𝑗𝑃𝑃 :

The transmission priority



13

The Semantic Attention Mechanism (SAM)


Outline

14

2

3

4

• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results


Experimental results

15

2

3

4

The experimental environments

Net1

Net2

the platform: a Dell desktop (CPU: Inteli7-8700 3.2GHz; Memory: 32G DDR4L2666Mhz; OS: 64-bit Ubuntu 16.04LTS)

the network topology graphs: two realnetwork topology Net1 (12 nodes) andNet2 (20 nodes)

the traffic data: an open real trafficdataset including 8 classes of traffic



16

2

3

4

Performance of hop-by-hop approach

the MADRL-based DeepHophandles the routing tasks betterunder all degrees of congestionin edge networks than theheuristic CRF protocol

DeepHop performs more stablythan CRF in lower degrees ofcongestion according to theirresults in Net1

Task Unfinished Ratio (TUR): the ratio of the packets beingdiscarded before reaching their own destination nodes

Speed of Injection: leads to different degrees of congestion

Coexistent Routing and Flooding (CRF) algorithm: a state-of-the-art heuristic routing protocol of the edge network



17

2

3

4

Performance of Different MADRL algorithms

MAACKTR algorithm: Multi-Agent Actor Critic using Kronecker-Factored Trust Region

MADDPG algorithm: Multi-Agent Deep Deterministic Policy Gradient

Injection of speed=3000(Net1) Injection of speed=4000(Net1) Injection of speed=5000(Net1) Injection of speed=6000(Net1)

2

Injection of speed=3000(Net2) Injection of speed=4000(Net2) Injection of speed=5000(Net2) Injection of speed=6000(Net2)



18

2

3

4

Performance of different neural network structures in DRL agents the structure of separate sub-layers

makes the neural network understandthe semantics effectively

the attention mechanism accelerate theconvergence of policies

MADRL-SAM: The neural networks with sub-layersand self-attention mechanism

MADRL-SM: The neural networks with sub-layers

MADRL: The neural networks only have fully-connected layers

Injection of speed=3000(Net1) Injection of speed=4000(Net1)




19

2

3

Performance of different reward function

all elements of reward function areproved to have a positive effect whenthe agents learn to handle therouting tasks



Reward:

Reward1:

Reward2:

Reward3:



20

2

3

Performance of DeepHop in highly-dynamic networks

DeepHop can cope with the time-varying network congestion

To generate time-varying network congestion,the number of injected packets is randomlyselected from 3,000 to 6,000 per second inexperiments

DeepHop has a good robustness andit is suitable for actual edge networkenvironments


Outline

21

2

3

4

• Routing in Edge Networks• Challenges• Our DeepHop Framework• Experimental results• Conclusions


Conclusions

22

DeepHop utilizes the multi-agent deep reinforcement learning to determine hop-by-hop routing for traffic packets and deploys the agents on edge nodes based onthe MEC technology.

To learn the semantics of complicated state elements, DeepHop designs a self-attention mechanism to help the DRL agents learn better policies faster.

DeepHop adopts a novel MADRL algorithm, MAPOKTR, which introduces thepoint probability to its surrogate loss function for keeping the policy beingmonotonically improved.

For the future work, we will further explore the communication between theagents on nodes and solve the re-train problem of MADRL model when thetopology of edge networks changes significantly.

Thanks!

Documents

DeepHop on Edge: Hop-by-hop Routing by Distributed