20
Demystifying Deep Learning in Networking Ying Zheng, Ziyu Liu, Xinyu You, Yuedong Xu Junchen Jiang 1

Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Demystifying Deep Learning in Networking

Ying Zheng, Ziyu Liu, Xinyu You, Yuedong Xu Junchen Jiang

1

Page 2: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Why Deep Neural Nets in Networking?

2

Application Example gains

Cloud job scheduling 32.4% less job slowdown

Adaptive-bitrate streaming 12-25% better QoE

Routing 23-50% less congestion

Cellular traffic scheduling 14.7% higher utilization

Great promises of DNNs: Two-digit improvements in multiple applications

Many more papers are coming!

Page 3: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

So what’s wrong with DNNs in networking?

3

DNNSystem State Action

DNN is a “black box”

Problems• Hard to confer causality• Hard to trust• Agnostic to domain knowledge

White-box logic

System State ActionTraditional

Deep learning

Page 4: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Black-Box Problem #1: Hard to confer causality

4

DNN5, 1, 3, 10, … 1

Sequence of incoming jobs (length)

Picked job (length)

DNN9, 2, 7, 15, … 2

DNN20, 11, 13, 14Wait

(not 11!)

So probably DNN is looking for shortest job

But …

Correlation ≠ Causality

Page 5: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Problem #2: Hard to trust

5

0

5

10

15

20

0 10 20 30 40 50Avg

. jo

b c

om

ple

tio

n t

ime

(sec

)

Experiment

Rule-based logic (shortest job-first)

DNN-based logic

“The rule is suboptimal, but we know how it works”

Bet

ter

The DNN is on avg better, but we don’t know when/why it will fail

System engineers may favor certainty over performance

Page 6: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Problem #3: Agnostic to domain-specific knowledge

6

Empirical requirement:Resource utilization should be below 80%

??

Hard to imbue a domain-specific rule in a black box.

DNNJob sequence Scheduling

Page 7: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Roadmap to white-box DNNs

•How is a decision made?

•When will it fail?

•Can domain-specific knowledge be integrated?

7

Page 8: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Outline

•Why try to interpret deep learning in networking

•An attempt to explain DNN in networking: A case study

•Examples of worst-case performance of DNN

• Improving performance by domain-specific knowledge

8

Page 9: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

First attempt: use saliency map to understand DNNs

• Calculate gradient vector

𝑤𝑖=𝜕𝑦𝑖

𝜕𝑥

• x: input vector

• y: output vectorthe 𝑗𝑡ℎ element of 𝑤𝑖 shows how much a small change on the 𝑗𝑡ℎ

feature of x will change how likely 𝑖𝑡ℎ action is picked.

9

Page 10: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

DeepRM Input/Output: A Case Study

• Input: Remaining resource, Resource requirements• Output: Act probability over possible actions

10

Inp

ut state

Inp

ut state

Deep Neural Network(DNN)

Actio

n sp

ace A

ction

space

Page 11: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

CPU

MEM

Job 1 Job2 Job3

• Request time of jobs(outstanding district)

What features make DeepRM a decision?

• Saliency map

11

• CPU/RAM occupancy has small impact on the output

Page 12: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

• High-level feathers have connections to intermediate layer and affect final output

Relationship between intermediate output and decision

12

Hid

de

n N

euro

n

Activatio

n V

alue

Job Length

0 4 8 12 16 20

121086420

Page 13: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Outline

13

•Why try to interpret deep learning in networking

•Reverse-engineer DNN in networking: A case study

•Examples of worst-case performance of DNN

• Improving performance by domain-specific knowledge

Page 14: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

When DNN fails in a different workload than what it's trained on

14

Comparisonobjective

Testing distribution

DNNRule-basedalgorithm

Average job slowdown

Training distribution

3.73 4.513.25 4.873.93 5.373.03 4.71

Other distribution

10.57 9.6911.44 10.2312.02 10.7511.48 10.22

Page 15: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

When DNN fails *even* in the same workload to what it's trained on

15

Comparisonobjective

Test data DeepRM Rule-basedalgorithm

Average jobslowdown

Normalsequence

1.84 2.01

Adversarialsequence

1.80 1.46

Page 16: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Outline

16

•Why try to interpret deep learning in networking

•Reverse-engineer DNN in networking: A case study

•Examples of worst-case performance of DNN

• Improving performance by domain-specific knowledge

Page 17: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Domain-specific knowledge might provide a cure

•Better transferability

•Better robustness

•Better training efficiency

17

Page 18: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

A standard way to incorporate domain knowledge

18

Rule

Loss Function

DeepRMTeacher Student

Projection

Back-propagation

Page 19: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Some takeaways, a lot more need to be done

• DNNs in networking achieve superior performance

• DNNs’ limitations compel us to try to at least understand it.

• Our preliminary findings:

a) One can explain to some extent how networking DNNs work

b) DNNs may fail: fits only distribution, vulnerable to noises

19

Page 20: Demystifying Deep Learning in NetworkingWhy Deep Neural Nets in Networking? 2 Application Example gains Cloud job scheduling 32.4% less job slowdown Adaptive-bitrate streaming 12-25%

Thanks for your listening!

20