34
Revolutionary Voice Enhancement in Real-Time Communications with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz

Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

Revolutionary Voice Enhancement in Real-Time Communications

with GPU

Davit Baghdasaryan, CEO, 2HzArto Minasyan, CTO, 2Hz

Page 2: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

2

Page 3: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,
Page 4: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

Mute Background Noises

Page 5: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

Voice Quality with Deep Learning

5

•Mute Background Noise

•Mute Everyone Except Me

•Remove Room Echo

•High Resolution Voice Everywhere

Page 6: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

6

Real-Time Noise Suppression with Deep Learning

Page 7: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

7

-Requires 2-4 mics

-Runs on edge device

-Cancels only limited noises

-Outbound only

Traditional Noise Cancellation

Page 8: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

8

Train krispNet Deep Neural Network

Background Noises

Clean Human Speeches

Deep Learning powered Noise Cancellation

-No dependency on mics

-Bi-directional

-Cancels all noise types

-Runs everywhere - on device and in the cloud

Page 9: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

9

How to Measure Voice Quality?

Page 10: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

10

- Academia - PESQ, Subjective

- Industry - 3QUEST (Speech MOS, Noise MOS, Global MOS)

- Skype Audio Test and 3GPP TS 26.131 specifications

Industry Standards

Page 11: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

11

Audio Lab

Page 12: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

12

Page 13: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

13

krisp.ai

Page 14: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,
Page 15: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

Seamlessly Integrates in Conferencing Apps

Supports any Microphone or Headset

Page 16: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,
Page 17: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

17

krisp.aiBest Product in Audio/Voice 2018

Page 18: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

18

Training and Inference

Page 19: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

19

Training Process

Page 20: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

20

- 2K distinct speakers -  gender and age diverse distribution

- >10K distinct noises - babble, construction, traffic, cafeteria, office, etc

- 2000+ hours

Training Data

Page 21: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

21

- All in Python

- Distributed TensorFlow

- Multiple in-house NVIDIA 1080ti. Takes a full week.

- p2.16xlarge in AWS. 16x NVIDIA K80

Training on GPUs

Page 22: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

22

- Supports NVIDIA, Intel and ARM platforms

- All in C/C++. Sometimes ASM

- Smaller network (5x boost with some quality penalty)

- TensorRT boosts ~2x

Inference

Page 23: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

23

Moving to the Cloud

Page 24: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

24

Server-side Noise Cancellation

Page 25: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

25

Latency Constraints

200ms end to end latency

Codecs and other DSP (10-80ms) Network (varies)

DNN Compute ( < 5ms)

DNN Algorithmic (15ms)

< 20ms

Page 26: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

26

How do you scale to 100K+ concurrent streams with such latency constraints?

Ex. Discord processes 2.5M concurrent audio streams

Page 27: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

27

10x-20x less costly

CPU Servers

GPU Servers

Page 28: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

28

Scalability with Batching

Page 29: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

29

Ultimate Quality

Remove Noise

Remove Room Echo

Expand Voice HD

Audio Frame

Ultimate Quality Audio Frame

} 5ms

Page 30: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

30

Maximum Quality and Scale withNVIDIA Tensor Cores

Page 31: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

31

TensorRT is pretty awesome

0

750

1500

2250

3000

P100 V100 K80 T4

TensorFlow Batching TensorRT Batching

Page 32: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

32

T4 and V100 are both awesome

0

1250

2500

3750

5000

P100 V100 T4

FP32 FP16

Page 33: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

33

1. Voice Quality Enhancement is moving to the Cloud

2. For large scale deployments we need GPUs

3. T4 and V100 GPUs are most efficient for this

Key Takeaways

Page 34: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,

34

Thank You!

Booth #247