Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Expanding Accessibility –Audio Signal Processing
Ivan Tashev
Partner Software Architect
Audio and Acoustics Research Group
MSR Labs – Redmond
Agenda
• Audio Understanding• Bumblebee project
• Spatial Audio • Cities Unlocked project
• HoloLens device• Research platform for applications helping visually and hearing impaired
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 2
Collaborators
• Audio and Acoustics Research Group in MSR Labs, Redmond
• Interns: Piotr Bilinski, Archontis Politis, Nilesh Madhu, Jinkyu Lee, Kun Han, Keith Godin, Hoang Do, many others
• The exceptional engineering teams in HoloLens, Kinect, and Windows we had the honor to work with
Hannes GamperMicrosoft Research
David JohnstonMicrosoft Research
Ivan TashevMicrosoft Research
Mark R. P. ThomasDolby Laboratories
Jens AhrensChalmers University,
Sweden
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 3
Audio UnderstandingBumblebee project
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 4
Extracting non-verbal cues
• The meaning of speech – less than 50% of human to human communication
• Audio understanding• Speaker identification and verification
• Gender and age detection
• Emotion detection
• Audio environment recognition
• Audio events detection
• Core application• Smarter and better HMIs
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 5
General architecture
• Framing and feature extraction• MFCCs, pitch, log-Energy, ZCR, …
• Up to 960 features in some cases
• Classifier• Frame/segment level: DNN, GMM
• Utterance level: SVM, ELM, HMM
• LSTM RNNs for end-to-end
DNNUtterance-level
featureUtterance-
level classifierEmotion
Segment-levelfeature extraction
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 6
Project Bumblebee
• Mobile phone application
• Visualizes the sound• Level and frequency content
• Recognizes audio objects• Fire or CO2 alarm• Door bell• Phone ring• Baby crying
• Social involvement• Sound can be sent for recognition to a support group• The added to the dataset of recognizable sounds
• Started as a Hakaton project in 2015• Work continues this summer with 6 interns
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 7
Spatial AudioCities Unlocked project
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 8
Binaural recording and reproduction
• Theatrophone, 1881
• Binaural recordings, mid-50s
• Problems:• Fixed audio scene
• HRTFs mismatch
Neuman KU-100
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 9
HRTF and personalization
• HRTFs describe acoustic path from sound source to ear entrances• Contain all intraural and spectral localization cues• Are a function of sound direction• Can be considered distance-independent for radii > 1m
• Head and torso geometry affects wave propagation• Anthropometric features are individual • Hence HRTFs are individual• Spatial hearing is individual!
• Using machine learning approaches for HRTF personalization
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 10
Cities Unlocked project
• Joint project with Guide Dogs UK, Microsoft UK, and MSR Labs in Cambridge and Redmond
• Headset device with IMU + smartphone• Knot of problems
• Detection and tracking of markers• 3D audio representation and rendering• UI aspects for visually impaired
• November 2014 – first phase• trial deployment, 5 people
• November 2015 – second phase• deployed to 50 people
• November 2016 – third phase
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 11
HoloLens deviceA platform for applications helping visually and hearing impaired people
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 12
HoloLens – released March 2016
• Wearable AR device:• Heads-up display
• Spatial audio system
• Windows 10 computer
• Set of DSPs underneath
• Sensors:• RGB camera
• 4 microphones
• Depth camera
• Head orientation and position tracking
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 13
Usage scenarios
• Gaming
• Entertainment
• Productivity
• Science
• Design and art
• Education
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 14
HoloLens for Enabling scenarios research
• Autonomous wearable device
• Packed with sensors
• Gesture, graphics, voice (GGV) - HMI input modalities
• HUD and spatial audio – HMI output modalities
• Substantial computing power, Wi-Fi connected
• Attractive device for conducting research and designing UI and other functionality
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 15
Finally …
Questions?
7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 16