38
The next-gen technologies driving immersion Qualcomm Technologies, Inc. February 2017

The Next-Gen Technologies Driving Immersion

Embed Size (px)

Citation preview

Page 1: The Next-Gen Technologies Driving Immersion

The next-gen technologies driving immersion

Qualcomm Technologies, Inc.February 2017

Page 2: The Next-Gen Technologies Driving Immersion

Technology improvements continue to drive more immersive experiences, especially for VR and AR

3

Natural UIs like voice, gestures, and eye tracking

are making interactions more intuitive

2

Scene-based audio is a new paradigm for 3D

audio

High Dynamic Range (HDR) will enhance the visual

quality on all our screens

1

Page 3: The Next-Gen Technologies Driving Immersion

Immersion enhances our experiences

The next-gen technologies driving immersion

Page 4: The Next-Gen Technologies Driving Immersion

4

ImmersiveExperiences

• Draw you in…• Take you to another place…• Keep you present in the moment…

The experiences worth having, remembering, and reliving

Page 5: The Next-Gen Technologies Driving Immersion

5

Immersion enhances everyday experiences

Experiences become more realistic, engaging, and satisfying

Spanning devices at home, work, and throughout life

Life-like video conferencing

Smooth, interactive, cognitive user interfaces

Augmented realityexperiences

Virtual realityexperience

Realistic gaming experiences

Theater-quality moviesand live sports

Page 6: The Next-Gen Technologies Driving Immersion

6

Visual quality

Sound quality

Intuitive interactions

Visual quality

Sound quality

Intuitive interactions

Immersion

Achieving full immersionBy simultaneously focusing on three key pillars

6

Page 7: The Next-Gen Technologies Driving Immersion

7

Visual quality

Sound quality

Intuitiveinteractions

The next generation technologies driving immersionAchieving full immersion at low power to enable a comfortable, sleek form factor

Immersion

Scene-based audio3D audio and positional audio through higher order ambisonics

Natural user interfacesAdaptive, multi-modal user interfaces like voice, gestures, and eye tracking

High dynamic range (HDR)Increased contrast, expanded color

gamut, and increased color depth

Page 8: The Next-Gen Technologies Driving Immersion

HDR for enhanced visual quality

Increased brightness and contrast, expanded color gamut, and increased color depth

Page 9: The Next-Gen Technologies Driving Immersion

9

HDR ON

9

HDR OFF

HDR images and videos are visually stunningMuch more realistic and immersive

Page 10: The Next-Gen Technologies Driving Immersion

10

HDR will enhance the visual quality on all our screensBringing our experiences closer to full immersion

VisualsSo vibrant that they are

eventually indistinguishable from the real world

Page 11: The Next-Gen Technologies Driving Immersion

11

Achieving realistic HDR is challengingReal-life brightness has a wide dynamic range that is hard to capture and replicate

Real-life

• Sun: ~10^9 nits1

• Sunlit scene: ~10^5 nits• Starlight: ~10^-3 nits• Dynamic range: ~10^12:1

Human vision

• Eye’s dynamic range: • ~10^4:1 (static)• ~10^6:1 (dynamic)

• Eyes are sensitive to relative luminance

1 Nit is the unit of luminance, which is also known as candela per square meter (cd/m2). Candela is the unit of luminous intensity.

Camera and display technologies

• Camera sensors can’t capture the full dynamic range

• Display panels can’t replicate the full dynamic range

Page 12: The Next-Gen Technologies Driving Immersion

12

Three technology vectors are essential for HDR Making every pixel count

Contrast and brightness

Color gamut Color depth

Brighter whites and darker blacks closer to the brightness

of real life

A subset of visible colors that can be accurately captured and

reproduced

The number of gradations in color that can be captured and

displayed

Page 13: The Next-Gen Technologies Driving Immersion

13

HDR10 is the next step towards true-to-life visualsA requirement for ULTRA HD PREMIUM certification

Contrast and brightness

Color gamut Color depth Codec

EOTF up to 10,000 nits BT.2020 support

10-bit per channel:over a billion colors!

HEVC Main 10 profile

EOTF is electro optical transfer function.

Display from 0.05 to 1000 nits or 0.0005 to 540 nits

Min. 90% DCI-P3 color reproduction

HDR10Content spec

ULTRA HD PREMIUM

Display spec

Page 14: The Next-Gen Technologies Driving Immersion

14

The time is right for HDR10Technologies and ecosystem are now aligning

HDR10

Ecosystem driversDevice availability

Software support

Content creation and deployment

Technologyadvancements

Multimedia technologies

Display and camera technologies

Power and thermal efficiency

Page 15: The Next-Gen Technologies Driving Immersion

15

Qualcomm® Snapdragon™ 835 processor is ready for ULTRA HD PREMIUM certificationEnjoy vibrant HDR content on a variety of screens

SoC Display

HDR10 content

Movies and TV shows• Netflix, Amazon, etc.

Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.

ULTRA HD PREMIUM certification is a device-level certification. Each Snapdragon device must be certified.

Page 16: The Next-Gen Technologies Driving Immersion

16

A history of multimedia technology leadership

2013

Snapdragon 800

First with 4K (H.264) capture and playback

Snapdragon805

4K playback with HEVC (H.265)

Snapdragon810

4K captureand playbackwith HEVC

Snapdragon820

4K playback@ 60fps

Snapdragon835

HDR10

ULTRA HD PREMIUM-ready

2017

1st1st

1st

1st

1st

Page 17: The Next-Gen Technologies Driving Immersion

17

A heterogeneous computing approach is needed for HDREfficient processing by running the appropriate task on the appropriate engine

* Not to scale

Adreno 540 Visual Processing• HEVC Main 10 video profile support with metadata processing

• Accurate color gamut and tone mapping

• Efficient rendering HDR effects for games with DX12 and Vulkan

• Precise blending of mixed-tone (HDR and SDR) layers

• Native 10-bit color BT.2020 support over HDMI, DP, & DSI displays

Qualcomm Spectra 180 ISP• 14-bit processing pipeline to support the latest camera sensors

• Video and snapshot HDR processing with local tone mapping

Hexagon 682 DSP & Kryo 280 CPU• Multicore CPU: Camera, video, and graphics application processing

• DSP + HVX for accelerated multimedia post processing

Snapdragon 835

SnapdragonX16 LTE modem

Adreno 540 Graphics Processing Unit

(GPU)

Wi-Fi

Hexagon 682 DSP QualcommSpectra 180

HVXQualcomm

All-Ways Aware

QualcommAqstic Kryo 280 CPU

IZat LocationQualcommHaven

DisplayProcessing Unit

(DPU)

VideoProcessing Unit (VPU)

Snapdragon, Qualcomm Adreno, Qualcomm Hexagon, Qualcomm All-Ways Aware, Qualcomm Spectra, Qualcomm Aqstic, Qualcomm Kryo, Qualcomm IZat, and Qualcomm Haven are products of Qualcomm Technologies, Inc.

Page 18: The Next-Gen Technologies Driving Immersion

Scene-based audio for enhanced sound quality

3D audio and positional audio through Higher Order Ambisonics (HOA)

Page 19: The Next-Gen Technologies Driving Immersion

19

True-to-life sound is critical to immersive experiences

The sounds and visuals must match —our hearing perceives depth, direction, and magnitude of sound sources

• Sound sources are all around us

Page 20: The Next-Gen Technologies Driving Immersion

20

True-to-life sound is critical to immersive experiences

The sounds and visuals must match —our hearing perceives depth, direction, and magnitude of sound sources

• Sound sources are all around us• Sound waves merge and reflect

Page 21: The Next-Gen Technologies Driving Immersion

21

True-to-life sound is critical to immersive experiences

The sounds and visuals must match —our hearing perceives depth, direction, and magnitude of sound sources

• Sound sources are all around us• Sound waves merge and reflect• Distinct sound pressure value at every

point in the 3D scene

Page 22: The Next-Gen Technologies Driving Immersion

22

Scene-based audio captures the entire 3D audio scene

Higher Order Ambisonics (HOA) coefficients are the key

• Spherical harmonic-based transforms convert the 3D sound pressure field into a compact and comprehensive representation—the HOA coefficients

• The HOA format is conducive for compression. Spatial encoding compresses the HOA coefficients

• Once calculated, the HOA coefficients are decoupled from the capture and playback

Page 23: The Next-Gen Technologies Driving Immersion

23

Object-based audiofor 3D audio scene

Faces issues with scalingand requires post-processingon capture

• Audio is associated with each object inthe scene

• Audio of each object and its corresponding position needs to be determined through post-processing

• The complexity and bandwidthrequirements increase with the numberof objects in the scene

• As a result, typical usage is a combinationof object- and channel-based audio

Page 24: The Next-Gen Technologies Driving Immersion

24

Channel-based audiofor 3D audio scene

A legacy format with a number of issues

• Mics are subjectively placed in different places depending on audio engineers

• In post-processing, the sound mix is subjectively created and may bear no resemblance to original audio scene

• A variety of formats need to be created, transmitted, and stored, such as 2.0, 5.1, 7.1.4, 22.2

• Playback does not adjust for incorrect speaker layout

Page 25: The Next-Gen Technologies Driving Immersion

25

Scene-based audio is a new paradigm for 3D audioProviding key benefits and solving the major challenges of existing audio formats

MIPS = Millions of Instructions Per Second

Efficient

• Reduced bandwidth and file size• Rendering complexity is

independent of scene complexity• A single format• Scalable layering• Power efficient: high quality per

MIPS

High fidelity

• Higher order ambisonics• The perfect representation of the

3D audio scene• High resolution and increased

sweet spot

Comprehensive

• Simple, real-time capture• Flexible rendering• Seamless integration into audio

workflows/applications• Advanced effects for interactivity

Page 26: The Next-Gen Technologies Driving Immersion

26

Simple real-time capture and flexible renderingHOA coefficients are decoupled from the capture and playback

Flexible rendering• Audio is rendered at the playback location based

on the number and location of the speakers• Recreates best possible reproduction of the original sound scene• Supports any channel format: 2.0, 5.1, 7.1.4, 22.2, binaural, etc.

• Uniform experience across devices and playback locations (theater, home, mobile devices, etc.)

Simple real-time capture• Spatially separated microphones are required

• Ideally, a spherical mic array with 32 mics for 4th order HOA coefficients

• Spot mics can be added• A smartphone with 3 mics offers 1st order HOA coefficients

• Captures the entire 3D audio scene• Generates a single, compact file • Great for live content (sports, user generated content,

etc.) and post-production (movies, etc.)

Page 27: The Next-Gen Technologies Driving Immersion

27

3D positional audio is essential for VR and ARAccurate 3D surround sound based on your head’s position relative to various sound sources

• Sound arrives to each ear at the accurate time and with the correct intensity

• HRTF (head related transfer function):◦ Takes into account typical human facial

and body characteristics, like location, shape, and size of ears

◦ Is a function of frequency and three spatial variables

• Sound at the ears needs to be appropriately adjusted dynamically as your head and the sound sources move — the VR and AR experience

Page 28: The Next-Gen Technologies Driving Immersion

28

Scene-based audio is an ideal solution for VR and ARA natural fit for capturing and playing back 3D positional audio

High fidelity• Captures the entire 3D sound scene

in high quality• Video and audio captured on

the same device

Real-time & simple• Works on a variety of devices (action

camera, smartphone, etc.)• No post-production required but easy

to apply scene-based effects• Great for live events like sports and

user-generated content• Compact file

Capture Playback

SoundsSo accurate thatthey are true to life

Immersive• High-fidelity 3D surround sound

adjusts based on head pose• 3-DOF and 6-DOF support• A natural way to guide a user’s

attention

Efficient• Accurate manipulation of

the sound field • HOA coefficients are computationally

efficient to rotate, stretch, or compress the audio scene

Page 29: The Next-Gen Technologies Driving Immersion

29

Scene-based audio adoption is acceleratingThe entire ecosystem needs to align

Advanced demonstrations

• End-to-end workflow solutions

• Broadcast (TV)• VR• Immersive audio

Standards adoption

• MPEG-H 3D Audio• ATSC 3.0• DVB is considering MPEG-H

3D Audio• Device interoperability:

DisplayPort, HDMI, etc.

Real deployments

• YouTube is using first order ambisonics for spatial audio

• 2018 Winter Olympics in South Korea is using MPEG-H 3D Audio

• Various mics available for purchase

MPEG = Moving Picture Experts Group. ATSC = Advanced Television Systems Committee.

Learn more about our contribution to scene-based audio: https://www.qualcomm.com/scene-based-audio

Page 30: The Next-Gen Technologies Driving Immersion

Intuitive interactions

Adaptive, multi-modal user interfaces are the future

Page 31: The Next-Gen Technologies Driving Immersion

31

Adaptive, multimodal, user interfaces

Speech recognition, eye tracking, and gesture recognition are becoming essentialNatural user interfaces for intuitive interactions

Speech recognition Use natural language

processing

Motion & gesture recognitionUse computer vision, motion sensors, or touch

Face recognitionUse computer vision to recognize facial expressions

Personalized interfacesLearn and know user preferences

based on machine learning

Eye tracking Use computer vision to measure point of gaze

Bringing life to objectsEfficient user interfaces for IoT

Page 32: The Next-Gen Technologies Driving Immersion

32

Voice is a natural way to interact with devicesA hands-free interface is necessary in certain situations

Designed to be• Intuitive• Conversational• Convenient• Productive• Personalized

Underlying technology• Voice activation• Noise filtering, suppression,

and cancellation• Speech recognition• Natural language processing• Voice recognition / biometrics• Deep learning

Page 33: The Next-Gen Technologies Driving Immersion

33

Eye tracking naturally detects our point of interestProviding valuable information for interacting with our devices

Natural user interface

• Gaze tracking and estimationto navigate within next-genapplications

• Fast and secure authenticationthrough iris scan

• Applicable to VR HMDs, AR glasses, and smartphones

Improved visuals

• Gaze tracking and estimation willbe an input to new visual andauditory rendering techniques

• Foveated rendering of graphicsand video enables a more immersive visual user experience

• Eye tracking, when combinedwith machine learning, will also personalize VR and AR experiences

Dynamic calibration

• Each human face has a different inter-pupillary distance (IPD)

• HMDs can also move aroundon the face during use

• CV techniques will be used to dynamically and accuratelyaccount for IPD

RequirementsTracking camera

Eye tracker

Gaze estimation

Latency reduction

System optimization

Robust solution

Page 34: The Next-Gen Technologies Driving Immersion

34

Gesture recognition for natural hand interactionsInteract with the UI like you would in the real world

Benefits

• Intuitive interaction with a device without the need for accessories — grab, select, type, etc.

• A reconstructed hand with accurate movements increases the level of immersion for VR

• Increased productivity by using gestures where appropriate and having predictive UI

Key technologies

• Wide field-of-view camera

• Computer vision

• Machine learning

Identify handsDetect

Track Follow key points on hands and fingers as they move

RecognizeUnderstand the meaning of the hand and finger gestures, even when occluded

Take appropriate action based on current and predicted gesture

Act

Page 35: The Next-Gen Technologies Driving Immersion

QTI is uniquely positioned to support superior immersive experiences

Custom designed SoCs and investments in the core immersive technologies

Page 36: The Next-Gen Technologies Driving Immersion

36

Within device constraints

Development timeSleek form factor

Power and thermal efficiency Cost

QTI is uniquely positioned to support immersive experiencesProviding efficient, comprehensive solutions

Immersive experiences Commercialization

Efficient heterogeneous computing architecture

Comprehensive solutions across tiers

Custom designed processing engines

Snapdragon development platforms

Ecosystem collaboration

App developer tools

Consistent, accurate color

HDR video, photos, and playback

High resolution and frame rate

Positional audio

Noise removal

True-to-life audio processing

Multimodal natural UIs

Intelligent, contextual interactions

Responsive and smooth UIs

Visual quality

Sound quality

Intuitive interactions

Via Snapdragon™ solutions

Via ecosystem enablement

Page 37: The Next-Gen Technologies Driving Immersion

Follow us on:For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Qualcomm, Snapdragon, Adreno and Hexagon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Qualcomm Spectra, Kryo, Qualcomm Haven, IZat, Qualcomm All-Ways Aware and Qualcomm Aqstic are trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners.

References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiariesor business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially allof its product and services businesses, including its semiconductor business, QCT.

Thank you

Page 38: The Next-Gen Technologies Driving Immersion

38

• Websites◦ Immersive experiences: https://www.qualcomm.com/Immersive

◦ Virtual reality: https://www.qualcomm.com/VR

◦ Augmented reality: https://www.qualcomm.com/AR

◦ Developers: https://developer.qualcomm.com

◦ Newsletter signup: http://www.qualcomm.com/mobile-computing-newsletter

• Presentations◦ Immersive experiences: https://www.qualcomm.com/documents/immersive-experiences-presentation

◦ Virtual reality: https://www.qualcomm.com/documents/making-immersive-virtual-reality-possible-mobile

◦ Augmented reality: https://www.qualcomm.com/documents/mobile-future-augmented-reality

• Papers◦ Virtual reality: https://www.qualcomm.com/documents/whitepaper-making-immersive-virtual-reality-possible-mobile

◦ Immersive experiences: https://www.qualcomm.com/documents/whitepaper-driving-new-era-immersive-experiences-qualcomm

• Videos:◦ Immersive experiences video: https://www.qualcomm.com/videos/immersive-experiences

◦ Immersive experiences webinar: https://www.qualcomm.com/videos/webinar-new-era-immersive-experiences-whats-next

Resources