WEBRTC WG Joint Meetings W3C TPAC 2020

W3C TPAC 2020Joint MeetingsWEBRTC WGOctober 13, 15, 16, 20208 AM - 9 AM Pacific Time

Chairs: Bernard AbobaHarald AlvestrandJan-Ivar Bruaroey 1

Welcome!● Welcome to the joint meetings of the W3C

WebRTC WG at TPAC 2020!

2

Agenda for Joint Meeting Weekhttps://www.w3.org/2011/04/webrtc/wiki/TPAC_2020https://www.w3.org/2020/10/TPAC/group-schedule.html

Tuesday, October 13, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of APA and WEBRTC WGZoom link: https://us02web.zoom.us/j/88497697633?pwd=WmJxRGxzRmlRUFNFVml1TTg0K2dDZz09

Thursday, October 15, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of PING and WEBRTC WGZoom link: https://us02web.zoom.us/j/83080493198?pwd=MXlKY1Fidzd4MG5EU2ZNdlRHdks4Zz09

Friday, October 16, 2020 (15:00 - 16:00 UTC, 8 AM - 9 AM Pacific)Joint Meeting of MEIG and WEBRTC WGZoom link: https://us02web.zoom.us/j/88027802654?pwd=UDBnTUJ5SzI4S1VXNFRjMnNJUkxjdz09 3

https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020

https://www.w3.org/2020/10/TPAC/group-schedule.html

https://us02web.zoom.us/j/88497697633?pwd=WmJxRGxzRmlRUFNFVml1TTg0K2dDZz09

https://us02web.zoom.us/j/83080493198?pwd=MXlKY1Fidzd4MG5EU2ZNdlRHdks4Zz09

https://us02web.zoom.us/j/88027802654?pwd=UDBnTUJ5SzI4S1VXNFRjMnNJUkxjdz09

W3C WG IPR Policy● This group abides by the W3C Patent Policy

https://www.w3.org/Consortium/Patent-Policy/ ● Only people and companies listed at

https://www.w3.org/2004/01/pp-impl/47318/status are allowed to make substantive contributions to the WebRTC specs

4

https://www.w3.org/Consortium/Patent-Policy/

https://www.w3.org/2004/01/pp-impl/47318/status

Joint MeetingAPA and WEBRTC WG

October 13, 20208 AM - 9:00 AM Pacific Time

5




6



About this Meeting● Meeting info:

○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #apa ● The meeting is being recorded.

7



http://irc.w3.org/

http://irc.w3.org/?channels=apa

Welcome!● Welcome to the joint meeting of the W3C

APA and WebRTC WGs at TPAC 2020!● During this meeting, we will discuss issues

relating to accessibility in realtime communications.

8

Agenda for Joint APA/WEBRTC WG Meeting

● WEBRTC WG Charter and Deliverables (Chairs + Dom, 10 minutes)● W3C Machine Learning Workshop (Bernard + Dom, 5 minutes)

○ https://www.w3.org/2020/06/machine-learning-workshop/● IETF accessibility initiatives (Bernard + Lorenzo, 15 mins)

○ Real-Time Text (RTT) and WebRTC Data Channel○ Human Language Negotiation○ WebRTC Interoperability profile for the Video Relay Service

● RTC Accessibility User Requirements: https://w3c.github.io/apa/raur/ (Joshue, 30 minutes)

9

https://www.w3.org/2020/06/machine-learning-workshop/

https://w3c.github.io/apa/raur/

WebRTC WG Charter● WEBRTC WG recently re-chartered through September 2022:

https://w3c.github.io/webrtc-charter/webrtc-charter.html ● Out of scope:

○ The definition of the network protocols used to establish the connections between peers is out of scope for this group; in general, it is expected that protocols considerations will be handled in the IETF.

○ The definition of any new codecs for audio and video is out of scope.● In scope:

○ API functions to explore device capabilities, e.g. camera, microphone, speakers,○ API functions to capture media from local devices (e.g. camera and microphone, but also output

devices such as a screen),○ API functions for encoding and other processing of those media streams,○ API functions for accessing the data in these media streams,○ API functions for decoding and processing (including echo canceling, stream synchronization and

a number of other functions) of those streams at the incoming end,○ Delivery to the user of those media streams via local screens and audio output devices (partially

covered with HTML5). 10

https://w3c.github.io/webrtc-charter/webrtc-charter.html

WebRTC WG Deliverables● WebRTC 1.0 API: https://w3c.github.io/webrtc-pc/ ● WebRTC-Stats: https://w3c.github.io/webrtc-stats/● WebRTC-NV Use Cases: https://w3c.github.io/webrtc-nv-use-cases/● WebRTC Extensions: https://w3c.github.io/webrtc-extensions/ ● WebRTC-ICE: https://github.com/w3c/webrtc-ice ● WebRTC SVC: https://github.com/w3c/webrtc-svc● Insertable Streams:

https://github.com/w3c/webrtc-insertable-streams● WebRTC Priority: https://w3c.github.io/webrtc-priority/● WebRTC-DSCP: https://w3c.github.io/webrtc-dscp-exp/

11

https://w3c.github.io/webrtc-pc/

https://w3c.github.io/webrtc-stats/

https://w3c.github.io/webrtc-nv-use-cases/

https://w3c.github.io/webrtc-extensions/

https://github.com/w3c/webrtc-ice

https://github.com/w3c/webrtc-svc

https://github.com/w3c/webrtc-insertable-streams

https://w3c.github.io/webrtc-priority/

https://w3c.github.io/webrtc-dscp-exp/

WebRTC WG Deliverables (Capture)● Media Capture Automation:

https://w3c.github.io/mediacapture-automation/● Media Capture and Streams:

● Media Capture Image: https://w3c.github.io/mediacapture-image/● Audio Output: ● Screen Capture: ● Media Recording: ● Content-Hints: https://w3c.github.io/mst-content-hint/

12

https://w3c.github.io/mediacapture-automation/

https://w3c.github.io/mediacapture-main/

https://w3c.github.io/mediacapture-image/

https://w3c.github.io/mediacapture-output/

https://w3c.github.io/mediacapture-screen-share/

https://w3c.github.io/mediacapture-record/

https://w3c.github.io/mst-content-hint/

Machine Learning and Accessibility● Machine Learning is increasingly used to address accessibility concerns.

Examples:○ Speech transcription. ○ Language translation.○ Image recognition○ Image to text or speech○ Emotion analysis (from audio or video).

● W3C Machine Learning Workshop: https://www.w3.org/2020/06/machine-learning-workshop/

● There is work underway to enable machine Learning algorithms to access raw media in an efficient way, including: ○ VideoTrackReader API (WebCodecs)○ Insertable Streams proposal for raw media (WEBRTC WG) 13


Accessibility Work within the IETF ART Area● T.140 over WebRTC Data Channel (MMUSIC):

draft-holmberg-mmusic-t140-usage-data-channel○ Enables Real-Time Text (RTT) to be sent and received over the WebRTC data channel using the

WebRTC 1.0 API. ○ Compatible with RFC 8373 language negotiation

● Language negotiation (SLIM): RFC 8373, draft-ietf-slim-use-cases○ Enables SDP negotiation of spoken, written and signed languages between parties.

○ Supports audio (spoken languages), video (signed and captioned languages), text (written languages)

● Interoperability profile of the Video Relay Service (RUM): https://tools.ietf.org/html/draft-ietf-rum-rue

○ Interoperability Profile for Relay User Equipment, referencing RTCWEB documents, including JSEP, Overview, RTP Usage, Security Architecture, Transports, RFC 7742 (Video requirements) and RFC 7874 (Audio requirements)

○ Open source implementation available.○ History & Background:

https://datatracker.ietf.org/meeting/105/materials/slides-105-rum-rum-history-background-00

https://tools.ietf.org/html/draft-holmberg-mmusic-t140-usage-data-channel

https://tools.ietf.org/html/rfc8373

https://tools.ietf.org/html/draft-ietf-slim-use-cases

https://tools.ietf.org/html/draft-ietf-rum-rue

https://datatracker.ietf.org/meeting/105/materials/slides-105-rum-rum-history-background-00

T.140/WebRTC Gateway Specificationdraft-ietf-mmusic-t140-usage-data-channel● Enables Real-Time Text (RFC 4103) to be sent over the WebRTC data

channel.● Uses reliable, ordered transport.● Compatible with existing implementations of the W3C WebRTC API.● Compatible with negotiation of human language (RFC 8373).● Requires a gateway between WebRTC data channel and RTT

endpoints.● Implementation in Janus (Lorenzo Miniero):

○ PR: https://github.com/meetecho/janus-gateway/pull/1898 ○ Article:

https://www.meetecho.com/blog/realtime-text-sip-and-webrtc/ 15

https://github.com/meetecho/janus-gateway/pull/1898

https://www.meetecho.com/blog/realtime-text-sip-and-webrtc/

T.140/WebRTC Gateway Implementation Experience● Integrated in Janus SIP plugin as an experimental feature

○ Negotiates m=text on the SIP side○ Negotiates m=application on the WebRTC side

● Both plain T.140 and T.140 over RED supported on the SIP side○ In both cases, always T.140 on data channels

● Initial integration for T.140 data channel negotiation properties○ dcmap line offered with hardcoded values

■ No support for any dcsa attributes, though○ Subprotocol and label used for delivery on data channels

● Patch includes simple integration in demo UI as well○ Basic UI to send local and display remote real-time text

16

T.140/WebRTC Gateway Implementation Experience● A few limitations in the current effort

○ Only tested with TIPcon1 (very old open source Java application)○ dcmap values are currently hardcoded, and ID is ignored

■ Note: may be hard to enforce in browsers in general?○ No buffering currently performed on the send side

■ Neither in the Janus plugin (when receiving data from DC)■ … nor in the browser/UI (when sending data on DC)

○ Behaviour in presence of packet loss not tested properly yet■ RTCP support still missing for the SIP/RTT SSRC

● Ready for experimentation, though!○ Would love to see this effort move forward

17

Negotiating Human Language (RFC 8373)● Supports negotiation of the language used to send and receive for each

media component.○ Enables user language preferences to be described in the Session Description Protocol

(SDP).○ “Language preferences” apply to all media, covering signed (e.g. ASE), spoken and

written languages (RTT). ○ Language negotiation handled via signaling outside the WebRTC API.

■ SDP Language negotiation attributes not passed in JSEP.

● Examples: ○ Negotiation of sending and/or receiving American Sign Language within a video stream.○ An Offer indicating a preference to write Spanish text and receive Spanish language

audio, with English as a second choice for both modalities.

18

Operational Model for RFC 8373● Language preferences negotiated for each media can be used to:

○ Route the call (e.g. to a Spanish speaker or ASL interpreter).○ Configure machine learning gateway services. Examples:

■ Text to speech or speech to text■ Translation from one language to another

● Media usage is up to the participants. Example: ○ Participants negotiate send/receiving ASE on video, sending/receiving English

via text, as well as receiving English audio.○ Whether the participants sign or text or speak is up to them (and can change

dynamically). Example:■ Participants can try signing, encounter video quality issues, then agree to

switch to text.19

RTC Accessibility User Requirements (RAUR)Updated draft overview - Oct 2020

There is now an updated FPWD available:

http://raw.githack.com/w3c/apa/AccessibleRTC/raur/index.html

20



New User Needs: Changes to RAUR

# Window anchoring and pinningUser Need 1: A deaf or hard of hearing user needs to anchor or pin certain windows in an RTC application so both a sign language interpreter and the person speaking (whose speech is being interpreted) are simultaneously visible.

21


# Window anchoring and pinningREQ 1a: Provide the ability to anchor or pin specific windows so the user can associate the sign language interpreter with the correct speaker.REQ 1b: Allow the use of flexible pinning of captions or other related content alternatives. This may be to second screen devices.REQ 1c: Ensure the source of any captions, transcriptions or other alternatives is clear to the user, even when second screen devices are used.

22


# Pause capture of 'on record' captioning in RTCUser Need 2: A deaf or hard of hearing user may need captioning of content in a meeting or presentation to be private.

23

New User Needs: Changes to RAUR# Pause capture of 'on record' captioning in RTCREQ 2a: Ensure there is a host operable toggle in the captioning service (whether human or automated) that facilitates going on and off record for the preserved transcript, but continues to provide captions meanwhile for 'off record' conversations.REQ 2b: Ensure the toggle between saving recordings also applies to the saving of captions. There should be a mechanism that both audio and captions can be paused or stopped, and both can be simultaneously restored for recording.

24


# Accessibility user preferences and profilesUser Need 3: A user may need to change device or environment and have their accessibility user preferences preserved.

25


# Accessibility user preferences and profilesREQ 3a: Ensure user profiles and accessibility preferences in RTC applications are mobile and can move with the user as they change device or environment.

26

New Requirements: # Emergency calls and RTT

REQ 11b:Avoid the problem of unsent emergency messages. A user may not be aware when they have not successfully sent an emergency message. For example, RTT avoids this problem due to instantaneous data transfer but this may be an issue for other messaging platforms.

REQ 12b: Provide support for other languages and translations. For example, VRS calls may be made between ASL (American Sign Language) users and hearing persons speaking either English or Spanish, or variations in signing itself such as Irish Sign Language (ISL, more closely related to French sign language) and British Sign Language (BSL) and a user may need to stream both or pin.

27

Other changes

# Note on the relationship between RTC and XR Accessibility User Needs.# Note on work on personalisation semantics and CSS media queries.# Moved User Need 19: A deaf user watching a signed broadcast needs a high-quality frame rate to maintain legibility and clarity in order to understand what is being signed to 'Quality of service issues' section.

# Added note on ITU definition of Total Conversation services.

REQ 10a: Ensure support for multiple simultaneous streams

28

Conclusions

● Review and feedback requested from WebRTC group to APA.

29

Thank you

Special thanks to:

WG Participants, Editors & Chairs

30

Joint MeetingPING and WEBRTC WG


31




32




○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #webrtc ● Do we want to record this meeting?

33



http://irc.w3.org/

http://irc.w3.org/?channels=webrtc


PING and WebRTC WGs at TPAC 2020!● During these meetings, we hope to make

progress on WebRTC privacy issues.

34

Agenda for Joint PING/WEBRTC WG Meeting

1. State of privacy in Media Capture and Streams (Jan-Ivar)2. Open privacy issues (Jan-Ivar)3. State of privacy in Audio Output Devices API (Jan-Ivar)4. Media Capture Extensions: In-Browser Cam/Mic Picker

(Jan-Ivar)5. State of privacy in WebRTC-PC / Stats / SVC (Jan-Ivar)

35

State of privacy in Media Capture and StreamsThanks to PING for reviewing these APIs!

await navigator.mediaDevices.enumerateDevices() // device enumeration await navigator.mediaDevices.getUserMedia() // camera/mic access navigator.mediaDevices.ondevicechange = func // detect device add/rem

12 issues were filed (4 open, 8 closed). 7 PRs were merged from review.

An overview of the state of these APIs before diving into open issues.

36

https://github.com/w3c/mediacapture-main/issues?q=is%3Aissue+label%3Aprivacy-tracker

https://github.com/w3c/mediacapture-main/pulls?q=label%3Aprivacy-pr+

await navigator.mediaDevices.enumerateDevices()

Drive-by web in iframe (without allow="camera" or allow="microphone"):[]

Drive-by web with focus: 2 bits - whether user has 0 cameras or 0 microphones*[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.

A site with persisted permission to use camera or microphone: same (2 bits)*[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.

Camera and/or microphone captured in the current document: full list w/deviceIds & labels[{kind: "videoinput", deviceId: [origin-specific id], label: "FaceTime HD Camera", groupId: [rotated id]}, {kind: "audioinput", deviceId: [origin-specific id], label: "MacBook Pro Microphone", groupId: [rotated id]}}, ...more] // lets site implement device selection

* Implemented in Safari. In development in Chrome & Firefox. Principle: “Trackers fear device light”.We’ve deprecated enumerate-first web strategy in favor of device-first. Some breakage expected.https://w3c.github.io/mediacapture-main/#access-control-model

State of device enumeration

37

Tracker

https://w3c.github.io/mediacapture-main/#access-control-model

State of devicechange event

navigator.mediaDevices.ondevicechange = func

Drive-by web in iframe (without allow="camera" or allow="microphone"): never fired

Drive-by web with focus: fired on 0→1 and 1→0 transitions in number of cameras, ditto mics*

A site with persisted permission to use camera or microphone: same (0→1 and 1→0)*

Camera and/or microphone captured in the current document: fired on any change to full list

* Implemented in Safari. In development in Chrome & Firefox.

TL;DR: event fires on changes observable at the different enumerateDevices exposure levels.38

State of camera and microphone capture (permission)

await navigator.mediaDevices.getUserMedia({video: true, audio: true})

Drive-by web in iframe (without allow="camera" or allow="microphone"): NotAllowedError

Drive-by web with focus (top-level or iframe): Permission (& prompt) tied to top-level originper standard permissions-policy rules

A site with focus & persisted permission to use cam(s) and mic(s): Permitted (no prompt)

In focus & camera and microphone capturing in the current document: Permitted (no prompt)

TL;DR: unchanged39

State of camera and microphone deviceId

Recap of intended access model: Good sites remember which camera the user is using (ditto mic):

const video = {deviceId: localStorage.cameraId}; // from last visit const stream = await navigator.mediaDevices.getUserMedia({video}); const [track] = stream.getVideoTracks(); localStorage.cameraId = track.getSettings().deviceId; // in case new camera

In iframes: “the decision of whether or not the identifier is the same across documents, MUST follow the User Agent's partitioning rules for storage (such as localStorage)”

● No longer a UUID● No longer in

enumerateDevices()except during capture.

● Best practice: →low entropy (no impl)

40

https://html.spec.whatwg.org/multipage/webstorage.html#dom-localstorage

Recap: sites may triage undesired devices ahead of prompt using required constraints (min,exact):

const video = {height: {min: 1080}}; // 1080p or higher only try { await navigator.mediaDevices.getUserMedia({video}); } catch (e) { // No prompt. No 1080p or higher }

But if there is 1080p this would prompt user or turn camera on (min 3 seconds)!This is thought to be a sufficient deterrent. Tracking libraries won’t risk device light or prompt.

To be conservative, we’ve made required constraints opt-in (for other specs like imageCapture):

“The allowed required constraints for device selection contains the following constraint names: width, height, aspectRatio, frameRate, facingMode, resizeMode, sampleRate, sampleSize, echoCancellation, autoGainControl, noiseSuppression, latency, channelCount, deviceId, groupId.”

State of camera and microphone constraints probing

41

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-width

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-height

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-aspectratio

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-framerate

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-facingmode

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-facingmode

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-resizemode

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-samplerate

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-samplesize

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-echocancellation

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-autogaincontrol

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-noisesuppression

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-latency

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-channelcount

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-deviceid

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-groupid

https://w3c.github.io/mediacapture-main/getusermedia.html#dfn-groupid

● #640 - Only reveal labels of devices user has given permission to● #645 - enumerateDevices should only provide device info permission for

granted device types● #646 - Should enumerateDevices by default return an empty list?● #672 - Deprecate inputDeviceInfo.getCapabilities() for better privacy

Open Privacy Issues

42

https://github.com/w3c/mediacapture-main/issues/640




Labels are bad for web compat and privacy, but will take time to get rid of.

Exposure significantly reduced now that a document must be capturing or have actively captured just now to see labels (persistent permission no longer sufficient).

Labels of non-granted devices are needed during capture to support sites implementing ⚙ device pickers in browsers that don’t grant all devices at once.

Long-term solution: in-browser picker for camera & mic (extension spec)

Short-term sub-issues:1. Labels may contain private information. Encourage sanitization.2. Clarify label is for display purposes; don’t rely on == model/manufacturer.

Propose: Close issue after short-term solved, and revisit with in-browser picker extension spec.

#640 - Only reveal labels of devices user has given permission to

43


Current spec allows enumeration of cams AND mics upon capturing either a camera OR microphone.

In theory it seems logical to further restrict this to allow● enumerating cameras only if document is capturing/has captured camera● enumerating microphones only if document is capturing/has captured mics

OTOH: Is successfully obtaining camera or microphone from the user perhaps sufficient to build a device picker for both?

Permission escalation example: Site X allows users to join web conferences with only microphone permission. Users expect to see camera choices in the site’s ⚙ options panel. Restriction may complicate ⚙ UX, so site X demands camera on entry instead.

Consensus: Restrict devices by granted types. This is what Chrome is implementing so breakage risk is probably low.

#645 - enumerateDevices should only give info for granted types

44


Not web compatible to return an empty list. Booleans enable cam/mic UX display.

Our thinking: allow user agents to fake camera and/or microphone if missing.

Side-effects: (inherent from loss of information; regardless of approach)1. camera/mic related UX (buttons) always visible on sites that today hide them for

users without camera and/or mic. Site only learns of absence when getUserMedia fails with NotFoundError. (Mild)

2. devicechange event will never fire when users plug in their first camera or mic. Site cannot take action on these events. (workflow issue?)

Consensus: Return booleans for cam/mic. The spec already allows user agents to fake devices (Safari has option to expose fake devices). Propose: Note this

#646 - Should enumerateDevices by default return an empty list?

45


This API helps sites enforce their constraints while building their pickers:

for (const device of await navigator.mediaDevices.enumerateDevices()) { if (device.getCapabilities().height.max < 1080) continue; options.push({name: device.label}); }

● But it lets site enumerate capabilities (min/max ranges, enums) of all devices.● Only available during capture● One implementation

Long term: A constraints-based in-browser picker would obsolete this need.Side-effect of losing: User would be able to pick device violating site constraints.No consensus. Feature at risk (1 implementation). Revisit w/in-browser picker

#672 - Deprecate inputDeviceInfo.getCapabilities(); better privacy

46


State of privacy in Audio Output Devices APIEarly spec followed an enumerate-first model with similar problems.

await navigator.mediaDevices.enumerateDevices() // enumerate speakers await audioElement.setSinkId(deviceId) // select speaker device

Implemented only in Chrome/ium behind microphone permission, which limits applications to web conferencing today (asking for mic to use speakers is escalation).

We got rid of this 3rd bit in enumerateDevices (before capture):[{kind: "videoinput"}, {kind: "audioinput"}, {kind: "audiooutput"}]

Latest API is an in-browser picker API (no implementation yet): await navigator.mediaDevices.selectAudioOutput() // in-browser picker

47

Latest API:

const id = await navigator.mediaDevices.selectAudioOutput(); // picker audioElement.setSinkId(id); // redirect audio from default speakers

● Single id exposed in session in enumerateDevices() only after user picks.● Works without microphone permission; redirect audio from any source.● Off in iframes by default. Needs allow="speaker-selection"● Firefox plans to implement soon. Thanks to Safari for driving design!

State of speaker selection (in-browser picker)

48

Sites still need a way to remember device to not prompt every time (if user permits), but must call selectAudioOutput again to validate the id:

const deviceId = localStorage.speakerId; // from last visit const id = await navigator.mediaDevices.selectAudioOutput({deviceId}); await audioElement.setSinkId(id); localStorage.speakerId = id; // store id for next time (might be new)

If accepted, the picker is skipped. But the user agent may show picker at times (e.g. if the speaker device is no longer available), deterring trackers.

The id only appears in enumerateDevices if call succeeds.

State of speaker selection (choice persistence)

49

Some devices are both microphone and speakers (e.g. headsets, laptops), detectable by shared groupId in enumerateDevices().

Such speakers get exposed with mics during (& immediately after) mic capture: [{kind: "videoinput", ... }, {kind: "audioinput", deviceId: [origin-specific id], label: "AirPods", groupId: "17"}, {kind: "audiooutput", deviceId: [origin-specific id], label: "AirPods", groupId: "17"}}, ...more] // lets site do headset detection

...but not before (no need):[{kind: "videoinput"}, {kind: "audioinput"}] // all other members are "". length is max 2.

This is to allow headset detection & full duplex audio (I/O on same device).

The spec is narrower than Chrome which atm exposes all speaker devices on (its global) microphone permission, which passed old spec, but not new one.

Implicit microphone permission (headset detection)

50

Extension spec: In-Browser Device PickerLong term we want to get away from in-content device selection

PING wants privacy-by-default in-browser device picker:1. site asks for category (or categories) of device2. browser prompts user for one, many or all devices3. site gains access to only the device + label, of hardware the user selects.

51

https://github.com/w3c/mediacapture-main/issues/640#issuecomment-549540203

In-Browser Cam/Mic PickerWhy not selectCamera() and selectMicrophone()? It’s complicated:

● Web apps want constraints on camera selection (e.g. resolution)● Web apps want some discovery (emerging use cases, streamers use 2 cams, WebVR)● Users want sites to remember their configuration(s) and not pick device every time● User agents differentiate in permission models (persistent on/off vs one-shot, innovation)● What would the migration path be?

○ getUserMedia (unlike setSinkId) is already implemented in all browsers○ Which sites will upgrade to prefer a less powerful / less established API?

Current Goal:1. Get rid of labels & capabilities of non-captured devices (consensus)2. Prevent User agents from granting permission to all cameras/mics (no consensus)3. Limit capabilities exposure of in-use cam/mic (“availability API”) (no consensus) 52

Incremental instead of new API

In-Browser Camera/Mic Picker Model (goals)

53

Device enumerationAPI (most powerful)

getUserMedia()enumerateDevices()

During capture:All labels

All deviceIdsAll capabilities

Implemented in all browsers

TAG/PING-definitionpicker-style API

no “selectCamera()”no “selectMicrophone()”

During capture:Select deviceIds

Select capabilities

No consensus on cam/micWouldn’t get rid of

getUserMedia or labels

May revisit

Label-lessDevice picker-style API

getUserMedia()++enumerateDevices()--

During capture:All deviceIds

Select capabilities

No implementation

Least

powerful

Migration

path

getUserMedia already has a picker in Firefox (tied to permission), letting theuser instead of user agent choose within the app’s constraints when choices >1

Meet.com

←Apps could have just called getUserMedia() again to get a different camera, but web compat preventsshowing a prompt then, because lazy sites expect the same result (no prompt)

Incremental API

?

Solution: Migrate to new getUserMedia semantics over time:

< await navigator.mediaDevices.getUserMedia({video: true, semantics: "browser-chooses"});> await navigator.mediaDevices.getUserMedia({video: true, semantics: "user-chooses"});

New semantics mandate a picker if app constraints don’t narrow down selection to 1 device per kind (where user agent normally would choose). Orthogonal to permission.

Migration strategy:1. Browsers implement pickers for "user-chooses" where agent chooses today.2. Allow sites time to replace in-content pickers in their ⚙ panel with browser pickers.3. Remove all labels from enumerateDevices(). Deprecate device.getCapabilities()4. (Maybe) flip default

Criticism / feature (for users w/multiple cams/mics): Flipping default would mean they see a picker even initially, instead of the browser picking the OS default device for them. Onsites wo/device selection, they’d be prompted every time (improvement over wrong device).

Incremental API (getUserMedia++)

2023: No more labels!

In-Browser Camera/Mic Picker Model (goals)

56

TAG/PING-definitionpicker-style API

no “selectCamera()”no “selectMicrophone()”

During capture:Select deviceIds

Select capabilities

No consensus on cam/micWouldn’t get rid of

getUserMedia or labels

May revisit

Label-lessDevice picker-style API

getUserMedia()++enumerateDevices()--

During capture:All deviceIds

Select capabilities

Someday implemented in all browsers

Least

powerful

Migration

path

State of privacy in WebRTC-PC / Stats / SVCThanks to PING for reviewing these APIs!

RTCRtpSender.getCapabilities("audio"); RTCRtpSender.getCapabilities("video"); RTCRtpReceiver.getCapabilities("audio"); RTCRtpReceiver.getCapabilities("video");

const pc = new RTCPeerConnection(); const offer = await pc.createOffer(); const stats = await pc.getStats();

3 issues were filed (open):● #2460 - getCapabilities seems to leak hardware capabilities w/o permission● #22 - getCapabilities seems to leak hardware capabilities w/o permission● #550 - Stats API should require additional permission

57

https://github.com/w3c/webrtc-pc/issues/2460

https://github.com/w3c/webrtc-svc/issues/22

https://github.com/w3c/webrtc-stats/issues/550

A site can learn about the visitor’s underlying hardware capabilities w/o a permission prompt or some other positive, affirmative action by the visitor.

Most of the same information is available in the SDP offer from pc.createOffer()which inherently needs to be signaled by JS to form a peer-to-peer connection, as described in JSEP (IETF):

Use cases:● Data channels● Receive media● Send media other than cam/mic/screen, e.g. canvas/elem.captureStream()

#2460/#22 - getCapabilities leaks hardware capabil w/o permission

58

https://tools.ietf.org/html/draft-ietf-rtcweb-jsep-25#section-4.1.6



getCapabilities:

createOffer says:

Conclusion from Graphics Hardware Fingerprinting document linked in issue:● “Information relating to graphics hardware capabilities provided by [WEBRTC], [WebRTC-Stats],

[WebRTC-SVC] ... may also be inferred from other sources such as Web-GPU, Web-GL and performance API.”

● “...graphics hardware fingerprinting concerns are not WebRTC-specific. ...consider adding a permission relating to “whether the page is permitted to know what graphics hardware the user is running” (outside WebRTC)”

Proposed resolution is to include a note relating to implementation issues with hardware capabilities.

#2460/#22 - getCapabilities leaks hardware capabil w/o permission

https://w3c.github.io/webrtc-pc/webrtc.html#dom-rtcrtpsender-getcapabilities

https://w3c.github.io/webrtc-pc/webrtc.html#dom-rtcrtpsender-getcapabilities

https://docs.google.com/document/d/1XWutvNNXu3iXA-6ZpYpwY7KRwh87_6s97WBNIOs5-Kg/



Two privacy harms / risks reported:1. Leaking communication / plain text

○ A useful consideration for isolated streams (WebRTC-identity), but for regular streams, the web page has prior access to audio and text content.

2. Hardware fingerprinting (decoderImplementation & codec)○ Similar to #2460 (covered in previous slide)

#550 - Stats API should require additional permission

https://w3c.github.io/webrtc-identity/identity.html#isolated-media-streams


https://github.com/w3c/webrtc-stats/issues/550

Conclusions

● Conclusions and next steps here

61

Thank you!

Special thanks to:

PING, WG Participants, Editors & Chairs, Youenn Fablet

Bat by OpenClipart-Vectors from PixabayCrowd from PNG EGG 62

https://pixabay.com/vectors/bat-black-dracula-wings-spread-151366/

https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=151366

https://www.pngegg.com/en/png-bfmvw

Joint MeetingMEIG and WEBRTC WG


63




64




MEIG and WebRTC WGs at TPAC 2020!● During these meetings, we hope to make

progress on the future of the capture and output specifications

65


○ https://www.w3.org/2011/04/webrtc/wiki/TPAC_2020● Link to Slides has been published on WG wiki ● Scribe? IRC http://irc.w3.org/ Channel: #me ● The meeting is being recorded.

66



http://irc.w3.org/

http://irc.w3.org/?channels=me

Agenda for Joint MEIG/WEBRTC WG Meeting

● WebRTC WG Charter and Deliverables● Status of Capture and Output deliverables● Machine Learning● New work

○ WebCodecs (WICG)○ Insertable Streams for Raw Media (WEBRTC WG)

67

WebRTC WG Charter● WEBRTC WG recently re-chartered through September 2022:

https://w3c.github.io/webrtc-charter/webrtc-charter.html ● Out of scope:

○ The definition of the network protocols used to establish the connections between peers is out of scope for this group; in general, it is expected that protocols considerations will be handled in the IETF.

○ The definition of any new codecs for audio and video is out of scope.● In scope:

○ API functions to explore device capabilities, e.g. camera, microphone, speakers,○ API functions to capture media from local devices (e.g. camera and microphone, but also output

devices such as a screen),○ API functions for encoding and other processing of those media streams,○ API functions for accessing the data in these media streams,○ API functions for decoding and processing (including echo canceling, stream synchronization and

a number of other functions) of those streams at the incoming end,○ Delivery to the user of those media streams via local screens and audio output devices (partially

covered with HTML5). 68

https://w3c.github.io/webrtc-charter/webrtc-charter.html

WebRTC WG Deliverables: Networking● WebRTC 1.0 API: https://w3c.github.io/webrtc-pc/ ● WebRTC-Stats: https://w3c.github.io/webrtc-stats/● WebRTC-NV Use Cases: https://w3c.github.io/webrtc-nv-use-cases/● WebRTC Extensions: https://w3c.github.io/webrtc-extensions/ ● WebRTC-ICE: https://github.com/w3c/webrtc-ice ● WebRTC SVC: https://github.com/w3c/webrtc-svc● Insertable Streams:

https://github.com/w3c/webrtc-insertable-streams● WebRTC Priority: https://w3c.github.io/webrtc-priority/● WebRTC-DSCP: https://w3c.github.io/webrtc-dscp-exp/

69

https://w3c.github.io/webrtc-pc/

https://w3c.github.io/webrtc-stats/

https://w3c.github.io/webrtc-nv-use-cases/

https://w3c.github.io/webrtc-extensions/

https://github.com/w3c/webrtc-ice

https://github.com/w3c/webrtc-svc

https://github.com/w3c/webrtc-insertable-streams

https://w3c.github.io/webrtc-priority/

https://w3c.github.io/webrtc-dscp-exp/

WebRTC WG Deliverables: Media Capture & Output● Media Capture and Streams (recycled at CR):

● Media Capture Automation (for testing; just adopted): https://w3c.github.io/mediacapture-automation/

● Audio output devices API (CR): https://w3c.github.io/mediacapture-output/ ● MediaCapture from DOM: https://w3c.github.io/mediacapture-fromelement/● Screen Capture: ● Media Capture Image: https://w3c.github.io/mediacapture-image/● Media Recording: ● Content-Hints: https://w3c.github.io/mst-content-hint/● Media Capture Depth: https://w3c.github.io/mediacapture-depth/● Media Capture Extensions: https://github.com/w3c/mediacapture-extensions

70

https://w3c.github.io/mediacapture-main/

https://w3c.github.io/mediacapture-automation/

https://w3c.github.io/mediacapture-output/

https://w3c.github.io/mediacapture-fromelement/


https://w3c.github.io/mediacapture-image/

https://w3c.github.io/mediacapture-record/

https://w3c.github.io/mst-content-hint/

https://w3c.github.io/mediacapture-depth/

https://github.com/w3c/mediacapture-extensions

State of Capture and Output Deliverables● Most (all?) specifications have been implemented by at least one

browser, several in multiple browsers.● Several specifications have gone to CR (or are being recycled at CR). ● Other specifications remain Working Drafts for many months or even

years without advancing.○ Is there is enough energy to get to CR (let alone PR)?○ WPT test coverage varies.

● Privacy is an ongoing concern. ○ “Browser picker” model for media capture under development

(Jan-Ivar). ● Is there a “hidden pool” of individuals who we could motivate to help with

these specifications?○ Or is this another example of the “Keebler Elf Theory of Software”? 71

MediaStream model: sources and sinks

.srcObject

.setSinkId

getDisplayMedia()

.captureStream()

MediaStreamTrack

▶ Element

Canvas

📷 🎙

.captureStream()

🖥▶ Element

getUserMedia() ��

(stream).start()MediaRecorder

��

🎧

��

.createMediaStreamDestination()WebAudio .createMediaStreamTrackSource(track)

ImageCapture (track).takePhoto()

.trackRTCRtpReceiver

RTCPeerConnection

.replaceTrack()RTCRtpSender

.addTrack()Networking RTCPeerConnection

WebAudio

State of Capture and Output Deliverablesmediacapture-main: (getUserMedia, enumerateDevices, MediaStreamTrack)

Camera & microphone. Final CR. 28 issues, mostly minor. ✅✅✅✅ All browsers.Work to reduce fingerprint w/in-browser device picker moved to mediacapture-extensions.

mediacapture-output: (setSinkId / selectAudioOutput)

Speaker selection. CR from 2017. Work picked up in 2020 with a more private picker-based model (selectAudioOutput) that also supports non-mic audio sources. 13 issues.✅✅ Chromium (old API), ☐ Firefox in development (new API), ☐ Safari interest (new API).

mediacapture-from-element: (canvas/element.captureStream())

WD from 2017. 22 minor issues. Mature. No recent activity.canvas.captureStream() ✅✅✅✅ All browsers.element.captureStream() ✅✅✅ most browsers, ❌ Safari. 73

https://github.com/w3c/mediacapture-main


https://github.com/w3c/mediacapture-output

https://w3c.github.io/mediacapture-fromelement

State of Capture and Output Deliverablesmediacapture-screen-share: (getDisplayMedia)

WD from 2019. Mature. ✅✅✅✅ all browsers. 20 issues, mostly minor.Recent 2020 interest in same-origin DOM capture (document.captureStream()), privacy.

mediacapture-image: (takePhoto, more camera constraints: brightness, whiteBalance etc.)

WD from 2017. ✅✅ Chromium. ❌ Firefox. Work picked up in 2020 with pan, tilt & zoom constraints behind extra permission. 16 issues. ☐ PTZ interest from Safari?

mediacapture-record: (MediaRecorder)

WD from 2017. Low activity. ✅✅✅ most browsers, ☐ Safari Tech Preview (pref?). 33 open issues. Mostly around config/codec interop, adding/removing tracks, seekable recordings.

mediacapture-extensions: (New in-browser camera/mic picker, miscellaneous)Extensions repo for Rec updates to mediacapture-main. 74


https://github.com/w3c/mediacapture-image

https://github.com/w3c/mediacapture-record


https://github.com/w3c/mediacapture-main

Recent interest in new use cases and features

1. Raw media accessPerformant access to uncoded bytes of video tracks (MediaStreamTrack).○ Use cases: filters, virtual backgrounds, machine learning

2. Capture HTML renderingSafely capture same-origin-isolated document into video track○ Use cases:

■ Let a page stream itself into a web conference(e.g. “Projecting Google Slides into a conference”)

■ Record a web conference

75

Machine Learning● Machine Learning is increasingly important in Media and Entertainment

scenarios: ○ Background replacement (e.g. blurring, images, etc.)○ Constructed environments (e.g. “together mode”, AR/VR, etc.)○ Accessibility (transcription, translation, etc.)

● W3C Machine Learning Workshop: https://www.w3.org/2020/06/machine-learning-workshop/

● Efficient access to raw media is a pre-requisite. Proposals include: ○ VideoTrackReader API (WebCodecs)○ Insertable Streams for Raw Media (WEBRTC WG)

76


WebCodecs● Currently being incubated in WICG:

https://wicg.github.io/web-codecs/● Provides access to raw media (via VideoTrackReader

interface)● Provides low-level access for encoding and decoding of

audio and video.● Still in the early stages. Known issues/limitations:

○ M86 limitations ○ Decoupled from WHATWG Streams.○ No support for advanced video (simulcast, SVC)○ No support for content protection.○ Performance optimizations (HW encode/decode, GPU, etc.) 77

https://wicg.github.io/web-codecs/

https://docs.google.com/document/d/10S-p3Ob5snRMjBqpBf5oWn6eYij1vos7cujHoOCCCAw/

https://docs.google.com/document/d/10S-p3Ob5snRMjBqpBf5oWn6eYij1vos7cujHoOCCCAw/

Insertable Streams for Raw Media

● Open up the MediaStreamTrack

● Keep it fast● Keep it simple

78

loss/delayadaptation

RTC media flow in WebRTC 1.0

transport decode display

transport networkadaptation encode pre-

processing

network

PeerConnection

capture

getUserMedia

Web Application

Open up the MediaStreamTrack

Media

Feedback

Me dia

Feedback

Javascript

Breakout Box stage 1 -> 2

Stage Two Track Processorfunction addMoustache(videoFrame) { let facePosition = detectFace(videoFrame.data); return addMoustache(videoFrame.data, facePosition);}

processingTrack = new ProcessingMediaStreamTrack(videoTrack);Transformer = new TransformStream({ Transform: (videoBuffer) => { videoFrame.modifyData(addMoustache(videoFrame)); }});processingTrack.readable .pipeThrough(transformer) .pipeTo(processingTrack.writable); 81

Break Apart the MediaStreamTrack

Media

FeedbackMedia

Feedback

Javascript

Breakout Box stage 3 - allows for generating and consuming tracks directly

Status and Next Steps

● Experimental implementation will be landing in Chrome 88

● Start of specification available○ https://github.com/alvestrand/mediacapture-insertable-streams

83

https://github.com/alvestrand/mediacapture-insertable-streams

Capture HTML rendering

Today (getDisplayMedia)● Web-surfaces may be captured by screen-sharing only if users pick them.● Sharing them carries significant risks not understood by users

(malicious sites may do active attacks on WWW’s same-origin policy).● Prevents sites from influencing users to choose these, to deter attacks.● Behind elevated permission (browsers supposed to warn of risks)

Ironically, sharing native apps is safer.Unfortunate, since we’d like to promote web over native.

Prohibitive UX flow for “record this meeting” use case or “Present Google Doc”84

https://blog.mozilla.org/webrtc/share-browser-windows-entire-screen-sites-trust/

https://en.wikipedia.org/wiki/Same-origin_policy

https://support.mozilla.org/en-US/kb/screenshare-safety


Better integration: What if web pages stream themselves into a conference?

The page could use existing tech (RTCPeerConnection) to join an ongoing meeting and stream itself there, if it could capture itself.

● The document needs only capture itself.● To be secure, document must be origin isolated as a matter of policy.● CORS-only allows opt-in which isn’t strong enough, since rendering a

document from another origin is different from reading it.● New policy needed, e.g. Cross-Origin-Embedder-Policy: disallow

85


More secure, but still needs permission: rendering may contain private info

○ link purpling (browser history)○ form autofill (address, credit card info)○ extensions (e.g. LastPass)○ file input element sometimes contains private info

Active attacks could harvest information quickly & covertly (CSS color shading)

These risks are hard to explain to users in a prompt.

86


HTML → Video a powerful paradigm. Remote browsing; stream web apps

Seems lower-level API than screen-sharing in use cases, behavior, challenges & potential

API suggestions:● document.captureStream() or even● canvas.drawImage(document) if we leave out audio

(since we already have canvas.captureStream())

(The latter would put it out of scope for WebRTC)87

Status and Next Steps

● Early idea stage● Is it a good idea / safe? Do we want this?

● Is WebRTC the right WG for this?● What’s the right audience?● Interested?

88

Open Calls for Review Feedback

● <Carine to fill in these slides>

89

Conclusions

● Conclusions and next steps here

90

Thank you

Special thanks to:

WG Participants, Editors & Chairs

91

Documents

WEBRTC WG Joint Meetings W3C TPAC 2020