19
Recording Meetings with the CMU Meeting Recorder Architecture Satanjeev Banerjee, et al. School of Computer Science Carnegie Mellon University

slides

  • Upload
    ronny72

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: slides

Recording Meetings with the CMU Meeting Recorder Architecture

Satanjeev Banerjee, et al.

School of Computer Science

Carnegie Mellon University

Page 2: slides

Carnegie Mellon University 2

Goals

End goal: Build conversational agents That “understand” meetings

E.g.: Identify action items Make contributions to meetings

E.g.: Confirm details of action items Part of Project CALO: Cognitive Agent that

Learns and Organizes First goal: Create corpus of human meetings

Capture data that we expect agents to use E.g.: Speech, video, whiteboard markings, etc.

Page 3: slides

Carnegie Mellon University 3

Desirable Properties of the Recorder Need to record meetings anywhere

Emphasis on instrumenting user, not room Assume low network bandwidth Should still be able to record in the extreme

situation where there is no network access! Should be easy to add new data streams

“Easy” = low time to incorporate new stream Should be able to support major OS-es

Page 4: slides

Carnegie Mellon University 4

The Recorder Architecture

Information stream is discretized into events Either a sequence of events, e.g. utterances Or one long event, e.g. video data

Each event is given start/end time stamps Coincide for instantaneous events, e.g. keystroke

Events are stored on local disks Laptops, shuttle PCs, etc.

Events are (slowly) uploaded to a central server when there is network access

Page 5: slides

Carnegie Mellon University 5

Event Identification and Logging Each recorded event has the following

identifying information associated with it: Start and stop time stamps Name of the meeting and the user Modality (speech, video, hand-writing, etc.)

After recording an event, its identification information is sent to a logging server Server creates a list of all the events in a meeting Good for book-keeping (but not essential)

Page 6: slides

Carnegie Mellon University 6

Time server

Participant 1

Participant 3

Participant 2

Architecture of Meeting Recorder

{DATA_BLOCKsession: OTTERuser: arudnicky

datatype: SPEECHfile: \\spot\data\u1.rawStart: 20030917::18:27.600End: 20030917::18:35.357}

Browse Meeting

P1

P1 P2 P3

P2

P3P1

P1

[master]

Page 7: slides

Carnegie Mellon University 7

Synchronizing the Time Stamps All event time stamps must be synchronized We use the Simplified Network Time Protocol

Query a central NTP server for the time Use the reply and the round-trip time to estimate

time difference between local machine and server Use this to create server-time time stamps

Rough experiments reveal 10ms variance Caveat: Experiments done on high speed network What if there is *no* network access?

Page 8: slides

Carnegie Mellon University 8

Aggregating the Data

Upon network access availability, data is transferred from all sites to a central location Current recording sites: CMU and Stanford

Implemented a cross-platform version of the MS Background Intelligent Transfer Service Uploads files in a transparent background process Throttles bandwidth use as user’s activity goes up Pauses if network connection is lost Resumes once network access is restored

Page 9: slides

Carnegie Mellon University 9

Data Collection Process (proposed)

Transcription, Annotation

Learning

Analysis

CALO

Independent cross-site collection

Independent cross-site collection

Background data

transmission

Background data

transmission researchresearch

integrationintegration

preparationpreparation

MEETINGDATABASE

Page 10: slides

Carnegie Mellon University 10

Capturing Close-Talking Speech Implemented Meeting Recorder Cross

Platform (MRCP) to record speech and notes Speech recorded using head-mounted mics 11.025 kHz sampling rate used for portability End pointing done using CMU Sphinx 3 ASR

Each end-pointed utterance is an event Utterance is recorded to local disk (wav format) Time stamps are generated using Simple NTP Utterance’s identifying information is sent to

logging server, utterance is queued for upload

Page 11: slides

Carnegie Mellon University 11

Capturing Typed Notes

Users type notes in client’s note-taking area “Snapshots” of notes are taken at each

carriage return Each snapshot is an event Each snapshot is saved to disk, time-stamped,

logged, and queued for upload [Demonstration of MRCP]

Page 12: slides

Carnegie Mellon University 12

More Details about MRCP

Implemented using cross platform libraries: wxWidgets for GUI, file access, networking PortAudio for audio libraries

Currently compiles on Windows, Macintosh OS-X and Linux operating systems

Windows version distributed to other Project CALO sites

Macintosh and Linux versions in beta-testing WinCE version in development

Page 13: slides

Carnegie Mellon University 13

Capturing Whiteboard Pen Strokes

We use Mimio to capture whiteboard pen strokes “Strokes” consist of all the x-y coordinates

between pen-down and pen-up Each stroke is an event. It is recorded, time-

stamped, logged, queued for upload.

Page 14: slides

Carnegie Mellon University 14

Capturing Power Point Slides Information We use MS’s PowerPoint API to capture slide

change timing information, and slide contents Events = slide changes Event data = content of the new slide

Content is in the form of all the text, and all the “shapes” on the slide

Events are instantaneous Start and stop time stamps coincide

Events are processed as before

Page 15: slides

Carnegie Mellon University 15

Capturing Panoramic Video

We capture panoramic video using a 4-camera CAMEO device Developed by the Physical

Awareness group at CMU Video recording done in

MPEG-4 format One long event is

produced and uploaded

Page 16: slides

Carnegie Mellon University 16

Current Status of Data Collection Recorded meetings vary widely in size…

From 2 to 10 person meetings …in meeting type

Scheduling meetings, presentations, brain storms …in content

Speech group meetings, dialog group meetings, physical awareness group meetings

Currently have a total of more than 11,000 utterances (including cross talk)

Page 17: slides

Carnegie Mellon University 17

Using the Data: Some Initial Research Question: Can we detect the state of a meeting, and

the roles of participants from simple speech data? Introduced a taxonomy of meeting states and

participant roles

Meeting State Participant Roles

Presentation Presenter, Observer

Briefing Information producer/consumer

Discussion Participator, Observer

Page 18: slides

Carnegie Mellon University 18

Detection Methods and Initial Results Used Anvil to hand annotate 45 minutes of

meeting video with states and roles Trained decision tree classifier from 30

minutes of data Input features:

# speakers, lengths of utterances, pauses and interruptions within a short history of the meeting

Initial results: About 50% detection accuracy on separate 15 minutes of test data

Page 19: slides

Questions?

Thanks to DARPA grant NBCH-D-02-0010