32
1 Speech, Ink, and Slides: The Interaction of Content Channels Richard Anderson Crystal Hoyer Craig Prince Jonathan Su Fred Videon Steve Wolfman Repeat Intro of Self Mention: -Richard -Jonathan In Audience

Writing and Speech Recognition

Embed Size (px)

DESCRIPTION

Writing Recognition, Digital Ink, Speech Recognition

Citation preview

Page 1: Writing and Speech Recognition

1

Speech, Ink, and Slides: The Interaction of Content Channels

Richard AndersonCrystal HoyerCraig PrinceJonathan SuFred VideonSteve Wolfman

Repeat Intro of Self

Mention:-Richard-Jonathan

In Audience

Page 2: Writing and Speech Recognition

2

Background

Content channels simply refers to the various sources of information in some context (e.g. audio, slides, digital ink, video, etc.)

Our focus is on the use of digital ink in the classroom setting

We want to capture/playback/analyze these channels intelligently

Page 3: Writing and Speech Recognition

3

Why do we want to analyze content channels?

We want to make it easier to interact with electronic materials Better search and navigation of

presentations Accessibility for the

hearing/learning/visually impaired Generating text transcripts Recognizing high level behaviors

Conversion to: Braille/Screen Reader

Page 4: Writing and Speech Recognition

4

Distance Learning Classes

Page 5: Writing and Speech Recognition

5

Classroom Presenter

General tool for giving presentations on the Tablet PC

Many similar systems – our findings applicable to all such systems

Enables writing directly on the slides Tablet PC enables high-quality digital ink Used in over 100 courses so far Allows us to collect real usage data

Page 6: Writing and Speech Recognition

6

Questions We Wanted to Explore

High Level Question: What is the potential for automatic analysis of archived content?

Other Questions: How well can digital ink be recognized by itself? How closely are different content channels tied

together? Speech and Ink? Ink and Slide Content?

Can we identify high level behaviors by analyzing the content channels?

Page 7: Writing and Speech Recognition

7

Research Methodology

1. We wanted to understand what real presentation data is like

2. We collected several 100’s of hrs. of recorded lectures from distance learning classes

3. Analyzed the data in various ways to help answer our guiding questions.

• Note: All examples given here are from real presentations!

Page 8: Writing and Speech Recognition

8

Outline

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing

Corrections

Page 9: Writing and Speech Recognition

9

Handwriting Recognition

Classroom lectures on Tablet PC offer interesting challenges for handwriting recognition Somewhat Awkward

• Small Surface to Write On• Bad Angle to the Tablet PC

Hastily Written• Concentrating on Speaking• Excited / Nervous

Page 10: Writing and Speech Recognition

10

Recognition Examples

The Good:

The Bad:

The Ugly:

Mark: Success/Failure

Page 11: Writing and Speech Recognition

11

Recognition Procedure

Studied isolated words/phrases written on slides

Removed all non-textual ink Fed through the Microsoft Handwriting

Recognizer No training done!

Page 12: Writing and Speech Recognition

12

Handwriting Recog. Results

260 (21%)18 (1%)123 (10%)850 (68%)Total

58 (11%)2 <(1%)46 (9%)408 (79%)Prof. E

111 (26%)9 (2%)45 (11%)262 (61%)Prof. D

19 (44%)1 (3%)5 (11%)18 (42%)Prof. C

71 (29%)6 (2%)26 (10%)146 (59%)Prof. B

1 (6%)0 (0%)1 (6%)16 (88%)Prof. A

NoneCloseAlternateExact

Mention That These Results Are Surprisingly Good!

Each Row Represents a Different Lecturer

Page 13: Writing and Speech Recognition

13

Outline

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing

Corrections

Look at Potential

Page 14: Writing and Speech Recognition

14

Joint Writing and Speech Recognition

Co-expression of ink and speech Is digital ink spoken as it is written?

Yes, but how often? How “closely” to the written text?

Can speech be used to disambiguate handwriting?

Can handwriting be used to disambiguate speech? (incl. deictic references)

In Time/Accuracy, Wanted Empirical Evidence

Page 15: Writing and Speech Recognition

15

Examples

Difficult for Speech and Ink Recognition

Difficult Written Abbreviations

Speech/Ink Used to Disambiguate Ink/Speech

DigiMon

Java 2 Enterprise Edition

Eswaran, Gray, Loric, Traiger

corn flakes

Page 16: Writing and Speech Recognition

16

Experiment

Examined instances of isolated word writing Selected word writing episodes at random

but uniformly from the various instructors Generated transcripts manually from the

audio Checked whether the instructor spoke the

exact word written Measured the time between the written and

spoken word

Page 17: Writing and Speech Recognition

17

Speech/Text Co-occurrence Results

Exact Approx None Simul 0-2s > 2s

A 1 (100%) 0 (0%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)

B 9 (75%) 3 (25%) 0 (0%) 12 (100%) 0 (0%) 0 (0%)

C 9 (82%) 2 (18%) 0 (0%) 10 (91%) 1 (9%) 0 (0%)

D 12 (86%) 2 (14%) 0 (0%) 10 (71%) 4 (29%) 0 (0%)

E 9 (56%) 7 (44%) 0 (0%) 7 (44%) 4 (25%) 5 (31%)

Total 40 (74%) 14 (26%) 0 (0%) 40 (74%) 9 (17%) 5 (9%)

Each Row Represents a Different Lecturer

Page 18: Writing and Speech Recognition

18

Outline

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing

Corrections

Page 19: Writing and Speech Recognition

19

Attentional Mark Identification

Attentional Marks are… First step is to Identify a stroke as a

mark Tying Attentional Marks to slide

content is important Attentional Ink provides a concrete link

between speech and slide content!

Page 20: Writing and Speech Recognition

20

Example

Page 21: Writing and Speech Recognition

21

Method

Segmentation Few strokes Close spatial and temporal proximity

Mark Recognition Created hand tuned classifiers for:

Circles, Lines, Bullets/Ticks

Matched with slide content

Page 22: Writing and Speech Recognition

22

Experiment

1. Identified and Classified Attention Marks by Hand

Two different people per slide Identified type of mark as well as slide

content mark referred to

2. Identified Attention Marks Automatically

3. Compared Resulting Identification

Page 23: Writing and Speech Recognition

23

Content Matching Issues

Hard to determine exactly what content a mark refers to

Not just a recognition Issue, but also related to HOW people draw

Page 24: Writing and Speech Recognition

24

Content Matching Cont.

Granularity of content parsing can be an issue

Page 25: Writing and Speech Recognition

25

Attentional Ink Recognition Accuracy

532118 (22%)50 (9%)35 (7%)329 (62%)

8735 (40%)0 (0%)0 (0%)52 (60%)Bullets

33966 (20%)44 (13%)22 (6%)207 (61%)Underlines

10617 (16%)6 (6%)13 (12%)70 (66%)Circles

Non-MatchCloseExact to Punctuation

Exact

Page 26: Writing and Speech Recognition

26

Outline

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing

Corrections

Page 27: Writing and Speech Recognition

27

Recongizing Corrections

Why? Want to answer the broad question:

- “Can we recognize patterns of activity by analyzing the ink and speech channels?”

Useful for Presenters- Occurs frequently (about 1-3 per lecture)

But Non-trivial

Our vision allows falsepositives

Page 28: Writing and Speech Recognition

28

Recognizing Corrections

Identified Six Types of Corrections

Looked through large # of lectures, wide range of marks

Page 29: Writing and Speech Recognition

29

Example Results

No Table Because: 1. Not a robust experiment2. Proof of Concept

Page 30: Writing and Speech Recognition

30

Wrap-up

We wanted to understand the nature of real data to direct our focus when building tools for automatic analysis

Our studies provided the necessary understanding to accomplish this

Page 31: Writing and Speech Recognition

31

Wrap-up (Cont.)

Specific Results: Basic handwriting recognition is

surprisingly good Very strong co-occurrence of written and

spoken words We were able to identify attentional

marks and the content associated with them

Activity Recognition: There are certain high-level activities that we can identify

ALL OPEN for Refinement

Page 32: Writing and Speech Recognition

32

Questions?

[email protected]

[email protected]

Classroom Presenter Websitehttp://www.cs.washington.edu/education/dl/presenter/