39
1 Generating a time shrunk lecture video by event detection Presented by: Mona Ragheb Yara Ali Supervised by: Dr. Aliaa Youssif

Generating a time shrunk lecture video by event

Embed Size (px)

DESCRIPTION

Generating a time shrunk lecture video by event

Citation preview

Page 1: Generating a time shrunk lecture video by event

1

Generating a time shrunk lecture video by event detection

Presented by: Mona Ragheb

Yara Ali

Supervised by: Dr. Aliaa Youssif

Page 2: Generating a time shrunk lecture video by event

2

Agenda

Introduction Generating lecture video using virtual camera work Event detection steps Evaluation Results Conclusion References

Page 3: Generating a time shrunk lecture video by event

3

Introduction

E-learning has become a popular method used in higher education.

However, video recording by a cameraman and videoediting takes a long time and costs a great deal.

To solve this problem, a system has been developedto generate dynamic lecture video using virtual camerawork from the high resolution images recorded by a HDV (high-definition video) camcorder

Page 4: Generating a time shrunk lecture video by event

4

How the system works?

The system generates a lecture video:

1. Using virtual camerawork based on shooting techniques of broadcast cameramen

2. By cropping from the high resolution image to track the region of interest (ROI) such as the instructor.

3. Generate time shrunk video using event detection

Camera motion analysis is used to detect scene changes.

Page 5: Generating a time shrunk lecture video by event

5

Shooting techniques

People invariably make the same sets of mistakes when they first start shooting video:

1. Trees or telephone poles sticking out of the back of someone's head

2. Interview subjects who are just darkened blurs because there was bright light in the background

3. Boring shots of buildings with no action

Page 6: Generating a time shrunk lecture video by event

6

Event Detection

Two kinds of events were detected:

1. A speech period

2. Chalkboard writing period

Page 7: Generating a time shrunk lecture video by event

7

Kinds of event detection

1. Speech period:

It’s detected by voice activity detection with LPC cepstrum and classified into speech or non-speech using Mahalanobis distance

Page 8: Generating a time shrunk lecture video by event

8

LPC

Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.

A spectral envelope is a curve in the frequency-amplitude plane, derived from a Fourier magnitude spectrum. It describes one point in time (one window, to be precise).

Page 9: Generating a time shrunk lecture video by event

9

How LPC works?

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz.

The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Page 10: Generating a time shrunk lecture video by event

10

Formants

Page 11: Generating a time shrunk lecture video by event

11

Kinds of event detection (Cont.)2. Chalkboard writing period It’s detected by using a graph cuts technique to

segment a precise region of interests such as an instructor.

By deleting content-free period, i.e, period without the events of speech and writing, and fast-forwarding writing periods, our method can generate a time shrunk lecture video automatically.

Page 12: Generating a time shrunk lecture video by event

12

Generating lecture video using virtual camera work A HDV camcorder is located at the back of the

classroom to videotape images with high resolution (1,400 × 810 pixels), which contain the whole area of the chalkboard, so that students can read the handwritten characters on the chalkboard.

Problem: it is impossible to display the high resolution image on the small screen of a general notebook PC.

Page 13: Generating a time shrunk lecture video by event

13

Generating lecture video using virtual camera work (Cont.)

Page 14: Generating a time shrunk lecture video by event

14

Solution?

1. The system detects a moving object by temporal differencing

2. The timing for virtual camerawork is detected using bilateral filtering and zero crossing.

Page 15: Generating a time shrunk lecture video by event

Bilateral Filter

The bilateral filter was introduced by Tomasi et al. as a non-iterative means of smoothing images while retaining edge detail.

It involves a weighted convolution in which the weight for each pixel depends not only on its distance from the center pixel, but also its relative intensity.

15

Page 16: Generating a time shrunk lecture video by event

16

Bilateral Filter• (a) and (b) show the potential of bilateral filtering for the

removal of texture. The picture "simplification" illustrated by figure 2 (b) can be useful for data reduction without loss of overall shape features in applications such as image transmission, picture editing and manipulation, image description for retrieval.

Page 17: Generating a time shrunk lecture video by event

17

Generating lecture video using virtual camerawork (Cont.) If the ROI has a large movement, this period of the video is classified into

panning, and if the ROI has no motion but voice activity, this period is classified into zooming.

Panning is used to show motion and speed. It is a technique that requires practice since it has to be done in one smooth, continuous motion.

Page 18: Generating a time shrunk lecture video by event

18

Event detection steps

1. Voice activity detection

2. Chalkboard writing detection1. Object detection and segmentation

2. Generation of current chalkboard image

3. Chalkboard writing detection

3. Generating a time shrunk video

Page 19: Generating a time shrunk lecture video by event

19

1- Voice activity detection

Page 20: Generating a time shrunk lecture video by event

1- Voice activity detection (Cont.) Whenever you do a finite Fourier transform, you're

implicitly applying it to an infinitely repeating signal. So, for instance, if the start and end of your finite sample don't match then that will look just like a discontinuity in the signal, and show up as lots of high-frequency nonsense in the Fourier transform, which you don't really want.

20

Page 21: Generating a time shrunk lecture video by event

1- Voice activity detection (Cont.) If your sample happens to be a beautiful sinusoid

but an integer number of periods don't happen to fit exactly into the finite sample, your FT will show appreciable energy in all sorts of places nowhere near the real frequency. You don't want any of that.

Windowing the data makes sure that the ends match up while keeping everything reasonably smooth; this greatly reduces the sort of "spectral leakage" described in the previous paragraph

21

Page 22: Generating a time shrunk lecture video by event

2-Chalkboard writing detection1-Object detection and segmentation Extracting a precise object region is needed for

detecting periods of writing characters on chalkboard.

Temporal differencing ( object detection ) is robust to lighting change.

Temporal differencing can not extract all foreground pixels of moving objects so another technique is used to support ( Graph cuts technique )

Page 23: Generating a time shrunk lecture video by event

23

2-Chalkboard writing detection1-Object detection and segmentation

Page 24: Generating a time shrunk lecture video by event

24

2- Chalkboard writing detection2-1-Object detection and segmentation (Cont.)

Page 25: Generating a time shrunk lecture video by event

Image Segmentation

Image segmentation is an important problem in computer vision and medical image analysis.

The objective of image segmentation is to provide a visually meaningful partition of the image domain. Although it is usually an easy task for human to separate background and different objects for a given image.

25

Page 26: Generating a time shrunk lecture video by event

26

2- Chalkboard writing detection2-2- Generation of current chalkboard image

Page 27: Generating a time shrunk lecture video by event

27

2- Chalkboard writing detection2-3- Chalkboard writing detection

Page 28: Generating a time shrunk lecture video by event

28

2-3-Chalkboard writing detection (Cont.)

Page 29: Generating a time shrunk lecture video by event

29

3- Generating a time shrunk video

Page 30: Generating a time shrunk lecture video by event

30

3- Generating a time shrunk video

Page 31: Generating a time shrunk lecture video by event

31

Evaluation

We videotaped 3 lectures (each video is 90 min long) by HDV camcorder.

In this evaluation, we use “recall” and “precision” for determining effectiveness of detection result.

Page 32: Generating a time shrunk lecture video by event

Precision Vs Recall• Precision (also called positive predictive value)

• Recall (also known as sensitivity) both calculate no of correct returned in in total frames

In this figure the relevant items are to the left of the straight line while the retrieved items are within the oval. The red regions represent errors. On theleft these are the relevant items not retrieved (false negatives), while on the right they are the retrieved items that are not relevant (false positives).

Page 33: Generating a time shrunk lecture video by event

33

Results

There are 200 false positive frames in a 90 min video because of the students’ voices.

Page 34: Generating a time shrunk lecture video by event

34

Results

Page 35: Generating a time shrunk lecture video by event

35

Results

Page 36: Generating a time shrunk lecture video by event

36

Conclusion

This paper presents a novel approach for generating a time shrunk lecture video using event detection.

Our method detects speech periods by voice activity detection and chalk board writing periods by a combination of object detection and segmentation techniques.

By deleting the content-free periods and fast-forwarding the chalkboard writing periods, our method can generate a time shrunk lecture video automatically.

The resulting generated video is about 20%∼30% shorter than the original video in time. This is almost the same as the results of manual editing by a human operator.

Page 37: Generating a time shrunk lecture video by event

37

References

1. http://www.vision.cs.chubu.ac.jp/04/pdf/e-learning08.pdf

2. http://research.cs.tamu.edu/prism/lectures/sp/l9.pdf

3. http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Linear_predictive_coding.html

4. http://www.ee.columbia.edu/~dpwe/e4896/lectures/E4896-L06.pdf

Page 38: Generating a time shrunk lecture video by event

38

ANY QUESTIONS?

Page 39: Generating a time shrunk lecture video by event

39

THANK YOU !