1
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task Jurandy Almeida GIBIS Lab, Institute of Science and Technology, Federal University of S˜ ao Paulo – UNIFESP [email protected] Introduction Developed in the MediaEval 2016 Pre- dicting Media Interestingness Task and for its video subtask only. The goal is to automatically select the most interesting video segments ac- cording to a common viewer. The focus is on features derived from audio-visual content or associated tex- tual information. Proposed Approach It relies on combining learning-to-rank algo- rithms and exploiting visual information: 1. A simple histogram of motion patterns is used for processing visual information. 2. A majority voting scheme is used for combining machine-learned rankers and predicting the interestingness of videos. Visual Features Low-Level & Mid-Level Features: Not used Applying an algorithm to encode visual properties from video segments. “Comparison of Video Sequences with Histograms of Motion Patterns” [1]. It relies on three steps: 1. partial decoding; 2. feature extraction; 3. signature generation. 106 111 100 88 91 94 95 90 90 93 96 91 1 1 2 1 2 1 0 3 Previous Current Next Temporal Spatial Time Series of Macroblocks Video Frames I-frames Macroblock Pixel Block Histogram Distribution DC coefficient 1: Partial Decoding 2: Feature Extraction 3: Signature Generation Motion Pattern 0101100110010011 Histograms of Motion Patterns (HMP) Learning to Rank Strategies Ranking SVM [5]: Use the traditional SVM classifier to learn a ranking function. RankNet [2]: Probability distribution metrics as cost functions to be optimized. RankBoost [4]: Regression error on weighted distri- butions of pairwise rankings. ListNet [3]: Extension of RankNet that uses a ranked list instead of pairwise rankings. Majority Voting [6]: The label with the most votes is selected as the label for a given instance. Input Rankers R 1 R 2 R N O 1 O 2 O N Combining Rankings Output ˆ o Experimental Protocol 4-fold cross validation Development data 5,054 videos from 52 movie trailers Test data 2,342 videos from 26 movie trailers Mean Average Precision (MAP) Configurations of Runs Run Learning-to-Rank Strategy 1 Ranking SVM 2 RankNet 3 RankBoost 4 ListNet 5 Majority Voting Experimental Results Results obtained on the development data. Results of the official submitted runs. Ranking SVM RankNet RankBoost ListNet Majority Voting MAP (%) 10 11 12 13 14 15 16 17 18 19 20 0 5 10 15 20 25 MAP (%) Ranking SVM RankNet RankBoost ListNet Majority Voting 18.15 16.17 16.17 16.56 14.35 AP per movie trailer achieved in each run. video-52 video-53 video-54 video-55 video-56 video-57 video-58 video-59 video-60 video-61 video-62 video-63 video-64 video-65 video-66 video-67 video-68 video-69 video-70 video-71 video-72 video-73 video-74 video-75 video-76 video-77 0 10 20 30 40 50 60 70 Average Precision (%) Ranking SVM RankNet RankBoost ListNet Majority Voting The learning-to-rank algorithms provide complementary infor- mation that can be combined by fusion techniques aiming at pro- ducing better results. Remarks The proposed approach has explored only visual properties. Different learning- to-rank strategies were considered, in- cluding a fusion of all of them. Results demonstrate that the proposed approach is promising. By combining learning-to-rank algorithms, it is possible to make a contribution to better results. Future Works The investigation of a smarter strategy for combining learning-to-rank algorithms and considering other information sources to include more features semantically related to visual content. Acknowledgements This research was supported by Brazilian agencies FAPESP, CAPES, and CNPq. References [1] J. Almeida, N. J. Leite, and R. S. Torres. Compar- ison of video sequences with Histograms of Motion Patterns. In ICIP, pages 3673–3676, 2011. [2] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton and G. N. Hullender. Learn- ing to rank using gradient descent. In ICML, pages 89–96, 2005. [3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML, pages 129–136, 2007. [4] Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining prefer- ences. Journal of Machine Learning Research, 4:933– 969, 2003. [5] T. Joachims. Training linear SVMs in linear time. In ACM SIGKDD, pages 217–226, 2006. [6] L. Lam and C. Y. Suen. Application of majority vot- ing to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Systems, Man, and Cybernetics, Part A, 27(5):553–568, 1997.

MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

Embed Size (px)

Citation preview

Page 1: MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

UNIFESPatMediaEval 2016:PredictingMedia InterestingnessTask

Jurandy AlmeidaGIBIS Lab, Institute of Science and Technology, Federal University of Sao Paulo – UNIFESP

[email protected]

Introduction• Developed in the MediaEval 2016 Pre-

dicting Media Interestingness Taskand for its video subtask only.

• The goal is to automatically select themost interesting video segments ac-cording to a common viewer.

• The focus is on features derived fromaudio-visual content or associated tex-tual information.

Proposed Approach

It relies on combining learning-to-rank algo-rithms and exploiting visual information:

1. A simple histogram of motion patternsis used for processing visual information.

2. A majority voting scheme is used forcombining machine-learned rankers andpredicting the interestingness of videos.

Visual Features• Low-Level & Mid-Level Features: Not used

• Applying an algorithm to encode visualproperties from video segments.

– “Comparison of Video Sequences withHistograms of Motion Patterns” [1].

• It relies on three steps:

1. partial decoding;

2. feature extraction;

3. signature generation.

106 111

100 88

91 94

95 90

90 93

96 91

1 1

2 1

2 1

0 3

Previous Current Next

Temporal Spatial

Time Series of Macroblocks

Video Frames

I-frames

Macroblock

Pixel Block

Histogram Distribution

DC coefficient

1: Partial Decoding

2: Feature Extraction

3: Signature Generation

Motion Pattern

0101100110010011

Histograms of Motion Patterns (HMP)

Learning to Rank Strategies

• Ranking SVM [5]: Use the traditional SVM classifierto learn a ranking function.

• RankNet [2]: Probability distribution metrics as costfunctions to be optimized.

• RankBoost [4]: Regression error on weighted distri-butions of pairwise rankings.

• ListNet [3]: Extension of RankNet that uses a rankedlist instead of pairwise rankings.

• Majority Voting [6]: The label with the most votesis selected as the label for a given instance.

Input

Rankers R1 R2 RN

O1 O2 ON

Combining Rankings

Output o

Experimental Protocol

• 4-fold cross validation

• Development data

– 5,054 videos from 52 movie trailers

• Test data

– 2,342 videos from 26 movie trailers

• Mean Average Precision (MAP)

Configurations of Runs

Run Learning-to-Rank Strategy1 Ranking SVM2 RankNet3 RankBoost4 ListNet5 Majority Voting

Experimental Results

Results obtained on the development data. Results of the official submitted runs.

RankingSVM

RankNet

RankBoost

ListNet

Majority

Voting

MAP(%

)

10

11

12

13

14

15

16

17

18

19

20

0

5

10

15

20

25

MAP(%

)

RankingSVM

RankNet

RankBoost

ListNet

Majority

Voting

18.15

16.1716.17 16.56

14.35

AP per movie trailer achieved in each run.

video−52

video−53

video−54

video−55

video−56

video−57

video−58

video−59

video−60

video−61

video−62

video−63

video−64

video−65

video−66

video−67

video−68

video−69

video−70

video−71

video−72

video−73

video−74

video−75

video−76

video−77

0

10

20

30

40

50

60

70

Average

Precision

(%)

Ranking SVM

RankNet

RankBoost

ListNet

Majority Voting

The learning-to-rank algorithmsprovide complementary infor-mation that can be combined byfusion techniques aiming at pro-ducing better results.

Remarks• The proposed approach has explored only

visual properties. Different learning-to-rank strategies were considered, in-cluding a fusion of all of them.

• Results demonstrate that the proposedapproach is promising. By combininglearning-to-rank algorithms, it is possibleto make a contribution to better results.

Future WorksThe investigation of a smarter strategy for combining learning-to-rank algorithms and consideringother information sources to include more features semantically related to visual content.

Acknowledgements

This research was supported by Brazilian agencies FAPESP, CAPES, and CNPq.

References

[1] J. Almeida, N. J. Leite, and R. S. Torres. Compar-ison of video sequences with Histograms of MotionPatterns. In ICIP, pages 3673–3676, 2011.

[2] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier,M. Deeds, N. Hamilton and G. N. Hullender. Learn-ing to rank using gradient descent. In ICML, pages89–96, 2005.

[3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li.Learning to rank: from pairwise approach to listwiseapproach. In ICML, pages 129–136, 2007.

[4] Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer.An efficient boosting algorithm for combining prefer-ences. Journal of Machine Learning Research, 4:933–969, 2003.

[5] T. Joachims. Training linear SVMs in linear time. InACM SIGKDD, pages 217–226, 2006.

[6] L. Lam and C. Y. Suen. Application of majority vot-ing to pattern recognition: an analysis of its behaviorand performance. IEEE Trans. Systems, Man, andCybernetics, Part A, 27(5):553–568, 1997.

1