View
46
Download
0
Category
Preview:
DESCRIPTION
Report on Action Recognition using Graph cut
Citation preview
Action Recognition using Graph-Cut (I)
J.Iveel
2014-9-24
Intro
• Proposed to recognize human action from video using Graph-Cut approach.
• Algorithmic stages can be defined as follows:– Pre-GraphCut: Input video segment S should be
converted into graphical representation Gs(V,E) – Pro-GraphCut: Given Gs(V,E) and optimum action-
category-labels Lv for its node V, which is the output from Graph-Cut, select a set of sub-graph where action(s) of interest might happened.
Some Notation
• “Video segment”, S, refers to a set of local feature point extracted at X location, described by descriptor D:
• “Confidence score” refers to a likelihood of class label l given observation o :
Pre-GraphCut
• Converting video segment into graphical representation requires:
(1)Breaking down whole video segment S into spatio-temporal grids. Each grid volume is node Vi connected to its neibhourhood by edge Ei in graph Gs.
(2)Assigning confidence-score for node Vi
Node Confidence-Score
• The most challenging problem is (2): assigning confidence score for each node:– Node is, simply, a set of feature points within
grid volume:
– Therefore, node confidence can be defined by an unknown function, g, over these feature points inside.
Node Confidence-Score
• The naïve approach is to find confidence-score for each feature point inside node and accumulate these scores to get node-score:
Then, let us find feature confidence-score, i.e, likelihood of class l given local feature fj.
Feature Confidence-Score (1)
• Target is to measure:
Feature Confidence-Score (2)
• Constructed BOV histogram for each test video segment, with centroids C:
• Trained binary linear SVM, to produce a support vector for class label l:
Feature Confidence-Score (3)
• Given a feature point from test segment, then its confidence score: (1) Hard Assignment:
(2) N-Soft Assignment:
Experiment: Feature Confidence (1)
• Hard-Assignment case:
Experiment: Feature Confidence (2)
• N-Soft Assignment case:
Node Cost-Value (1)
• Graph-Cut framework, it minimizes the total penalty/cost value of single nodes and neighborhood nodes given node label configuration L:
• Node cost score is inversely proportional to the likelihood or confidence score:
Node Cost-Score (2)
• Assuming node confidence score is a sum of feature point scores (using hard assignment):
• Considered following inverse relationship to derive node cost score:
(1) Nlog ( Negative Log-likelihood)
(2) Norm ( Negative Normalized Confidence Score)
(3) Naive ( Negative Raw Confidence Score)
Method 1: NLog
• Probabilistic interpretation: According Platt[1], he showed interpreting SVM confidence score in a probabilistic manner using a parametric form of a sigmoid to :
• Negative Log Likelihood: In MRF (Graph-Cut), the cost values often associated with neg-log of the measurement of noise. Similar, once confidence values are translated into probability, operation is applied to derive cost score:
•
•
Method 2: Norm
• The confidence score is scaled between 0 and 1. Then cost value is associated with the negative of these values:
Method 3: Naive
• The cost value is directly associated with the negative of the raw confidence score:
Method 3: Naive
• The cost value is directly associated with the negative of the raw confidence score:
Experiment: Node Cost Score (1)
• With default parameters, Naive approach, surprisingly, outperforming other two methods. The worst performance is observed with the Norm method
• The NLog approach performed lesser than my personal expectation. The reason, maybe, associated with the tuning parameters, A and B, of the sigmoid equation:
• In particular, the parameter A is in control of slope. Let's inspect this parameter's effect on the performance
Experiment: Node Cost Score (2)
• NLog approach: Sigmoid parameter A's effect on the performance
Experiment: Node Cost Score (3)
Num Method Avg. Recognition
1 Nlog ( optimized parameter) 96.8 %
2 Norm 95.8 %
3 Naive 93.5 %
Conclusion
• Future work will explore: – Alternative construction of video graph:
● Instead of defined grid, use super-voxel for choosing node region.
– Single feature confidence score:● Instead of BOF, using VLAD descriptor for
obtaining more discriminative representation of feature.
Conclusion
• In this slides, the two main questions being explored, which all related to construction of video graph G and proposed a few methods and did an experiment on the KTH dataset.
– (i) Assign confidence score at feature-level ● Soft-assignment● Hard-assignment
– (ii) Assigning confidence score at node-level● Nlog ( Negative likelihood )● Norm● Naive
Recommended