Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Robust View-Invariant Representation for Classification

and Retrieval of Image and Video Data

Xu ChenUniversity of Illinois at Chicago

Electrical and Computer EngineeringMarch/01/2010

Outline

Background and MotivationRelated WorkProblem StatementExpected Contributions Null Space Invariants

Tensor Null Space

Localized Null Space

Non-linear Kernel Space Invariants

Bilinear Invariants

Background

Within the last several years, object motion trajectory-based recognition has gained significant interest in diverse application areas including:

sign language gesture recognition, Global Positioning System (GPS), Car Navigation System (CNS), animal mobility experiments, sports video trajectory analysis automatic video surveillance .

Motivation

Accurate activity classification and recognition algorithms in multiple view is an extremely challenging task.

Object trajectories captured from different view-points lead to completely different representations, which can be modeled by affine transformation approximately.

To get a view independent representation, the trajectory data is represented in an affine invariant feature space.

Related Work

[Stiller, IJCV, 1994] math formulation of NSI [Bashir et al., ACM multimedia, 2006]

Curvature scale space (CSS), Centroid distance function (CDF) representation, only works with small camera motions

[Chellapa et al., TIP, 2006] PCNSA for activity recognition, [Huang et al., TIP, 2008] correlation tensor analysis

[Chang et al., PAMI, 2008] kernel methods with multilevel temporal alignment, not view invariant

Problem Statement and Approach

Development of efficient view invariant representation, indexing/retrieval, and classification techniques for motion based events

Null Space in a particular basis is invariant in the presence of arbitrary affine transformations.

Demonstration of enormous potential in computer vision, especially in motion event, activity recognition and retrieval.

Null Space Invariants

Let be a single 2-D point, i = 0,1,… ,N-1 . Motion trajectory can be represented in the n 2-D points in a matrix M:

null space H:

Where q is a n by 1 vector, H is a matrix spanned by the vector (linearly independent basis) with the size n by (n-3).

Null Space Invariants (NSI)

Typically, each element in H is given by:

Null Space based Classification/Retrieval Algorithm

Normalization the length of trajectories. Taking 2D FFT, selecting the N largest coefficients and then taking 2D

IFFT. Computation of NSI for the normalized raw data and

vectorizing the NSI. Once we obtain the n by n-3 NSI H, we convert H into the n(n-3) by 1 vector.

Applying Principal Component Null Space Analysis (PCNSA) on vectorized NSI.

There are various classification and retrieval algorithms we could apply on NSI.

Normalization Example to 25 samples

Details of PCNSA1. Obtain PCA Space: Evaluate total covariance matrix , then apply PCA to

the total covariance matrix to find W(pca), whose columns are the L leading eigenvectors.

2. Project the data vectors, class means and class covariance matrices into the corresponding data vectors, class means, and class covariance matrices in the PCA space.

3. Obtain ANS: Find the approximate null space , for each class i by choosing M(i) smallest eigenvalues’ corresponding eigenvectors.

Details of PCNSA

4. Obtain Valid Classification Directions in ANS: Say If any direction e(i) satisfies this direction is said

valid direction and used to build valid ANS, W(NSA, i).

5. Classification: PCNSA finds distances from a query trajectory X to all classes :

d(X, i)=||W(NSA, i) (X-m(i)||, where m(i) is the mean for each class. We choose the smallest distance to a class for classification of X.

6. Retrieval: We compute the distance of the querytrajectory Y to any other trajectory X(i) by d(X, i)=||W(NSA, i)(X(i)-Y||.

Classification Performance

We plot the classification accuracy verus the number of classes with 20 trajectories in each class (up to 40 classes).

Classification Performance

We plot the classification accuracy with the number of trajectories (up to 40 trajectories in each class)

Retrieval Performance

Recall

Precisio

To further demonstrate the view invariance nature of our system, we populate the CAVIAR dataset with 5 rotated versions for each trajectories in the class by rotating the trajectories with -60, -30, 0, 30, 60 degrees.

Apply PCNSA on NSI;

Directly using PCA on NSI

Visual illustration for retrieval results with 20 classes with motion trajectories from CAVIAR dataset for the motion events ”chase” and ”shopping and leave” for fixed cameras from unknown views. (query and top 2 retrieval)

Applications of NSI in image retrieval

Facial recognition

Extract SIFT

as feature points

The raw data matrix is not necessarily of the size 3 by n.

Image retrieval results

Perturbation Analysis

So the ratio of the output error (error on null space) and input error (error on the raw data) is:

Z the noise matrix on the raw data

The ratio of the energy of the signal for NSI and the energy of the noise on NSI.

Optimal Sampling

Given the perturbation, designing optimal sampling strategy .

Uniform sampling and Poisson sampling are utilized.

Arbitrary trajectories in x and y directions:

x=f(t), y=g(t)

Expanding the trajectory in Macluarin series.

Optimal Sampling

Property 2: The rate parameter = O(N) should be chosen for Poisson sampling to guarantee the convergence of the error ratio, where N is the total number of samples.

Optimal Sampling

In our framework, the density corresponds to the average number of samples per unit-length; i.e.

Arbitrary Moving Cameras and Segmented NSIFixed cameras from unknown views (all the features

points undergo the same global affine transformations). Classification and retrieval problem is further compound

(the feature points can undergo different affine transformations).

Computing null space of segmented trajectories yields higher accuracy since the orientations and the translations for adjacent points are very close, therefore they have more similar null space representation locally.

Overlapping segmentation and non-overlapping segmentation. (Assumption)

Query Rank 1 Rank 2

global

Overlap

segment

Entering the shop

16 NSI

optimal samplingWithout Poisson sampling

With Poisson sampling

The same trajectory with different representation due to camera motions

The example of the trajectory ”all” and affine versions with and without Poisson sampling with lamda=0.8. NS representations with Poisson sampling (on the right) are more similar than the ones without sampling (on the left). Poisson sampling greatly attenuates the noisy effects.

Without sampling With Poisson sampling

Classification Accuracy

Retrieval Time

Comparison

ASL dataset, 20 classes with 40 trajectories in each class

Tensor Null Space (TNSI)

Fundamental mathematical framework for tensor NSI

View-invariant classification and retrieval of multiple motion trajectories.

Definition of Tensor Null Space

Conditions for rotational invariance:

Applying affine transformation T (m) on the mth unfolding of the multi-dimensional data M, if the resulting tensor null space Q is invariant in the mth dimension, then it is referred to as mode-m invariant.

M(1), M(2), M(3) are unfolding of the three dimensional tensor from different dimensions. M(1): I1 by I2I3 M(2): I2 by I1I3 M(3): I3 by I1I2

Definition of Tensor Null Space

Conditions for translation invariance for tensor null space:

Due to the invariance of rotation,

Motion Event Tensor

We align each trajectory as two rows in a matrix according tox and y coordinates, and the number of rows of a matrix isset to be twice the number of the objects in the motion eventunder analysis

P: the length of normalized trajectories

J: Twice of the number of trajectories

K: Number of video samples

Simulation results for TNSI

The accuracy of proposed classification system versus number of classes. There are 20 tensors in each class. Simulation results show that our system preserves its efficiency even for higher number of different classes (J (three trajectory in each clip)=3, P (length of trajectories)=18, K (video clips)=20). (unfolding in K )

Accuracy values versus increase in the number of tensors within a class. There are 20 classes in the system.

Localized Null Space

Consider the view invariant video classification and retrieval. partial queries dynamical video database.

Efficient updating and downdating procedures for the representation of dynamic video databases.

Localized Null Space is one of the ways to solve the problem.

Localized Null Space (LNS)Localized Null Space (LNS)

Localized Null Space relies on different key points in different segments.

Localized Null SpaceLocalized Null Space

Proposed Localized Null Space

Zero elements

Traditional Null Space

Structure of Localized Null SpaceStructure of Localized Null Space

Illustration of the structure of the traditional Null Space and the proposed Localized Null Space.

Zero elements

3 Non-Zero elements

Zero elements

K-3Non-Zero elements for W1

N-K-3Non-Zero elements for W2

Zero elements

Splitting of Raw Data SpaceSplitting of Raw Data Space

Deterministic splitting The length of the feature vector and the key points

are known to the users. LNS provides perfect solution

Random splitting The length of the feature vector and the key points

are not available to the users. Splitting and key points must be estimated.

Optimal SplittingOptimal Splitting

where D is the distortion for random splitting given by

and P(L) is the distribution of the segment with length L, and K is the optimal segmentation length. Solving the minimization problem, we obtain

Optimal Key Points Selection within Optimal Key Points Selection within Each segmentEach segment

where C is the probability that all the key points are in the range .

Benefits of LNSBenefits of LNS

The localized null space can be viewed as consisting multiple subspaces and therefore can be dynamically split for retrieval of partial queries.

Localized Null Space can be used to merge multiple Null Spaces into an integrated Null Space.

Localized Null Space has the same complexity as the traditional null space.

Visual illustration of the facial image B and part of rotate image A with identical localized null space representations.

LNS ExampleLNS Example

Non-linear Kernel Space Invariants(NKSI)

Invariance to non-linear transformationRelying on Taylor expansions to

approximate the non-linear transformations with linear transformations

Application: Standard Perspective Transformation

When k=2

Standard Perspective Transformation

Unequal multiple trajectory representation

Bilinear Invariants

AXB=0, where A and B are raw data matrices, X is the invariant basis.

When A and B are subject to different linear transformations from the left and right side respectively, X is invariant.

Bilinear Invariants

Retrieval of unequal multiple trajectories

Conclusion

Null Space an effective and robust tool for classification and

retrieval of motion events

segmentation of null space can further improve the performance for arbitrary moving cameras

Tensor Null Space

high order data

Conclusion

Localized Null Space Dynamic updating of the database

Partial Query, Splitting and Merging of Null Space

Non-linear Kernel Space Invariants Invariance to non-linear transformation

Bilinear Invariants suitable for different lengths of features, different dimensions of raw

Publications

Journal Papers: 1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and

trajectory estimation of mobile object using minimum samples,'' IEEE Transactions on Vehicular Technology (TVT), volume 8, issue 9, 2009, pp 4439-4446.

2. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part I: View Invariant Motion Trajectory Analysis and Image Classification and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), revised, 2009.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part II: Localized Null Space Representation for Dynamic Image and Video Databases," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), submitted, 2009.

Publications

Conference Papers:1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and trajectory

estimation of mobile object with a single sensor,'' IEEE Statistical Singal Processing Workshop (SSP'07), Madison, Wisconsin, 2007.

2. Eser Ustunel, Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08), Las Vegas, Nevada, 2008.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust closed-form localization of mobile targets using a single sensor based on a non-linear measurement model,'' IEEE International Workshop Singal Processing Advances in Wireless Communications (SPAWC'08), Recife, Pernambuco, Brazil, 2008.

4. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust null space representation and sampling for view invariant motion trajectory analysis,'' IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, Alaska, 2008.

5. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust multi-dimensional null space representation for image retrieval and classification,'' IEEE Conference on Image Processing (ICIP'08), San Diego, California, 2008.

6. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, "View-Invariant Tensor Null Space Representation For Multiple Motion Trajectory Retrieval and Classification," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'09) (Invited Paper), Taibei, Taiwan, 2009.

Publications

Conference Papers:7. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localized Null-Space

representation for Dynamic Updating and Downdating in Image and Video Databases,'' IEEE International Conference on Image Processing (ICIP'09), Cairo, Egypt, 2009.

8. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP‘10), Dallas, Texas, 2010.

Thanks !

Questions ?

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Documents

BRISK: Binary Robust Invariant Scalable Keypointsmargaritachli.com/papers/ICCV2011paper.pdf · BRISK: Binary Robust Invariant Scalable Keypoints Stefan Leutenegger, Margarita Chli

Jürgen Wolf 1 Wolfram Burgard 2 Hans Burkhardt 2 Robust Vision-based Localization for Mobile Robots Using an Image Retrieval System Based on Invariant

An Efficient and Robust Technique for Region Based Shape Representation and Retrieval

Content-based Image Retrieval Using Rotation-invariant ...takigu/pdf/2015/p443.pdf · textual image retrieval becomes impractical and ine cient. About the semantic-based, current

Tuning Before Feedback: Combining Ranking Discovery and Blind Feedback for Robust Retrieval*

Exploring techniques for robust management and …mdv/courses/CM30082/projects.bho/2008-9/BUR… · Exploring techniques for robust management and retrieval of personal information

Compact Deep Invariant Descriptors for Video Retrievalvijaychan.github.io/Publications/main_1107_v5.pdfCompact Deep Invariant Descriptors for Video Retrieval Yihang Lou 1;2, Yan Bai

Robust image watermarking using local invariant featureshklee.kaist.ac.kr/publications/optical engineering(2006-3).pdf · Robust image watermarking using local invariant ... Most

Originally published in: Research Collection ...7684/eth... · BRISK Binary Robust Invariant Scalable Keypoints ... Binary Robust Invariant Scalable Keypoints ... tablished leaders

Similarity-invariant Sketch-based Image Retrieval in Large

Compact Environment-Invariant Codes for Robust Visual ... · Compact Environment-Invariant Codes for Robust Visual Place Recognition Unnat Jain Dept. of Computer Science University

Sorted random projections for robust rotation-invariant ...€¦ · Sorted random projections for robust rotation-invariant texture classiﬁcation Li Liua,n, Paul Fieguthb, David

A Blind Robust Watermarking Scheme for 3D Triangular … · A Blind Robust Watermarking Scheme for 3D ... embedded in invariant-transform domains generally maintain synchronization

Temperature Distribution Descriptor for Robust 3D Shape Retrieval · 2011-06-04 · Temperature Distribution Descriptor for Robust 3D Shape Retrieval Yi Fang Purdue University West

Illumination Invariant Imaging: Applications in Robust ...mobile/Papers/2014ICRA_maddern.pdf · invariant point features have been presented [13], but these rely on training images

ROBUST ZEBRA-CROSSING DETECTION USING BIPOLARITY AND ... · ROBUST ZEBRA-CROSSING DETECTION USING BIPOLARITY AND PROJECTIVE INVARIANT Mohammad Shorif Uddin and Tadayoshi Shioyama

DeepFashion: Powering Robust Clothes Recognition … · DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations Ziwei Liu1 Ping Luo3,1 Shi Qiu2 Xiaogang

UNIVERSITY OF MIAMI ROBUST TOPOLOGICALLY INVARIANT SET … · Robust Topologically Invariant Set Operations on (December 2005) 2D Semi-Algebraic Sets Abstract of a thesis at the University

Constructing robust chaos: invariant manifolds and expanding cones. · 2019. 6. 26. · Constructing robust chaos: invariant manifolds and expanding cones. P.A. Glendinning† and

1999 - Invariant Content-based Image Retrieval Using a Complete Set of Fourier-Mellin Descriptors