Video Fingerprinting and Applications: A Review

Video Fingerprinting and Applications: a review Jian Lu Vobile, Inc.

Media Forensics & Security Conference EI’09, San Jose, CA

From Research to Applications

1999 1999 2008

What’s Video Fingerprinting

•  A video fingerprint is a unique identifier extracted from video content –  Video fingerprints are often just string of bits,

representing some “signatures” of the video content, and usually not in fixed length.

–  Video fingerprinting refers to the process of extracting fingerprints from the video content.

–  Comparing to watermarking, fingerprinting does not add to or alter video content.

–  Also known as “robust hashing”, “perceptual hashing”, “content-based copy detection (CBCD)” in research literature.

Human vs. Video Fingerprint

Human Fingerprint Video Fingerprint

Uniquely identify human Uniquely identify video

Physical form Digital form

Pictorial Time-based binary

Identification by Fingerprint

Video identification

Human identification

Video Fingerprinting Algorithms

Desired Properties

•  Robust –  Largely invariant for the same content under various

types of processing, conversion, and manipulation. •  Discriminating

–  Distinctly different for different content.

•  Compact –  Low data rate

•  Low complexity –  Fast fingerprint generation and matching

Type of Video Signatures

Spatial Signatures

Temporal Signatures

Color Signatures

Transform-D Signatures

Granularity Whole frame Group of

frames Bins of histograms

3D transforms on GOP

Blocks or other types of subdivision

Down-sampled frames

Frame transforms

Points of interest Key frames

Every frame

Variants of Spatial Signatures

•  Block-based – Quantized mean block intensity – Luminance block patterns ✪

•  ordinal ranking of average block intensity

– Differential luminance block patterns ✪ •  Centroid of gradient orientations •  Dominant edge orientation

•  Points-of-interest – Corner features (Harris points) – Scale-space features

An Example of Spatial Signature

Variants of Temporal Signatures

•  Temporal luminance patterns –  Ordinal ranking of average frame or block intensity in

a group of frames •  Temporal differential luminance patterns ✪

–  Sum of absolute pixel or block difference – quantized and thresholded

–  Block motion vectors – histogram of quantized directions

•  Shot duration sequence

Color Signatures

•  Histogram-based – Level-quantized histogram, e.g., (32, 16, 16)

for Y, U, V, followed by magnitude quantization on each bin ✪

– Level-quantized histogram, followed by ordinal ranking of histogram bins by magnitude

Transform-Domain Signatures

•  Affine transformation resilient – Polar Fourier transform – Radon transform ✪ – Singular Value Decomposition

•  Energy compaction – 3D DCT – 3D Wavelet transform

Which One to Use?

•  Spatial signatures, particularly block-based, are the overall category winner, and most widely used.

•  Temporal and color signatures are less robust, but can be used along with spatial signatures to enhance discriminability.

•  Transform-domain signatures are computationally expensive and not widely used in practice.

•  The weakness of block-based spatial signatures is their lack of resilience against excessive geometric distortion, e.g., rotation and cropping.

Challenges of Geometric Distortions

Original

Rotation by 10 degrees Rotation + Cropping

Fingerprinting performance

•  Video fingerprint using block-based spatial signatures – Data size: a few hundreds bits per frame or

<10 Kbps – Speed: 1/10 playback time (10x RT) or faster

for standard-def video.

Fingerprint Matching and Search

Similarity Measures

•  Distance-based ✪ –  L1 (Manhattan) or L2 (Euclidean) distance

•  For non-binary signatures •  Weights can be assigned when multiple signatures are used

–  Hamming Distance •  For binary signatures

•  Probability-based

–  Probabilistic models for common distortion vectors

Complexity of Fingerprint Search

•  Exhaustive search has linear complexity, or O(K*N) –  N is the size of reference fingerprint DB, in minutes or

hours. –  K is length of the query video. –  N can be further decomposed into M*L

•  M is number of reference video fingerprints in DB •  L is the average length of video fingerprints in DB

•  The curse is on N or M, the DB size.

Strategies for Fast Search

Strategies Fingerprint Search Motion Vector Search

Reduce search space ✪ LSH

Greedy search Sequential alignment Hierarchical search

Early exit Hamming distance > T SAD > T

Approximation in distance calculation Frame down-sampling Block down-sampling

Locality Sensitive Hashing (LSH)

•  Consider ε-NNS problem, –  For a query point q, find an approximate point p such

that d(q,p) < (1+ε) d(q,P) –  LSH guarantees p can be found, with high probability,

in O(N1/(1+ε)) •  Geometric reasoning:

–  Close points in space are likely to be close after hashing (e.g., a projection onto a lower dimensional space)

–  By using multiple hash functions, the probability of close points falling close is increased

Other Approximation Techniques

•  Multi-resolution coarse-to-fine search –  Fine-level search can be terminated (early exit) if

coarse-level search is far off. –  Rank candidates by coarse-level search scores and

take only top N candidates for fine-level search. •  Adaptive hashing – “learning to hashing”

–  Hashing is non-deterministic; system is trained to adapt to identification task and data.

–  A substantial reduction in search space.

Applications

UGC & P2P – copyright concerns?

•  UGC Traffic in 07/2007 (Source: comScore, November 30, 2007) –  70 million people viewed 2.5 billion videos on YouTube.com (39.4% of total

UGC audience) –  38 million people viewed 360 million videos on MySpace.com (22.6% of total

UGC audience)

•  P2P Traffic 2007 (Source: iPoque, November 28, 2007) –  Average 50-60% total Internet traffic: 49% in Middle East; 83% in Eastern

Europe. –  BitTorrent 66.7%, eDonkey 28.6% of total P2P traffic

P2P UGC

Video Content Registration

•  A reference video fingerprint database is pre-populated.

•  Two types of information are stored with video fingerprint data in the reference database –  Metadata, e.g., title, owner, release date, etc. –  Business rules, e.g., allow, filter, or advertise, possibly

based on certain conditions •  MovieLabs’ Content Recognition Rules (CRR) is an industry

standard interface for expressing and exchanging rules.

Video Content Filtering

Video Content Tracking

Example: Video Content Tracking

Tracking Olympic Video Distribution

Other Applications

•  Broadcast monitoring –  Audit TV program and commercial airings

•  Contextual Ads (monetization) –  Pair ads with identified content like Google AdSense

•  Video asset management –  Content-based IDs identify linkage between edits and

sources

•  Content-based video search –  Query by video clip

Summary

•  Research in video fingerprinting began a decade ago; it had developed into a technology and been adopted by the industry.

•  Different types of signatures are used to form a video fingerprint, including spatial, temporal, color, and transform-domain signatures.

•  Spatial signatures are overall winner judged by multiple criteria, and widely adopted as primary signatures; temporal and color signatures can be used as secondary signatures to enhance discriminability.

•  Brute-force, exhaustive fingerprint search is an O(K*N) problem. •  Fast approximate algorithms make fingerprint search tractable and

scalable for practical applications.

•  Current applications focus on copyright enforcement, other applications being developed and experimented include contextual advertising, asset management, and content-based video search.

Technology

Video Fingerprinting and Applications: A Review