Upload
jianlu
View
6.362
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This presentation reviews the development in video fingerprinting technology in the past decade and its applications in content identification.
Citation preview
Video Fingerprinting and Applications: a review Jian Lu Vobile, Inc.
Media Forensics & Security Conference EI’09, San Jose, CA
From Research to Applications
1999 1999 2008
What’s Video Fingerprinting
• A video fingerprint is a unique identifier extracted from video content – Video fingerprints are often just string of bits,
representing some “signatures” of the video content, and usually not in fixed length.
– Video fingerprinting refers to the process of extracting fingerprints from the video content.
– Comparing to watermarking, fingerprinting does not add to or alter video content.
– Also known as “robust hashing”, “perceptual hashing”, “content-based copy detection (CBCD)” in research literature.
Human vs. Video Fingerprint
Human Fingerprint Video Fingerprint
Uniquely identify human Uniquely identify video
Physical form Digital form
Pictorial Time-based binary
Identification by Fingerprint
Video identification
Human identification
Video Fingerprinting Algorithms
Desired Properties
• Robust – Largely invariant for the same content under various
types of processing, conversion, and manipulation. • Discriminating
– Distinctly different for different content.
• Compact – Low data rate
• Low complexity – Fast fingerprint generation and matching
Type of Video Signatures
Spatial Signatures
Temporal Signatures
Color Signatures
Transform-D Signatures
Granularity Whole frame Group of
frames Bins of histograms
3D transforms on GOP
Blocks or other types of subdivision
Down-sampled frames
Frame transforms
Points of interest Key frames
Every frame
Variants of Spatial Signatures
• Block-based – Quantized mean block intensity – Luminance block patterns ✪
• ordinal ranking of average block intensity
– Differential luminance block patterns ✪ • Centroid of gradient orientations • Dominant edge orientation
• Points-of-interest – Corner features (Harris points) – Scale-space features
An Example of Spatial Signature
Variants of Temporal Signatures
• Temporal luminance patterns – Ordinal ranking of average frame or block intensity in
a group of frames • Temporal differential luminance patterns ✪
– Sum of absolute pixel or block difference – quantized and thresholded
– Block motion vectors – histogram of quantized directions
• Shot duration sequence
Color Signatures
• Histogram-based – Level-quantized histogram, e.g., (32, 16, 16)
for Y, U, V, followed by magnitude quantization on each bin ✪
– Level-quantized histogram, followed by ordinal ranking of histogram bins by magnitude
Transform-Domain Signatures
• Affine transformation resilient – Polar Fourier transform – Radon transform ✪ – Singular Value Decomposition
• Energy compaction – 3D DCT – 3D Wavelet transform
Which One to Use?
• Spatial signatures, particularly block-based, are the overall category winner, and most widely used.
• Temporal and color signatures are less robust, but can be used along with spatial signatures to enhance discriminability.
• Transform-domain signatures are computationally expensive and not widely used in practice.
• The weakness of block-based spatial signatures is their lack of resilience against excessive geometric distortion, e.g., rotation and cropping.
Challenges of Geometric Distortions
Original
Rotation by 10 degrees Rotation + Cropping
Fingerprinting performance
• Video fingerprint using block-based spatial signatures – Data size: a few hundreds bits per frame or
<10 Kbps – Speed: 1/10 playback time (10x RT) or faster
for standard-def video.
Fingerprint Matching and Search
Similarity Measures
• Distance-based ✪ – L1 (Manhattan) or L2 (Euclidean) distance
• For non-binary signatures • Weights can be assigned when multiple signatures are used
– Hamming Distance • For binary signatures
• Probability-based
– Probabilistic models for common distortion vectors
Complexity of Fingerprint Search
• Exhaustive search has linear complexity, or O(K*N) – N is the size of reference fingerprint DB, in minutes or
hours. – K is length of the query video. – N can be further decomposed into M*L
• M is number of reference video fingerprints in DB • L is the average length of video fingerprints in DB
• The curse is on N or M, the DB size.
Strategies for Fast Search
Strategies Fingerprint Search Motion Vector Search
Reduce search space ✪ LSH
Greedy search Sequential alignment Hierarchical search
Early exit Hamming distance > T SAD > T
Approximation in distance calculation Frame down-sampling Block down-sampling
Locality Sensitive Hashing (LSH)
• Consider ε-NNS problem, – For a query point q, find an approximate point p such
that d(q,p) < (1+ε) d(q,P) – LSH guarantees p can be found, with high probability,
in O(N1/(1+ε)) • Geometric reasoning:
– Close points in space are likely to be close after hashing (e.g., a projection onto a lower dimensional space)
– By using multiple hash functions, the probability of close points falling close is increased
Other Approximation Techniques
• Multi-resolution coarse-to-fine search – Fine-level search can be terminated (early exit) if
coarse-level search is far off. – Rank candidates by coarse-level search scores and
take only top N candidates for fine-level search. • Adaptive hashing – “learning to hashing”
– Hashing is non-deterministic; system is trained to adapt to identification task and data.
– A substantial reduction in search space.
Applications
UGC & P2P – copyright concerns?
• UGC Traffic in 07/2007 (Source: comScore, November 30, 2007) – 70 million people viewed 2.5 billion videos on YouTube.com (39.4% of total
UGC audience) – 38 million people viewed 360 million videos on MySpace.com (22.6% of total
UGC audience)
• P2P Traffic 2007 (Source: iPoque, November 28, 2007) – Average 50-60% total Internet traffic: 49% in Middle East; 83% in Eastern
Europe. – BitTorrent 66.7%, eDonkey 28.6% of total P2P traffic
P2P UGC
Video Content Registration
• A reference video fingerprint database is pre-populated.
• Two types of information are stored with video fingerprint data in the reference database – Metadata, e.g., title, owner, release date, etc. – Business rules, e.g., allow, filter, or advertise, possibly
based on certain conditions • MovieLabs’ Content Recognition Rules (CRR) is an industry
standard interface for expressing and exchanging rules.
Video Content Filtering
Video Content Tracking
Example: Video Content Tracking
Tracking Olympic Video Distribution
Other Applications
• Broadcast monitoring – Audit TV program and commercial airings
• Contextual Ads (monetization) – Pair ads with identified content like Google AdSense
• Video asset management – Content-based IDs identify linkage between edits and
sources
• Content-based video search – Query by video clip
Summary
• Research in video fingerprinting began a decade ago; it had developed into a technology and been adopted by the industry.
• Different types of signatures are used to form a video fingerprint, including spatial, temporal, color, and transform-domain signatures.
• Spatial signatures are overall winner judged by multiple criteria, and widely adopted as primary signatures; temporal and color signatures can be used as secondary signatures to enhance discriminability.
• Brute-force, exhaustive fingerprint search is an O(K*N) problem. • Fast approximate algorithms make fingerprint search tractable and
scalable for practical applications.
• Current applications focus on copyright enforcement, other applications being developed and experimented include contextual advertising, asset management, and content-based video search.