Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Measuring and Correcting Lip Sync
Progress on a Standard At Last!
Points of Discussion Next What is a media fingerprint?
Fingerprint generation schemes
Lip Sync Detection
Compare the Content
Fingerprint Standardization
What is a Media Fingerprint ?
A form of identification for a piece of audio and video media
Can be used to identify or recognize a specific piece media later in time or downstream in a system
Key Attributes of a Media Fingerprint Does not alter the media itself
Compact, ideally much, much smaller than the media itself
Robust and Resistant: Able to survive normal processing of the media
Efficient: Economical to generate as well as compare or search in a database
Fundamental Difference Between Finger Prints and Watermarks
Watermarks ALTER sources by putting some form of
mark or identifier in the media in a hard to remove way
Fingerprints extract properties from a source then store or
transport it separately
Actually the watermark is usually not visible/audible to the human eye or ear.
Fingerprints vs. Watermarks Quality or Property Fingerprints Watermarks
Content Remains Unchanged Yes No
Survive Process Manipulation Yes – Depends on Algo Less – Depends on Algo
Unique Yes Not Necessarily
Can Be Removed No Yes
Used Retrospectively Yes No
How Transported Typically Stored / Transported Separately
Transported in the Signal
Proprietary Yes*** Yes
Fingerprints vs. Watermarks Quality or Property Fingerprints Watermarks
Content Remains Unchanged Yes No
Survive Process Manipulation Yes – Depends on Algo Less – Depends on Algo
Unique – Can differentiate content and versions
No Yes
Can Be Removed No Yes
Used Retrospectively Yes No
How Transported Typically Stored / Transported Separately
Transported in the Signal
Proprietary Yes*** Yes
Fingerprints vs. Watermarks Quality or Property Fingerprints Watermarks
Content Remains Unchanged Yes No
Survive Process Manipulation Yes – Depends on Algo Less – Depends on Algo
Unique Yes Not Necessarily
Can Be Removed (hacked) No, not embedded In some cases yes
Used Retrospectively (after the fact) Yes No
How Transported Typically Stored / Transported Separately
Transported in the Signal
Proprietary Yes*** Yes
Fingerprints vs. Watermarks Quality or Property Fingerprints Watermarks
Content Remains Unchanged Yes No
Survive Process Manipulation Yes – Depends on Algo Less – Depends on Algo
Unique Yes Not Necessarily
Can Be Removed (hacked) No In some cases yes
Used Retrospectively (after the fact) Yes No
How Transported Typically Stored / Transported Separately
Transported in the Signal
Proprietary Yes*** Yes
Fingerprints vs. Watermarks Quality or Property Fingerprints Watermarks
Content Remains Unchanged Yes No
Survive Process Manipulation Yes – Depends on Algo Less – Depends on Algo
Unique Yes Not Necessarily
Can Be Removed No Yes
Used Retrospectively (after the fact) Yes No
How Transported Typically Stored / Transported Separately
Transported in the Signal
Proprietary Yes*** Yes
Points of Discussion
Next
What is a media fingerprint?
Fingerprint generation schemes
Lip Sync Detection
Compare the Content
Fingerprint Standardization
Basic Fingerprinting Concept
Audio FP
Algo
Baseband Audio and Video Program Signal Program Payload: 156 Mbytes /Sec 1080i60 signal w 6 PCM audio ch
…
V a1 a2 an
Change Based Video Fingerprint Algorithm
Optional Timestamp Audio and Video Fingerprint Stream multiplex
~700 Bytes / sec 13 bytes / field-frame for Video w 6 Ch Audio
Basic Fingerprinting Concept
Audio FP
Algo
Baseband Audio and Video Program Signal Program Payload: 156 Mbytes /Sec 1080i60 signal w 6 PCM audio ch
…
V a1 a2 an
Change Based Video Fingerprint Algorithm
Optional Timestamp Audio and Video Fingerprint Stream multiplex
~700 Bytes / sec 13 bytes / field-frame for Video w 6 Ch Audio
Next
A Sample Video Fingerprint Generation Algorithm
A(1)
A(N)
B(1)
B(N)
Consider two consecutive fields or
frames in a video program
…
…
…
Sample N points in each of the consecutive
field or frames
…
…
…
Field or Frame X
Field or Frame X+1
A Sample Video Fingerprint Generation Algorithm
A(1)
A(N)
B(1)
B(N)
…
…
…
Sample N points in each of the
consecutive field or frames
…
…
…
Field or Frame X
Field or Frame X+1
Sample Points are not necessarily uniformly
distributed as long as they are distributed the same
way in both images
A Sample Video Fingerprint Generation Algorithm
Samples Very
Different?
no = 0 Yes = 1
1 0 0 0 1 1 0 0 1 1
…
…
…
…
…
… Repeat for every pair of samples in the
images
1 N Sample
Compare the same samples in the two
image
A(1)
A(N)
B(1)
B(N) Field or Frame
X Field or Frame
X+1
If samples are very different,
indicate a 1 in that sample position, if not
indicate a 0
A Sample Video Fingerprint Generation Algorithm
Samples Very
Different?
no = 0 Yes = 1
Normalize to 1 Byte
1 0 0 0 1 1 0 0 1 1
Normalized Change Index: 0 - 255
Count of all the 1’s (very different samples)
Change Index: (0 -N)
…
…
…
…
…
…
1 N Sample
Field or Frame X
Field or Frame X+1
Consider a series of consecutive fields or
frames in a video program
A finger print is extracted from each consecutive
field or frame
Field or Frame X
Field or Frame X+1
t
Field or Frame N
Field or Frame N+1
Video Fingerprint Generation Algorithm
Field or Frame X Field or Frame X+1
t
Field or Frame N Field or Frame N+1
Fingerprint Compare
X change index
= Fingerprint Compare
N change index
=
A 1 byte change index is calculated between
consecutive fields or frames
Video Fingerprint Generation Algorithm
Matching Two Fingerprints
Delta Based Video Fingerprint Algorithm
Streams of Change Index 1 Byte / Field or Frame
Video System
Video Program A Video Program A’
Delta Based Video Fingerprint Algorithm
Fingerprint Comparison Program to
Program Delay (16.6 mSec resolution)
Video Matching Factor (%)
Simple Matching Process
Fingerprint Comparison
Change Index Streams 1 Byte / Field or Frame
255
0
Time (t)
255
0
<-DLY->
Time (t)
Simple Convolution engine
used to look for matching patterns in the two fingerprint streams
Matching Patterns are identified
If Match Factor exceeds threshold, Delay between the two
patterns is established and reported
Match Factor (%) Program to Program Delay
Patterns don’t need to match perfectly. The more they match the
higher the Match factor
• Based on changes in image (levels and motion) • Very simple to generate fingerprints
– requires very little h/w and s/w resources – Can be implemented on monitoring DA grade card with negligible
impact on cost • Very simple to compare two fingerprints
– Can be done in real time using simple h/w or a few microprocessor instructions (no floating point)
– Hundreds of signals and multiple points per signal can be handled by one average PC
• Fingerprint stream is very compact – 1 Byte per image or 60 Bytes per second for 60Hz video – 0.0004% of baseband HD signal data rate – 0.02 % of HD signal compressed to 20 Mbps
Characteristics Of This Fingerprint Scheme
Delta Based Video Fingerprint Algorithm
• Insensitive to: – Normal video level / color changes (ex.: PROCAMP) – Video scaling (up / down conversion) and resolution – Video Compression (DCT or Wavelet-J2K)
• Weakened but not disabled by: – Aspect ratio change including adding bars on side or
top/bottom – Adding of graphics occupying less than 40% of image
• Disabled by: – Frame rate conversion (e.g. 50 to 60 Hz) – Prolonged periods of freeze
Characteristics Of This Fingerprint Scheme (2)
Delta Based Video Fingerprint Algorithm
Basic Fingerprinting Generation Concept
Audio FP
Algo
Baseband Audio and Video Program Signal
156 Mbytes /Sec 1080i60 signal w 6 PCM audio ch
…
V a1 a2 an
Change Based Video Fingerprint Algorithm
Optional Timestamp Audio and Video Fingerprint Stream multiplex
~700 Bytes / sec 13 bytes / fld-frm for Video w 6 Ch Audio
Next
Audio Fingerprint Algorithm One Channel of
Digital Audio Program (48 KHz Sampling)
Variation Compare
Fingerprint Engine
923 b/s 115 bytes/s
=
A 115 byte/s variation based audio fingerprint
Audio Fingerprint Algorithm
Extract Envelope
Extract Mean
One Channel of Digital Audio Program
(48 KHz Sampling)
Absolute Value
Audio Fingerprint Algorithm (2)
Sample Envelope
Greater than Mean
Sub Sample to Reduce Bit Rate
1 0 0 0 1 1 0 0 1 1
No = 0 Yes = 1
48K b/s
Mean
Envelope
923 b/s 115 bytes/sec
1 0 0 1 1 0 1 Audio Fingerprint
For every sample point, compare envelope value
to mean value
If Envelope > Mean indicate a 1
Result is 48 Kbps Signature based on
Envelope to Mean Variations
Matching Two Audio Fingerprints
Envelope to Mean Audio Fingerprint
Algorithm
115 Bytes/s audio fingerprint streams
Audio System
Audio Program
Audio Program’
Fingerprint Comparison Program to
Program Delay (1 ms resolution)
Content Match Factor (%)
Envelope to Mean Audio Fingerprint
Algorithm
Simple Matching Process 1 Kbps audio fingerprint streams
1
0
1
0
<-DLY->
Time (t)
Simple Convolution engine
used to look for matching patterns in the two fingerprint
streams Matching Patterns
are identified
Delay between the two patterns is established and reported delay resolution is
1 mS
Match Factor (%) Program to Program Delay
Program A
Program B Time (t)
Program A
Program B
Audio Fingerprint Comparison
Fingerprint Streams don’t
need to match perfectly. The more
they match the higher the Match factor
• Based on Variations in Audio Envelope • Very simple to generate fingerprints
– requires very little h/w and s/w resources – Implementable on monitoring DA grade card with negligible
impact on cost
• Very simple to compare two fingerprints – Can be done in real time using simple h/w or a few microprocessor
instructions (no floating point) – Hundreds of signals and multiple points per signal can be handled
by one average PC
• Fingerprint stream is very compact (115 bytes/s) and High resolution (1 mSec)
– 0.08 % of baseband audio payload
Characteristics Of Audio Fingerprint Scheme
Envelope to Mean Audio Fingerprint
Algorithm
• Insensitive to common processing – Bit rate reduction (aka compression) – Level shifts including Dynamic Compression – Sample rate conversion
• Disabled by: – Mixing in new content (ex.: voice over) – Content with no frequency changes (ex.: 1 kHz tone)
• UpMix and DownMix cases are handled by the use of additional signatures
Characteristics Of Audio Fingerprint Scheme (2)
Envelope to Mean Audio Fingerprint
Algorithm
Points of Discussion What is a media fingerprint?
Fingerprint generation schemes
Lip Sync Detection
Compare the Content
Fingerprint Standardization
Next
TV Production
and Delivery System
Non intrusive device analyses
video+audio and generates a
low bit rate signature
Media Fingerprinting High Level Concept Audio and Video Programs
Standard Network
Audio and Video Programs
Audio+Video Fingerprint Algorithm
Fingerprint Database
Audio+Video Fingerprint Algorithm
compare
Signature is either stored or
time-stamped and streamed into a
standard IT network
Downstream, a second device
generates a finger print of the audio and
video
Content Same ? Audio to Video Delay Same ?
The two signatures are compared to
establish if content is the same and what the relative delay is
Content In Server
Media Fingerprinting High Level Concept TV
Production and
Delivery System
Audio and Video Programs
Audio and Video Programs
Audio+Video Fingerprint Algorithm
Audio+Video Off –Line
Fingerprint Algorithm What Content is This
Fingerprint Database
Search
Fingerprint algorithm can also be applied to file based content
sitting on a server
LAN / WAN
Fingerprints are stored on a server anc can later be
searched for content match
Fingerprints are stored on a server and can later be
searched for content match
Traditional Content Comparison
Primary Server
Backup Server
Branding Main
Branding Backup
2x1 Change
-over
Primary Path
DTH return
Cable return
CC/VBI Inserter
Neilsen Encoder
Audio Process
Distribution Encoder
CC/VBI Inserter
Neilsen Encoder
Audio Process
Return IRD
Up Link
Backup Path
R o u t e r
Server Playback Primary and Backup
Branded Outputs Primary, Backup & Final
Returns
Multi Viewer
Primary Server
Backup Server
R o u t e r
2x1 Change
-over
Primary Path
Secondary Path
DTH return
Cable return
CC/ VBI Neilsen Audio
CC/ VBI Neilsen Audio
Return IRD
Branding
Operators typically monitor multiple points in playout chain
Content is Valid and Same Across Entire
Distribution Path
Up Link
Distribution Encoder
Branding
Traditional Content Comparison Scenario
Channels are monitored by an operator looking at the monitor wall Operator watches & sometimes listens for operational errors such as wrong content, missing graphics, lip sync As channel count grows or channel complexity increases, so must operator count
Fingerprint Based Comparison Scenario
Primary Server
Backup Server
Branding Main
Branding Backup
2x1 Change
-over
Primary Path
DTH return
Cable return
CC/VBI Inserter
Neilsen Encoder
Audio Process
Distribution Encoder
CC/VBI Inserter
Neilsen Encoder
Audio Process
Return IRD
Up Link
Backup Path
R o u t e r
Server Playback Primary and Backup
Branded Outputs Primary, Backup & Final
Returns
Multi Viewer Finger Print Comparison Engine Compare All Points Exception
Notification
Fewer Operators…. Yet Potentially Better Comparison
Finger Print Comparison Engine
Exception Notification
• Fingerprints must be robust to survive multiple conversion, encoding, decoding
• Fingerprints must be “focusable” to detect specific problems such as missing graphics
• Fingerprint generation must be simple so that it can be incorporated inside simple devices (e.g. routers, DA’s, IRDs)
• Multi-Vendor Support so not all the devices have to be from the same vendor
Content Comparison FP Requirements
Points of Discussion What is a media fingerprint?
Fingerprint generation schemes
Lip Sync Detection
Compare the Content
Fingerprint Standardization
Next
Lip Sync Errors Can Happen In Many Places in Broadcast Chain
Mobile studio
Encode + Tx
Special event venue IRD Encode
+ Tx
Network Facility Branding
+ Proc
Server
Local Station
Return IRD
Local Process
Encode + Tx
Rx + Decode
Off Air IRD
Facility Infra-
structure
Service Provider (Cable, DTH, IPTV)
Encode + Tx
Rx+ Decode
Fingerprint Comparison Provides Delay Measurement
System Program Video + N Ch Audio
Fingerprint Generation
Fingerprint Generation
Video Audio
Ch 1
Audio Ch N
…
Fingerprint Comparison Audio to Video Delay Calculation
Video Audio
Ch 1
Audio Ch N
…
Video Audio
Ch 1
Audio Ch N
…
Overall Program Delay Video to Audio Lip Sync Error (per channel)
Ch N Audio FP Comparison Ch 2 Audio FP Comparison
Ch 1 Audio FP Comparison Video FP Comparison
Fingerprint Comparison Provides Delay Measurement
1
0
1
0
15
0
15
0
<- Vid -> DLY
Ch1 <-Aud-> DLY
PGM A
PGM B
PGM A
PGM B
If Video Delay ≠ Audio Delay
we have lip sync errors
Can establish Lip Sync error as small
as 1 mS
Can establish Lip Sync error across any number of audio
channels
Compare delays
2 Point Monitoring: Output + Return
Primary Path
Secondary Path
Branding
Branding
2x1
CC/VBI encoder
CC/VBI encoder
A/V Proc
A/V Proc
Up Link
Distrib Enc
Return IRD
With only two points to compare a simple h/w
module based solution is well
suited Lip Sync Probe
FP Gen
FP Gen FP
Comp
Overall Pgm Delay Video to Audio Delay for Each Audio Ch.
Alarm for Excess
2 Point Lip Sync Monitoring: Out vs Cable Return
mary Path
dary Path
A/V Proc
A/V Proc
Up Link
Distrib Enc
Service Provider (Cable, DTH)
Encode + Tx
Rx+ Decode
DTH return Cable return
Return IRD
2x1
What if the problem is at the Cable or Satellite
Distributor Lip Sync Probe
FP Gen
FP Gen FP
Comp
Overall Pgm Delay Video to Audio Delay for Each Audio Ch.
Alarm for Excess
What if I want to monitor more than 2 points in my system and establish where Lip Sync
error is introduced
Multi-Point Fingerprint Correlation and Lip-sync Detection
End to End Monitoring Including Multiple Points
IRD Encode + Tx
Broadcast Facility
Branding + Proc
DTH return Cable return
Server
Service Provider
Decode Re-encode
Return IRD
Generate FP
Facility Infra-
structure
Generate FP Generate
FP
Generate FP
Generate FP
Generate FP
Standard Ethernet Network
Ideal Scenario Is to Have Fingerprint Generation built into key devices in the chain
IRD Encode + Tx
Broadcast Facility
Branding + Proc
DTH return Cable return
Server
Service Provider
Decode Re-encode
Return IRD
Multi-Point Fingerprint Correlation and Lip-sync Detection
Facility Infra-
structure
Standard Ethernet Network
FP
FP
FP
FP
FP
FP
What about cases where signals travel through are distributed across multiple sites ?
Central NOC or Monitoring Point Multi-Point Fingerprint Correlation and
Lip-sync Detection
Remote Fingerprints Streamed via Standard IP Network
Mobile studio
Encode + Tx
Special event venue IRD Encode
+ Tx
Network Facility Branding
+ Proc
Server
Local Station
Return IRD FP
FP FP
Local Process
Encode + Tx
Rx + Decode
Off Air IRD
Facility Infra-
structure
FP
FP
IP WAN
Central NOC or Monitoring Point Multi-Point Fingerprint Correlation and
Lip-sync Detection
Remote Fingerprints Streamed via Standard IP Network
Mobile studio
Encode + Tx
Special event venue IRD Encode
+ Tx
Network Facility Branding
+ Proc
Server
Local Station
Return IRD FP
FP FP
Local Process
Encode + Tx
Rx + Decode
Off Air IRD
Facility Infra-
structure
FP
FP
Local Lip Sync Management
IP WAN
Automatic Correction Of
Lip Sync Errors Theoretically Possible
But Not Advised
Real World Examples
Audio/Video Fingerprint for Subtle Error Detection
Playout Center
Cable/Satellite/IPTV Headend
Kaleido-X Kaleido-IP
iControl PM
Off-Air Return Monitoring
Main Playout
Backup Playout
MPEG Encoder
Mod RX Encode / Transcode
Ad Insert
Remote cable / IPTV monitoring
Return Monitoring WAN
HCO XVP
HLP
HLP
LIP SYNC Lip Sync Error Detection
Multi-Point Signal Continuity
Spanish English Spanis
h
English Language Swap
1 2 3
Playout Monitoring at Astro, Kuala Lumpur
Playout Monitoring at Astro, Kuala Lumpur Astro Channel Playout
Kaleido-X
iControl PM
MPEG Encoder
Main Playout
Backup Playout
iControl PM
HCO iTX System
HCP Probe
NV Tx
Router
NV Playout Router
XVP LGK
IRD HLP Probe HMP
Kaleido-X LIP SYNC Lip Sync Error
Detection
Playout Monitoring at Astro, Kuala Lumpur Astro Channel Playout
Kaleido-X
iControl PM
MPEG Encoder
Main Playout
Backup Playout
iControl PM
HCO iTX System
HCP Probe
NV Tx
Router
NV Playout Router
XVP LGK
IRD HLP Probe HMP
Kaleido-X Miranda Audio/Video
Fingerprints for Subtle Signal Error Detection
Playout Monitoring at Cognacq-Jay Images, Paris
Use of fingerprints of Cognacq-Jay Image
iControl Fingerprint Correlation
Video Matching Factor (%)
Audio ch1 Matching Factor (%)
Audio ch2 Matching Factor (%)
Main
TX Main FOB
Omneon Server
Anyware Branding/ Subtitling
Médiamétrie Watermarking
XVP-3901 HCO-1822 (2x1)
TX Main FOB
Omneon Server
Anyware Branding/ Subtitling
HCO-1822 (2x1)
Médiamétrie Watermarking
XVP-3901
• Fingerprints must be robust to survive multiple conversion, encoding, decoding
• Fingerprint must be compact to be streamed over network
• Fingerprint must include timestamp to allow transport over non deterministic networks
• Fingerprint generation must be simple so that it can be incorporated on simple devices (e.g. routers, DA’s, IRDs)
• Multi-Vendor Support
Key Requirements for Lip Sync Application
Points of Discussion What is a media fingerprint?
Fingerprint generation schemes
Lip Sync Detection
Compare the Content
Fingerprint Standardization Next
The Need for an Interoperable Fingerprint Standard
is similar to the need for an interoperable compression standard…
Examples In Use Already Rights Management
Lip Sync, Automated Playout Monitoring
Lip Sync
Content Verification, Ad Insertion Verification
Enable Content Verification, Rights Management
Content Verification, Rights Management
Content Verification, Rights Management
What would need to be standardized? • Audio Fingerprint Algorithm • Video Fingerprint Algorithm • Audio/Video Combination
Files and Streams
Standardize What?
Fingerprint Database
Content In Server
Audio+Video Off –Line
Fingerprint Algorithm
Audio and Video Programs
Audio+Video Fingerprint Algorithm
Audio and Video Programs
Vendor A Lip Sync Application
Fingerprinting Nirvana
Mobile studio
Encode + Tx
Special event venue IRD Encode
+ Tx
Network Facility Branding
+ Proc
Server
Local Station
Return IRD
Local Process
Encode + Tx
Rx + Decode
Off Air IRD
Facility Infra-
structure
Service Provider (Cable, DTH, IPTV)
Encode + Tx
Rx+ Decode
Vendor A Equipment
Vendor B Equipment
Vendor C Equipment
Vendor D Equipment
Vendor A Equipment
Fingerprinting Nirvana
Mobile studio
Encode + Tx
Special event venue IRD Encode
+ Tx
Network Facility Branding
+ Proc
Server
Local Station
Return IRD
Local Process
Encode + Tx
Rx + Decode
Off Air IRD
Facility Infra-
structure
Service Provider (Cable, DTH, IPTV)
Encode + Tx
Rx+ Decode
Vendor A Equipment
Vendor B Equipment
Vendor C Equipment
Vendor D Equipment
Vendor A Equipment
Vendor A Lip Sync Application
Vendor B Content Compare Application
• ATSC Specialist Group on Video and Audio Coding (TSG/S6)
What is being done to promote a standard?
• SMPTE TV and Broadband Technical Committee 24TB – ad-hoc group for Lip Sync
Update on standard progress
• Fingerprinting differs from watermarking.
• Fingerprinting has many applications.
• Standardization is essential for fingerprinting to reach its full potential
Conclusion
Questions? Comments?
Observations?
Measuring and Correcting Lip Sync
Progress on a Standard At Last!
Sara Kudrle Miranda Technologies [email protected]