Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Parallel Video Processing
Neal Wadhwa
December 10, 2012
Parallel Video Processing
I Processing is on uncompressed video
I Uncompressed videos uses huge amounts of space.
I 1080p at 30 FPS is one gigabyte per second
I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.
Parallel Video Processing
I Processing is on uncompressed video
I Uncompressed videos uses huge amounts of space.
I 1080p at 30 FPS is one gigabyte per second
I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.
Parallel Video Processing
I Processing is on uncompressed video
I Uncompressed videos uses huge amounts of space.
I 1080p at 30 FPS is one gigabyte per second
I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.
Parallel Video Processing
I Processing is on uncompressed video
I Uncompressed videos uses huge amounts of space.
I 1080p at 30 FPS is one gigabyte per second
I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.
Example: Motion Magnification
I DSP based method to magnify subtle motions
I Here are some cool examples of motion magnification.
I Switch to video.
I FFT-based algorithm lends itself to being parallelized.
I Try to parallelize and see how far we can get
Example: Motion Magnification
I DSP based method to magnify subtle motions
I Here are some cool examples of motion magnification.
I Switch to video.
I FFT-based algorithm lends itself to being parallelized.
I Try to parallelize and see how far we can get
Example: Motion Magnification
I DSP based method to magnify subtle motions
I Here are some cool examples of motion magnification.
I Switch to video.
I FFT-based algorithm lends itself to being parallelized.
I Try to parallelize and see how far we can get
Example: Motion Magnification
I DSP based method to magnify subtle motions
I Here are some cool examples of motion magnification.
I Switch to video.
I FFT-based algorithm lends itself to being parallelized.
I Try to parallelize and see how far we can get
Example: Motion Magnification
I DSP based method to magnify subtle motions
I Here are some cool examples of motion magnification.
I Switch to video.
I FFT-based algorithm lends itself to being parallelized.
I Try to parallelize and see how far we can get
Outline of algorithm - three stages
I 1. Spatially decompose each frame.
I 2. Temporally process each pixel in each decomposition level
I 3. Reconstruct each frame
I Every stage is easy to parallelize individually
I Serial algorithm takes several hours on high resolution videos.
Outline of algorithm - three stages
I 1. Spatially decompose each frame.
I 2. Temporally process each pixel in each decomposition level
I 3. Reconstruct each frame
I Every stage is easy to parallelize individually
I Serial algorithm takes several hours on high resolution videos.
Outline of algorithm - three stages
I 1. Spatially decompose each frame.
I 2. Temporally process each pixel in each decomposition level
I 3. Reconstruct each frame
I Every stage is easy to parallelize individually
I Serial algorithm takes several hours on high resolution videos.
Outline of algorithm - three stages
I 1. Spatially decompose each frame.
I 2. Temporally process each pixel in each decomposition level
I 3. Reconstruct each frame
I Every stage is easy to parallelize individually
I Serial algorithm takes several hours on high resolution videos.
Outline of algorithm - three stages
I 1. Spatially decompose each frame.
I 2. Temporally process each pixel in each decomposition level
I 3. Reconstruct each frame
I Every stage is easy to parallelize individually
I Serial algorithm takes several hours on high resolution videos.
Outline of algorithm - Spatial Decomposition
I Say you have frames F1, . . . ,Fn
I Decompose each frame into different spatial bands
Fi → (Di ,1, . . . ,Di ,k)
uses k times as much space as the original frame.
I Decomposition is performing by FFT, multiplying by filtersand applying IFFT
Di ,j = F−1{Tj ×F{Fi}}
I Transform is invertible.
Outline of algorithm - Spatial Decomposition
I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands
Fi → (Di ,1, . . . ,Di ,k)
uses k times as much space as the original frame.
I Decomposition is performing by FFT, multiplying by filtersand applying IFFT
Di ,j = F−1{Tj ×F{Fi}}
I Transform is invertible.
Outline of algorithm - Spatial Decomposition
I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands
Fi → (Di ,1, . . . ,Di ,k)
uses k times as much space as the original frame.
I Decomposition is performing by FFT, multiplying by filtersand applying IFFT
Di ,j = F−1{Tj ×F{Fi}}
I Transform is invertible.
Outline of algorithm - Spatial Decomposition
I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands
Fi → (Di ,1, . . . ,Di ,k)
uses k times as much space as the original frame.
I Decomposition is performing by FFT, multiplying by filtersand applying IFFT
Di ,j = F−1{Tj ×F{Fi}}
I Transform is invertible.
Outline of algorithm - Spatial Decomposition
I Create decompositionfor every frame
x
y
Space (x)
Spa
ce (y)
Time (t)
Levels
Orientations
Space (x)
Spa
ce(y
)
Time (t)
Outline of algorithm - Spatial Decomposition
I Create decompositionfor every frame
x
y
Space (x)
Spa
ce (y)
Time (t)
Levels
Orientations
Space (x)
Spa
ce(y
)
Time (t)
Outline of algorithm - Spatial Decomposition
I Create decompositionfor every frame
x
y
Space (x)
Spa
ce (y)
Time (t)
Levels
Orientations
Space (x)
Spa
ce(y
)
Time (t)
Outline of Algorithm
I For every pixel in every level, values contain motion signal
Outline of Algorithm
I Bandpass from 100 Hz to 120Hz
I Add bandpassed signal to original signal
0 50 100 150 200 250 300
−3
−2
−1
0
1
2
3
Time (t)0 50 100 150 200 250 300
−3
−2
−1
0
1
2
3
Time (t)
Bandpass 100-120 Hz Magnified)
I Amplifies only selected frequency
Outline of Algorithm
I Bandpass from 100 Hz to 120Hz
I Add bandpassed signal to original signal
0 50 100 150 200 250 300
−3
−2
−1
0
1
2
3
Time (t)0 50 100 150 200 250 300
−3
−2
−1
0
1
2
3
Time (t)
Bandpass 100-120 Hz Magnified)
I Amplifies only selected frequency
Easy to Parallelize
I Parallelize spatial decomposition over frames
I Parallelize temporal filtering over pixels.
I Difficulty lies in how to store data over cores.
Easy to Parallelize
I Parallelize spatial decomposition over frames
I Parallelize temporal filtering over pixels.
I Difficulty lies in how to store data over cores.
Easy to Parallelize
I Parallelize spatial decomposition over frames
I Parallelize temporal filtering over pixels.
I Difficulty lies in how to store data over cores.
Matlab vs. JuliaI Relatively easy to port code to Julia
I Compare serial performance of matlab vs. julia at differentimage sizes.
103
104
105
106
107
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Julia Code
MatlabJulia
I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.
Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different
image sizes.
103
104
105
106
107
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Julia Code
MatlabJulia
I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.
Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different
image sizes.
103
104
105
106
107
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Julia Code
MatlabJulia
I Julia is slightly slower, but comparable.
I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.
Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different
image sizes.
103
104
105
106
107
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Julia Code
MatlabJulia
I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).
I Uses 400 GB at largest problem size, 1600x1600x300.
Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different
image sizes.
103
104
105
106
107
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Julia Code
MatlabJulia
I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.
Matlab Parfor
I Parfor gives factor of two improvement when used with 12cores.
I Parfor processing on frames and on temporal processing
103
104
105
106
107
100
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
Serial Matlabparfor Matlab
I Only 2x improvement
Matlab Parfor
I Parfor gives factor of two improvement when used with 12cores.
I Parfor processing on frames and on temporal processing
103
104
105
106
107
100
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
Serial Matlabparfor Matlab
I Only 2x improvement
Matlab Parfor
I Parfor gives factor of two improvement when used with 12cores.
I Parfor processing on frames and on temporal processing
103
104
105
106
107
100
101
102
103
104
105
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
Serial Matlabparfor Matlab
I Only 2x improvement
Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in Julia
I Faster than serial Julia for large problem size.
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia
I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.
I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.
Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia
I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.
I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.
Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia
I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.
I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.
Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia
I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.
I In serial code, temporal processing uses 14% of time.
I In parallel code, temporal processing uses 50% of time.
Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia
I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.
I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.
Change processing to use 3-tap primal domain temporalfilter
I Makes temporal processing more local to avoidcommunication overhead.
I Store temporally close pixels on same processors
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial Julia3Tap Filter instead of FFT
I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this
Change processing to use 3-tap primal domain temporalfilter
I Makes temporal processing more local to avoidcommunication overhead.
I Store temporally close pixels on same processors
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial Julia3Tap Filter instead of FFT
I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this
Change processing to use 3-tap primal domain temporalfilter
I Makes temporal processing more local to avoidcommunication overhead.
I Store temporally close pixels on same processors
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial Julia3Tap Filter instead of FFT
I 2.5x faster than Matlab, 5x faster than serial Julia
I Matlab parfor fails to capitalize on this
Change processing to use 3-tap primal domain temporalfilter
I Makes temporal processing more local to avoidcommunication overhead.
I Store temporally close pixels on same processors
103
104
105
106
107
100
101
102
103
104
Frame Size in Pixels
RunningTim
e
Comparison of Serial Matlab and Matlab with parfor
parfor MatlabSerial Julia3Tap Filter instead of FFT
I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this