Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... ·...

Preview:

Citation preview

Parallel Video Processing

Neal Wadhwa

December 10, 2012

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,Fn

I Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Outline of Algorithm

I For every pixel in every level, values contain motion signal

Outline of Algorithm

I Bandpass from 100 Hz to 120Hz

I Add bandpassed signal to original signal

0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)

Bandpass 100-120 Hz Magnified)

I Amplifies only selected frequency

Outline of Algorithm

I Bandpass from 100 Hz to 120Hz

I Add bandpassed signal to original signal

0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)

Bandpass 100-120 Hz Magnified)

I Amplifies only selected frequency

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Matlab vs. JuliaI Relatively easy to port code to Julia

I Compare serial performance of matlab vs. julia at differentimage sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.

I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).

I Uses 400 GB at largest problem size, 1600x1600x300.

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in Julia

I Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.

I In parallel code, temporal processing uses 50% of time.

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial Julia

I Matlab parfor fails to capitalize on this

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this

Recommended