45
Parallel Video Processing Neal Wadhwa December 10, 2012

Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Parallel Video Processing

Neal Wadhwa

December 10, 2012

Page 2: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Page 3: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Page 4: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Page 5: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Parallel Video Processing

I Processing is on uncompressed video

I Uncompressed videos uses huge amounts of space.

I 1080p at 30 FPS is one gigabyte per second

I Lots of algorithms are easy to parallelize due to independenceof processing in space or time.

Page 6: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Page 7: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Page 8: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Page 9: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Page 10: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Example: Motion Magnification

I DSP based method to magnify subtle motions

I Here are some cool examples of motion magnification.

I Switch to video.

I FFT-based algorithm lends itself to being parallelized.

I Try to parallelize and see how far we can get

Page 11: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Page 12: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Page 13: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Page 14: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Page 15: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - three stages

I 1. Spatially decompose each frame.

I 2. Temporally process each pixel in each decomposition level

I 3. Reconstruct each frame

I Every stage is easy to parallelize individually

I Serial algorithm takes several hours on high resolution videos.

Page 16: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,Fn

I Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Page 17: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Page 18: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Page 19: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Say you have frames F1, . . . ,FnI Decompose each frame into different spatial bands

Fi → (Di ,1, . . . ,Di ,k)

uses k times as much space as the original frame.

I Decomposition is performing by FFT, multiplying by filtersand applying IFFT

Di ,j = F−1{Tj ×F{Fi}}

I Transform is invertible.

Page 20: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Page 21: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Page 22: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of algorithm - Spatial Decomposition

I Create decompositionfor every frame

x

y

Space (x)

Spa

ce (y)

Time (t)

Levels

Orientations

Space (x)

Spa

ce(y

)

Time (t)

Page 23: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of Algorithm

I For every pixel in every level, values contain motion signal

Page 24: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of Algorithm

I Bandpass from 100 Hz to 120Hz

I Add bandpassed signal to original signal

0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)

Bandpass 100-120 Hz Magnified)

I Amplifies only selected frequency

Page 25: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Outline of Algorithm

I Bandpass from 100 Hz to 120Hz

I Add bandpassed signal to original signal

0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)0 50 100 150 200 250 300

−3

−2

−1

0

1

2

3

Time (t)

Bandpass 100-120 Hz Magnified)

I Amplifies only selected frequency

Page 26: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Page 27: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Page 28: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Easy to Parallelize

I Parallelize spatial decomposition over frames

I Parallelize temporal filtering over pixels.

I Difficulty lies in how to store data over cores.

Page 29: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab vs. JuliaI Relatively easy to port code to Julia

I Compare serial performance of matlab vs. julia at differentimage sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Page 30: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Page 31: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.

I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Page 32: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).

I Uses 400 GB at largest problem size, 1600x1600x300.

Page 33: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab vs. JuliaI Relatively easy to port code to JuliaI Compare serial performance of matlab vs. julia at different

image sizes.

103

104

105

106

107

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Julia Code

MatlabJulia

I Julia is slightly slower, but comparable.I Not surprising since main processing occurs in ffts (in libfftw).I Uses 400 GB at largest problem size, 1600x1600x300.

Page 34: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Page 35: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Page 36: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Matlab Parfor

I Parfor gives factor of two improvement when used with 12cores.

I Parfor processing on frames and on temporal processing

103

104

105

106

107

100

101

102

103

104

105

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

Serial Matlabparfor Matlab

I Only 2x improvement

Page 37: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in Julia

I Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Page 38: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Page 39: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Page 40: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.

I In parallel code, temporal processing uses 50% of time.

Page 41: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Julia spawnat vs. Matlab parforI Parllelize the spatial decomposition and reconstruction in JuliaI Faster than serial Julia for large problem size.

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial JuliaParallelize Spatial Decomposition in Julia

I The spatial decomposition is extremely fast, but reordereddata for temporal filtering is very, very slow.

I In serial code, temporal processing uses 14% of time.I In parallel code, temporal processing uses 50% of time.

Page 42: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this

Page 43: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this

Page 44: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial Julia

I Matlab parfor fails to capitalize on this

Page 45: Parallel Video Processingcourses.csail.mit.edu/18.337/2012/projects/nealwadhwa_18... · 2015-08-21 · Parallel Video Processing I Processing is on uncompressed video I Uncompressed

Change processing to use 3-tap primal domain temporalfilter

I Makes temporal processing more local to avoidcommunication overhead.

I Store temporally close pixels on same processors

103

104

105

106

107

100

101

102

103

104

Frame Size in Pixels

RunningTim

e

Comparison of Serial Matlab and Matlab with parfor

parfor MatlabSerial Julia3Tap Filter instead of FFT

I 2.5x faster than Matlab, 5x faster than serial JuliaI Matlab parfor fails to capitalize on this