CS 376bIntroduction to Computer Vision
03 / 31 / 2008
Instructor: Michael Eckmann
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Today’s Topics• Comments/Questions• Motion
– background subtraction scheme– definitions
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• Detect/describe motion (of an object or objects or of the scene
as a whole ...) from an image sequence, where each frame is separated by some time t (e.g. a video at 30 fps has t=1/30 sec.).
• There are several cases that lend themselves to different approaches
– a non-moving camera imaging a static scene and we are to detect one moving object
– a non-moving camera imaging a static scene and we are to detect multiple moving objects
– a moving camera imaging a static scene– a moving camera and moving objects
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• a non-moving camera imaging a static scene and we are to
detect one moving object --- can use background subtraction.
• This is from figure 9.1 in Shapiro and Stockman (credit due to S.-W. Chen)
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• let's consider an overall algorithm for a real
scene just shown– when a frame is subtracted from another frame will
the result be black (0)?• most likely not, due to various noise, flickering of
fluorescent lights, etc.
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• let's consider an overall algorithm for a real
scene just shown– when a frame is subtracted from another frame will
the result be black (0)?• most likely not, due to various noise, flickering of
fluorescent lights, etc.• so we'd probably need to determine a threshold over
which, the subtraction is deemed significant– is it possible that this threshold ends up causing
pixels on the moving object to be not detected as significant? also, can it still detect possibly insignificant pixels too?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• let's consider an overall algorithm for a real
scene just shown– is it possible that this threshold ends up causing
pixels on the moving object to be not detected as significant? also, can it still detect possibly insignificant pixels too?
• yes, so we can perform connected components and remove all the small regions afterwards which are considered due to noise
– After connected components, we might still have holes to fill --- so what could we do?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• let's consider an overall algorithm for a real
scene just shown– After connected components, we might still have
holes to fill --- so what could we do?• close
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• algorithm 9.1 from Shapiro and Stockman
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• What if you were concerned with just
determining if the scene changed, not where?– any ideas?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• The motion field is defined as: a 2d array of 2d
vectors representing the motion of 3d scene points. Vectors can be points at time t to t+dt.
• The focus of expansion (FOE) is defined as: the point in the image plane from which motion field vectors diverge. (This is usually the point toward which the camera is moving.)
• The focus of contraction (FOC) is defined as: the point in the image plane toward which motion field vectors converge. (This is usually the point from which the camera is moving away.)
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion
figure 9.2 from Shapiro and Stockman
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• The text describes a video game that detects
the motion field in a video of someone using their hands/arms to either run or jump.
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• To compute motion field vectors we'll need to detect and locate interest points with high accuracy.
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• To correspond points from one image to the
next ---– take the neighborhood of an interesting point and
look in some small window of the next frame (assuming motion is limited to some speed/distance) and find the best matching neighborhood.
• How might you compare neighborhoods?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• To correspond points from one image to the
next ---– take the neighborhood of an interesting point and
look in some small window of the next frame (assuming motion is limited to some speed/distance) and find the best matching neighborhood.
• How might you compare neighborhoods?• Can use cross-correlation of the neighborhoods and the
largest value is best match.• Also smallest SSD can be used as a match value (or L1
or L2 distance too.)
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• To compute motion field vectors we can detect
– high interest points (have high energy in many directions)
– edges are problematic here (only one direction)– corners (two directions)– anything that can be located accurately in a later
image– centroids of moving regions after segmentation
could be tracked as well.• discussion on the board why corners are more
accurately located than edges.
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• The text describes an interest operator to detect
high interest points – the smallest variance in the vertical, horizontal, and
2 diagonal directions in a neighborhood must be above some threshold
– how well will this be able to be located?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion
• from algorithm 9.2 in Shapiro and Stockman
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• Let's consider some places where this kind of
detection and matching scheme might break down.
– What assumptions need to be true to make the scheme work?
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion• MPEG
– to compress video (an image sequence)– replaces a 16x16 image block with a motion vector
describing the motion of that block• so in a later frame, a 16x16 block is represented as a
vector– only the vector is stored if the blocks are identical
(or very close)• if they differ by too much, encode the difference
• These 16x16 blocks and motion vectors are computed between say a frame fi and fi+3
Michael Eckmann - Skidmore College - CS 376b - Spring 2008
Motion
• from figure 9.7 Shapiro and Stockman
Recommended