Introduction to Computer Vision Sebastian van Delden USC Upstate [email protected]

Introduction to Computer Vision

Sebastian van Delden

USC Upstate

[email protected]

What is Computer Vision?

The goal of Computer Vision (or Machine Vision) is:

to make useful decisions about real physical objects and scenes based on sensed images.

to construct scene descriptions from images.

Issues

Sensing: How do sensors obtain images from the world?

Encoded Information: How do images yield information for understanding the 3D world, including the geometry, texture, motion, and identity of objects in it?

Representations: How to represent objects, their parts, properties, and relationships, in a computer?

Algorithms: How to process image information and construct descriptions of the objects and the world?

Digital Images

A 2D image is a projection of a scene from a specific viewpoint; many 3D features are captured, some not.

A 2D arrangement of pixels (picture elements) with a fixed number rows and columns.

In grey scale images, each pixel is a single value of grey usually in the range [0…255] where 0 is black and 255 is white.

In color images, each pixel has color information associated with it RGB Color scheme – quantities of Red, Green and Blue True color uses 1 byte for Red, 1 for Green, and 1 for blue For many computer vision applications color is not needed.

Digital Images cont…

Example digital image

with 8 x 8 block of pixels

from left eye.

Numerous Applications

Image Database Querying Aerial Images and GIS Medical Imaging Processing Scanned Text Understanding a Scene of Parts Inspection applications Automated navigation Etc, etc, etc…

Understanding a Scene of Parts A robot needs to classify (or inspect) parts

and act accordingly.

Combining Multiple Images

Images with a constant background are subtracted to detect change over time.

Pixel differences at boundary reveals moving object and its shape

Combining Multiple Images

Images can also be added to blend them together.

Reality Check

Success is hard won Some potential issues:

Lighting Fluctuations or inadequacies Object positioning and occlusion Background noise or other un-important image

features. Good news

Industrial robotics usually provides a very controlled environment.

Imaging and Image Representation


USC Upstate

[email protected]

Imaging Devices CCD Camera

Charge-coupled Device (CCD) Instead of chemical film that reacts to light (like 35mm

film), tiny solid cells convert light energy into electrical charge.

Problem with Digital Images

Blooming

Difficult to insulate adjacent sensing elements.

Charge often leaks from hot cells to neighbors, making bright regions larger.


Clipping or Wraparound Dark grid

intersections at left were actually brightest of scene.

In A/D conversion the bright values were clipped to lower values.


Lens distortion distorts image

“Barrel distortion” of rectangular grid is common for cheap lenses ($50)

Precision lenses can cost $1000 or more.

Zoom lenses often show severe distortion.


CCD Variations CCD sensors imperfections can cause different

reading for the same light intensity. Chromatic Distortion

Different light wavelength bent differently. Quantization Effects

Mixing and Rounding problems.

Spatial Quantization Problems Both pixel size relative position make a difference.

Mixed pixel represents a mixture of intensity values in a real scene.

Small features can be lost or blended.

Portable Bit Map Image (PGM) P2 means

ASCII gray Comments W=16; H=8 192 is max

intensity Can be made

with editor

Compression

Lossless – if a decompression method exists to precisely recover the original image.

Lossy – information is lost during compression and cannot to recovered during decompression (JPG, GIF) GIF – (Graphics Interchange Format) only 8 bits used for

color; can contain transparency and animation. JPG – (Joint Photographic Experts Group) for high quality

images; considers human vision systems and uses discrete cosine transform and Huffman coding.

Binary Image Analysis


USC Upstate

[email protected]

Pixels and Neighborhoods

A binary image consists of only two intensities – 0 and 1 (or 0 and 255).

A binary image B can be obtained from a grayscale image I through an operation that selects a subset of image pixels as foreground pixels, the pixels of interest in an image. Everything else would be considered as background pixels.

000100100010000001111000100000010010001000

Thresholding and Segmentation

Gray level thresholding is the simplest segmentation process.

Many objects or image regions are characterized by constant reflectivity or light absorption of their surface.

Thresholding is computationally inexpensive and fast.

Thresholding can easily be done in real time using specialized hardware

Thresholding Background is black Healthy cherry is bright Bruise is medium dark

This Histogram shows two cherry regions (black background has been removed)

gray-tone values

pixelcounts

0 256

Thresholding - Example

Original Image Thresholded Image (95)

Thresholding Example

Over-segmentation (225) Under-segmentation (25)

Algorithm

Thresholding algorithm Search all the pixels f(i,j) of the image f. An image element g(i,j) of the segmented image is an

object pixel if f(i,j) >= T, and is a background pixel otherwise

Correct threshold selection is crucial for successful threshold segmentation

Threshold selection can be interactive or can be the result of some threshold detection method

Region Properties

Once a binary image has been processed we could obtain properties about the regions in the processed image.

Some of those properties are Area, centroid Measure of circularity and elongation

Area and Centroid

Connected Components

Components are objects that share at least one common neighbor (in 4- or 8- neighborhood).

Definition: A connected component labeling of binary image B is a labeled image LB in which the value of each pixel is the label of its connected component.

Recursive Labeling Algorithm Given a binary image B

Negate the image (make all 1-pixels to –1) Search and find a –1 pixel, label it and find its (4-

or 8-) neighboring pixels with –1 and assign the same label.

Recursively apply to resolve (merge or split) components. Increment label each time…

Robot/Camera Calibration


USC Upstate

[email protected]

Vision Input Sensor

A camera can be used to provide visual input data to the robot Objects in the input images must be represented

in the robot world coordinate system so that the robot can manipulate them

Two Approaches: Visual Servoing Calibration

Visual Servoing

A target image is compared against the current image

An error vector is generated which indicates how the robot should be moved in order to minimized the error between the current and target images.

The machine is incrementally moved. PRO: No need for a mm/pixel ratio to be calculated.

Very nice!

Visual Servoing Example: Camera

mounted to end-effector and end-effector must move so that circular piece in the center of the image.

Camera Calibration

Common in Industry CON: Usually must be manually done and

can become un-calibrated over time. Steps:

Must calculate mm/pixel ratio Must train a reference frame that can be seen in

the input image

Example Setup: Frame {f} has been training in the robot work area. Parts can coordinates in this area. Consider Z to be fixed. Note: cannot recover depth this way…

Same example as before, but this is what the camera is seeing:

Documents

Introduction to Computer Vision Sebastian van Delden USC Upstate [email protected]