Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009

Computational Theories & Low-level

Pixels To PerceptsA. Efros, CMU, Spring 2009

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Light

Vision

Audition

STM

LTM

Motor

Sound

LightMove-ment

Odor (etc.)

Ceramiccup on a table

David Marr, 1982



The Retinal Image

An Image (blowup) Receptor Output



Image-basedRepresentation

Primal Sketch(Marr)

An Image

(Line Drawing)

RetinalImage

Image-based

processes

EdgesLinesBlobsetc.

We likely throw away a lot

line drawings are universal



Surface-basedRepresentation

Primal Sketch 2.5-D Sketch


Surface-based

processes

StereoShadingMotion

etc.

Single Surface(Koenderink’s trick)




Primal Sketch 2.5-D Sketch


Surface-based

processes

StereoShadingMotion

etc.

Figure/Ground Organization

A contour belongs to one of the two (but not both) abutting regions.

Figure(face)

Ground(shapeless)

Figure(Goblet)Ground

(Shapeless)

Important for the perception of shape


Properties of figures vs. grounds

15.18

Figure GroundThing-like Not thing-likeCloser FartherShaped Extends behind

Figure-Ground OrganizationFigure-Ground Organization


Principles of figure-ground organization:

Surroundedness

15.19Figure-Ground OrganizationFigure-Ground Organization

Surrounded region --> FigureSurrounding region --> Ground



Size


Smaller region --> FigureLarger region --> Ground



Orientation


Horizontal/vertical region --> FigureOblique region --> Ground



Contrast


Higher contrast region --> FigureLower contrast region --> Ground



Symmetry


Symmetrical region --> FigureAsymmetrical region --> Ground



Convexity


More convex region --> FigureLess convex region --> Ground



Parallelism


More parallel region --> FigureLess parallel region --> Ground



Lower region


Lower region --> FigureUpper region --> Ground



Meaningfulness


More meaningful region --> FigureLess meaningful region --> Ground


Relation to Depth Factors


Figure-ground organization as edge assignment:To which side does the edge belong?

Depth cues can also be figure-ground factorsand

Figure-ground factors can be depth cues.

To the closer side. This fact connects figure-groundorganization with depth perception.



Occlusion


Occluding region --> FigureOccluded region --> Ground



Cast Shadows


Shadowing region --> FigureShadowed region --> Ground



Shading


Shaded region --> FigureNonshaded region --> Ground

Line Labeling

> : contour direction+ : convex edge - : concave edge

possible junctions(constraints)

ConstraintPropagation

[Clowes 1971, Huffman 1971; Waltz 1972; Malik 1986]

26

Line Labeling



Object-basedRepresentation

Object-based

processes

GroupingParsing

Completionetc.


2.5-D Sketch Volumetric Sketch

Geons(Biederman '87)



Category-basedRepresentation

Category-based

processes

Pattern-Recognition

Spatial-description

Object-basedRepresentation

Volumetric Sketch Basic-level Category

Category: cup

Color: light-gray

Size: 6”

Location: table

We likely throw away a lot

line drawings are universal

However, things are not so simple…

● Problems with feed-forward model of processing…

Junctions in Real Images

Are Junctions local evidence?

J McDermott, 2004


14.38

Is grouping an early or late process?

Early vs. Late GroupingEarly vs. Late Grouping



Object-Based

Processing


Light ? ? ? ?


14.39

Before or after stereoscopic depth?

(Rock & Brosgole, 1964)



14.40

Before or after lightness constancy?

(Rock, Nijhawan, Palmer & Tudor, 1992)

ReflectanceMatched

LuminanceMatched

TranslucentPlastic Strip


ReflectanceMatched

Luminance-Ratio Matched

OpaquePaper Strip

Opaquepaper strip


14.41

Before or after visual completion?

(Palmer, Neff & Beck, 1996)



14.42

Before or after illusory contours?

(Palmer & Nelson, 2000)

?



14.43

Conclusion: Grouping can occur “late”

Question: Can grouping also occur “early”

(Palmer & Brooks, in preparation)



14.44

Grouping affects shape constancy

(Palmer & Brooks, in preparation)

Ambiguous

Flat oval

Circle in depth



14.45

Proximity effects

Biased toward oval

Biased toward circle



14.46

Color similarity effects

Biased toward oval Biased toward circle



14.47

Common fate effects

Biased toward oval Biased toward circle



14.48

Conclusion: Grouping occurs both “early”

and “late” -- possibly everywhere!



Object-Based

Processing


Light

Grouping Grouping Grouping Grouping


two-tone images

hair (not shadow!)

inferred external contours

“attached shadow” contour

“cast shadow” contour

Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity

The images do not contain this information a priori (at low level)

Cavanagh's argument

A Classical View of Vision

Grouping /Segmentation

Figure/GroundOrganization

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

A Contemporary View of Vision

Figure/GroundOrganization

Grouping /Segmentation

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

But where we draw this line?

Question #1:What (if anything) should be done at the “Low-Level”?

N.B. I have already told you everything that is known. From now on, there

aren’t any answers.. Only questions…

Who cares? Why not just use pixels?

Pixel differences vs. Perceptual differences

Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Cornsweet Illusion

Campbell-Robson contrast sensitivity curveCampbell-Robson contrast sensitivity curve

Sine wave

Metamers

Question #1:What (if anything) should be done at the “Low-Level”?

i.e. What input stimulus should we be invariant to?

Invariant to:

• Brightness / Color changes?

small brightness / color changeslow-frequency changes

But one can be too invariant

Invariant to:

• Edge contrast / reversal?

I shouldn’t care what background I am on!

but be careful of exaggerating noise

Representation choices

Raw Pixels

Gradients:

Gradient Magnitude:

Thresholded gradients (edge + sign):

Thresholded gradient mag. (edges):

Spatial invariance

• Rotation, Translation, Scale• Yes, but not too much…

• In brain: complex cells – partial invariance

• In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -- will discuss later

Many lives of a boundary

Often, context-dependent…

input canny human

Maybe low-level is never enough?

1/f amplitude spectra for natural images

(Field 1987)

There are statistical regularities in the natural world, and image statistics reflect that. (Burton & Moorehead 1987; Field 1987; Tolhurst et al. 1992)

Why 1/f?

Scale invariance

Edges have 1/f structure

Object distribution in real world (Ruderman 1997; Lee & Mumford 1999)

(Image source: smokiesguidebook.comSlide content: Simoncelli & Olshausen 2001)

A closer look at amplitude spectra

(Torralba & Oliva 2003)

Do natural image statistics matter?Sensory coding might exploit statistical regularities of our world according to various criteria:

Representational efficiency Decorrelate input responses, make them independent, sparse,

information theoretic metrics etc.

Metabolic efficiencySpike efficiency, minimal wiring.

Learning efficiencySparseness, invariance, over completeness etc.

Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)

Documents

Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009