91
Dagstuhl Oct09 1 Object Recognition Through Reasoning About Functionality: A Survey of Related Work and Open Problems Louise Stark University of the Pacific Stockton, California Melanie Sutton University of the West Florida Pensacola, Florida

Dagstuhl Oct091 Object Recognition Through Reasoning About Functionality: A Survey of Related Work and Open Problems Louise Stark University of the Pacific

Embed Size (px)

Citation preview

Dagstuhl Oct09 1

Object Recognition Through Reasoning About Functionality: A Survey of Related Work and Open Problems

Louise StarkUniversity of the Pacific

Stockton, California

Melanie SuttonUniversity of the West Florida

Pensacola, Florida

Dagstuhl Oct09 2

Function-Based ResearchFunction-Based Research

Dr. Louise StarkUniversity of the PacificStockton, CA

Dagstuhl Oct09 3

University of the PacificUniversity of the Pacific

Dagstuhl Oct09 4

University of the PacificUniversity of the Pacific

University of West Florida

Pre/post-hurricane season…

Dagstuhl Oct09 6

Seminar GoalsSeminar Goals

• This seminar brings together scientists from disciplines such as computer science, neuroscience, robotics, developmental psychology, and cognitive science

Dagstuhl Oct09 7

Seminar GoalsSeminar Goals

• Hope to further the knowledge• how the perception of form relates to

object function • how intention and task knowledge (and

hence function) aids in the recognition of relevant objects

Dagstuhl Oct09 8

OverviewOverview

• Recognition based on functionality• Overview of GRUFF approach• Functionality in Related Disciplines• Open Problem Areas

Dagstuhl Oct09 9

Function-based ApproachesFunction-based Approaches

Cognitive Psychology/Human Perception

Artificial Intelligence Computer Vision

Robotics

Representations of object categoriesHuman-robot interaction strategiesWayfinding

Formal representations of knowledgeMachine learning techniques to automate reasoning

Document/aerial image analysisInterpreting human motionObject recognition/categorization

Mapping of indoor environmentsObject detectionNavigation/interaction plansFormalisms for autonomous robot control

Dagstuhl Oct09 10

Computer Vision?Computer Vision?

• Deriving meaningful descriptions of the environment from images•Descriptions needed for

•Recognition•Manipulation•Reasoning about objects

Dagstuhl Oct09 11

Generic Object RecognitionGeneric Object Recognition

•Minsky (1991)•Argued for the necessity of representing knowledge about functionality

•“… rarely use a representation in an intentional vacuum, but we always have goals…”

•“… we must classify things… according to what they can be used for.”

Dagstuhl Oct09 12

MotivationMotivation

Parameterized Model Structural Model

Could these be

recognized?

Dagstuhl Oct09 13

GGRUFFRUFF

chair (cher) n. - a piece of furniture for one person to sit on

GGenericRRecognitionUUsingFForm and FFunction

Dagstuhl Oct09 14

What is the goal?What is the goal?

Develop alternative approaches to genericobject recognition & manipulation

- concentrate on man made objects (artifacts)

Human artifacts – existence or non/existence of properties can be deduced by analyzing the shape of an object

For any particular object category – there is some set of functional properties shared by ALL objects in that category.

Dagstuhl Oct09 15

Approach to the ProblemApproach to the Problem

•Derive the format of my function-based representation• Confirm feasibility of appoach test domain-

perfect input - planar face models• Expand the domains• Test real data• Interact to confirm functionality• Exploit contextual information

Dagstuhl Oct09 16

Knowledge in GKnowledge in GRUFFRUFF is of three types: is of three types:

A category hierarchy which specifiessuperordinate / basic / subordinate categories

furniture chair arm chair

Functional properties that define each catgory(provides_sittable_surface, provides_stability,...)

Knowledge primitives used to reason about shape(dimensions, relative orientation, ...)

All organized into a "category definition tree"which is GRUFF's knowldge about the world.

Dagstuhl Oct09 17

Category Representation TreeCategory Representation Tree

Conventional Chair

Provides Sittable Surface

Provides Stable Support

Dagstuhl Oct09 18

We imagine the definition of a generic object category to be something like...

straight_back_chair ::= provides_seating_surface +

provides_stability + provides_back_support_surface

and recognition is conceptualized as ...

provides_arm_support

Provides_sittable_surface

provides_stable_support

Provides_back_support

Dagstuhl Oct09 19

A functional requirement such as : provides_sittable_surfaceis implemented as a sequence of calls to shape-based operators.

dimensions(shape_element, dimensions_type, range_parameters)

relative_orientation(normal 1,normal 2, range_parameters)

clearance(shape_element clearance_volume)

Shape-based Knowledge PrimitivesShape-based Knowledge Primitives

Dagstuhl Oct09 20

Abstract shape reasoning

• Metric dimensions (width, depth, height, area, contiguous surface, volume

• Proximity• Relative orientation• Clearance• Stability• Enclosure

Knowledge PrimitivesKnowledge Primitives

Dagstuhl Oct09 21

Physical interaction reasoning

• Change orientation• Apply force• Observe deformation

Knowledge PrimitivesKnowledge Primitives

Dagstuhl Oct09 22

Value returned from knowledge primitive invocation

1.0

Evaluation

Measure

0.0

least low high greatest

ideal ideal

Values of Shape Property

Evaluation MeasuresEvaluation Measures

Dagstuhl Oct09 23

•Combine required measurements using probabilistic AND (0-1)

•Combine descendent subcategory node measure using probabilistic OR

Combining EvidenceCombining Evidence

Dagstuhl Oct09 24

• Category representation graph is control structure

• Structural Constraint Propagation – subcategory nodes constrained by what was found for the parent

Recognition ProcessRecognition Process

Dagstuhl Oct09 25

2 approaches

1. Check all known categories in the knowledge base

2. Confirm/deny object can/cannot function as a specified (sub)category

Recognition StageRecognition Stage

Dagstuhl Oct09 26

Valid Chairs Recognized by GValid Chairs Recognized by GRUFFRUFF

Dagstuhl Oct09 27

History of GHistory of GRUFFRUFF Project Project

Dagstuhl Oct09 28

GRUFF - Generic object recognition system Reasons about and generates plans for understanding 3D scenes of objects

Extension to Context-based Reasoning - Determine significance of accumulated functional evidence to infer the existence of scene concepts

Context-based ReasoningContext-based Reasoning

Dagstuhl Oct09 29

What makes an 'office' an office?

A desk with at least one chair in close proximity.

You categorize areas or workspaces bythe functional configuration of the objectsin the area.

Functionality in the LargeFunctionality in the Large

Dagstuhl Oct09 30

Name: OfficeType: CategoryFunction Verification PlanRealized by Potential Results

Name: Provides potential seating

Name: Provides potential worksurfaces

Shape-basedReasoning

Name:Infer Seating AreasName:Infer Back Support

Name: Infer worksurfaces

Context-basedReasoning

Context-based ReasoningContext-based Reasoning

Dagstuhl Oct09 31

• Multiple objects in scene• Relax functional requirements• Allow partial evidence

What Did Change?What Did Change?

Dagstuhl Oct09 32

• Basic set of functional primitives• Organization of the representation• Categorization, not identification

What Did Not Change?What Did Not Change?

Dagstuhl Oct09 33

Simulated data- Complete 3D models evaluated

no occlusion surfaces- Partial 3D models derived from laser range finder simulation tool

Real data- Stereo camera system generating

range data (SRI's Small VisionSystem software)

Test DataTest Data

Dagstuhl Oct09 34

Test Scenes Used inTest Scenes Used inContext-based ReasoningContext-based Reasoning

Dagstuhl Oct09 35

Test Scenes Used inTest Scenes Used inContext-based ReasoningContext-based Reasoning

Dagstuhl Oct09 36

Infer contextual relationships fromaccumulated functional evidence

Provides potential seating(back support and/or seating area)

Provides potentialworksurfaces

Context-based Reasoning SystemContext-based Reasoning System

Dagstuhl Oct09 37

What is the goal?What is the goal?

Question – How do we recognize objects we have never previously encountered?

- we don'thave a model (or do we?)

Essentially-We categorize objects using some type of "model"

Dagstuhl Oct09 38

Earlier WorkEarlier Work

Roberts“Machine perception of three dimensional solids” 1965

•Analyze intensity image•Extract edge information•Match against library of geometric models

- “Model-based vision” paradigm- “Single arbitrary view 3-D object recognition” paradigm

Dagstuhl Oct09 39

Earlier WorkEarlier Work

Binford“Survey of model-based image analysis systems” 1982

“The essential definition of object class is functional. …

Object classes have an associated 3-D form: form equals function. …

Dagstuhl Oct09 40

Earlier WorkEarlier Work

Binford“Survey of model-based image analysis systems” 1982

“An object’s function is often a geometric function. The function of a room is to be an enclosing volume. … The function of a chair… is to be a flat surface at a comfortable height for sitting….”

Dagstuhl Oct09 41

Earlier WorkEarlier Work

Winston, Binford, Katz and Lowry“Learning physical descriptions from functional definitions, examples and precedents” 1984

•Discussed used of function-based definitions of object categories •Infinity of individual physical descriptions of objects in a category… •Single functional description to represent all (cup example)

Dagstuhl Oct09 42

Earlier WorkEarlier Work

Brady, Agre, Braunegg and Connell“The mechanics mate” 1985

Connell and Brady“Generating and generalizing models of visual objects” 1987

• Discussed relation between geometric structure and functional significance• Generalized structural description learned from sequence of examples

Dagstuhl Oct09 43

Earlier WorkEarlier Work

Minsky“The Society of Mind”, 1985

“… The solution is that we need to combine at least two different kinds of descriptions.

On one side, we need structural descriptions for recognizing chairs when we see them. ”

Dagstuhl Oct09 44

Earlier WorkEarlier Work

Minsky“The Society of Mind”, 1985

“… On the other side we need functional descriptions in order to know what we can do with them… we need connections between parts of the chair structure and the requirements of the human body that those parts are supposed to serve. “

Dagstuhl Oct09 45

BackgroundBackground

DiManzo, Trucco, Giunchiglia, Ricci“FUR: Understanding Functional Reasoning”, 1989

• Utilized functional knowledge within an expert system framework

•Primitives defined as individual expert systems that evaluate 3D information

Dagstuhl Oct09 46

BackgroundBackground

Rivlin and Rosenfeld“Navigational Functionalities”, 1995

• Explored functionality as it relates to mobile robots• Navigating agent may classify objects

in its environment in functional terms as “threat,” “landmark” and so on.

Dagstuhl Oct09 47

Function-based ApproachesFunction-based Approaches

Cognitive Psychology/Human Perception

Artificial Intelligence Computer Vision

Robotics

Representations of object categoriesHuman-robot interaction strategiesWayfinding

Formal representations of knowledgeMachine learning techniques to automate reasoning

Document/aerial image analysisInterpreting human motionObject recognition/categorization

Mapping of indoor environmentsObject detectionNavigation/interaction plansFormalisms for autonomous robot control

Dagstuhl Oct09 48

Artificial IntelligenceArtificial Intelligence

Two areas within AI that impact function-based research

• Work on formal representations of knowledge about functionality•Application of machine learning techniques

to automate the process of constructing function-based systems

Dagstuhl Oct09 49

Artificial IntelligenceArtificial Intelligence

• AI approach developed greater formalism and depth than that in computer vision• Advantage as complexity of system requirements increases

Dagstuhl Oct09 50

RoboticsRobotics

• Incorporate best practices from other fields• Evolution

• Service robots (controlled environment)• Interaction to confirm function• General navigational systems

Dagstuhl Oct09 51

Human Perception TheoriesHuman Perception Theories

• Klatsky et al. (2005)• observe how children interact with object associated to specific function• use information in design of algorithms

for robotic interaction with objects to reason about their function

Dagstuhl Oct09 52

Functional Knowledge RepresentationFunctional Knowledge Representation

• Barsalou et al. (2005)• HIPE (History, Intentional perspective,

Physical environment, and Event sequences)

• Raubal and Moratz (2007) • expanded on theory• representation of affordance-based attributes

Dagstuhl Oct09 53

Affordances?Affordances?

Goal is object recognition using function According to Webster…

Affordance - <graphics> A visual clue to the function of an object.

Yes, GRUFF uses affordances

Dagstuhl Oct09 54

Some interpretation of Gibson affordance

• Automatic• Pop out – no processing necessary

Have to admit – there were (are) different camps

AffordancesAffordances

Dagstuhl Oct09 55

According to Gibson

“If you know what can be done with… an object, what it can be used for, you can call it whatever you please.”

AffordancesAffordances

Dagstuhl Oct09 56

• Considered an error if an object is misclassified. Yes or no?

www.businesssupply.com

AffordancesAffordances

Dagstuhl Oct09 57

According to Gibson“If a surface of support is knee-high above the ground, it affords sitting on.

We call it a seat in general.

If it can be discriminated as having just these properties, it should look sit-on-able.

If it does, the affordance is perceived visuallyperceived visually.”

Yes, GRUFF uses affordances

AffordancesAffordances

Dagstuhl Oct09 58

Yes, it is a chair

AffordancesAffordances

Dagstuhl Oct09 59

Gibson’s Theory of AffordancesGibson’s Theory of Affordances

• Properties noted: Knowledge Primitives• horizontal Relative Orientation• flat Planar• extended Metric Dimensions• rigid Requires Interaction

Physical properties, measured relative to the animal. (Shape Properties)

The Ecological Approach to Visual Perception, J.J. Gibson

Dagstuhl Oct09 60

Open Problems: Across DisciplinesOpen Problems: Across Disciplines

Work to ensure: • scalability• efficiency• accuracy• ability to learn

What we learned from What we learned from GRUFF GRUFF

Open Problem AreasOpen Problem Areas

Data FlowData Flow

End GoalsEnd Goals

Infer contextual relationships fromInfer contextual relationships fromaccumulated functional evidence…accumulated functional evidence…

Provides potential seating(back support and/or seating area)

Provides potential worksurfaces

Infer affordancesInfer affordances““in the large”…(in scale-space)in the large”…(in scale-space)

Provides potentialtable area

Provides potential containment

Factors Influencing System ComplexityFactors Influencing System Complexity

Degree of Interaction

Feedback from Interaction

Complexity of Interaction

From Function From Visual Analysis and Physical Interaction. M. Sutton, L. Stark, & K. Bowyer. Image and Vision Computing. 16 (1998) 745-763.

Knowledge RepresentationKnowledge Representation

The internal architecture utilized for reasoning about affordances:

RepresentationRepresentation

RepresentationRepresentation

Action/Action/ObservationObservationSequenceSequence

Interaction Interaction Tests Tests for a for a Cup ObjectCup Object

Action/Action/ObservationObservationSequenceSequence

Action/Observation SequenceAction/Observation Sequence

Results: Furniture-like objectsResults: Furniture-like objects

Results: Dish-like objectsResults: Dish-like objects

Representative OPUS ModelsRepresentative OPUS Models

Representative Image SequencesRepresentative Image Sequences

Results: Segmentation IssuesResults: Segmentation Issues

Segmentation IssuesSegmentation Issues

Summary of Unpredicted Summary of Unpredicted Subsystem FailuresSubsystem Failures

Category

Model

Building

Subsystem

Shape-based Reasoning

Subsystem

Interaction-based Reasoning Subsystem

Chairs (13/45) 29% 8/32 (25%) 3/18 (17%)

Cups - 0/27 (0%) 7/27 (26%)

Task/Affordance Driven Data FlowTask/Affordance Driven Data Flow

Captureimage pair

Use taskinformation

Calculatedisparity

and range data

(and evaluate)

Performsegmentation

(and evaluate)

Performfunction-based

reasoning

(and evaluate)

Reset parameters

?

Resetparameters

Data flow from function-based reasoning to refinement of image acquisition and range segmentation parameters.

Implementation Level: Parameter SetsImplementation Level: Parameter Sets

Implementation Level: Implementation Level: Metrics / Error CalculationsMetrics / Error Calculations

Real Data: SVSReal Data: SVS

Real Data: Parameter VariationsReal Data: Parameter Variations

Surface Extraction and Use of AffordancesSurface Extraction and Use of Affordances

Capture image pair -> calculate disparity and range ->

evaluate range data -> perform/evaluate range segmentation -> perform/evaluate object recognition

Question: How can use of affordances be incorporated into feedback loops?

Guiding QuestionsGuiding Questions AND ANSWERS! (from previous Dagstuhl seminar) AND ANSWERS! (from previous Dagstuhl seminar)

How could or should a robot control architecture look like that makes use of affordances as first-class items in perceiving the environment?

How could or should such an architecture make use of affordances for action and reasoning?

Is there more to affordances than function-oriented perception, action and reasoning?

Guiding QuestionsGuiding QuestionsAND ANSWERS! (from previous Dagstuhl seminar)AND ANSWERS! (from previous Dagstuhl seminar)

Should affordances in a robot be programmed or learned? (Can they be programmed in the first place?)

What about an affordance needs to be represented in a robot, and how?

How and where in the architecture would attention, intention, or other internal states filter affordances that were perceived on a low level?

How would affordance-based control go together with behavior-based and plan-based control? Is it complementary? Redundant? Inconsistent?

How can affordances be used for reasoning and action?

Affordances:Affordances:

…in space …in space

and time…and time…

Affordances:Affordances:

…within …within subsystems…subsystems…

…supervisors,…supervisors,specialists, specialists, agents…agents…

Affordances:Affordances:

…in …in

scale-space…scale-space…

QUESTIONSQUESTIONS??

In a similar vein, trying to understand perception by studying only neurons

is like trying to understand bird flight by studying only feathers:

It just cannot be done. In order to understand bird flight,

we have to understand aerodynamics; only then do

the structure of feathers and the different

shapes of birds’ wings make sense.

David Marr (1982)

Dagstuhl Oct09 91

Thank Thank You !You !