CVIU Lecture 1

Embed Size (px)

Citation preview

  • 7/30/2019 CVIU Lecture 1

    1/55

    1

    ENGN8530: Computer Vision

    and Image Understanding:Theories and Research

    Topic 1:

    Introduction to Computer Vision

    and Image UnderstandingDr Chunhua Shen

    Dr Roland Goecke

    VISTA / NICTA & RSISE, ANU

  • 7/30/2019 CVIU Lecture 1

    2/55

    ENGN8530: CVIU 2

    What is Computer Vision?

    Vision is a process that produces, from images of theexternal world, a description that is useful to the viewerand not cluttered with irrelevant information. (Marr andNishihara, 1978)

    Computer vision is the science and technology ofmachines that see. computer vision is concerned with thetheory and technology for building artificial systems that

    obtain information from images or multi-dimensionaldata. (Wikipedia)Reference:

    D. Marr and K. Nishihara, Representation and recognition of the spatial organisation of three-

    dimensional shapes, Proc. Royal Society, B-200, 1978, pp. 269-294.

  • 7/30/2019 CVIU Lecture 1

    3/55

    ENGN8530: CVIU 3

    What is Computer Vision? (2)

    Sometimes seen as complementary to biological vision. In biological vision, the visual perception of humans and

    various animals are studied, resulting in models of howthese systems operate in terms of physiologicalprocesses.

    Computer vision, on the other

    hand, studies and describesartificial vision system that areimplemented in software and/orhardware.

  • 7/30/2019 CVIU Lecture 1

    4/55ENGN8530: CVIU 4

    What is Computer Vision? (3)

    Applications: Controlling processes

    (robots, vehicles)

    Detecting events (visual

    surveillance) Organising information

    (indexing databases ofimages / videos)

    Modelling objects orenvironments (medicalimage analysis)

    Interaction (HCI)

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    5/55

    ENGN8530: CVIU 5

    Image Understanding

    Computer vision goeshand in hand withimage understanding

    What information dowe need to know tounderstand the scene?

    How can we makedecisions about whatobjects are present,their shape, their

    positioning? Source: CMU Computer Vision course

  • 7/30/2019 CVIU Lecture 1

    6/55

    ENGN8530: CVIU 6

    Image Understanding (2)

    Many different questions and approaches to solvecomputer vision / image understanding problems:

    Can we build useful machines to solve specific (and limited)vision problems?

    Is there anything special about the environment which makesvision possible?

    Can we build a model of the world / scene from 2D images?

    Many different fields are involved, e.g. computer science,AI, neuroscience, psychology, engineering, philosophy,art.

  • 7/30/2019 CVIU Lecture 1

    7/55

    ENGN8530: CVIU 7

    Sub-areas of CVIU

    Scene reconstruction Event detection

    Object tracking

    Object recognition Object structure recovery

    Ego-motion

    Multi-view geometry

    Indexing of image / video databases

  • 7/30/2019 CVIU Lecture 1

    8/55

    ENGN8530: CVIU 8

    Scene Reconstruction

    From stereo From multiple views

  • 7/30/2019 CVIU Lecture 1

    9/55

    ENGN8530: CVIU 9

    Event Detection

    Source: MERL

    Source: Roland Goecke

  • 7/30/2019 CVIU Lecture 1

    10/55

    ENGN8530: CVIU 10

    Object Tracking

    Source: Roland Goecke

  • 7/30/2019 CVIU Lecture 1

    11/55

    ENGN8530: CVIU 11

    Object Recognition

    Query

    Result

    DatabaseSource: David Nister

  • 7/30/2019 CVIU Lecture 1

    12/55

    ENGN8530: CVIU 12

    Object Structure Recovery

    Reference:

    A.D. Worrall, J.M. Ferryman, G.D. Sullivan and K.D. Baker, Pose and structure recovery using

    active models, Proc. 6th British Machine Vision Conference, Vol.1, Birmingham, UK, pp137-146.

  • 7/30/2019 CVIU Lecture 1

    13/55

    ENGN8530: CVIU 13

    Ego-motion

    Estimated camera path

    Optical flow

    Source: Roland Goecke

  • 7/30/2019 CVIU Lecture 1

    14/55

    ENGN8530: CVIU 14

    Multi-View Geometry

    Epipolar geometry

    Source: Richard Hartley, Andrew Zisserman

  • 7/30/2019 CVIU Lecture 1

    15/55

    ENGN8530: CVIU 15

    Indexing and Retrieval

    Results

    1

    23

    Query

    Reference: J. Sivic and A. Zisserman, Video Google: A Text Retrieval Approach to Object Matching in

    Videos, Proc. International Conference on Computer Vision, Nice, France, 2003, pp. 1470-1477.

  • 7/30/2019 CVIU Lecture 1

    16/55

    ENGN8530: CVIU 16

    The Default Approach (Marr)

    Workbottom upfrom the image to a 3D world modelvia hierarchy of representations as follows

    Pixel array the image

    Raw primal sketch edge, corner, etc. representation

    Primal sketch structural information, i.e. groupings,segmentations, etc.

    2-D sketch depth information in image-centred view

    3-D world model

    Reference:

    D. Marr, Vision, Freeman, 1982.

  • 7/30/2019 CVIU Lecture 1

    17/55

    ENGN8530: CVIU 17

    The Default Approach (2)

    Image sensor

    Visible,infra-red,

    radar

    Image capture

    Digitisation

    Image processing

    Feature detection

    (edges, corners,regions)

    Feature grouping

    Characterization

    of parts

    Object

    recognition

  • 7/30/2019 CVIU Lecture 1

    18/55

    ENGN8530: CVIU 18

    What is in Image? An image is an array/matrix of

    values (picture elements =pixels) on a plane whichdescribe the world from the

    point of view of the observer.

    Because of the line of sighteffect, this is a 2D

    representation of the 3D world. The meaning of the pixels

    depends on the sensors used

    for their acquisition. Source: Antonio Robles-Kelly

  • 7/30/2019 CVIU Lecture 1

    19/55

    ENGN8530: CVIU 19

    Imaging Sensors The information seen by the imaging device is digitised

    and stored as pixel values. Two important quantities of imaging sensors are:

    Spatial resolution: How many pixels are there? Image size

    Signal resolution: How many values per pixel? There are many different types ofsensors

    Optical: CCDs, CMOS, photodiodes, photomultipliers,

    photoresistors Infrared: Bolometers

    Others: Range sensors (laser), Synthetic Aperture Radar (SAR),Positron emission tomography (PET), Computed (Axial)

    Tomography (CAT/CT), Magnetic Resonance Imaging (MRI)

  • 7/30/2019 CVIU Lecture 1

    20/55

    ENGN8530: CVIU 20

    Electro-Magnetic SpectrumSWIR MWIR LWIR

    1.7m2.5m3.0m 5.0m 14.0m8.0m

    NIR

    1.0m

    UV Visible

    0.4m

    The human eye can seelight between 400 and700 nm.

  • 7/30/2019 CVIU Lecture 1

    21/55

    ENGN8530: CVIU 21

    Charge-Coupled Device (CCD) CCDs (Charge-Coupled Devices)

    were invented in 1969 by WillardBoyle and George Smith at

    AT&T.

    They are composed of an arrayof capacitors which are sensibleto light.

    More modern devices are basedupon photodiodes.

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    22/55

    ENGN8530: CVIU 22

    CCD (2) Generally, the light-sensitive unit of

    construction is arranged in an arraywhose topology is a lattice

    Not always true, e.g. log-polarCCDs

    Colour CCDs:

    Bayer filter: 1x Red, 1x Blue, 2x Green

    because the human eye is moresensitive to green

    RGBE filter: 1x Red, 1x Blue, 1x Green,1x Emerald (Cyan)

    Bayer filter

    RGBE filter

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    23/55

    ENGN8530: CVIU 23

    Bolometers Invented by the astronomer Samuel Pierpont Langley in

    1878.

    It is a device comprised ofan "absorber" in contact witha heat sink through aninsulator. The sink can beviewed as a reference for

    the absorber temperature,which is raised by the powerof the incident

    electromagnetic wave.

    Source: Los Alamos National Laboratory

  • 7/30/2019 CVIU Lecture 1

    24/55

    ENGN8530: CVIU 24

    Microbolometer The microbolometer, a particular kind of bolometer, is

    the basis for thermal cameras.

    It is a grid of vanadium oxide or amorphous silicon heatsensors atop a corresponding grid of silicon.

    IR radiation from a specific rangeof wavelengths strikes thevanadium oxide and changes its

    electrical resistance. Thisresistance change is measured andprocessed into temperatures which

    can be represented graphically. Source: Roland Goecke

  • 7/30/2019 CVIU Lecture 1

    25/55

    ENGN8530: CVIU 25

    Synthetic Aperture Radar SARis an active sensing technique

    Active sensor transmits radio waves

    Antenna picks up reflections

    For a conventional radar, the footprint is governed by the

    size of the antenna (aperture).

    SAR creates a synthetic aperture and delivers a 2Dimage. One dimension is the range (cross track),

    whereas the other one is the azimuth (along track).

    Sonar and ultrasound work on the same principles but indifferent wavelengths

  • 7/30/2019 CVIU Lecture 1

    26/55

    ENGN8530: CVIU 26

    SAR (2)

    Radar Track

    Range

    Azimuth

    Nadir Track

    RADAR = Radio Detection and Ranging

    NADIR = Opposite of zenith SAR image of Venus

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    27/55

    ENGN8530: CVIU 27

    Positron Emission Tomography Active sensing technique.

    PET based on measuring emittedradiation.

    PET is a nuclear medicineimaging technique which usesradiation from a radio-isotopeintroduced into the target.

    PET produces a 3D image or mapof functional processes in thebody.

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    28/55

    ENGN8530: CVIU 28

    Magnetic Resonance Imaging Active sensing technique.

    MRI also based on measuringemitted radiation.

    MRI simulates the emission ofradiation by aligning the spinsof water molecules making useof a high energy magnetic field

    (several Tesla!).

    Good for showing soft tissue

    Not good for showing bones

    MRI

    Magnetic

    Resonance

    Angiography

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    29/55

    ENGN8530: CVIU 29

    Functional MRI Functional MRI (fMRI) measures

    signal changes in the brain thatare due to changing neuralactivity.

    Increases in neural activity causechanges in the MR signal due tochange in ratio of oxygenated to

    deoxygenated haemoglobin. Deoxygenated haemoglobin

    attenuates the MR signal.

    fMRI of head: Highlighted areasshow primary visual cortex

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    30/55

    ENGN8530: CVIU 30

    Computed (Axial) Tomography Employs a set of axially acquired x-

    ray images to recover a 3Drepresentation of the object.

    Originally, the images were in axial

    or transverse planes, but the modernCT scanner deliver volumetric data.

    Digital geometry processing is used

    to generate a 3D image of theinternals of an object from a largeseries of 2D X-ray images taken

    around a single axis of rotation.

    CT scan of head

    Source: Wikipedia

  • 7/30/2019 CVIU Lecture 1

    31/55

    ENGN8530: CVIU 31

    CAT/CT Good for showing

    bones

    Not good for showingsoft tissue

    Modern diagnostic software

  • 7/30/2019 CVIU Lecture 1

    32/55

    ENGN8530: CVIU 32

    Camera Geometry

    The apertureallows light to enter the camera

    The image planeis where the image is formed

    The focal lengthis the distance between the aperture and theimage plane

    The optical axispasses through the center of the aperture andis perpendicular to it.

    f (focal length)

    x'

    image planeaperture

    y'd

    z optical axis

  • 7/30/2019 CVIU Lecture 1

    33/55

    ENGN8530: CVIU 33

    Camera Geometry (2)

    f (focal length)

    x'

    x

    z optical axis x'

    By similar triangles, x'/f=x/z

    For small angle

    tanor fxzxfx ==

    fx =

  • 7/30/2019 CVIU Lecture 1

    34/55

    ENGN8530: CVIU 34

    Camera Geometry (3)

    f

    x'x'

    xt

    xb x't

    x'b

    x

    And, using the formula in the previous slide

    Hence, size transforms as

    z

    xf

    z

    fxx

    xxxz

    fx

    xz

    fx

    x

    bt

    bt

    b

    b

    t

    t=

    ====

    )(

    and,

    ffx =2

    tan2

  • 7/30/2019 CVIU Lecture 1

    35/55

    ENGN8530: CVIU 35

    Camera Geometry (4)

    Close objectDistant object

    Rays that pass through the camera aperture spread out anddo not make a sharp point on the image.

    These rays need to be focussed to make a sharp point in

    the image. The rays from close objects diverge more than from distant

    objects

    For very distant objects, the rays are effectively parallel

  • 7/30/2019 CVIU Lecture 1

    36/55

    ENGN8530: CVIU 36

    Aperture and Resolution Light diffracts as it passes through the aperture

    A point in the scene spreads out into a blob in the image(fundamental limit on image sharpness)

    Size of Airy disk (and best resolution) is (Rayleigh

    criterion)

    where is the wavelength of the light, d is the apertured

    fR

    d

    22.122.1 minmin ==

    Circular apertureAiry disk

    Squareaperture

    Separatepoints

  • 7/30/2019 CVIU Lecture 1

    37/55

    ENGN8530: CVIU 37

    Resolution The resolution of a camera is the minimum separation

    between two points such that they appear separately onthe image plane

    Since distant objects appear smaller and closer together,

    the resolution varies with respect to the distance.

    The angle between separable objects does not vary wrtdistance angular resolution

    The distance on the image plane does not vary imageplane resolution.

  • 7/30/2019 CVIU Lecture 1

    38/55

    ENGN8530: CVIU 38

    Camera Models Pinhole camera

    Camera with lenses

  • 7/30/2019 CVIU Lecture 1

    39/55

    ENGN8530: CVIU 39

    Pinhole Camera Advantages

    No distortion of image

    Depth of field from a few cm to infinity

    Wide angular field

    Works on ultra-violet and X-rays

    Disadvantages

    Very limited light gathering

    Poor resolution

  • 7/30/2019 CVIU Lecture 1

    40/55

    ENGN8530: CVIU 40

    Pinhole Camera (2)

    Simplest camera

    The pinhole (aperture d) must be small to get asharp image

    But we need a large pinhole to get enough light!

  • 7/30/2019 CVIU Lecture 1

    41/55

    ENGN8530: CVIU 41

    Pinhole Camera (3) For distant objects the

    geometric limit is

    The diffraction limit is

    The best resolution occurswhen these two are equal:

    or

    f* is the optimal focal length

    d

    dR =

    dfR /22.1 =

    dfd /22.1 *=

    22.1/2* df =

    R=

    d

    Geometric

    Diffraction

    f

    R

  • 7/30/2019 CVIU Lecture 1

    42/55

    ENGN8530: CVIU 42

    Pinhole Camera (4)Geometric limit

    Longer wavelength

    Smaller aperture

  • 7/30/2019 CVIU Lecture 1

    43/55

    ENGN8530: CVIU 43

    Cameras with Lenses For better light-gathering capabilities, we need to

    increase the aperture.

    A lens removes the geometric limit on resolution,since it focuses all light entering through the

    aperture on the same point on the image.

    f

    d

    Pinholepath

  • 7/30/2019 CVIU Lecture 1

    44/55

    ENGN8530: CVIU 44

    Cameras with Lenses (2) We can have apertures as large as we like

    The price to pay: chromatic and spherical aberration

    The image-plane resolution of lens based camera is thediffraction limit of the aperture:

    The larger the aperture, the better the resolution

    The image-plane resolution is still f

    d/22.1 =

    dfR /22.1 =

  • 7/30/2019 CVIU Lecture 1

    45/55

    ENGN8530: CVIU 45

    Camera Resolution Examples Pinhole camera, 0.5mm pinhole

    Optimal focal length f*=37cm

    =4.6', equivalent to 1mm at 75cm

    For a 35mm lens camera and visible light:

    =3.9'', 1mm at 52m

    Focal length depends on the lens, but typically

  • 7/30/2019 CVIU Lecture 1

    46/55

    ENGN8530: CVIU 46

    Illumination The amount of light entering the camera is proportional

    to the area of the lens (d2/4)

    The area covered by the image is proportional to f2

    So, the brightness of the image is proportional to d2/f2

    Dependent on the focal ratio f/d

    Brightness is controlled by a moveable aperture whichchanges d

    Referred to by a sequence of f-stops; f:1 is fully open,each successive f-stop halves the brightness (so theaperture is reduced by 2): f:1.4, f:2, f:2.8, f:4, f:5.6

  • 7/30/2019 CVIU Lecture 1

    47/55

    ENGN8530: CVIU 47

    Absorption and Reflection

    Reflection

    Transmission

    Absorption

    Reflected + absorbed + t ransmit t ed energy

    = I ncident l ight energy

    All of these are object(material, surface) dependant!

  • 7/30/2019 CVIU Lecture 1

    48/55

    ENGN8530: CVIU 48

    The BSDF

    Source: Wikipedia

    Bidirectional Scattering

    Distribution Function

    Describes the way in which lightis scattered by a surface

    BSDF = BRDF + BSSRDF + BTDF

    BRDF - Bidirectional reflectancedistribution function

    BSSRDF - Bidirectional surfacescattering reflectance distributionfunction (incl. subsurface scattering)

    BTDF - Bidirectional transmittance

    distribution function

  • 7/30/2019 CVIU Lecture 1

    49/55

    ENGN8530: CVIU 49

    The BRDF It describes the reflectance of

    an object as a function of theillumination, viewinggeometry and wavelength.

    Its given by the ratio ofirradiance (incident flux perunit area) to radiance

    (reflected flux per unit area).

    Reference:

    F. Nicodemus, "Reflectance nomenclature and directional reflectance and emissivity," Appl. Opt.,

    Vol. 9, 1970, pp. 14741475.

  • 7/30/2019 CVIU Lecture 1

    50/55

    ENGN8530: CVIU 50

    The BRDF (2) The modelling of the lighting conditions in the scene is of

    pivotal importance for the acquisition and processing ofdigital imagery.

    The radiance function can be decomposed into a linear

    combination ofambient, diffuse and specularcomponents.

    Recovering the radiance function from a single image is

    an underconstrained problem.

  • 7/30/2019 CVIU Lecture 1

    51/55

    ENGN8530: CVIU 51

    The BRDF (3) In general, the BRDF has the following form

    The function depends on

    Incoming and outgoing angle

    Incoming and outgoing wavelength

    Incoming and outgoing polarisation

    Incoming and outgoing position (subsurface scattering)

    Delay between the incoming and outgoing light rays

  • 7/30/2019 CVIU Lecture 1

    52/55

    ENGN8530: CVIU 52

    Radiance Power per unit projected area perpendicular to

    the ray per unit solid angle in the direction of theray

    Flux given byd = L(x,) cos d dA

    Solid angle is proportional to the surface area, Sof a projection of the object onto a sphere dividedby the square of its radius R.

    dA dw

    L(x,w)

  • 7/30/2019 CVIU Lecture 1

    53/55

    ENGN8530: CVIU 53

    Example BRDFs Oren and Nayar

    Cook and Torrance

  • 7/30/2019 CVIU Lecture 1

    54/55

    ENGN8530: CVIU 54

    Example BRDFs (2)

    where mp is the microfacet slope

  • 7/30/2019 CVIU Lecture 1

    55/55

    ENGN8530: CVIU 55

    Example BRDFs (3) Phong