Digital Image Processing and Interpretation

- 1 -

Copyright 1999 Qiming Zhou. All rights reserved.

DIGITAL IMAGE PROCESSING AND INTERPRETATION*

Qiming Zhou

Department of Geography, Hong Kong Baptist UniversityKowloon Tong, Kowloon, Hong Kong

Phone: (852) 23395048, Fax: (852) 23395990, E-mail: [email protected]

Digital images, particularly those from remote sensing technology, have become animportant source of spatial information. In modern Geographical Information Systems (GIS),digital remotely sensed images are widely recognised as one most practical means for spatialinformation updating, especially in real-time applications. However, in most today’sapplications, the remotely sensed data may only be used with their greatest potentials if theycan be correctly interpreted, classified and presented in the same way as other terrestrialspatial information, such as thematic maps.

This lecture note demonstrates the methodologies and techniques of extracting thematicinformation from digital images. For introduction, the nature of digital images andcharacteristics of earth objects in relation to image interpretation are discussed. Then thediscussions are focused on the techniques of image enhancement, interpretation and auto-classification using black-and-white (or single band) images or multispectral images.Methods and techniques used for integrating digital images with spatial information systemsare also discussed. For the purpose of this note and ease of discussion, the digital imagesreferred here only include those from passive remote sensing.

1 The Nature of Digital Images

In our everyday experience, we are exposed to images in magazines, on billboards, ontelevision and with snapshot photos. It is easy to think of all images as photographs, butphotography is only one way that an image can be made. It is common today to use videocameras, digital cameras or other machines to record images, as well as conventionalphotography. In many cases, a hard copy of the image is distributed on photographic film,when the original images were recorded in some other manner. Photographs are routinelyconverted to digital images for computer enhancement. It is important to understand thedistinction between the photographic process and electronic image recording processes.

1.1 Photographic Film

Photographic film reacts chemically to the presence of light. The more light, the greateris the chemical reaction in the film. Light causes a reaction in the silver salts in the film thatturns part of the grains to silver. A chemical developer is used to complete the conversion ofexposed grains to pure silver. In a black and white negative film, the silver salts are washedaway leaving only black silver where the light exposed the film. The density of silver grainscontrols how dark the negative appears.

* Lecture Notes for Subject GEOG3610: “Remote Sensing and Image Interpretation”, Department of

Geography, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong.

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Page 2 of 50

The shading between very light and very darkobjects in the film appears as a smooth transition to theeye, even though it is caused by a number of solid blackgrains (Figure 1). Through a microscope, however,photographic film is merely a sea of black particles on aclear base. Photographic paper is essentially the same asfilm except that it is backed with paper rather than clearplastic. To create a print, photographic paper is exposedusing an apparatus that focuses the negative onto thepaper. Colour photography works in a similar manner toblack and white except that the silver grains are replacedwith colour pigments during the developing process.

Figure 1. A photo appears continuoushowever individual grains of silver-halide turn dark in response to light.

1.2 Photo-Electrical Imaging

Electronically recorded images are not made using a chemical reaction, but are createdusing light-sensitive computer chips. Certain types of computer chips create an electricalcurrent when illuminated by light. This is known as the photo-electrical effect. If light isfocused on a photoelectric chip (a photo-detector), the amount of electricity created isproportional to the amount of light hitting the chip. If an array of detectors is placed onto theback of a camera instead of film, an electrical image is produced whereby each pictureelement (or pixel) has an electrical intensity instead of a different density of exposed silverhalide grains (Figure 2). The size of the pixel depends on the size of the array of detectorsand the size of the object that is focussed onto the array (Figure 3).

Electricity

Figure 2. Light from the object is focusedonto a light-sensitive computer chip. Theelectricity produced is proportional to thebrightness.

Figure 3. An image made using photoelectric chips recordsthe relative amount of light hitting each chip. If the film ina camera were replaced with an array of 16 chips the lightpattern shown at right would create an image as shown atleft.

Digital still-frame cameras and video cameras work on much the same principle as aphotographic camera (Figure 4, top and middle). The digital still-frame camera replaces thefilm with an array of photo-detectors. Video cameras store the image as a series of lines“sampled” from the image in the camera. Each line is a record of the brightness of that stripacross the image. Many earth resource remote sensing systems focus the image directly ontothe photo-detector, rather than using a camera-like system. The idea is the same as if youviewed the earth through a cardboard tube, recording the image in strips much like a videocamera, scanning across each line (Figure 4 bottom).


Page 3 of 50

Array of photo-detectors

Lines acrossimage scannedonto photo-detectors

Mirror scansobject ontophoto-detectors

Digital still-frame camera

Digital video camera

Electro-optical scanner

Figure 4. Digital images can be created by measuring the light from a number of sources.

1.3 Analogue versus Digital

An analogue signal is a continuous record ofphenomena. The record can be physically etchedinto a surface (e.g. the sound energy etched into thegrooves of an LP record, or the inked lines drawnby a seismograph); stored on magnetic tapes (e.g.the sound energy stored by magnetising magnetictape in a cassette); or broadcast electromagnetically(e.g. the sound energy carried on anelectromagnetic wave as radio sound). Figure 5shows an analogue signal of the level of brightnessacross the image above the graph (draw any linehorizontally across the shaded image).

One of the main problems with a recording ofan analogue signal is that any smudges, impurities,marks or anything that changes the recording in anyway is interpreted as part of the signal duringplayback. The crackle and pop of old LP records,for example, are dust and marks in the grooves,whilst the hiss on a cassette tape is the randombackground magnetic signal on the tape.

Distance

Sign

al s

tren

gth

Figure 5. Analogue signal (cross-section oftop image shown on bottom)


Page 4 of 50

1.4 The Digital Signal

One solution to the problems with analoguesignals is to convert them to a numeric representationof the original signal. The numbers can be stored inany available manner, and then used to reproduce theoriginal signal without noise or distortion. The processof dividing the analogue signal up intervals is calledsampling. Figure 6 shows how both the intensity (Yaxis) and distance (X axis) can be broken up into anumber of discrete regions. The greater the number ofregions or bins, the greater the sampling density.

X

Y

Figure 6. The first step of digitising –determining the sampling interval. If thisis applied to Figure 5, X is distanceacross the image, Y is the signal strength.

a

b

c

Once the sampling density along the distanceaxis (Figure 5, X in Figure 6) has been determined,the signal must be reduced to a single value foreach bin (Figure 7a and Figure 7b) which ideally isthe average of the signal within the bin. Generallythe number of intensity levels is fixed, requiringthat the digital value set to the closest possiblealternative (Figure 7c – the darkest boxes show thedigital signal that is to be stored).

The sampling interval along the horizontalaxis in Figure 7 (the base in Figure 8) is a knowninterval, and thus does not have to be recordedexplicitly. The intensity values (vertical axis) areall that needs to be recorded as long as they arekept in order. The digital values for the signalshown in Figure 5 are shown along the base inFigure 8 (4, 8, 11, 11, 6, 2, …). The samplingprocess described here is performed using ananalogue to digital converter (DAC).

Figure 7 (Up). Sampling an analogue signal along one axis(a), the sample signal (b), and reducing to a discrete

intensity value (c).

Figure 8 (Right). The resulting digital signal (along thebottom)

4 8 1111 6 2 3 4 6 8 8 6 5 3 2 2 3 3 4 5 6 7 7 7 6 6 50123456789

10111213

1.5 Pixels and Digital Images

The numeric values recorded for each sampled element of the signal are known as adigital number or DN. To convert a photograph into a digital image the image is divided intoa regularly spaced grid (an array) where the average brightness of each cell is measured and


Page 5 of 50

stored as a DN. Each cell of the grid that makes up the image is known as a picture elementor pixel. The DN recorded for each pixel represents the average brightness for the areacovered by that pixel. The process of measuring each pixel is known as scanning, which iswhy the term scanner crops up repeatedly in desktop publishing and remote sensing (desktopscanner, multispectral scanner). Figure 9 shows a photograph that was scanned using avarious number of pixels.

Figure 9. A digital image and its spatial resolution. As fewer and fewer pixels are used, the area withineach pixel is averaged. From left to right, 275x337 pixels, 92x112, 28x34 and 7x8.

The number of pixels used to create a digital image depends on the degree of claritythat is needed and the particular characteristics of the scanner. The fewer pixels that theimage is divided into, the less clear the image appears. If any digital image is looked at veryclosely, it will appear blocky, as the individual pixels become visible.

Satellite sensors focus a small patch of the surface of the earth on a single detector,recording a pixel value for an area that varies from a few metres across, to tens or hundredsof metres across. A digital sensor in a camera that is used by news reporters can produce animage with pixels so small that it is difficult to tell the difference between the digital imageand a photograph. A desktop scanner used for computer publishing can be controlled tomeasure pixels in micrometers. The actual size of each pixel on the ground or on the subjectis controlled by the distance from the target and the individual characteristics of the sensor.The size of the pixels in an image scanned from a photograph depends on the scale of theoriginal photograph, and the number of pixels that it is divided into.

1.6 Digital Numbers

In a remote sensing image, the range of valuesof DNs in the image depends on the particular sensor.The electric signal that is created by the sensor isconverted to a digital number by a DAC – theresulting numeric value ranges from some minimumvalue (when the target is very dark), to a maximumvalue (when the target is extremely bright). The DACconverts that range into numerical values that arestored as binary integers often in the range of 0 to 255or 0 to 65,536 (in one or two bytes). The data range isoften arbitrary, although it is related to the radiometricsensitivity of the sensor. Figure 10 shows an imagewith an arbitrary range of ten values from 0 to 9.

8

8

8

8

8

8

8

6

4

4

4

6

8

6

0

0

2

6

8

6

0

0

2

6

8

6

2

2

2

6

Figure 10. The string of digital numberson the left is to represent the image on

the right.


Page 6 of 50

The entire range of data values is often not used. The sensor and the DAC are designedso that very seldom is the image source darker or brighter than that sensor is capable ofreading (and thus seldom are the DN’s at the minimum or maximum value). Most images areneither very dark, nor extremely bright, and fall somewhere in between the minimum andmaximum digital number. However, in some rare cases, an object may be brighter than thatthe sensor can measure, such as the cases with snow or clouds, thus the surface would berecorded with the maximum value (this is also called saturated).

1.7 Image Geometry

Digital images can be thought of as a regular grid of (normally) square pixels. The pixelshape is generally assumed to be square by most computer image processing programs, whichmay not be the case with raw data. The degree of distortion depends on the type of sensor andplatform, the width of the view angle or Field of View (FOV), and the altitude of theplatform. Distortion is of two types: systematic distortion and non-systematic distortion.Systematic distortion occurs in a regular, predictable manner usually due to knowncharacteristics of the platform, scanner or the orientation of the platform relative to the earth,while non-systematic distortion is due to external factors that cannot be calculated orcomputed from known variables.

The distortions in aircraft scanner images are often much more complex than withsatellite-based scanners because the attitude of the aircraft changes along the flight path, andthe scanning arc is generally greater than with a satellite-based system. Distortion due toflight-path perturbation can be removed systematically only if the exact orientation of theaircraft is known from gyroscopic data recorded at the time of the flight.

1.7.1 Systematic distortion

Systematic distortion is a regular distortion of an image that can be calculated andcorrected. The causes of systematic distortion include, e.g., the FOV of the sensor, earth’scurvature, and earth’s rotation. The type of systematic distortion that occurs in an imagedepends largely on the altitude of the platform, and the type of sensor and platform used.

The large view angle common with aircraft-based scanners and satellite-based sensorssuch as NOAA AVHRR causes the pixel size to change as the distance from the flight pathincreases (Figure 11). The extent of the distortion can be calculated from the FOV of thescanner, the altitude of the platform, and the look angle for each pixel. The curvature of theearth also needs to be taken into account with high-altitude imagery with a large view anglesuch as NOAA AVHRR. This is not a problem with Landsat MSS or TM, or SPOT MSS dueto their high altitude and very small view angle.

Satellite-based platforms such as Landsat or SPOT have a near-polar orbit that passesfrom north to south. If the earth did not rotate, the strip of image data captured by the sensorwould have little geometric distortion since the sensors on board both satellites have a smallview angle and normally look straight down at the earth (SPOT MSS can also look to theside). The earth rotates out from under the satellite, however, resulting in the strip slowlymoving westward towards the south end of the pass (Figure 12). The westward skew can becalculated from the trajectory of the satellite orbit and the speed of the earth underneath thesatellite at any given latitude. The data vendor generally takes care of systematic distortionunless the raw data is specifically requested.


Page 7 of 50

Figure 11. If a scanner has a wide view angle pixelsthat are farther from the nadir (straight down)

point are increasingly distorted. If this is notcompensated for the edges of the image become

compressed.

Figure 12. Earth resource satellites, such asLandsat, orbit at approximately 700 km above the

earth’s surface in a near-polar orbit. The earthrotates out from under the satellite resulting in theimage area moving progressively westward as the

satellite passes from the north to the south.

1.7.2 Non-systematic distortion

Many factors that affect the geometricorientation of the image cannot be calculatedbecause of unknown variables or perturbations inthe orientation of the scanner. In the case of twosatellite images, the absolute orientation will bedifferent on each pass of the satellite (although itwill be close). The orbital characteristics of thesatellite will dictate where each IFOV is located,and although the corner of the image can beestimated to some degree using the known orbit,the images usually are offset from one another.

1.7.2.1 Radial Distortion

The physical manner in which the photographor image is captured leads to other types ofdistortion (see Figure 11 to show the geometry ofan aircraft-based scanner). The arrows shown in thephotograph and the image in Figure 13 show howtall objects appear in a photograph and a side-to-side image scanner. The camera focuses an entirescene onto film at one time, while the scannerimage is built up a line at a time (the tall object islaid over along the scanned line). In these examplesthe aircraft is flying from the top of the photo orimage to the bottom.

Figure 13. Radial distortion in photographs,outward distortion in airborne scanner

images, and displacement due to elevationdifferences. The degree of displacement on

an image plane is shown in the rightdiagram: a shows the location on the imageplane for a tree located at A, while b shows

the location of the elevated tree B.


Page 8 of 50

1.7.2.2 Displacement due to Elevation

The relative location of objects in both cases will be different than on a map due tolocation displacement caused by elevation differences. In Figure 13, the tree B appears on thephotograph or image at point b on the imaging plane (a side view of the photo or image). Ifthe tree was located at A the position on the imaging plane would be a. The displacementfrom elevation is used by the eye to gauge distances. The difference in location of objects intwo images or photographs can be used to determine the elevation of the ground. The size ofthe difference is known as parallax, the closer the object is to the camera, the greater theparallax will be in subsequent stereo photographs. The same effect can be seen when drivingdown the highway; the nearby telephone poles or fence posts pass by quickly (largedisplacement), while distant objects pass by slowly.

2 Factors to be Considered for Thematic Information Extraction

Remotely sensed images are presented in a way similar to a normal photograph, exceptit is in digital numbers that represent brightness values. To extract thematic information outof these digital numbers, three basic factors must be considered, namely spectral, spatial andtemporal characteristics. The spectral characteristics refer to the nature of electromagneticradiation that is emitted or reflected from the earth’s surface and the capability of the sensorto detect it. The spatial characteristics describe the size of the earth’s objects to be detected incomparison with the spatial resolution of the sensors. Given that we live in a changing world,the temporal characteristics must be considered while attempting monitoring ourenvironment, not only about the nature of phenomena, but also the capability of the sensor.

2.1 Spectral Characteristics

Human eyes are a form of remote sensing detector. They react to light that comes fromor is reflected from things around us. The light that human eyes see, however, is just a smallportion of a continuous spectrum of energy called the electromagnetic spectrum.

The electromagnetic spectrum is made up of energy that is known as electromagneticradiation (EMR) because the pulses or waves can be measured both electrically andmagnetically. Specific names are used to describe the different wavelengths that make up theelectromagnetic spectrum (Figure 14). The light that human eyes can see is called visiblelight only because it is visible to the eye. Indeed, there is nothing particularly special aboutthe visible portion of the electromagnetic spectrum other than the fact that humans can see it.


Page 9 of 50

Short wavelengths (high frequency) Long wavelengths (low frequency)

Gamma X-ray Infrared Thermalinfrared

Microwave RadioVisible

nanometres micrometres centimetres metres

blue green redVisible light0.4 - 0.7 micrometres

Figure 14. The range of wavelengths known as the electromagnetic spectrum. Our eyes can only see asmall part of this energy known as visible light (see zone marked “visible light”).

2.1.1 The Nature of Electromagnetic Radiation

The energy of EMR can be detected when it interacts with matter. In the absence ofmatter (in a vacuum) EMR travels at just under 300,000 km per second. In matter the speed isslightly slower (such as through the atmosphere). In fact, the denser the matter, the slower isthe speed.

Figure 15. White light is separated into a fanof light with distinct wavelengths as it passesthrough a prism. The shortest wavelengths

slow the most (and thus are bent the greatestamount).

Electromagnetic radiation interacts withdifferent types of matter in different ways. Infact, the way that the EMR is viewed causes it toappear to be either a wave or a particle. Forinstance, the refraction (or bending) of light as itpasses through glass is best explained bydescribing the radiation as a wave. The light witha shorter wavelength, such as blue, is slowedmore than that with a longer wavelength, such asred. The shorter waves are bent or refracted morethan longer waves causing the light to spread outlike a fan with the longest wavelengths at oneside of the fan and the shortest wavelengths at theother (Figure 15).

The wave oscillates at a frequency that is inversely proportional to the length of thewave (the distance between the peaks of two waves).

λν=c (1)

were c = speed of light = 299,792.8 km/secondν = frequency (oscillations per second in Hertz)λ = wavelength (m)

The energy associated with EMR is proportional to the frequency of the radiation. Asthe energy level of the radiation increases it does so in discrete steps – as if the radiation wasmade of individual vibrating bundles of energy. These bundles are known as photons orquanta and can behave in a manner similar to particles except that they have no mass. Theamount of energy in a photon can be calculated using either the frequency or wavelength ofthe energy.


Page 10 of 50

λν hc

hE == (2)

where E = Energy of a photon in Joulesh = Planck’s constant 6.626 × 10-34 Joules ⋅ secondν = frequency (oscillations per second in Hertz)λ = wavelength (m)

Most remote sensing is based on the detection of EMR, whether the detector is a humaneye, a camera or a scanner on a satellite (an electronic “camera”). What can be learned fromremote sensing depends on what type of radiation is detected and how it interacts with thesurface of the earth (or any other surface that is being looked at). Human vision is particularlyimportant because any information that is collected generally is interpreted visually from aprinted image or on a computer screen.

2.1.2 The Electromagnetic Radiation from Earth Materials

All matter in the universe that is warmer than 0°K (or -273.15°C) emits electromagneticenergy. Molecular motion stops at 0°K – the coldest possible temperature, which is alsoknown as absolute zero. All objects in every day life are warmer than absolute zero.

The wavelength of EMRranges from very short (nanometres)to very long (kilometres) as wasshown in Figure 14. The amount andtype of energy that is emitteddepends on the temperature of theobject. Very cold objects only emitenergy with very long wavelengths,while warm objects emit both longand short wavelengths. In addition toemitting a wider range ofwavelengths, the warmer objectemits more energy than the coldobject. This higher energy level iscaused not only by an increase in theamount of EMR being emitted, butalso by the fact that shorterwavelength EMR contain moreenergy (Figure 16).

Figure 16. A very cold object (bottom) only emits long waveenergy. As the object is heated (middle) the amount of

emitted energy increases, and the average wavelength of theenergy decreases. The hottest object (top) not only emits the

most energy, but also emits short-wave energy

The total amount of energy and the range of wavelengths of that energy that are emittedfrom a surface are predicted by a set of equations. The term blackbody refers to a surface thatabsorbs all incoming energy (and hence, looks black) which re-emits that energy in perfectaccordance with the Stefan-Boltzmann law:

4TM σ= (3)


Page 11 of 50

where M = Total radiant exitance from surface in W/m2 (W = Watts)σ = Stefan-Boltzmann constant = 5.6697 × 10-8 Wm-2K-4

T = absolute temperature of the surface (°K).

To quote Lillesand and Kiefer (1994, pp. 7) “a blackbody is a hypothetical, idealradiator that totally absorbs and re-emits all energy incident upon it. Actual objects onlyapproach this ideal”. The wavelength at the peak of the blackbody curve is the wavelength ofmaximum emittance which is directly related to the temperature of the surface (Wien’sdisplacement law):

T

W=maxλ (4)

where λmax = Wavelength of maximum emittance in µmW = Wien’s constant = 2,897 µm⋅KT = absolute temperature (°K).

A surface needs to be very hot before it emits energy at such short wavelengths. Figure17 shows a diagram of the amount of energy and the wavelengths of energy emitted from thesun, a hot stove element, and the human body. The light from the sun is slightly yellowbecause there is slightly more green and red light than blue light being emitted. In addition,some of the blue light that is emitted by the sun is scattered by the atmosphere, furtherreducing the amount of blue and making the sky appear blue. A hot stove element looks redbecause it is not hot enough to emit blue and green light, only red light and energy withlonger wavelengths than we can see (infrared and longer). The energy that is emitted from thehuman body cannot be seen at all by human eyes because all of the energy is at wavelengthsthat are much longer than the eye can see.

A surface that emits energy close to that of a blackbody is called a greybody, while asurface that emits close to a blackbody at some wavelengths but not at others is called aselective radiator. Most earth materials are selective radiators (Figure 18).

Sun at 5700°C

Hot iron at 800°C (red hot)

Human body at 37°C

Am

ount

of

ener

gy

Shorter wavelength Longer wavelength

Visible light

BGRBGR

Figure 17. The wavelength and amount of energythat is emitted from the sun (top curve), hot iron(such as a glowing stove element, middle curve),and the human body (bottom curve).

A

Am

ount

of

ener

gy

Shorter wavelength Longer wavelength

B

C

Figure 18. Curves A, B and C show the energyemitted from three surfaces with different emissioncharacteristics: (A) a blackbody, (B) a greybody,and (C) a selective radiator.

2.1.3 Spectral Characteristics of Sensors

One important consideration in selecting the wavelength range in which a remotesensor will detect is the atmospheric transmittance. The Earth’s atmosphere itself selectivelyscatters and absorbs energy in certain spectral range, allowing the rest of solar energytransmitted through it. Areas of the spectrum where specific wavelengths can pass relatively


Page 12 of 50

unimpeded through the atmosphere are called transmission bands, or atmospheric windows,whereas absorption bands, or atmospheric blinds, defined those areas where specificwavelengths are totally or partially blocked. For a remote sensor that is capable of ‘seeing’objects on the ground, the detectors must use the transmission bands (Figure 19).

Vis

ible

UV

Infr

ared

Sun’s energy (at 6000°K)

Earth’s energy (at 300°K)

0.3µm 1µm 10µm 100µm 1mm 1m

Wavelength

Ene

rgy

(a) Energy sources


Wavelength

Human eye

Photography Thermal IR scanners

Electro-optical sensors Passive microwave

(c) Common remote sensing systems

Imaging radar


Wavelength

0%10

0%

Tra

nsm

issi

on

(b) Atmospheric transmittance

Energy absorbedand scattered

Figure 19. Spectral characteristics of energy sources, atmospheric effects and remote sensing systems(after Lillesand and Kiefer, 1994).

The spectral responses that a remote sensor can ‘see’ is dependent upon the spectralbands that the sensor detects. In remote sensing, the spectral range is usually composed of anumber of spectral bands (fall within the ‘atmospheric windows’), ranging from a single bandimage (panchromatic image) to several hundreds (hyper spectral image). Usually, a term‘multispectral’ is applied to the images that are composed of several spectral bands. Thespectral characteristics of commonly used space-borne sensors, their spectral bands andprimary use are listed in Table 1.


Page 13 of 50

Table 1. The spectral characteristics of some currently operational space-borne remote sensors.

Satellite Sensor /image

No. ofBands

Band SpectralRange (µm)

Primary Use

1 0.5 – 0.6 Culture features, clear water penetration2 0.6 – 0.7 Vegetation/soil discrimination3 0.7 – 0.8 Delineating water bodies, geology

MSS 4

4 0.8 – 1.1 Delineating water bodies, vegetation vigourand biomass

1 0.45 – 0.52 Coastal water mapping, soil/ vegetationdiscrimination, forest type mapping, andcultural features

2 0.52 – 0.60 Vegetation discrimination and vigourassessment, and cultural features

3 0.63 – 0.69 Plant species differentiation, and culturalfeatures

4 0.76 – 0.90 Vegetation types, vigour, and biomass content,delineating water bodies, and soil moisture

5 1.55 – 1.75 Vegetation and soil moisture content,differentiation of snow from clouds

6 10.4 – 12.5 Vegetation stress analysis, soil moisturediscrimination, and thermal mappingapplications

Landsat

TM 7

7 2.08 – 2.35 Discrimination of mineral and rock types, andvegetation moisture content

PAN 1 1 0.51 – 0.73 General mapping, DTM generation1 0.50 – 0.59 Cultural features, clear water penetration2 0.61 – 0.68 Vegetation/soil discrimination and plant

species differentiation

SPOTXS 3

3 0.79 – 0.89 Delineating water bodies, vegetation typesvigour and biomass

1 0.58 – 0.68 Daytime cloud and surface mapping, snow andice extent

2 0.725 – 1.1 Surface water delineation, snow and ice extent3 3.55 – 3.93 Detecting hot targets (e.g., forest fires), night-

time cloud mapping4 10.3 – 11.3 Determining cloud and surface temperatures,

day or night cloud mapping

NOAA AVHRR 5

5 11.5 – 12.5 Determining cloud and surface temperatures,day or night cloud mapping, water vapourcorrection

2.1.4 Spectral Signatures of Some Earth Materials

For passive remote sensing, the ‘light’ that a sensor detects is mainly the reflectance ofthe sunlight, which has an energy distribution over the entire spectrum, although somesensors do have the ability to detect energy emitted from earth surface itself (e.g. thermalinfrared). In theory, the reflectance of sunlight from different kind earth materials is differentfrom each other. The spectrum of the reflectance for a certain earth material is often unique,therefore it is called the spectral signature of the material. In remote sensing, it isfundamental to investigate the spectral signature before a correct image interpretation may beachieved.


Page 14 of 50

There are a huge variety of materials on the earth’s surface, therefore the recording oftheir spectral signatures (also known as spectral library) requires substantial financial andtime investments. For years, efforts have been made to establish such spectral libraries andsome of which have already been available, often associated with remote sensing imageprocessing software packages. It is now becoming the common knowledge on thedifferentiation of some typical earth materials’ spectral signatures such as vegetation, soil andwater (Figure 20).

0.4 0.6 0.8 1.0 2.0 2.61.2 1.4 1.6 1.8 2.2 2.4

0

20

40

60

Ref

lect

ance

(%

)

Wavelength (µm)

Dry bare soil (grey-brown)Vegetation (green)

Water (clear)

Landsat MSS

Landsat TM

SPOT HRV

NOAA AVHRR

Figure 20. Typical spectral reflectance curves of common earth surface materials in the visible and near-to-mid infrared range. The positions of spectral bands for some remote sensors are also indicated

(after Richards, 1993).

For information extraction and image interpretation, the selection of appropriate bandsof multispectral images for the application objectives is crucial task. The comparison ofmultispectral bands which present the most distinct difference between the cover types ofinterest gives the most promising hope for correct interpretation and classification, whereasdifficulties are often experienced in separating cover types with image bands which recordsspectral regions where the cover types present similar response.

2.2 Spatial Characteristics

Another important factor for information extraction from digital images is the spatialextent of the objects to be interpreted and the sensor’s spatial resolution. By theory, if theobjects are smaller than the image resolution, the objects cannot be reliably perceived so thattheir will not be correctly interpreted. In reality, however, some small objects may be visibleon images with a lower resolution, provided that the objects have enough contrast againsttheir background. In Figure 21, the linear features in the ancient lake are clearly visibleindicating the road crossing the area. Since the road has a high contrast against its


Page 15 of 50

surrounding background, it is shown on the image regardless its width is far less than the 82mresolution of the MSS imagery.

According to the sampling theory, thespatial resolution of the digital image must beat least the half of the size of the smallestobject that are of interest, so that the shape ofthe object can be reliably presented on theimage. This sets up the bottom-line limitationof the digital image concerned. Table 2outlines the spatial resolution of commonlyused remotely sensed data and theirinterpretation features and primaryapplications related to their spatial resolution.

Figure 21. Landsat MSS image of an ancient drylake near Pooncarie, western New South Wales,

Australia.

Table 2. Examples of spatial resolution of some commonly used remotely sensed imagery.

Platform Sensor/image

Spatialresolution

Interpretation features Primary applications

Aircraft Digitalairphoto

1 – 2m Control points oncultural features

Photogrammetry andmapping, urban management

PAN 10m Houses and streets Urban planningSPOT

XS 20m Crop fields, waterbodies, urban areas

Regional planning,agriculture, land use change

TM 30m Crop fields, waterbodies, urban areas

Regional planning,agriculture, land use change

Landsat

MSS 82m Landforms, forest,pasture and agricultureareas

Environment and pasture,rangeland management

NOAA AVHRR 1.1km Regional landforms,coastline

Regional monitoring,coastline and oceanography

GOES GOES 2.5 – 5km Clouds, coastline Weather forecast,oceanography, global change

One of the greatest challenges in remote sensing is the techniques and methodologiesthat deal with so-called ‘mixel’ (i.e. the pixel covering various cover types within its spatialresolution). There has been large number of reports in the literature on the sub-pixelcomponent modelling which intended to extract quantitative information from these mixels.Detailed discussion on this issue, however, is out of the scope of this note.


Page 16 of 50

2.3 Temporal Characteristics

Since we are living in a changing world, the frequent regular monitoring ourenvironment is one of the major application areas for digital remote sensing. Multitemporalremotely sensed images make it possible for this.

For thematic information extraction, the temporal factors that may influence theinterpretation process and potential applications include, for example, acquisition date andtime, frequency of coverage, and history of coverage.

2.3.1 Acquisition date and time

Date and time is often, in addition to the geographical position, the first criteria appliedto acquisition of remotely sensed data. For many application projects, it is fundamental toacquire simultaneous imagery (or near simultaneous) to match up the ‘ground truthing’information. Even for applications that the simultaneous coverage is less critical (e.g.geological applications), appropriate timing for image acquisition can often provide the greatassistance to the image interpretation. For example, the shading effect and shadow, whichmay vary on different acquisition dates (e.g. winter or summer), may provide great assistancein interpretation of topography, whereas day and night images acquired by some sensors maybe used to distinguish different types of surface materials.

2.3.2 Frequency of coverage

The frequency of coverage of a particular type of remote sensor determines how oftenthat we may use the derived images to monitor a given area. For most of today’s commonlyavailable satellite data, which are basically orthographic in nature, the re-visit frequency isdetermined by the satellite orbital characteristics. A satellite designed for earth resourcemonitoring usually takes a sun-synchronous orbit, meaning that the satellite passes over allplaces on the earth having the same latitude at approximately the same local time. However,some satellites (e.g. SPOT) have ability to take the ‘order’ to acquire ‘side-looking’ images,thus to provide more frequent coverage than the orthographic images only.

Table 3. Temporal characteristics of some commonly used satellite data and their area coverage.

Satellite Sensor/image

Daylight crossing localtime at the equator

Re-visit periodat the equator

Pixelresolution

Swathwidth

SPOT HRV PAN 10m

HRV XS

10:30am 26 days up to 7passes/26 days 20m

60km

Landsat MSS 82m

TM

9:45am 16 days

30m

185km

NOAA AVHRR Even No.: 7:30am/pmOdd No.: 2:30am/pm

2 passes/day 1.1km 2800km

GOES GOES Geo-stationary 2.5 – 5km Full-diskview

For some applications, such as flood and bush fire monitoring, the re-visit period iscrucial. These events have a critical demand on the timely coverage and situations may


Page 17 of 50

change very rapidly. Generally speaking, for a given sensor, the re-visit period and the spatialresolution are negatively related. This is because that the lower resolution imagery oftencovers a larger area so that the same area would be more frequently covered again. Given thisfact, it is quite understandable that the most today’s large area remote sensing applicationsrelying on land monitoring most likely use lower resolution data (e.g. NOAA AVHRR).

2.3.3 History of coverage

Since the first launch of Landsat satellite, we now have near 30-year regular collectionof satellite-borne imagery. This does not count the even longer collection history of aerialphotographs and meteorological satellites. The historical collection of remotely sensedimagery gives us an opportunity to study long-term change and human impact in manyaspects of our living environment.

For a particular objective involved with past remotely sensed imagery, it is necessary toexamine the history of coverage of the data concerned. This also involves the way by whichthe data were acquired and archived, as many past data sets have already been unavailablebecause of technical and management problems.

3 Feature Identification and Image Interpretation

In today’s GIS environment, digital remotely sensed data has becoming one of themajor input data source for information updating. However, the digital images need to beinterpreted and transformed into thematic maps (or classified images) that are presented inthe same way that other data layers are stored and manipulated. Correct feature identificationand image interpretation in given application context, therefore, are the key to the successfulutilisation of remotely sensed information.

3.1 Image Interpretation Keys

Identifying cover types that represent basic materials on the ground is both exercises inrecognising patterns and interpretation of the colours or shades in an image. The ability torecognise the shape of particular features and the colour of various cover types is crucial todetermining what is happening on the landscape. Those who are familiar with the landscapeare one step ahead.

The colour, or spectral response, of natural and man-made features is often similarbecause of the similarity of the materials. Trees, whether planted by man or occurringnaturally, appear the same colour. However the patterns formed by the plants usually makes itpossible to distinguish between them due to the species mix and the symmetrical nature ofartificially planted forests. Crops are easily detected from their colour and dense pattern.Pavement and many rock types are often indistinguishable by colour alone because soils,sand and gravel are often used as construction materials. Concrete can look like certain rocktypes because the primary constituents of it are gravel and limestone derivatives. Looking forpatterns is the first step in interpreting an image, and the next step is to determine whatmaterial a surface is likely to be made from.


Page 18 of 50

3.1.1 Grey Scale and Colour

The brightness and appearance of a cover type in an image depends on the wavelengthband of the image, i.e., how much radiation is reflected in that colour or wavelength, and therelative brightness of other features in the same image (contrast). Boundaries betweenvegetated areas and non-vegetated areas show up well in the near-visible infrared (NIR)because of the high reflection of vegetation, the relatively lower reflection of other surfaces,and the lack of haze and atmospheric scattering in the infrared. Moreover, water absorbs mostNIR, resulting in sharp water-land boundaries.

Figure 22. Digital airphoto showing an agriculturearea in North New South Wales, Australia. Thedifference in brightness indicates if the field is

cropped or left as bare. Man-made features such asroad, quarry and houses are also clearly shown

The visible bands are good foridentifying man-made objects, partiallybecause they tend not to be made of growingvegetation. This also applies to identifyagriculture crop fields and monitor thegrowth of crops (Figure 22). Vegetationabsorbs most red light because chlorophyll inthe leaves uses red light (and blue light, andto a lesser extent, green light) to turnnutrients and water into plant matter, and inthe process using carbon dioxide andreleasing oxygen into the atmosphere. Thebright response from concrete, brick andgravel tends to show up well against the darkbackground of vegetation (in parks, gardens,fields and pastures). The stark contrastbetween bright vegetation in NIR, and brightconcrete and rock in red, means that a colourcomposite that includes red and NIRwavelengths is very useful for identifyingfeatures.

The mid-infrared bands of Landsat TM imagery are especially useful in ruralenvironments. Soil differences tend to show up in the mid-infrared (MIR) generally becauseof differences in water and mineral content. Water absorbs radiation in the MIR area of thespectrum. Certain minerals, specifically clays that contain water in their structure, absorbmid-infrared radiation for this reason. The water content of plants will change the amount ofMIR radiation that is absorbed. For example, dry grass will reflect much more MIR than thewetter grass. Other minerals such as carbonates absorb radiation only in the longer MIR. Theusefulness of the MIR bands is one reason that TM data, even with lower spatial resolutioncompared with SPOT HRV, is most commonly used in rural areas.

3.1.2 Pattern and Texture

Spatial patterns are formed by the overall shape of things on the ground, and the textureof the image is due to repetitive features such as rows of trees, patches of woody weed orblocks of cultivated land. Naturally occurring patterns tend to be somewhat curvilinear andrandom, while man-made patterns are often rectilinear and regular. Table 4 outlines some ofthe key cues that can be used to determine whether a particular feature is likely to be


Page 19 of 50

naturally occurring or artificial. Figure 23 shows examples of typical spatial patterns andtextures for natural and artificial features.

Table 4. Pattern and texture of some natural and artificial features.

Natural Features Artificial Features

Irregular patterns - curves, meanders Regular patterns - lines, regular shapes

Hills, valleys, gullies Paddocks, fields, bare soil or dense crops

Rivers, streams, lakes, dry riverbeds Roads, railroad tracks, canals

Vegetation type - trees, bushes, grasses Buildings, bridges, dams

Drainage patterns Road networks, fire breaks, power lines

Vegetation patterns Vegetation patterns (linear, rectangular)

Forest(airphoto)

Gullies(SPOT Pan)

Buildings(SPIN-2)

Road network(airphoto)

Figure 23. Some examples of spatial patterns and textures for natural and artificial features.

In Figure 23, it is quite noticeable that artificial features are likely to follow someregular patterns with shape contrast between features and background. Natural features, onthe other hand, rarely show such regular pattern. Rather, they often show a highly irregularpattern with relatively lower contrast between the features and background.

3.1.3 The Shape of Objects

The shape of an object describes the external form or configuration of the object.Cultural objects tend to have geometrical shapes and distinct boundaries, whereas naturalfeatures tend toward irregular shapes with irregular boundaries. From a vertical perspective,such as the view from a satellite, the shape of some objects is so distinctive that they can beconclusively identified by their shape alone. Figure 24 shows a well-known cultural featurethat shows a unique shape needing very little effort to interpret. Other cultural features havingeasily recognised shapes include airport runways (on the right side of Figure 25), agriculturalfields, and certain archaeological sites. Many natural features also have distinctive shapes,such as sand dunes, volcanic cinder cones, alluvial fans and riverbeds (Figure 25).


Page 20 of 50

Figure 24. SPIN-2 imagery showing Greatpyramids of Giza (Cheops, Cephren and

Mycerinus) near Cairo, Egypt.

Figure 25. SPOT Panchromatic image of YulinCity, Shaanxi Province, China. The sand dunes onbottom show distinctive shapes for interpretation.

3.1.4 The Spatial Context

Certain objects are “geographically”linked to other objects, so thatidentifying one tends to indicate orconfirm the other. The spatial context isone of the most helpful clues foridentifying cultural features thatcomprise aggregate components. Forexample, in Figure 26, the bright area,which shows the spectral response ofconcrete materials, next to the runwayare characteristic features of an airportterminal, whereas scattered brighterfeatures on the water can reasonably be‘guessed’ as sea vessels. Spatial contextfor natural features, however, needssome knowledge about geography. Forexample, Figure 21 shows an ancientlake that is dry in modern days. On its leeside (right side on the image) shows abright strip that indicates a sandbank(also called “lunette”) as the result of thelong-history of aeolian sedimentation.

Figure 26. Landsat TM image (band 3) showing thegeographical association between airport runway and

adjacent concrete area.


Page 21 of 50

3.2 Procedure for Image Interpretation

Image interpretation is a task that relies on human analyst’s knowledge and experience.Although the knowledge on general principles about remote sensing is fundamental forsuccessful interpretation, the extensive knowledge and experience on the features of interest(i.e. application fields such as geography) will in many ways enhance our ability to makecorrection feature identification and image interpretation.

To make a map through image interpretation, the following procedure would help,although in practice some of the steps may be varied.

Establish interpretation goals: This is arguably the most important step. The aim andobjectives for the image interpretation must be clearly established in the beginning of theproject. Without clear goals, the analyst may waste a great deal of resources and time onsomething that may be not even relevant. Decisions need to be made at this stage include: a)types of maps to be produced, b) map scale, and c) output products (digital or hardcopy).

Establish classification system: Since the most interpretation projects aim to producemaps, it is essential to establish the classification for the end product. Sometimes, adisciplinary or national standard must be followed. Practically, however, it may be difficult tocompletely follow the standard, therefore, a compromise might be needed to simplify ormodify the classification system that may be different from the standard. Regardless to whatextent that the classification has to be varied, it is important to understand that theclassification must serve the project goals properly.

Image acquisition: Based on map scale and classification system, images are acquiredfor the interpretation exercise. While acquiring the images, the spectral, spatial and temporalcharacteristics of the images and corresponding requirements for feature identification andimage interpretation need to be matched. In many cases, the ratio of image cost againstbenefit for image interpretation also has to be balanced.

Collection of reference material: Reference materials such as maps, field records andphotographs can greatly help image interpretation, particular in the area where the analyst isnot very familiar. With today’s development of GIS technology, thematic information canalso be available from sources such as national and regional geographical databases, or evenfrom Internet. For example, Microsoft’s terra server web site now provides on-line catalogueand ordering services for image data distribution, together with ancillary materials such asdigital maps, text and multimedia presentations.

Establish interpretation keys: Interpretation keys need to be established in the earlystage of interpretation. Usually this is done by selecting the typical combination of colour,pattern and texture for a known feature. These keys will then be used as the guidance for latermapping and interpretation operations.

Interpretation and draft map: With the established interpretation keys andclassification system, the image can be interpreted and classified, and draft map can beproduced.

Accuracy assessment: The draft map needs to be further checked against the referencedata sets, so that appropriate accuracy assessment exercises can be carried out as outline inSection 4.5 of this note.


Page 22 of 50

3.3 Limitations

Although feature identification and image interpretation are the basic skills that aremote sensing analyst must master, they have some significant limitations in the real-worldapplications if they are sole methodology to be used.

First, the image interpretation (here we mean the image interpretation by a humaninterpreter) has a high demand on the analyst’s skills and experience, not only in remotesensing, but also in the application discipline (e.g. geography, geology, agriculture andurban). This often presents a major constraint in an application project and could be verycostly if the required expertise is not readily available.

Secondly, because manual operations form the major methodological input to theinterpretation, they can be quite subjective, resulting in delivery of different results fromdifferent interpreters.

Thirdly, for monitoring purposes, the visual interpretation operations would take toolong time for some applications to deliver the results which is required for decision making atthe real time. Examples of these include bush fire monitoring and control, agriculturemanagement (particularly with today’s “precision farming” concept), and environmentalhazard detection and monitoring (e.g. oil slick in the sea).

Due to these limitations, the image interpretation operations are now commonlyassociated with the machine processing of digital images which can largely reduced thedemand on human resources and also make it possible to deliver the interpretation results atthe real-time or near real-time.

4 Image Processing for Thematic Information Extraction

Machine processing for digital images involves the manipulation and interpretation ofdigital images with the aid of a computer. The central idea behind digital image processing isquite simple. The digital image is fed into a computer one pixel at a time with its brightnessvalue, or its digital number (DN). The computer is programmed to insert these data into anequation, or series of equations, and then store the results of the computation for each pixel.These results form a new digital image that may be displayed or recorded in pictorial format,or may be further manipulated by additional programs.

Although the possible forms of digital image manipulation are literally infinite, frompoint of view of this note, we only discuss some important operations related to thematicinformation identification and extraction, namely, geometric correction, image enhancement,image arithmetic and image classification.

4.1 Geometric Correction

This section looks at image warping techniques that are used to geometrically correct adistorted image. Systematic distortion, such as image skew due to the earth’s rotation (see1.7.1), can be corrected using a known mathematical correction. Image distortion due toelevation differences (see 1.7.2.2), for example, cannot be predicted as easily.


Page 23 of 50

In most cases the goal of geometric correction is to create a new image that is orientedthe same as a map grid such as UTM, or in such a manner that it fits exactly on top of anexisting image. The data can then be used with existing knowledge of the ground,information about specific locations can be easily found, and images from different times canbe compared. For the purposes of this note, a map is used to define the orientation of the newimage, but another image could just as easily be used.

The process of geometric correction involves four distinct steps:

Step 1: Creating a new blank image oriented with the pixels centred on evenly spacedmap coordinates.

Step 2: Locating points that can be easilyidentified in both the original image and themap. The location in both the original image andthe new map-orientated image can then bedetermined. Figure 27 shows how the originalimage would look if fitted to a map (1 on top). Anew image fitted to map coordinates is alsoshown in Figure 27 (2 on bottom).

Step 3: Calculating an equation thattransforms the original image coordinates (i.e. incolumns and rows) to map image coordinates (inEasting and Northing) as in Figure 27 (1). Theequation is a best average in most cases becausethe geometry of the original image is differentthan the map image, the map may be incorrect,and it is difficult to locate the discrete pixels onthe continuous map coordinate system. (errorsare common). Figure 27. The process of using Ground

Control Points as a reference between anexisting image and an image oriented to map

coordinates.

Step 4: Inverting the equation to find a value for each pixel in the map image. If anequation can satisfactorily transform the image pixels into their known map coordinates, theinverse of the equation can be used to locate the pixels in the map image (which are presentlyvalueless) in the original image. The pixel values shown in Figure 27 (2) can then be filled.

4.1.1 Image Coordinates versus Map Coordinates

4.1.1.1 Image coordinates

The coordinate system of a raw image is expressed simply in terms of columns androws or pixels and lines with the origin of the data set located in the top left hand corner atlocation [1, 1] (which may also be referred to as [0, 0]). Each pixel usually has a knownwidth and height enabling the calculation of relative distances within the image (e.g. TM hasa pixel size of 30 m). The distance between pixels is taken from the pixel centre resulting ineven multiples of the pixel size along columns or rows. Two similar satellite images, such astwo TM scenes of the same area on different dates, may seem close but they usually won’t


Page 24 of 50

overlap exactly. For this reason image to image resampling often needs to be done beforeimages from different dates can be compared.

4.1.1.2 Map coordinates

Map coordinate systems are usually oriented in a Cartesian manner with the origin atthe lower left corner. The units may be Northing and Easting, such as in UTM, latitude andlongitude, or even x and y. The coordinates are continuous. The first step towards creating anew image is to decide if the pixels or pixel boundaries in the new rectified image are centredon map grid lines. The location of pixel centres then becomes a multiple of the pixel size,which may be different from that in the original raw image (Figure 28).

Columns or pixels Northingor y

Rows orlines

Easting or x

Figure 28. Image coordinates (left) and map coordinates (right).

4.1.2 Locating Ground Control Points

Finding the location of Ground Control Points (GCP’s) is often a difficult task. Theprocess is complicated by the fact that human activity and natural processes change thelandscape over time, altering the location and appearance of features in the landscape. Pointsthat are relatively stable could be road intersections, airport runway intersections, bends in ariver, prominent coastline features, and perhaps purposely located reflectors on the ground. Ifsuch points can be found, the first step is to locate the map coordinates of each point, anddevelop transformation equations that would convert the Ground Control Points from theimage coordinate system to the map coordinate system.

4.1.3 Linear transformations

If the transformation involves only rescaling and rotation, the equation will be quitesimple. If an image is being placed into a map projection, however, the process usuallyinvolves a change of scale and perhaps warping to fit the image into the map projection. Theequations for basic changes of scale, translation (shift of coordinates) and rotation are straightforward:

Let [x, y] denote old coordinates and [u, v] denote new coordinates, there are:

Translation Scaling Rotation

u = x + A; v = y + B u = Cx; v = Dy u = xcosθ + ysinθ; v = ycosθ - xsinθ

where A and B are the shift in x and y, respectively; C and D are the scaling factor in xand y, respectively; θ is the rotation angle.


Page 25 of 50

These equations can be addedtogether to produce a combinedtransformation known as a lineartransformation. A lineartransformation, also known as a firstorder transformation, changes theorientation and scaling of the imagebut does not bend or warp it. A firstorder equation contains onlyvariables of the first degree (x, y asopposed to second degree x2, y2,Figure 29).

Figure 29. The shape of an image can be changed with afirst-order (linear) equation. The above examples show a) theoriginal image, b) a scale change in x, c) a scale change in y,d) a skew, or translation that varies with y, e) a skew that

varies with x, and a rotation.

4.1.4 Non-linear transformations

If the image needs to be warped, rather than just re-oriented, second-degree or greaterequations need to be used. Non-linear equations, such as the second-order equation form acurve. Figure 30 shows the difference between a simple two-dimensional linear equation anda second- and third-order equation. The second-order equation contains at least one minimaor maxima. This example contains a maxima (the top of the curve). The third-order equationhas both a minima and maxima.

Figure 30. The diagrams show the appearance of a line that results from a first-order equation (linear onthe left), a second-order equation (middle) and a third-order equation (right).

A non-linear transformation applied to an image can be used for warping (Figure 31).The higher the order of transformation, the greater the undulations in the surface. Oneproblem with the high order polynomials (the type of equation described here) is that theycan become unpredictable at high orders. Generally a second or third degree polynomial issufficient for image warping. Equations of higher order than this will often becomeunpredictable at the edges and may oscillate between high and low values if there are notenough control points (Figure 32).


Page 26 of 50

Figure 31. The original image (a) can be warpedusing a non-linear polynomial equation (b).

Figure 32. A three dimensional surface that isproduced by a linear equation (a), a second-order

equation (b), and a higher-order equation (c).

4.1.5 Residuals

If the orientation of GCP’s on a map is slightly different from those on an image (due toreal distortion, or measurement error), the new transformed image coordinates will notexactly coincide with the chosen GCP’s on the map. This difference is known as a residual. Ifthe residual for a GCP is much larger than that of the others, a measurement error is often thelikely cause.

The concept of residuals can be showneasily using a line of best fit in a two-dimensional graph. Figure 33 shows a series ofpoints with a first, second and third order line-of-best-fit. The fit, in this case, was done tominimise the distance from the line to eachpoint. The difference between the predictedvalue (the line) and the actual point is knownas the residual or error. Note that in this case,the third order line-of-best-fit has greater errorthan the second order line.

Figure 33. A figure showing lines-of-best-fit(regression on y).

Figure 34. The error is givenas a radius from a known

GCP. The root-mean-square(RMS) is the root of theaverage error, squared.

In the image, this residual is an error between the locationof a known GCP and the value calculated from thetransformation equation. In some cases the use of a very highorder equation could fit the new image almost exactly to thecoordinates, but the precision associated with this can bespurious. Not only is it expected that there is some measurementerror associated with the GCP location coordinates, but alsohigh order equations can become unpredictable between controlpoints, actually increasing error away from the GCP’s. RootMean Square (RMS) error, defined as the distance between themap coordinates and the transformed coordinates of the GCP’s,is a common means of reporting error in a transformed image.(Figure 34).


Page 27 of 50

4.1.6 Selection of GCP’s

The choice and accuracy of ground control points determine how well thetransformation may be. GCP’s must be well distributed throughout the image in order to givethe best fit over the entire image. In a region with little control (few GCP’s) thetransformation equation can extrapolate large amounts of distortion that may make thetransformation result unrealistic (Figure 35). The effect is the same as in Figure 33 where theend-points of the line (beyond which there is no control) continue increasing or decreasing.

a) Map showing thedistribution of GCP’s.

b) Transformed image using theGCP’s and linear equation.

c) Transformed image using theGCP’s and high-order equation.

Figure 35. GCP’s should be spread out as evenly as possible around the image - generally just beyond thearea of interest. The map (a) shows the distribution of the GCP’s that correspondingly fell into andaround the box shown on image (b) and (c). Within the box, the image is well registered. In contrast,

outside the box, the transformation extrapolated a large amount of distortion. It should also be noted thatthe extrapolated distortion increases dramatically with a high-order transformation equation.

4.1.7 Resampling

Resampling is the process of determining thenew DN (or brightness) value for each of thepixels in the transformed image. Thetransformation equation that was calculatedto locate the GCP’s on the new image isinverted so that the DN for each pixel in thenew image can be determined. Even if theequation pinpoints where each new pixel islocated on the old image, it is most likely thatit will not coincide with the centre of pixels(Figure 36). The process of determining whatDN to use, or how to estimate the new DN, isknown as resampling. When DN values areestimated from those of nearby pixels, theprocess is called interpolation.

Figure 36. The location of each pixel centre on thenew image is located on the original image in orderto determine the DN value of each pixel. The pixel

centres seldom coincide.


Page 28 of 50

4.1.7.1 Nearest Neighbour Resampling

In nearest neighbour resampling the DN value of the closest neighbour is chosen as theDN for the new image. In Figure 37 the actual location of a pixel is w, however the closestpixel in the original image is d. The DN value is therefore taken from d. This technique oftenproduces images that have a ragged appearance since the original location of pixel values hasbeen changed. However since the original values have not been altered, this method is thebest if further manipulation (e.g. classification) of the DN values is required. An example of aresampled image using nearest neighbour is shown in Figure 38.

Figure 37. Nearest neighbourresampling locates the closest pixelvalue in the original image and uses

that in the new image.

a b

c d

Figure 38. The original image (a) is transformed into (b).There is a scale change and rotation associated with the

transformation. The overlaid image are shown in (c). Theresulting resampled image is shown in (d).

4.1.7.2 Bilinear Interpolation

Bilinear interpolation treats the fourneighbouring values as corners of a plane or facet.The DN of the transformed pixel is determined atthe location where the centre of the pixelintersects with the facet. The new DN isessentially a weighted average of theneighbouring points. Figure 39 shows how a DNfor point w is determined first by estimating u andv, where u is the weighted average of a and b, andv is the weighted average of c and d. The distancebetween all points are known. The final value forw is calculated from the weighted average of uand v.

Figure 39. The point w is determined usingweighted averages.


Page 29 of 50

4.1.7.3 Cubic Convolution

Cubic convolution, like bilinearinterpolation, uses approximating curves todefine a surface or facet to determine thevalue of the transformed pixel. However, thecubic convolution method uses a cubicfunction rather than a linear function toapproximate the surface. A cubic functionhas third-degree terms, which allow a fairdegree of flexibility along an interpolatingcurve. Figure 40 shows a two-step operationfor the cubic convolution. At Step A, a curveis fitted along each of the four rows, which issubsequently used to interpolate the DNvalue at location a, b, c and d. A curve canthen be fitted using the interpolated values toestimate the value at w (Step B).

Figure 40. A line is fitted through the points oneach of the four lines (think of the pixel DN as a z

value on a 3-dimensional graph). The value of a, b,c and d is calculated using the known distance

along the line. Line B is then calculated to fit thosepoints, and subsequently estimate the value at w.

4.2 Image Enhancement

Image interpretation is often facilitated when the radiometric nature of the image isenhanced to improve its visual impact. Specific differences in vegetation and soil types, forexample, may be better visualised by increasing the contrast of an image. In a similar manner,differences in brightness value can be highlighted either by contrast modification or byassigning different colours to those levels.

Moreover, the geometric details in an image may be modified and enhanced. In contrastto the pixel-by-pixel operation used for radiometric enhancement, techniques for geometricenhancement are characterised by operations over neighbourhoods. Although the procedurestill aims to modify the DN of an image pixel, the new value is derived from the DN of itssurrounding pixels. It is this spatial interdependence of the pixel values that leads tovariations in the perceived image geometric detail. The geometric enhancement that is ofmost interest in remote sensing generally relates to smoothing, edge detection andsharpening.

4.2.1 Contrast stretch

Given one digital image with poor contrast, such as that in Figure 42a, it is desired toimprove its contrast to obtain an image with a good spread of pixels over the availablebrightness range. In other words, a so-called contrast stretch of the image data is required.

Consider a transfer function that maps the intensities in the original image intointensities in a transformed image with an improved contrast. The mapping of brightnessvalues associated with contrast modification can be described as

y = f(x) (5)


Page 30 of 50

where x is the original DN and y is the corresponding new brightness value after thecontrast stretch.

One of the simplest contrast modifications is the linear contrast enhancement that cansimply be described as

y = f(x) = ax + b (6)

Relative to the original image, the modified version is shifted owing to the effect of b,is spread or compressed, depending on whether a is greater or less than 1 and is modified inamplitude. Practically, the linear stretch is applied to each pixel in the image using thealgorithm

( )minmaxminmax

min LLBB

Bxy −

−

−= (7)

where Bmin, Bmax, Lmin and Lmax denote the minimum and maximum of the old DN, andthe minimum and maximum of the corresponding new brightness values, respectively. Formost today’s image visualisation systems, Lmin = 0 and Lmax = 255.

Frequently a better image product is given when linear contrast enhancement is used togive some degree of saturation (Figure 42b). In this way the variances at the very dark orbright areas of the image (which are of no interest) will be compromised to expand the rangeof interest to the maximum possible dynamic range of the display device (Figure 41a).

Input DN

Out

put D

N

0 255

255

a) Saturating linear stretch

Input DN

Out

put D

N

0 255

255

b) Sinusoidal stretch

Input DN

Out

put D

N

0 255

255

c) Non-linear stretch

Figure 41. Contrast Stretch functions.

The algorithm for saturating linear contrast enhancement is the same as the Equation 7except that Bmin and Bmax are user-defined. Typically, an image processing system employsthe saturating linear stretch for automatic contrast enhancement by determining the cut-offand saturation limits (i.e. Bmin and Bmax) using the mean brightness and its standard deviation.

A sinusoidal stretch is designed to enhance variance within ‘homogeneous’ areas in theimage, such as an urban area and water body (Figure 42c). The stretch parameters are usuallydetermined by interpreting the histogram of the image. The distribution is divided into severalintervals or ranges and each of these is expanded over the output range (Figure 41b). Thereason this stretch is called sinusoidal is that when input and output DNs are plotted againsteach other, a sinusoidal curve is formed. Because several different old DNs can be mapped to


Page 31 of 50

one output value, sinusoidal stretches are usually applied to three multispectral bands to forma colour composite to reduce the possibility to map different features with an identical colour.

Non-linear stretches have flexible parameters that are controlled by DN frequenciesand the shape of the original distribution (Figure 41c). One frequently used non-linear stretchis the uniform distribution stretch (or histogram equalisation), with which the original DNsare redistributed on the basis of their frequency of occurrence. The greatest contrastenhancement occurs within the range with the most original DNs (Figure 42d).

a b

c d

Figure 42. Effect of contrast stretch on a digital image: (a) original image, (b) saturating linear contrastenhancement, (c) sinusoidal stretch and (d) non-linear stretch by histogram equalisation.

Other frequently used non-linear functions are logarithmic and exponential contrastenhancements, which map DNs between original and modified images to enhance dark andlight features, respectively. The logarithmic and exponential functions can be expressed as

( ) caxby += log and cbey ax += (8)


Page 32 of 50

where parameters a, b and c are usually included to adjust (or normalise) the overallbrightness and contrast of the output values.

4.2.2 Density Slice

Density slice is an enhancement technique whereby the DNs distributed along the x-axis of an image histogram are divided into a series of pre-determined intervals or ‘slices’.All DNs falling within a given interval in the input image are then displayed at a singlebrightness value. In this way the overall discrete number of brightness values used in theimage is reduced and some detail is lost. However, the effect of noise can also be reduced andimage becomes segmented, or looks like a contour map, except that the areas betweenboundaries are occupied by pixels displayed at the same DN (Figure 43).

Bmin Bmax

Lmin

Lmax

Old pixel DN

New

pix

el b

righ

tnes

s

Figure 43. Density slicing. Up: the brightness valuemapping function corresponding to black andwhite density slicing. Right: the resulting densitysliced image becomes segmented with reducednoise and details.

Different colours can also be used in density slicing instead of using grey levels. This isknown as colour density slicing. Provided the colours are chosen suitably, it can allow finedetails to be clearly visualised.

4.2.3 Spatial Filtering

The most basic algorithms that are used to enhance patternsin imagery are based on comparisons of pixels within smallregions of an image. The algorithm is usually designed to identifylarge differences in brightness value, or more subtle differencesthat occur in a specific orientation. The simplest of these is knownas a neighbourhood function. The neighbourhood of any givenpixel is comprised of a series of adjacent or nearby pixels. Asimple 3 by 3 pixel neighbourhood is shown in Figure 44.

Figure 44. The immediateneighbours of a pixel forma “3 × 3 neighbourhood”.


Page 33 of 50

Figure 45. The pixels within the 3 by 3neighbourhood (at left) can be used tocalculate a range, which can be used to

create a new grid (at right) in which highvalues indicate an edge.

The data values of the pixels that form theneighbourhood can be used to describe brightnessdifferences within the neighbourhood. Forexample, the maximum and minimum values (therange) from the neighbourhood at the left of Figure45 could be used to quantify the smoothness of theimage. If the range value is placed in a new grid (atthe right of Figure 45), the larger the number, thegreater the variation in the area local to that pixel.A connected region with high range values mayrepresent an edge, while low values wouldrepresent unchanging areas (smooth areas).

In the image in Figure 46, area A has a dark region onthe left, and a lighter region on the right. Neighbourhoodsthat straddled the boundary would have a larger range ofdata values within their neighbourhood than pixels withinarea B.

In an image, the difference between the brightness ofneighbouring pixels can be large where there is an abruptedge (on a ridge, or along a road), or small where a gradualcontrast change is occurring (such as on a smooth hill). If aprofile across an image is drawn such that very bright valuesare high, and dark values are low, steep slopes in the profilerepresent sharp light/dark boundaries (Figure 47).

aa

cc

bb

dd

Figure 46. Areas defined show aregion within edges (A, C) and

without (B, D).

Cross-section of the image

Bri

ghtn

ess

Figure 47. A cross section of an image along a row, showingbrightness as height. Abrupt changes in contrast can be

seen as steep slopes.

The cross-section in Figure 48shows the profile starting at left sideof the image, through a relativelydark shadowed area, over lighter facetones, and back to the dark area ofhairs. In a real-world satellite image(Figure 49) the profile is not assmooth as in Figure 48, but the samecharacteristics apply. Bright pixelshave a high value, darker pixels havelow values, and the changes betweenbright and dark areas show up in theprofile.


Page 34 of 50

Figure 48. A photo of a human face shows abruptboundaries between generally smooth regions. The

profile line is shown in white on the photo.

Figure 49. Real-world images, such as this one ofHong Kong (Landsat TM band 4), are not assmooth. The profile above right follows the

horizontal line across the image.

The profile across an image is a mixture of rapidly changing brightness values andslowly changing shades. The abrupt edges may be due to a boundary such as a change fromgrass to a concrete surface, while the more gradual changes may be due to topographicshading of a hill. Slow, gradual changes in brightness over an image are referred to as lowfrequency changes, while abrupt changes are high frequency changes.


Page 35 of 50

The image at the left in Figure 50 isan example of a case with the greatestpossible amount of difference between twopixels (black to white – the“checkerboard” image). The rangecalculated for any 3 by 3 neighbourhoodwould be large and the profile would bevery rugged when looked at closely. In thegradually changing middle image therange calculated for neighbourhoodswould be much lower. The profile wouldbe a smooth gently rising slope. The imageat right would have a number of very highand low values (and a variable profile).

Figure 50. The image on the left has the greatestdifference possible between two pixels; minimum to

maximum brightness next to each other. Thedifference between pixels in the middle image is muchless, while the right image has a mixture of large and

small changes between pixels.

The neighbourhood can also be used to detect, remove or enhance the variability usinga numerical comparison of the values within the neighbourhood. The simplest function is thecalculation of an average for the neighbourhood, which then becomes the value of the centralpixel in the output image.

For a neighbourhood X (see right), the average valueis the sum of all neighbourhood values divided by thenumber of neighbourhood members (9 for a 3 × 3neighbourhood).

For this simple algorithm (known as an AverageFilter) it is rather obvious about what is happening.However, if the algorithm is more complicated it becomesuseful to describe the calculation to each element in theneighbourhood more completely. A generic equation for thiswould look like this:

x(1,1) x(1,2) x(1,3)

x(2,1) x(2,2) x(2,3)

x(3,1) x(3,2) x(3,3)

( ) ( ) ( )

⋅= ∑∑

= =

M

m

N

n

nmxnmMN

jiy1 1

,,1

, ω (9)

where y(i, j) denotes the output filtered value for pixel x(i, j), ω is the weighting value(for average, ω = 1), M and N are the number of columns and rows of the neighbourhood (inthis case M = N = 3), respectively.

Following this example, if every element is multiplied byits weight and then all of the results summed, we only need torepresent the weights in a grid representing the neighbourhood.In the sense of image processing, the normalisation of theoutput value (y) is largely unnecessary since the image wouldmost likely be contrast stretched (see 4.2.1) after filtering. Notethat the original image (with no changes) can also be shownusing this notation (see right).

1 1

1

1

1

1 1

1 1

Average

0 0

1

0

0

0 0

0 0

Original(no change)


Page 36 of 50

The example in Figure 51 shows that a flatter, smoother profile is resulted after passingan average filter. Figure 52 shows the appearance of images after an average filter of differentneighbourhood sizes is applied to an image. The larger the neighbourhood, the greater thedegree of smoothing.

Figure 51. The profile aboveshows how the profile changesdue to an averaging filter. Thebottom profile is a 3-pixel-wide

average of the top profile.

Figure 52. The original image (left), 3 × 3 average image (middle), anda 7 × 7 average image (right).

The output from the average filter appears blurred because high-frequency informationis removed. For this reason, the averaging filter is also known as a low-pass filter becauselow-frequency information “passes through” the filter and is retained.

Edge detection filters enhance abrupt changes in theimage and filter out the background. One of the simplestmeans to detect edges is to subtract the smoothed averageimage from the original, leaving only the high and lowpoints in the data that were removed from the original. Theresult of this is shown in profile in Figure 53 (the residualinformation). This process is called edge detection becauseonly the edges are left in the resulting output image (Figure54). The edge detection filters are also called high-passfilters since only the high-frequency information is “passesthrough”.

Figure 53. The smooth profile (topblack line) is subtracted from theoriginal profile (top grey profile)to produce the bottom residuals.

− =

Figure 54. The original image (left) minus the smoothed image (middle) leaves the residual edges (right).


Page 37 of 50

0 0

9

0

0

0 0

0 0

1 1

1

1

1

1 1

1 1

-1 -1

8

-1

-1

-1 -1

-1 -1

– =

Figure 55. The Edge Detection kernel (right) as calculatedfrom the original and smoothed kernels (left and middle).

The process of subtracting thesmoothed image from the original canbe done using the kernels.Arithmetically it is the same as goingthrough the filtering process first andthen subtracting the images.

The edges that have been isolated using the edge detectiontechnique can be added back on top of the original toenhance the edges (Figure 56 and Figure 57). This process iscalled unsharp masking and produces an edge enhancedimage (Figure 58).

0 0

1

0

0

0 0

0 0

-1 -1

8

-1

-1

-1 -1

-1 -1

-1 -1

9

-1

-1

-1 -1

-1 -1

+ =

Figure 56. The Unsharp Masking procedure using the kernel values.

Figure 57. The profile showing theresiduals added on top of the

original profile.

+ =

Figure 58. The original image (left) adds the residual image (middle) gives the residual edges (right).

4.3 Image Arithmetic

One most important nature of multispectral images is their ability to detect thedifferences between spectral bands, which can then be compared qualitatively orquantitatively with the spectral signatures of different earth materials (refer to Figure 20).Often the image processing functions using multispectral bands can be presented in thesimilar way to the variables in a mathematical equation. For example, one image band can beadded on, or subtracted from, another band on a pixel-by-pixel basis. This approach is,therefore, commonly called image arithmetic.

4.3.1 Band Ratio and the Topographic Effect

Among a large variety of image arithmetic functions, band ratio is arguably the mostcommonly used. Typical applications include the removal of topographic effects on imagesand detection on different cover types, including deriving vegetation indices.


Page 38 of 50

The amount of illumination that a pointon a surface receives is a function of theangle that the light is hitting the slope. Atypical area on the surface of the earthreceives the most light when the sun isdirectly overhead. The amount of light that isreflected back to a sensor, is thus a functionof not only the properties of the surfacematerial, but also the angle of illumination.This property is known as the TopographicEffect and its effect is shown in this simulatedimage of terrain (Figure 59).

One cover type will usually reflect adifferent amount of EMR in each band due tothe particular absorption characteristicsassociated with it.

Figure 59. A simulated image of the topographiceffect with a constant cover type.

Two stands of identical material will appear different if they are receiving differingamounts of illumination. The ratio between matching pixels in each band will remain thesame even under different light conditions, if the only variable affecting the response isillumination due to slope angle. This is because the percent of the total light hitting thesurface remains the same for each band, even if the absolute amount is different (Figure 60).In reality, atmospheric effects, variation in the cover type response with the angle ofillumination, and sensor calibration make quantitative analysis using band ratios difficult.The identification of spectral features using ratios is more promising.

Sun

DN on slope facing sunUnit Red NIR Red/NIRA 60 80 0.75B 30 60 0.50

A

B

DN on slope facing away from sunUnit Red NIR Red/NIRA 45 60 0.75B 20 40 0.50

Figure 60. The ratio between areas of the same cover typethat are illuminated differently should be constant (Sunillumination from the right).

Although band ratios do notentirely remove the effects ofillumination, they do reduce itseffect to the point that spectraldifferences between cover types canbe more easily identified. In manycases two surfaces look similar inmost bandwidths. A particularcharacteristic, such as selectedabsorption due to a mineralcomponent, may impart a subtledifference between two regions inone band. These subtle differencescan be enhanced using a ratio.

Band ratios can be used in any instances where the absorption in one band is differentfrom that in another band. Stressed vegetation can be separated from healthy vegetation inmany cases; algae can be found in turbid water; and the parent materials of soils can betraced. One typical example of this is based on the sharp difference in spectral signaturesbetween bare soil and green vegetation. The former presents a near linear spectral curve inthe region of visible and near infrared (NIR), while the latter typically has a low reflectancein red (R) and very high reflectance in the NIR area. Therefore the R/NIR ratio should largelydistinguish the bare soil (with a high value) and green vegetation (with a low value). A


Page 39 of 50

summary of band ratios that are useful for discriminating various surface materials is asfollowing:

Soil, Vegetation:

NIR

R Clays:

7

5

TM

TM

FeO, Hydroxides:

B

GorR or

G

R Plant stress:

7

5

TM

TM or

5

3

TM

TM or

4

3

MSS

MSS

4.3.2 Vegetation Indices and their interpretation

A very popular application of the band ratio functions is the vegetation indices.Vegetation can be separated from other cover types because it has characteristic absorptionfeatures in visible wavelengths, and high reflectance in near-visible infrared wavelengths.The spectral curve of vegetation presented in Figure 20 is clearly in different shape of that ofsoil. On the red TM images presented bellow (Figure 61), the fields with green crops appearto be slightly darker and are not easily distinguishable from those without. On the other side,the high reflectance of green vegetation is detected by NIR sensor so that the cropped fieldsare clearly distinguishable (Figure 62).

Figure 61. Red TM channel (band 3). Figure 62. NIR TM channel (band 4).

When vegetation becomes stressed, absorption decreases in visible wavelengths andincreases in near-visible infrared wavelengths (NIR). Additionally, the higher the density ofbroad-leaf plants, the more distinct the difference between visible and NIR will be.

To date, there are tens different kinds of vegetation indices that have been developedand reported in the literature. Most of them are fundamentally based on the comparisonbetween the red band and NIR band of remotely sensed images. The simplest index usingthese features is simply an NIR and red band differential (DVI).

RNIRDVI −= (10)

If there is substantial topographic effect due to rugged terrain, the variation inillumination (due to slope orientation) will cause the resulting difference to vary substantiallythroughout the image.


Page 40 of 50

The topographic effect can be minimised with a ratio, as discussed in 4.3.1. The currentmost widely used vegetation index is the Normalised Difference Vegetation Index (NDVI)which can be expressed as:

RNIR

RNIRNDVI

+−= (11)

The NDVI is primarily based on the NIR and red ratio, but normalises the output valueswithin the range of [-1, 1]. This provides advantages not only for reducing the problemsrelated to illumination differences due to topography, but also made the result easier to beinterpreted. Practically, when NDVI ≤ 0, one can quite comfortably assume that the pixel isnot vegetation. The more active (or ‘greener’) the plant is, the higher value of NDVI will bereturned on the image. While NDVI → 1, the pixel will most likely to be covered by active(green) plant. A comparison between simple NIR/red ratio and NDVI is shown in Figure 63and Figure 64.

Figure 63. Simple ratio of NIR/red (TM4/TM3). Figure 64. NDVI.

Quite often in applications, vegetation indices are derived to separate the vegetationcover from bare soils, rocks, urban areas, etc. A common technique is to compute thevegetation index and then density slice of the resulting image so that the areas with differentlevels of vegetation coverage may be distinguished. Attempts were also made with variabledegrees of success to quantify the vegetation cover by relating ground measurements to thevegetation indices.

4.4 Image Classification

Image classification is the process of creating a meaningful digital thematic map froman image data set. The classes in the map are derived either from known cover types (wheat,soil) or by algorithms that search the data for similar pixels. Once data values are known forthe distinct cover types in the image, a computer algorithm can be used to divide the imageinto regions that correspond to each cover type or class. The classified image can beconverted to a land use map if the use of each area of land is known. The term land use refersto the purpose that people use the land for (e.g. city, national parks or roads), whereas covertype refers to the material that an area is made from (e.g. concrete, soil or vegetation).


Page 41 of 50

• What cover type is this?

• It has the spectral signaturelike wheat, the cover typeis likely to be wheat.

• In this area, wheat is likelyto be a farming land use.

• The thematic class istherefore “farmland”.

Remotely sensed imagery Computer decisions Resulting class image

Figure 65. The process of making a classed thematic map from a digital data set.

Image classification can be done using a single image data set, multiple imagesacquired at different times, or even image data with additional information such as elevationmeasurements or expert knowledge about the area. Pattern matching can also be used to helpimprove the classification. The discussion here concentrates on the use of a single image dataset to create a classified thematic map where each pixel is classified based on its spectralcharacteristics. The process that would be used for multiple images is essentially the samewith perhaps some extra effort needed to match the images together. If soil type or elevationare used the algorithm would need to take into account the fact that thematic soil classes needto be treated differently than measured radiance data.

4.4.1 Turning Pixel Values into a Thematic Map

Classification algorithms are grouped into two types of algorithms: supervised andunsupervised classification. With the supervised classification the analyst identifies pixels ofknown cover types and then a computer algorithm is used to group all the other pixels intoone of those groups. With the unsupervised classification a computer algorithm is used toidentify unique clusters of points in data space, which are then interpreted by the analyst asdifferent cover types. The resulting thematic image shows the area covered by each group orclass of pixels. This image is usually called a thematic image, or classified image.

Image data

Clustering

Seed area(example pixels)

Use clusters to define signaturesor use clusters as classes

Signatureinformation

Use a decision ruleto class each pixel

Thematic image?

Supervisedclassification

Unsupervised

classification

Figure 66. Diagram showing the steps to produce a thematic or classified image.


Page 42 of 50

Figure 66 shows the processes that are used to create the classified image. Unique covertypes are identified either by the computer (clustering) or by the analyst (from the image). Ifclustering is used, the pixels in an image are sorted into groups that are spectrally unique.These can either be used directly as a classified image, or they can be used to define a set ofspectrally unique signatures (the statistical description of the class). If the user has chosenexample pixels, the pixel samples are used to calculate the signatures of each cover type class(vegetation, sand, etc; signatures will be discussed further). Once signatures have beendefined, an algorithm called a decision rule is used to place each pixel from the image dataset into one of the classes. The process is often repeated a number of times, adjusting thesignatures and decision rule before each run, each time checking the results against areas inthe image in which the cover types are known.

4.4.2 Supervised Classification

The supervised classification relies on the analyst who provides the ‘training’ forcomputers to recognise different cover types. Usually there are three basic steps involved in atypical supervised classification procedure, namely training, classification and output.

The purpose of the training stage is to derive spectral signatures for the cover types ofinterest to create ‘seeds’ for classification in the later stage. The analyst identifiesrepresentative training areas and develops a numerical description of the spectral attributes ofeach land cover type of interest. This training can be carried out interactively on the imageprocessing system by selecting ‘training areas’ in which the pixel DNs of multispectral bandscan be statistically analysed to derive spectral signature of the class (Figure 67).Alternatively, one can ‘train’ the computer by selecting certain DN range in a multi-dimensional spectral space (e.g. on a scattergram) and then examining the correspondingselected areas on the image (Figure 68).

Cover Type Colour No. Points

Water Cyan 3793

Concrete Purple 975

High buildings Thistle 1866

Bare soils Coral 784

Grass slope Yellow 924

Forest Green 3122

Water

High buildings

Concrete

Bare soils

Forest

Figure 67. Training areas are interactively selected on the image for different cover types to derive theirspectral signatures for classification.


Page 43 of 50

TM Band 3

8

28

49

69

90

14 35 56 77 88

TM

Ban

d 4

Concrete

Water

Figure 68. Spectral ranges are selected on the scattergram (right) and the pixels with the selected spectralcharacteristics are interactively marked as the training areas (left).

In the classification stage, each pixel in the image is categorised into the cover class itmost closely resembles. If the pixel is not spectrally similar enough to any seed created by thetraining process, it is then labelled “unknown”. The class label (or the theme) assigned toeach pixel is then recorded in the corresponding cell of an interpreted data set, or classifiedimage.

Today the analyst has a variety of choices in the way to define how ‘close’ a pixel to itsnearest seed of pre-defined classes. This choice often refers to the selection of classifierswhich are based on spectral pattern recognition. Numerous mathematical methods to spectralpattern recognition have been developed and extensive discussion of this subject can befound in the literature. For the purpose of this note, our discussion only touches the surface ofthe vast knowledge base about how spectral patterns may be classified into categories, bydemonstrating with limited examples, namely parallelepiped, minimum distance andmaximum likelihood classifiers. For the ease of presentation, various approaches towardsclassification are illustrated with a 2 band multispectral image. In reality, rarely are just twobands employed in an analysis.


Page 44 of 50

0

2550

255

TM Band 3T

M B

and

4

concrete

high buildings

grass slope

water

bare soils

forest

Figure 69. The scattergram illustrating the pixelobservations of six cover types.

Assume that we sample animage with pixel observations fromareas of known cover type (i.e. fromthe training areas). Each pixel valueis plotted on the scattergram thatshows various distributions of thespectral response patterns of eachcover type to be interpreted in the 2-dimensional spectral space (Figure69). Our consideration is thus thestrategies of using these ‘training’spectral response patterns asinterpretation keys by which otherpixels are categorised into theirappropriate classes.

The parallelepiped classifierconsiders the range of values in eachclass’s training set. This range maybe defined by the highest and lowestdigital number values in each bandand appears as a rectangular boxes inour 2-dimensional scattergram(Figure 70). When a pixel lies insideone of the box, then it is classifiedinto the corresponding class (e.g.point 2 in Figure 70). If a pixel liesoutside all regions, then it isclassified as “unknown”.

Difficulties are encounteredwhen class ranges overlap where apixel has to be classified as “notsure” or be arbitrarily placed on oneof the two overlapping classes.

0

2550

255

TM Band 3

TM

Ban

d 4

concrete

high buildings

grass slope

water

bare soils

forest

1

2

Figure 70. Parallelepiped classification strategy.

Because spectral response patterns often exhibit correlation, or high covariance, and therectangular decision regions fit the class training data very poorly, resulting in confusion for aparallelepiped classifier. For example, the point 1 shown in Figure 70 would probably betterbe classified into the class “grass slope” rather than “bare soils” as shown. This problem canbe somewhat amended by modifying the single rectangles into a series of rectangles withstepped borders.

The minimum distance classifier comprises three steps. First the mean of the spectralvalue in each band for each class is computed (represented in Figure 71 by symbol “+”).Then the distance between the spectral value of an unknown pixel and each of the categorymeans can be computed. The pixel is then assigned to the “closest class”.


Page 45 of 50

The minimum distance classifier ismathematically simple and itovercomes the poor representationproblem of rectangular decisionregion used by parallelepipedclassifier. For example, the point 1shown in Figure 71 would becorrectly classified as “grass slope”.This strategy, however, has itslimitations. It is insensitive todifferent degrees of variance in thespectral response data. In Figure 71,the point 2 would be classified as“concrete” in spite of the fact thatthe pixel would probably moreappropriate to be “bare soils”because of the class’ greatervariability.

0

2550

255

TM Band 3T

M B

and

4

concrete

high buildings

grass slope

water

bare soils

forest

1 2

Figure 71. Minimum distance classification strategy.

The maximum likelihood classifier quantitatively evaluates both the variance andcovariance of the category spectral response patterns when classifying an unknown pixel. Todo this, an assumption is made that the pixel spectral cluster forms a normal distribution,which is considered reasonable for common spectral response distributions. Under thisassumption, we may compute the statistical probability of a given pixel value being amember of a particular cover class by applying a probability density function for each classderived from its training data.

Using the probability density functions, the classifier would calculate the probability ofthe pixel value occurring in the distribution of class “concrete”, then the likelihood of itsoccurring in class “high buildings”, and so on. After evaluating the probability in each class,the pixel would be assigned to the most likely class that presents the highest probabilityvalue), or labelled “unknown” if the probability values are all below a given threshold.

Figure 72 shows theprobability values plotted on our 2-dimensional scattergram where thecontour lines are associated with theprobability of a pixel value being amember of one of the classes.Basically the maximum likelihoodclassifier delineates ellipsoidalequal-probability contours, the shapeof which shows the sensitivity of theclassifier to both variance andcovariance. For example, both pixel1 and 2 would be appropriatelyassigned to the class “grass slope”and “bare soils”, respectively.

0

2550

255

TM Band 3

TM

Ban

d 4

concrete

high buildings

grass slope

water

bare soils

forest

1 2

Figure 72. Equal-probability contours defined by amaximum likelihood classifier.


Page 46 of 50

The principal drawback of maximum likelihood classification is the extensive demandon computation to classify each pixel. When a large number of spectral bands are involved ora large number of classes must be differentiated, the maximum likelihood classifier wouldperform much slower than the other classifiers described. This drawback was one of themajor limitations in the past, but is becoming much less critical today with rapid developmentof computer hardware.

In the output stage, the results are presented in the forms of thematic maps, tables ofstatistics for the various cover classes, and digital data files suitable for inclusion in a GIS.Since the multispectral classification methods are primarily based on the spectralcharacteristics with minimum consideration on the spatial extents of the resulting classes, theresulting classified image is often with considerable high-frequency spatial variations (Figure73, left image). Because many applications require the classification results to be input intoGIS in the form similar to other thematic data layers, often the post-classification processneeds to be performed.

The most common demand on post-classification process is to remove high-frequencyspatial variance (or ‘noise’) from the classified image. This is often achieved by analysing theneighbourhood for each pixel and removing the scatter single pixels (‘sieve’ process), andthen merging the small patches of pixels together to make more continuous and coherentunits (‘clump’ process). The effect of this process is illustrated in Figure 73.

Water Concrete High buildings Bare soils Grass slope Forest

Figure 73. Classified image before (left) and after (right) post-classification process.

4.4.3 Unsupervised classification

Unsupervised classifiers do not utilise training data as the basis for classification.Rather, this kind of classifiers involves algorithms that examine the unknown pixels in animage and aggregate them into a number of classes based on the natural groupings or clusterspresent in the image values. The basic assumptions here is that values within a given covertype should be close together in the multi-dimensional spectral space, whereas data indifferent classes should be comparatively well separated.


Page 47 of 50

Unlike the supervised classification, the classes that result from unsupervisedclassification are spectral classes. Because they are solely based on the clusters in the imagevalues, the identity of the spectral classes is not initially known. The analyst must comparethe classified data with some form of reference data to determine the identity andinformational value of the spectral classes.

Clustering algorithms use predefinedparameters to identify cluster locations indata space, and then to determine whetherindividual pixels are in those clusters or not.In many algorithms the number of clustersmay be defined at the start, while others justuse cluster size and separation parameters tocontrol the number of clusters that are found.Figure 74 illustrates the type of parametersthat can be used to define clusters, andwhether pixels belong in that cluster.Clustering algorithms either pass oncethrough the data, grouping pixels during thatpass, or they pass through a number of timesto adjust and improve the clusteringassignments. It is impossible to discuss allforms of clustering in this text, however mostclustering algorithms used in remote sensingsoftware operate in a similar manner.

Cluster size

Distance betweencluster means

Distance to acluster mean

0

2550

255

Band A

Ban

d B

Figure 74. Measures that define a cluster includethe size of a cluster and the distance betweenclusters.

A typical multiple-pass, or iterative, clustering algorithm works as shown Figure 75.Pass One: (A) Cluster centres are arbitrarily assigned. (B) Each pixel is assigned to the clustercentre nearest to them in data space (spectral distance). (C) The cluster means are thencalculated from the average of the cluster members (the middle cluster is shown with greypoints) and the pixels are reassigned to the new cluster centres. Pass Two: (D) the process isrepeated. The iteration stops when the cluster centres (or means) move by less than a pre-setamount during each iteration. With a number of iterations the location of clusters tend tostabilise as the location of cluster centres between each pass changes less and less.

0

2550

255

Band A

Ban

d B

A

0

2550

255

Band A

Ban

d B

B

0

2550

255

Band A

Ban

d B

C

0

2550

255

Band A

Ban

d B

D

Figure 75. Iterative clustering of points in data space.

Algorithms that pass through the image only once tend to be more affected by the initialconditions than iterative algorithms that repeatedly adjust the cluster means. After each passthrough the data, cluster means can be calculated along with other measures such as standarddeviation. In addition to simple straight-line distances, statistical measures of distance can beused where the distance to clusters is weighted by the size and importance of that cluster.


Page 48 of 50

The result from unsupervised classification may also need post-classification process asdescribed above. In addition, because the real-world nature of spectral classes derived fromthe classification is largely unknown, considerable analysis and interpretation will berequired. Often the resulting classes need to be merged into fewer classes to make theclassified image more acceptable as a thematic map.

4.5 Accuracy assessment

Classification accuracy analysis is one of the most active research fields in remotesensing. Meaningless and inconclusive assessment on the image classification resultssometimes precludes the application of automated land cover classification techniques evenwhen their cost is more favourable with more traditional means of data collection. It is alwaysclaimed in the remote sensing community that “a classification is not complete until itsaccuracy is assessed” (Lillesand and Kiefer, 1994).

One of the most common methods of expressing classification accuracy is thepreparation of a classification error matrix (or confusion table). The error matrix comparesthe relationship between known reference data and the corresponding results of theclassification. Table 5 shows an error matrix. The numbers listed in the table represent theproportion of training pixels, for each cover type, that were correctly and incorrectly labelledby the classifier. It is common to average the correct classifications and regard this as theoverall classification accuracy (in this case 81%), although a better measure globally wouldbe to weight the average according to the areas of classes in the map.

Table 5. An error matrix expressing classification accuracy.

Training data (known cover types)

Classificationresults

Water Concrete Highbuildings

Baresoils

Grassslopes

Forest Rowtotal

Water 93 0 2 1 0 0 96

Concrete 0 65 4 6 0 0 75

High buildings 2 3 124 5 9 12 155

Bare soils 2 3 21 165 24 12 227

Grass slopes 0 0 6 16 201 45 268

Forest 0 0 8 9 76 512 605

Column total 97 71 165 202 310 581 1426

Producer’s accuracy User’s accuracy

W = 93/97 = 96% B = 165/202 = 82% W = 93/96 = 97% B = 165/227 = 73%

C = 65/71 = 92% G = 201/310 = 65% C = 65/75 = 87% G = 201/268 = 75%

H = 124/165 = 75% F = 512/581 = 88% H = 124/155 = 80% F = 512/605 = 85%

Overall accuracy = (93 + 65 + 124 + 165 + 201 + 512)/ 1426 = 81%

κ = (1160 - 365.11) / (1426 - 365.11) = 0.749


Page 49 of 50

A distinction is made between omission errors and commission errors. Omission errorscorrespond to those pixels belonging to the class of interest that the classifier has failed torecognise, whereas commission errors are those that correspond to pixels from other classesthat the classifier has labelled as belonging to the class of interest. The former refers tocolumns of the error matrix, whereas the latter refer to rows. For example, in the casepresented in Table 5, the omission error for class “concrete” is (0 + 3 + 3 + 0 + 0)/71 = 8%,whereas the commission error for the class is (0 + 4 + 6 + 0 + 0)/75 = 13%.

The producer’s accuracy shown in Table 5 is interpreted as the probability that theclassifier has classified the image pixel as, for example, “water” given that the actual class is“water” as indicated by the training data. As a user of the classified image we are moreinterested in the probability that the actual class is “water” given that the pixel has beenlabelled as “water” by the classifier (user’s accuracy). In our case, the producer’s accuracyfor class “forest” is 512/581 = 88%, whereas the user’s accuracy is 512/605 = 85%.

For the global assessment of classification accuracy, a measurement called Cohen’skappa coefficient (κ) is often employed. The kappa coefficient is a measure that considerssignificantly unequal sample sizes and likely probabilities of expected values for each class.

Let ∑=

+ =n

jiji xx

1, (i.e. the sum over all columns for row i), and ∑

=+ =

n

ijij xx

1, (i.e. the sum

over all rows for column j), then

qN

qd

−−=κ (12)

where N = total number of samples, d = total number of cases in diagonal cells of theerror matrix, and

N

xxq

n

kji∑

=++ ⋅

= 1 . (13)

The optimal κ score is 1.0 (perfect classification). In our case, N = 1426, d = 1160, q =365.11, and κ = 0.749.

5 Summary

Digital remote sensing imagery is widely recognised as one important source for spatialinformation technology. However, to maximise its potential benefit, the images need to becorrectly interpreted, classified and integrated with a GIS operating environment, so that thesupport for real-time decision making can be delivered.

This lecture note discusses the considerations and techniques in information extractionfrom digital images. The discussion is largely focused on the remotely sensed images fromvarious satellite platforms. The majority of such images have a common nature ofmultispectral capabilities, but with variable spatial resolutions. The spectral, spatial andtemporal characteristics of remotely sensed images and some typical natural and artificial


Page 50 of 50

features are reviewed in order to provide general background for the techniques describedlater.

The discussion on methodology is focused on two major areas, namely featureidentification and image interpretation, and image processing for thematic informationextraction. The former describes keys and methods to be employed in recognising natural andcultural features on the Earth’s surface, which may be made by different materials. The latterdiscusses the computer-based image processing techniques for extracting thematicinformation from digital images.

It is important to understand that we have only ‘scratched the surface’ of vastknowledge base of interpretation and machine processing of digital images. With the scope ofthis volume, it is impossible to cover a wider extent of related topic. Interested readers mayfind themselves getting lost in the large literature base of remote sensing technology, but thefew references listed below may provide quite useful initial help.

6 Further Readings

ASPRS, 1997, Manual of Remote Sensing, 3rd Edition, American Society of Photogrammetryand Remote Sensing, Bethesda, Maryland, CD-ROM.

Avery, T.E. and Berlin, G.L., 1992, Fundamentals of Remote Sensing and AirphotoInterpretation, 5th Edition, Prentice-Hall, Upper Saddle River, NJ.

Cracknell, A.P. and Hayes, L.W.B., 1991, Introduction to Remote Sensing, Taylor andFrancis, London.

Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd

Edition, John Wiley & Sons, New York.

Philipson, W.R. (ed.), 1997, Manual of Photographic Interpretation, 2nd Edition, AmericanSociety for Photogrammetry and Remote Sensing, Bethesda, Maryland.

Richards, J.A., 1993, Remote Sensing Digital Image Analysis: An Introduction, 2nd Edition,Springer-Verlag, Berlin.

Documents

Digital Image Processing and Interpretation