Click here to load reader
Upload
mano-dragon
View
213
Download
0
Embed Size (px)
Citation preview
Characteristics and statistics of digital remote sensing imagery
There are two fundamental ways to obtain digital imagery:
• Acquire remotely sensed imagery in an analog format (often referred to as hard-copy) and then convert it to a digital format through the process of digitization, and
• Acquire remotely sensed imagery already in a digitalformat, such as that obtained by the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) sensor system.
Image Digitization: Analog to Digital (A/D) Image Digitization: Analog to Digital (A/D) Conversion Using Linear Array ScannersConversion Using Linear Array Scanners
Image digitization is the process of turning a hard-copy analog imagery into a digital image. (e.g., a desk-top scanner)
A hardcopy image is illuminated by white light. An objective disperses the light reflected from the hardcopy image on to registered red, green, and blue (RGB) linear arrays of detectors. The linear arrays contain thousands of detectors. The analyst sets the output numbers of pixels per inch (dpi) and the scanner samples the linear arrays accordingly. Digitization progresses line-by-line over the hardcopy image. When a color image is scanned, this results in a three-band (RGB) registered datasets.
RGB = True Color. Any addition array that collect spectral information beyond visible light will be able to generate pseudo color image.
Additional array?
Panchromatic Black & White Infrared
True-color vs. Pseudo (false)-color
With all images are digital, knowledge and skills in digital
image processing are necessary.
Digital Image
• With raster data structure, each image is treated as an array of values of the pixels.
• Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image.
• Each pixel (picture element) is treated as a separate unite.
Statistics of Digital Images:
The person responsible for analyzing the digital remote sensor data should first assess its quality and statistical characteristics. This is normally accomplished by: • Looking at the frequency of occurrence of individualbrightness values in the image displayed in a histogramviewing individual pixel brightness values at specificlocations or within a geographic area;• Computing univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Computing multivariate statistics to determine the amount
of between-band correlation (e.g., to identify redundancy).
Statistics of Digital Images:
• Looking at the frequency of occurrence of individualbrightness values in the image displayed in a histogram• Viewing individual pixel brightness values at specificlocations or within a geographic area
Statistics of Digital Images:• Computing univariate descriptive statistics to determine if there are unusual anomalies in the image data; and
Statistics of Digital Images:• Computing univariate descriptive statistics to determine if there are unusual anomalies in the image data; and
Water
Forest
Statistics of Digital Images:• Computing multivariate statistics to determine the amount
of between-band correlation (e.g., to identify redundancy).
Statistics of Digital Images
It is useful to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data.
This normally involved computing of maximum and minimum value for each band of imagery, the range, mean, standard deviation, between-band variance-covariance matrix, correlation matrix, and frequencies of brightness values for each band, which are used to produce histograms.
Such statistics provide valuable information necessary for processing and analyzing remote sensor data.
• A “population” is an infinite or finite set of elements. • A “sample” is a subset of the elements taken from a
population used to make inferences about certain characteristics of the population. (e.g., training signatures)
• Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution.
• Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped is called a normal distribution.
Histogram and Its Significance to Digital Remote Sensing Image Processing
Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed. In such instance, nonparametric statistical theory may be preferred.
The histogram is a useful graphic representation of the information content of a remote sensing image, which provide readers with an appreciation of the quality of the original data, e.g. whether it is low in contrast, high in contrast, or multimodal in nature.
Univariate Descriptive Image Statistics
Measures of Central Tendency in Measures of Central Tendency in Remote Sensor DataRemote Sensor Data
• Mode: is the value that occurs most frequently in a distribution and is usually the highest point on the curve. Multiple modes are common in image dataset.
• Median: is the value midway in the frequency distribution.
• Mean: is the arithmetic average and if defined as the sum of all observations divided by the number of observations.
Measures of Central Tendency in Remote Sensor DataMeasures of Central Tendency in Remote Sensor Data
The mean is the arithmetic average and is defined as the sum of all brightness value observations divided by the number of observations. It is the most commonly used measure of central tendency. The mean (k) of a single band of imagery composed of n brightness values (BVik) is computed using the formula:
The sample mean, k, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode).
1 1 1
1 1 1
1 1 19
Mean = 27/9 = 3 (Does not represent the dataset well)
1 1 1
1 1 1
1 1 1
Mean = 9/9 = 1
0 1 2
0 1 2
0 1 2
Mean = 9/9 = 1
Sample mean is a poor measure of central tendency when the set of observations is skewed or contains an outlier.
Outlier
Skewed
Measures of DispersionMeasures of the dispersion about the mean of a distribution provide
valuable information about the image. For example, the range of a band of imagery (rangek) is computed as the difference between the maximum (maxk) and minimum (mink) values:
Unfortunately, when the minimum or maximum values are extreme or unusual observations (i.e., possibly data blunders), the range could be a misleading measure of dispersion. Such extreme values are not uncommon because the remote sensor data are often collected by detector systems with delicate electronics that can experience spikes in voltage and other unfortunate malfunctions. When unusual values are not encountered, the range is a very important statistic often used in image enhancement functions such as min–max contrast stretching.
Standard Deviation (sk): is the positive square root of the variance.
A small sk. suggests that observations clustered tightly around a central value. A large sk indicates that values are scattered widely about the mean.
The sample having the largest variance or standard deviation hasthe greater spread among the values of the observations.
kk vars
Mean
Standard Deviations
Standard Deviation (sk): is the positive square root of the variance.
19 1 12
18 1 13
1 14 12
35.754var kks
548
4328
)912()914()91()913()91()918()912()91()919(
1
)(var
222222222
1
2
n
BVn
ikik
k
kk vars
7.35
Measures of DispersionMeasures of Dispersion
Variance: is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, vark, is computed using the equation:
The numerator of the expression is the corrected sum of squares (SS). If the sample mean (mk) were actually the population mean, this would be an accurate measurement of the variance.
1n
)BV(
var
n
1i
2kik
k
BV11 BV12 BV13
BV21 BV22 BV23
BV31 BV32 BV33
548
4328
)912()914()91()913()91()918()912()91()919(
1
)(var
222222222
1
2
n
BVn
ikik
k
19 1 12
18 1 13
1 14 12
1 1 1
1 1 1
1 1 1
019
)11(var
9
1
2
ik
1n
)BV(
var
n
1i
2kik
k
BV11 BV12 BV13
BV21 BV22 BV23
BV31 BV32 BV33
Measures of Distribution (Histogram) Asymmetry and Measures of Distribution (Histogram) Asymmetry and Peak SharpnessPeak Sharpness
SkewnessSkewness is a measure of the asymmetry of a histogram and is a measure of the asymmetry of a histogram and is computed using the formula: is computed using the formula:
A perfectly symmetric histogram has a A perfectly symmetric histogram has a skewnessskewness value of zero.value of zero.
BV11 BV12 BV13
BV21 BV22 BV23
BV31 BV32 BV33
Max. = 102Min. = 6Mean = 27Median = 25Mode = 9
Histogram of A Single Band of Landsat Thematic MapperData
Max. = 188Min. = 38Mean = 73Median = 68Mode = 75
Histogram of Thermal Infrared Imagery of a Thermal Plume in the Savannah River
Remote sensing research is often concerned with the measurement of how much radiant flux is reflected or emitted from an object in more than one band (e.g., in red and near-infrared bands). It is useful to compute multivariatestatistical measures such as covariance and correlationamong the several bands to determine how the measurements covary. Variance–covariance and correlation matrices are the key for principal components analysis (PCA), feature selection, classification and accuracy assessment.
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics
The different remoteThe different remote--sensingsensing--derived spectral measurements derived spectral measurements for each pixel often change together in some predictable for each pixel often change together in some predictable fashion. fashion.
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics
Landsat-7 ETM+ Band 1 (Blue band) Landsat-7 ETM+ Band 2 (Green band)
If there is no relationship between the brightness value in one If there is no relationship between the brightness value in one band and that of another for a given pixel, the values are band and that of another for a given pixel, the values are mutually independent; that is, an increase or decrease in one mutually independent; that is, an increase or decrease in one bandband’’s brightness value is not accompanied by a predictable s brightness value is not accompanied by a predictable change in another bandchange in another band’’s brightness value. s brightness value.
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics
Landsat-7 ETM+ Band 3 (Red band) Landsat-7 ETM+ Band 4 (Near IR band)
Because spectral measurements of individual pixels may not Because spectral measurements of individual pixels may not be independent, some measure of their mutual interaction is be independent, some measure of their mutual interaction is needed. needed.
This measure, called the This measure, called the covariancecovariance, is the joint variation of , is the joint variation of two variables about their common mean.two variables about their common mean.
To calculate covariance, we first compute the To calculate covariance, we first compute the corrected sum corrected sum of productsof products ((SPSP)) defined by the equation:defined by the equation:
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics
It is computationally more efficient to use the following It is computationally more efficient to use the following formula to arrive at the same result:formula to arrive at the same result:
This quantity is called the This quantity is called the uncorrected sum of productsuncorrected sum of products..
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics
CovarianceCovariance is calculated by dividing is calculated by dividing SPSP by (by (nn –– 11). ).
Therefore, the covariance between brightness values in bands Therefore, the covariance between brightness values in bands kkand and ll,, covcovklkl, is equal to: , is equal to:
Remote Sensing Multivariate StatisticsRemote Sensing Multivariate Statistics Covariance: is the joint variation of two variables about their common mean. SPkl is the corrected Sum of Products between bands k and l.
1n
SPcov kl
kl
n
1i)lilBV)(kikkl BV(SP
19 1 12
18 1 13
1 14 12
1 1 1
1 1 1
1 1 1
SPkl = (19-9)x(1-1)+(1-9)x(1-1)+(12-9)x(1-1)+(18-9)x(1-1)+(1-9)x(1-1)+(13-9)x(1-1)+(1-9)x(1-1)+(14-9)x(1-1)+(12-9)x(1-1) = 0
Covkl = 0
Band k
k = 9
Band l
l = 1
Covariance: A more efficient formula is:
The sums of products (SP) and sums of squares (SS) can be computed for all possible band combinations.
n
1i
n
1iil
n
1iik
ilikkl n
BVBV
)BVBV(SP
Band k
Total = 91
Band l
Total = 9
19 1 12
18 1 13
1 14 12
1 1 1
1 1 1
1 1 1
SPkl = [(19x1)+(1x1)+(12x1)+(18x1)+(1x1)+(13x1)+(1x1)+(14x1)+(12x1)] –[91x9/9] = 0
Covkl = 0
Band 1Band 1
(green)(green)
Band 2 Band 2
(red)(red)
Band 3 Band 3 (near(near--
infrared)infrared)
Band 4 Band 4 (near(near--
infrared)infrared)
Band 1Band 1 SSSS11covcov1,21,2 covcov1,31,3 covcov1,41,4
Band 2Band 2 covcov2,12,1 SSSS22covcov2,32,3 covcov2,42,4
Band 3Band 3 covcov3,13,1 covcov3,23,2 SSSS33covcov3,43,4
Band 4Band 4 covcov4,14,1 covcov4,24,2 covcov4,34,3 SSSS44
Format of a VarianceFormat of a Variance--Covariance MatrixCovariance Matrix
Correlation between Multiple Bands of Remotely Sensed Correlation between Multiple Bands of Remotely Sensed DataData
To estimate the degree of interrelation between variables in a To estimate the degree of interrelation between variables in a manner not influenced by measurement units, the manner not influenced by measurement units, the correlation correlation coefficient, r,coefficient, r, is commonly used. The correlation between two is commonly used. The correlation between two bands of remotely sensed data, bands of remotely sensed data, rrklkl, is the ratio of their , is the ratio of their covariance (covariance (covcovklkl) to the product of their standard deviations ) to the product of their standard deviations ((sskkssl); thus:); thus:
lk
klkl ss
covr
Correlation Coefficient:
• A correlation coefficient of +1 indicates a positive, perfect relationship between the brightness values of the two bands.
• A correlation coefficient of -1 indicates that the two bands are inversely related.
• A correlation coefficient of zero suggests that there is no linear relationship between the two bands of data.