55
Principal Components Analysis with application to remote sensing image analysis D G Rossiter Cornell University, Section of Soi & Cropl Sciences April 12, 2020

Principal Components Analysis with application to remote ... · PCA 10 Details In practice the system is solved by the Singular Value Decomposition (SVD) of the data matrix. This

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Principal Components Analysis

    with application to remote sensing image analysis

    D G Rossiter

    Cornell University, Section of Soi & Cropl Sciences

    April 12, 2020

  • PCA 1

    Topic: Factor Analysis

    A generic term for methods that consider the inter-relations between a set ofvariables.

    • Often the set of predictors which might be used in a multiple linear regression.

    – Multivariate observations on same objects (e.g., soil samples)– Remote sensing: a set of co-registered images of a scene∗ all bands of one image∗ bands of multiple (co-registered) sensors∗ one band or band product (e.g., NDVI) of a time-series of images

    • This is an analysis of the structure of the multivariate feature space coveredby a set of variables.

    D G Rossiter

  • PCA 2

    Uses of factor analysis in remote sensing

    1. Discover relations between images, and possible groupings of them

    2. Discover groupings of pixels in a set of images (→ classification)

    3. Interpret the resulting groupings in terms of processes

    4. Diagnose multi-collinearity, since images are usually correlated

    • determine which images are most correlated• quantify redundancy, find the most informative subset of images

    5. For data reduction for model inputs; two approaches:

    • Identify representative images for a minimum data set• Compute synthetic images

    D G Rossiter

  • PCA 3

    Topic: Principal Components Analysis (PCA)

    • The simplest form of factor analysis; a data reduction technique.

    • Gives insight into the relation between a set of variables within a dataset

    – This is completely data driven; different sets of observations from the samepopulation will give different relations

    – So, a data mining approach

    • Gives insight into the relation between a set of pixels in the multivariatespace spanned by the set of images

    D G Rossiter

  • PCA 4

    What does PCA do?

    1. The vector space made up of the original observations (e.g., stack of pixelvalues in a set of images) is projected onto another vector space;

    2. The new space has the same dimensionality as the original1, i.e., there are asmany variables in the new space as in the old;

    3. In this space the new synthetic images, also called principal components areorthogonal to each other, i.e. completely uncorrelated;

    4. The synthetic images are arranged in decreasing order of varianceexplained; and the total variance is unchanged;

    5. The contribution of each original variable to each synthetic image is given;

    6. Each observation can be re-projected into the new (PC) space, by its value ofthe synthetic images

    1unless the original was rank-deficientD G Rossiter

  • PCA 5

    Mathematics: the data matrix

    PCA is a direct calculation from a matrix constructed from the multivariatedataset.

    X: centred data matrix (i.e., difference from mean), where:

    • rows are observations (e.g., pixels, observation locations, sampledindividuals)

    • columns are variables (e.g., reflectance in a band, pixel values) measured ateach observation

    • may scale by dividing values by the variable’s sample standard deviation

    – standardized vs. unstandardized, see below

    This gives the location of each observation in multivariate attribute space.

    D G Rossiter

  • PCA 6

    Mathematics: the covariance or correlation matrix

    The data matrix X is used to build a matrix that shows the relation between dataitems:

    • C = XTX : the covariance (unscaled) or correlation (scaled) matrix

    • this is symmetric and positive (semi-)definite, so has all real roots

    D G Rossiter

  • PCA 7

    Why can the correlation matrix be misleading?

    • The individual pairwise correlations do not take into account the degree towhich both of the variables may be correlated to others

    • In the case where both are highly correlated to a several others this is apparentcorrelation, which may not reflect a real process

    • Solution: compute partial correlations

    – the bivariate correlation between the two residuals from linear regression ofeach variable on all the others, less the one with which to pair

    – this accounts for the “lurking” effect of other variables, and shows whatcorrelation remains that can not be otherwise accounted for

    – (see example below)

    D G Rossiter

  • PCA 8

    Mathematics: Eigen decomposition

    The key insight is that the Eigen decomposition2 of C orders the syntheticvariables into descending amounts of variance, and ensures they are orthogonal(Hotelling 1933).

    • Decompose a square, symmetric positive-definite matrix, e.g., the correlationmatrix C formed from a data matrix such that AC = λC

    • Eigenvalues: a diagonal matrix λ; off-diagonals 0, i.e., no covariances, soorthogonal; Eigenvectors: the transformation matrix A

    • The eigenvectors provide a coördinate transformation such that the matrixmultiplied by the diagonal eigenvalues matrix is the same as multiplication bya matrix made up of the eigenvectors

    • Eigenvectors span an orthogonal vector space onto which we can project theoriginal data.

    2 (German eigen ≈ English “own, belonging to oneself”)D G Rossiter

  • PCA 9

    Computation

    • |C− λI| = 0: a determinant to find the eigenvalues of the correlation matrix

    – these are sometimes called the characteristic values– their relative magnitude is the proportion of the original covariance

    explained

    • Then the axes of the new space, the eigenvectors γj (one per dimension) arethe solutions to (C− λjI)γj = 0

    • Obtain synthetic variables by projection: Y = PC where P is the row-wisematrix of eigenvectors (rotations).

    D G Rossiter

  • PCA 10

    Details

    In practice the system is solved by the Singular Value Decomposition (SVD) of thedata matrix.

    This is equivalent but more stable than directly extracting the eigenvectors of thecorrelation matrix.

    Accessible explanations with worked examples:

    • Davis, J. C. (2002). Statistics and data analysis in geology. New York: JohnWiley & Sons.

    • Legendre, P., & Legendre, L. (1998). Numerical ecology. Oxford: ElsevierScience.

    D G Rossiter

  • PCA 11

    Standardized vs. unstandardized – 1

    Standardized each variable (e.g., reflectance in a band) has its mean subtracted(so x.j = 0) and is divided by its sample standard deviation (so σ(x.j) = 1);

    • All variables (e.g., bands) are equally important, no matter their absolutevalues or spreads;

    • Gives equal weight to all variables;• This is usually what we want if variables are measured on different scales.

    – e.g., multivariate measurements of soil constituents– e.g., co-registered images from different sensors

    D G Rossiter

  • PCA 12

    Standardized vs. unstandardized – 2

    Unstandardized use the original variables, in their original scales ofmeasurement; generally the means are also subtracted to centre the variables

    • Variables with wider spreads (often due to measurement scale) are moreimportant, since they contribute more to the original variance

    • This preserves the importance of variables with more variance = moreinformation

    • E.g., sensor with different radiometric resolutions (so wider range of numericvalues); higher resolution will have more weight

    • Bands with more variability will have more weight – maybe we want this.

    D G Rossiter

  • PCA 13

    Potential difference between un/standardized!

    Example: variables with three orders of magnitude difference in standarddeviation (in original measurement scale):

    First, unstandardized PCs:

    k.a k.b p.a caco3.a caco3.b sand.b206.1333028 172.1004094 53.2891699 25.5945396 24.9265695 20.3849822

    p.b clay.b clay.a sand.a silt.b silt.a19.1486608 15.1824385 15.0226513 14.7909094 13.1101435 11.3642023

    cec.b cec.a oc.a oc.b ph.b ph.a1.3351728 0.9138645 0.7702299 0.7265857 0.4260936 0.4153947

    dens.b dens.a0.2721703 0.2313593

    Importance of components:PC1 PC2 PC3 PC4 PC5 PC6

    Standard deviation 250.7955 100.7590 47.69829 34.18739 27.84061 16.31755Proportion of Variance 0.8092 0.1306 0.02927 0.01504 0.00997 0.00343Cumulative Proportion 0.8092 0.9398 0.96909 0.98413 0.99410 0.99753

    PC1 is numerically very large; K values have much larger standard deviations(higher absolute values of all measurements).

    D G Rossiter

  • PCA 14

    Second, standardized PCs

    dens.a silt.a ph.a cec.a caco3.a p.a dens.b sand.b clay.b oc.b1 1 1 1 1 1 1 1 1 1

    cec.b caco3.b p.b sand.a clay.a oc.a k.a silt.b ph.b k.b1 1 1 1 1 1 1 1 1 1

    Importance of components:PC1 PC2 PC3 PC4 PC5 PC6 PC7

    Standard deviation 2.4798 1.8233 1.4605 1.36289 1.16650 1.08224 0.85127Proportion of Variance 0.3075 0.1663 0.1067 0.09289 0.06805 0.05857 0.03624Cumulative Proportion 0.3075 0.4738 0.5805 0.67336 0.74141 0.79998 0.83622

    Same standard deviation (by design), much less total variance explained by PC1.

    D G Rossiter

  • PCA 15

    Difference in variance represented by PCs

    Standardized

    Var

    ianc

    es

    01

    23

    45

    6Unstandardized

    Var

    ianc

    es

    020

    000

    4000

    060

    000

    D G Rossiter

  • PCA 16

    Difference in biplots

    −0.3 −0.2 −0.1 0.0 0.1 0.2

    −0.

    3−

    0.2

    −0.

    10.

    00.

    10.

    2

    PC1

    PC

    2

    12

    3

    4

    5

    67

    89

    10

    11 12131415

    1617

    18

    19202122

    23

    24

    25

    27

    2829

    30

    31

    32

    33

    34

    3536

    37

    38

    39

    40

    41

    42

    43

    44

    454647

    48

    49

    50

    51

    52

    53

    5455

    56

    57585960

    61

    6263

    6465

    66

    6768

    69

    70

    71

    72

    73

    74

    75

    7677

    7879

    8081

    82

    83

    84

    85

    8687

    88

    −10 −5 0 5

    −10

    −5

    05

    dens.a

    sand.a

    silt.a

    clay.a

    oc.a

    ph.acec.a

    caco3.a

    p.a

    k.a

    dens.b

    sand.b

    silt.b

    clay.b

    oc.b

    ph.b

    cec.b

    caco3.b

    p.bk.b

    −0.4 −0.2 0.0 0.2 0.4

    −0.

    4−

    0.2

    0.0

    0.2

    0.4

    PC1

    PC

    2

    12

    3

    4

    5678 9

    10111213

    14

    15

    1617

    181920

    21222324

    25

    272829303132333435

    36

    37

    38

    39

    40

    41

    42

    4344

    45464748

    49

    5051

    5253

    54

    5556

    57

    5859

    606162

    63

    6465

    666768

    69

    707172

    7374

    75

    76

    777879

    808182

    8384

    85

    868788

    −1500 −500 0 500 1000 1500

    −15

    00−

    500

    050

    010

    0015

    00

    dens.asand.asilt.aclay.aoc.aph.acec.acaco3.ap.a

    k.a

    dens.bsand.bsilt.bclay.boc.bph.bcec.bcaco3.bp.b

    k.b

    Standardized UnstandardizedMore or less equal loadings (lengths of arrows) K dominates loadings

    (see below for explanation of biplots).D G Rossiter

  • PCA 17

    Topic: Remote sensing examples

    These will help visualize the transformation from original space into PC space.

    D G Rossiter

  • PCA 18

    Example: Time series of diurnal temperature differences

    • -ý_Ï¿ Pei county in JiangSu province, PRC

    • MODIS3 daily land-surface temperature product4

    • Images are diurnal temperature differences (day – night) between two MODISproducts, units are ∆°C

    • 30 Oct – 03 Nov 2000 (Julian days 304 ff.); Soil is drying after a heavy rain

    • Objective: try to relate DTD to soil texture and organic matter

    • Reference: Zhao, M.-S., Rossiter, D. G., Li, D.-C., Zhao, Y.-G., Liu, F., & Zhang,G.-L. (2014). Mapping soil organic matter in low-relief areas based on landsurface diurnal temperature difference and a vegetation index. EcologicalIndicators, 39, 120–133.5

    3http://modis.gsfc.nasa.gov4https://modis.gsfc.nasa.gov/data/dataprod/mod11.php5http://doi.org/10.1016/j.ecolind.2013.12.015

    D G Rossiter

    http://modis.gsfc.nasa.govhttps://modis.gsfc.nasa.gov/data/dataprod/mod11.phphttp://doi.org/10.1016/j.ecolind.2013.12.015

  • PCA 19

    Original images

    3820000

    3825000

    3830000

    3835000

    3840000

    3845000CK_DTD_304 CK_DTD_306

    480000485000490000495000500000

    CK_DTD_307 CK_DTD_308

    9

    10

    11

    12

    13

    14

    15

    16

    17 ∆°C(day - night)

    Low DTD on first day afterrain; increases overall thendecreases (closer day/nightair T?)

    But: similar overall patternof high vs. low DTD(redundancy)

    D G Rossiter

  • PCA 20

    Reading in the dataset to R

    ## read images as a raster stackrequire(raster)(list

  • PCA 21

    Simple and partial correlations

    > with(stackDTD.df, cor(CK_DTD_307, CK_DTD_308)) # simple correlation of two days[1] 0.9029475> # compute residuals from linear model on other days> r307 r308 cor(r307, r308) # partial correlation = simple correlation of residuals[1] 0.5054682> plot(CK_DTD_308 ~ CK_DTD_307, data=stackDTD.df); plot(r308 ~ r307)

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ● ●

    ●●●●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●

    ●●●●

    ●●

    ●●●

    ●●

    ● ●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●● ●●●● ●

    ●●

    ●●●

    ●●

    ●● ●

    ●●

    ●●●●●●● ●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●● ●

    ●●●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●●

    ●●●

    ●●●●

    ●●●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●●●●●

    ●●

    ●●

    ●●●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●●

    ●●

    ●●●●●

    ● ● ●

    ●●

    ●●

    ●●

    ●●●●●●

    12 13 14 15 16

    1011

    1213

    1415

    Simple

    CK_DTD_307

    CK

    _DT

    D_3

    08

    ●●

    ●● ●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ●●

    ● ●●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●●

    ●● ●

    ●●

    ● ●

    ● ● ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ●●●

    ●●●

    ● ●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●●

    ●●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ● ●●

    ● ●

    ● ● ●

    ●●●

    ●●

    ● ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●●

    ● ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●●● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −0.5 0.0 0.5 1.0−

    1.5

    −0.

    50.

    51.

    01.

    52.

    0

    Partial

    r307

    r308

    Partial correlation accounts for correlations of each with the other days

    D G Rossiter

  • PCA 22

    PCA processing in R

    In this example we use unstandardized PCA because the four DTD images are onthe same scale from the same sensor and represent the same phenomenon.

    ## PCA -- unstandardized, use original DTD, ignore coordinatespca

  • PCA 23

    Structure of prcomp object

    > str(pca)List of 5$ sdev : num [1:4] 1.823 0.404 0.323 0.208$ rotation: num [1:4, 1:4] -0.459 -0.59 -0.466 -0.474 -0.675 .....- attr(*, "dimnames")=List of 2.. ..$ : chr [1:4] "CK_DTD_304" "CK_DTD_306" "CK_DTD_307" "CK_DTD_308".. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"$ center : Named num [1:4] 10.6 13.4 12.9 11.8..- attr(*, "names")= chr [1:4] "CK_DTD_304" "CK_DTD_306" "CK_DTD_307" "CK_DTD_308"$ scale : logi FALSE$ x : num [1:784, 1:4] -6.17 -5.71 -4.64 -4.83 -4.43 .....- attr(*, "dimnames")=List of 2.. ..$ : NULL.. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"- attr(*, "class")= chr "prcomp"

    rotation are the eigenvectors

    x are the PC scores (location of each pixel in the PC space)

    center are the image means (subtracted from all values)

    D G Rossiter

  • PCA 24

    PCA results – Importance of components

    > summary(pca)Importance of components:

    PC1 PC2 PC3 PC4Standard deviation 1.8232 0.40406 0.3230 0.20810Proportion of Variance 0.9145 0.04492 0.0287 0.01191Cumulative Proportion 0.9145 0.95938 0.9881 1.00000

    The four DTD images are highly-correlated, 92% of the information is incommon

    I.e., over the four days the same areas tend to have narrow and wide DTD ranges

    D G Rossiter

  • PCA 25

    PCA results – rotations

    > pc$rotationPC1 PC2 PC3 PC4

    DTD304 -0.4447 0.5893 0.5870 0.3322DTD306 -0.6084 0.3379 -0.5200 -0.4952DTD307 -0.4717 -0.3857 -0.3677 0.7025DTD308 -0.4578 -0.6243 0.4998 -0.3884

    PC1 “intensity” of the phenomenon over all days – all signs the same (arbitrary),similar magnitudes

    PC2 contrast between first two and second two days

    PC3 contrast between middle two and end two days

    PC4 Third and first days, contrasted with second and fourth days

    Note PCs are orthogonal (independent)

    D G Rossiter

  • PCA 26

    Screeplot

    pca

    Var

    ianc

    es

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    D G Rossiter

  • PCA 27

    Biplots

    These show scores of the observations as synthetic variables in the 2-PC space(observation numbers)Loadings of each variable in the 2-PC space shown by the arrows (longer = higherloading).

    −0.10 −0.05 0.00 0.05 0.10

    −0.

    10−

    0.05

    0.00

    0.05

    0.10

    PC1

    PC

    2

    1

    2

    3

    45

    67

    8

    9

    10

    11

    12

    13

    14

    151617

    1819

    20

    21

    2223

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    36

    37

    383940

    4142

    43

    44

    45

    4647

    48

    49

    50

    51

    52

    53

    54

    55

    56

    57

    58

    5960

    61

    62

    63

    6465

    66

    6768

    69

    70

    71

    72

    73

    7475

    7677

    7879

    80

    81

    82 83

    84

    85

    8687

    88

    89

    90

    91

    92

    93

    94

    959697

    98

    99

    100

    101

    102

    103104

    105

    106

    107

    108

    109110

    111

    112

    113114

    115

    116 117

    118

    119

    120121

    122

    123

    124125

    126127

    128

    129

    130

    131

    132

    133

    134

    135

    136137

    138

    139

    140

    141 142

    143144145

    146

    147

    148

    149

    150

    151

    152

    153

    154155

    156

    157158

    159

    160

    161

    162

    163164

    165

    166

    167

    168

    169

    170

    171

    172

    173 174

    175

    176177

    178

    179

    180181

    182183

    184

    185

    186

    187

    188

    189

    190

    191

    192

    193

    194

    195

    196

    197198199

    200

    201202

    203

    204

    205

    206

    207208

    209

    210

    211

    212

    213

    214

    215

    216

    217

    218

    219

    220

    221

    222

    223

    224

    225

    226

    227

    228229

    230231

    232233

    234

    235

    236

    237

    238239

    240

    241

    242243

    244

    245

    246

    247

    248

    249

    250

    251

    252

    253

    254

    255256

    257

    258

    259

    260

    261

    262

    263

    264265

    266

    267

    268

    269

    270271

    272

    273

    274275

    276

    277

    278279

    280

    281

    282

    283

    284285

    286

    287288

    289

    290

    291

    292

    293

    294295

    296

    297

    298

    299

    300

    301

    302

    303304

    305306

    307308

    309

    310

    311

    312

    313

    314

    315

    316

    317

    318

    319

    320321

    322

    323

    324

    325

    326327

    328

    329

    330331

    332333

    334335

    336

    337

    338

    339

    340

    341

    342

    343344345

    346347

    348

    349

    350

    351352

    353354355

    356357

    358

    359

    360

    361362363

    364

    365

    366

    367

    368

    369

    370

    371372

    373

    374375

    376377

    378

    379

    380381382383

    384

    385

    386387

    388

    389390

    391

    392

    393

    394

    395396

    397

    398

    399

    400

    401

    402

    403

    404

    405

    406

    407

    408

    409

    410

    411412

    413414

    415

    416417418

    419

    420421

    422

    423

    424

    425

    426

    427

    428429

    430

    431

    432

    433434

    435

    436

    437

    438

    439

    440441

    442

    443444445

    446

    447

    448

    449

    450

    451

    452

    453

    454

    455

    456

    457

    458

    459

    460

    461

    462

    463

    464

    465

    466

    467

    468469

    470

    471

    472473474

    475

    476

    477478

    479

    480

    481

    482

    483

    484

    485486

    487

    488

    489

    490

    491

    492

    493

    494

    495

    496

    497

    498

    499500

    501

    502

    503504

    505506

    507

    508

    509

    510

    511

    512

    513

    514515

    516

    517

    518

    519

    520

    521

    522

    523524

    525

    526

    527

    528

    529

    530

    531

    532533

    534

    535

    536

    537

    538539

    540541

    542

    543

    544

    545546

    547

    548

    549

    550

    551

    552

    553

    554

    555

    556

    557

    558559

    560561562

    563

    564

    565

    566

    567

    568

    569

    570571

    572

    573

    574575

    576577578

    579580

    581582

    583584

    585

    586

    587588

    589

    590

    591

    592

    593594

    595

    596597

    598

    599 600601

    602603

    604

    605

    606

    607608

    609610611

    612613

    614615

    616

    617

    618

    619

    620

    621

    622

    623624625

    626

    627

    628

    629

    630

    631632

    633

    634

    635

    636637

    638639640

    641

    642643644

    645

    646647

    648649650

    651652

    653

    654655

    656

    657658

    659 660

    661

    662

    663

    664

    665666

    667

    668

    669

    670

    671672

    673

    674

    675

    676

    677

    678

    679680

    681682

    683

    684685

    686

    687

    688

    689

    690

    691

    692693694695

    696697

    698

    699

    700

    701

    702703

    704705

    706

    707

    708

    709710

    711

    712713

    714

    715

    716

    717

    718719720721

    722

    723

    724

    725726

    727

    728

    729

    730731

    732

    733

    734

    735

    736

    737

    738

    739

    740

    741

    742

    743

    744

    745

    746

    747748

    749750

    751

    752

    753

    754755756

    757758

    759

    760761

    762

    763

    764

    765

    766

    767

    768769

    770

    771

    772

    773

    774775776

    777778

    779780

    781

    782

    783

    784

    −30 −20 −10 0 10 20 30

    −30

    −20

    −10

    010

    2030

    CK_DTD_304

    CK_DTD_306

    CK_DTD_307

    CK_DTD_308

    −0.10 −0.05 0.00 0.05 0.10

    −0.

    10−

    0.05

    0.00

    0.05

    0.10

    PC2

    PC

    3

    1

    2

    3

    4

    5

    6

    7

    8

    9 10

    11

    12

    13

    14

    1516

    17

    18

    19

    2021

    2223

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    3637

    3839

    40

    41

    42

    4344

    45

    46

    47

    48

    49

    50

    51

    52

    53

    54

    55

    56 57

    5859

    60

    61

    62

    63

    64

    65

    66

    67

    68

    6970

    71

    72 7374

    75

    76

    77

    78

    79

    8081

    8283

    84

    85

    86

    87

    88

    89

    90

    91

    9293

    94

    95

    96

    97

    98

    99 100101

    102

    103104 105

    106

    107

    108

    109

    110 111

    112

    113

    114

    115

    116

    117

    118119

    120

    121

    122

    123124

    125126

    127 128

    129 130

    131

    132

    133

    134135

    136

    137138

    139

    140

    141

    142

    143

    144145

    146

    147148

    149

    150

    151

    152

    153

    154

    155

    156

    157

    158

    159160

    161

    162

    163

    164 165

    166

    167

    168

    169170

    171

    172

    173

    174

    175

    176

    177

    178

    179 180181

    182183

    184

    185

    186

    187

    188

    189 190

    191

    192

    193

    194

    195

    196

    197

    198

    199

    200201

    202

    203

    204

    205

    206207208

    209

    210

    211

    212213

    214

    215

    216

    217

    218

    219220

    221

    222

    223

    224

    225

    226

    227

    228

    229

    230231

    232

    233

    234

    235

    236

    237

    238

    239

    240

    241

    242

    243

    244245246

    247

    248249

    250 251252

    253

    254

    255

    256

    257

    258

    259

    260

    261

    262

    263

    264

    265

    266267 268

    269

    270

    271

    272273

    274

    275

    276

    277278

    279

    280

    281282 283

    284

    285

    286

    287

    288

    289290

    291

    292

    293294

    295

    296 297

    298

    299

    300

    301

    302

    303

    304

    305

    306

    307

    308

    309

    310

    311

    312

    313

    314

    315

    316

    317

    318

    319

    320321

    322

    323

    324

    325

    326

    327328

    329

    330331

    332

    333 334

    335

    336

    337

    338

    339

    340341

    342

    343

    344

    345

    346

    347

    348

    349

    350

    351352

    353

    354

    355

    356

    357

    358359360

    361

    362

    363

    364

    365

    366

    367

    368

    369

    370

    371

    372

    373

    374

    375

    376

    377378379 380

    381

    382

    383384

    385

    386

    387

    388

    389

    390391392393

    394

    395

    396

    397398

    399

    400

    401

    402

    403

    404

    405

    406

    407

    408

    409

    410

    411412

    413414

    415

    416

    417

    418419420421

    422 423

    424

    425

    426

    427

    428

    429

    430

    431

    432

    433

    434

    435

    436

    437

    438439

    440

    441

    442

    443

    444

    445

    446447

    448

    449

    450

    451452

    453

    454

    455

    456457

    458

    459

    460 461

    462

    463464

    465

    466467

    468

    469

    470

    471

    472

    473

    474

    475476

    477

    478479

    480481

    482

    483

    484485

    486

    487

    488

    489

    490491

    492

    493

    494

    495

    496

    497498 499

    500

    501

    502

    503504

    505

    506

    507

    508

    509

    510

    511512 513

    514

    515

    516517

    518519

    520

    521

    522

    523

    524

    525526 527

    528

    529

    530

    531532

    533 534

    535

    536

    537538

    539

    540

    541

    542

    543544 545

    546

    547

    548

    549

    550551

    552553

    554

    555

    556

    557 558

    559560

    561

    562

    563564

    565

    566

    567

    568

    569

    570

    571 572

    573

    574575

    576

    577

    578

    579580

    581

    582

    583584

    585

    586

    587588589

    590

    591

    592

    593

    594

    595596

    597 598

    599

    600

    601

    602603

    604605606

    607608 609

    610

    611612

    613

    614615 616617

    618619

    620

    621

    622

    623624

    625

    626

    627

    628

    629

    630631632

    633

    634

    635

    636

    637

    638639640

    641

    642643

    644

    645

    646

    647648

    649

    650

    651

    652

    653

    654

    655656

    657

    658

    659660

    661

    662663

    664

    665

    666

    667

    668

    669

    670

    671672

    673

    674

    675676

    677678

    679

    680681

    682

    683684

    685

    686

    687

    688689

    690691

    692

    693

    694

    695

    696697698

    699

    700

    701702

    703

    704705

    706

    707

    708

    709710711

    712

    713

    714 715

    716717

    718

    719

    720

    721722723

    724

    725726

    727

    728

    729730731

    732

    733

    734

    735

    736737

    738739

    740

    741742

    743

    744

    745

    746

    747

    748

    749

    750 751752753

    754

    755

    756

    757758

    759

    760761

    762

    763764

    765766

    767

    768

    769

    770

    771

    772

    773

    774

    775776

    777

    778

    779

    780781

    782

    783

    784

    −5 0 5

    −5

    05

    CK_DTD_304

    CK_DTD_306

    CK_DTD_307

    CK_DTD_308

    −0.10 −0.05 0.00 0.05 0.10

    −0.

    10−

    0.05

    0.00

    0.05

    0.10

    PC3

    PC

    4

    1

    2

    3

    4

    5 6

    7

    8

    9

    10

    11

    121314

    15

    1617 18

    19

    202122

    23

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    363738

    3940

    41

    42

    43

    44

    45

    46

    47

    48

    49

    50

    51

    52

    53

    54

    55

    56

    57

    58

    59

    60

    61

    6263

    64

    65

    66

    67

    68

    69

    70 71

    72

    7374

    7576

    77

    78

    79 80

    81

    82

    83

    84

    85

    86

    87

    88

    89

    90

    91

    92

    93

    94

    95

    96

    97

    98

    99

    100

    101102

    103104

    105

    106

    107

    108

    109

    110

    111

    112

    113

    114 115116

    117

    118

    119

    120121

    122

    123

    124

    125

    126

    127

    128

    129130

    131

    132

    133

    134

    135

    136

    137

    138

    139

    140

    141

    142

    143

    144

    145

    146

    147148

    149

    150151

    152

    153

    154

    155

    156

    157158

    159

    160

    161162

    163

    164

    165

    166

    167

    168

    169

    170

    171

    172173

    174175

    176177

    178

    179

    180

    181

    182

    183

    184

    185

    186

    187

    188

    189

    190

    191

    192

    193

    194

    195196

    197

    198

    199

    200

    201

    202

    203

    204

    205

    206

    207

    208209

    210

    211

    212213

    214

    215

    216

    217

    218

    219

    220

    221

    222

    223

    224

    225

    226

    227

    228

    229

    230231

    232

    233

    234

    235

    236

    237

    238

    239

    240

    241

    242

    243

    244245

    246

    247

    248

    249

    250

    251

    252253

    254

    255

    256

    257

    258

    259

    260

    261

    262

    263

    264

    265

    266

    267268

    269

    270

    271

    272273

    274275

    276

    277278

    279

    280

    281282

    283

    284 285286

    287

    288

    289290

    291 292

    293

    294

    295296

    297

    298

    299

    300

    301

    302

    303

    304

    305

    306

    307

    308

    309

    310311

    312

    313

    314

    315

    316

    317

    318

    319

    320

    321

    322

    323

    324

    325

    326

    327

    328

    329

    330331

    332333

    334

    335

    336

    337

    338

    339

    340

    341

    342

    343

    344

    345

    346347

    348349

    350

    351

    352

    353

    354

    355

    356

    357

    358

    359

    360

    361

    362

    363

    364

    365

    366

    367

    368

    369

    370

    371372

    373

    374375

    376

    377

    378

    379

    380381

    382

    383

    384

    385

    386387

    388

    389390391

    392

    393

    394

    395

    396

    397398

    399

    400

    401

    402

    403 404

    405

    406407

    408

    409

    410 411412

    413414

    415

    416

    417 418

    419420421

    422

    423 424425

    426427

    428

    429

    430431

    432

    433

    434

    435

    436

    437

    438439

    440

    441442

    443444

    445

    446

    447

    448

    449

    450451

    452453

    454

    455

    456

    457

    458

    459

    460

    461

    462

    463

    464465

    466

    467 468469

    470

    471472

    473

    474

    475

    476

    477

    478

    479

    480

    481

    482483484

    485

    486487

    488

    489

    490

    491

    492

    493

    494

    495496

    497

    498

    499

    500501

    502

    503

    504

    505

    506

    507 508

    509510

    511

    512513

    514

    515516

    517

    518

    519

    520

    521

    522

    523

    524525

    526

    527528

    529

    530

    531

    532

    533

    534

    535536

    537538

    539

    540

    541

    542

    543

    544

    545

    546

    547

    548

    549

    550

    551552

    553

    554555

    556

    557558559

    560

    561

    562

    563

    564

    565

    566

    567

    568

    569

    570

    571

    572

    573

    574

    575

    576

    577

    578

    579

    580

    581582

    583

    584

    585

    586

    587

    588

    589590

    591 592

    593

    594

    595

    596

    597

    598

    599

    600

    601

    602

    603

    604

    605

    606607608

    609

    610611

    612

    613

    614

    615

    616

    617

    618

    619

    620

    621

    622

    623

    624625

    626627

    628

    629630

    631

    632

    633

    634

    635

    636

    637638

    639

    640

    641

    642

    643

    644

    645

    646

    647

    648

    649

    650651

    652653

    654

    655

    656

    657 658 659

    660

    661662

    663

    664665

    666

    667

    668

    669

    670

    671

    672

    673674

    675

    676

    677678

    679

    680

    681

    682

    683

    684

    685

    686687

    688

    689

    690

    691

    692

    693

    694

    695

    696

    697

    698

    699

    700

    701

    702

    703

    704

    705

    706

    707

    708

    709710

    711712

    713

    714

    715

    716

    717

    718719720

    721

    722

    723724

    725

    726

    727

    728

    729730

    731

    732

    733

    734

    735

    736

    737

    738

    739

    740

    741

    742

    743

    744

    745

    746

    747 748

    749

    750

    751

    752

    753754

    755

    756

    757

    758

    759

    760

    761762

    763

    764765

    766

    767

    768

    769

    770

    771

    772

    773

    774

    775

    776

    777778

    779

    780

    781

    782

    783

    784

    −4 −2 0 2 4

    −4

    −2

    02

    4

    CK_DTD_304

    CK_DTD_306

    CK_DTD_307

    CK_DTD_308

    First PC is overall intensity across all 4 dates; other PCs show contrasts betweendates

    D G Rossiter

  • PCA 28

    PCs (synthetic bands) – same stretch

    3820000

    3825000

    3830000

    3835000

    3840000

    3845000PC1 PC2

    480000485000490000495000500000

    PC3 PC4

    −6

    −4

    −2

    0

    2

    Note decreasinginformation content asPC number increases

    There is still someinformation in PC2, 3, 4that contrasts with theoverall pattern

    D G Rossiter

  • PCA 29

    PCs (synthetic bands) – individual stretch

    480000 485000 490000 495000 500000

    3820

    000

    3830

    000

    3840

    000

    PC1

    x

    y

    480000 485000 490000 495000 500000

    3820

    000

    3830

    000

    3840

    000

    PC2

    xy

    480000 485000 490000 495000 500000

    3820

    000

    3830

    000

    3840

    000

    PC3

    x

    y

    480000 485000 490000 495000 500000

    3820

    000

    3830

    000

    3840

    000

    PC4

    x

    yThis allows to visualizecontrasts within a PC

    D G Rossiter

  • PCA 30

    Another example of synthetic images

    Cordillera de la Costa, W branch Río Aragua, Aragua state, Venezuela

    D G Rossiter

  • PCA 31

    Conclusion

    • PCA is a powerful data reduction technique

    • It is linear – so if variables are highly skewed or multi-model, apply atransformation

    • It can also reveal the inter-relation between variables (e.g., reflectances invarious spectral bands)

    • It is completely data-driven and is scene-specific

    • An important choice is between standardized and unstandardized

    D G Rossiter

  • PCA 32

    Topic: PCA of long time series of imagery to reveal seasonality

    Groundbreaking paper, 200+ citations:

    Eastman, J., & Fulk, M. (1993). Long Sequence Time-Series Evaluation UsingStandardized Principal Components (reprinted from PhotogrammetricEngineering and Remote-Sensing , Vol 59, Pg 991–996, 1993).Photogrammetric Engineering and Remote Sensing, 59(8), 1307–1312.

    • Sensor was AVHRR

    • Images were monthly maximum NDVI, 1986-1988 (three full years)

    D G Rossiter

  • PCA 33

    Synthetic images

    Green = “+” score, Red= “-”

    Sign is arbitrary, herechosen to show highNDVI in green.

    D G Rossiter

  • PCA 34

    Loadings

    PC1 All bands contribute ≈equally:(overall intensity)

    PC2 shows annual movement ofIntertropical Convergence Zone

    PC3/4 show deviations from PC2seasonality

    D G Rossiter

  • PCA 35

    Interpretation

    Synthetic bands images summarizing all original images, according to the PCs

    Loadings relation between original images (dates, x-axis) and contribution to thePC (y-axis).

    • Note decreased absolute loadings at higher PCs, this is because theyrepresent less of the total variability of the 36 images

    • PC1: ≈ equal contribution of all dates (overall vegetation vigour averaged overthe 3 years)

    • PC2: + correlation in N. hemisphere summer, - in winter (seasonality) –Intertropical Convergence Zone

    • PC3/4: deviations from PC2 seasonality – note time lag of greening/senescence

    • PC5: detecting sensor drift

    D G Rossiter

  • PCA 36

    Topic: PCA for a multi- to hyper-variate dataset

    A small dataset of 30 soil properties observed at 87 locations in the Lake Valenciabasin, Venezuela

    Also recorded coördinates (not used here) and geomorphic region

    Main interest is to see the inter-relation between variables (grouping, contrasts)

    • Could we only measure surface soil properties w.o. loss of informtion?

    • Could we omit some (expensive?) measurements w.o. loss of informtion?

    > dim(dlv.r)[1] 87 33> names(dlv.r)[1] "utm.n" "utm.e" "r.geo" "dens.a" "vcs.a" "cs.a" "ms.a" "fs.a"[9] "vfs.a" "sand.a" "silt.a" "clay.a" "oc.a" "ph.a" "cec.a" "caco3.a"

    [17] "p.a" "k.a" "dens.b" "vcs.b" "cs.b" "ms.b" "fs.b" "vfs.b"[25] "sand.b" "silt.b" "clay.b" "oc.b" "ph.b" "cec.b" "caco3.b" "p.b"[33] "k.b"

    D G Rossiter

  • PCA 37

    Geomorphic regions explain some variation

    > ## boxplot of bulk density of the subsoil, by geomorphic region> require(ggplot2)> qplot(x=as.factor(r.geo), y=dens.b, data=dlv.r, geom=c("boxplot", "point"), shape=I(3))

    0.50

    0.75

    1.00

    1.25

    1.50

    2 3 6 7as.factor(r.geo)

    dens

    .b

    D G Rossiter

  • PCA 38

    Computing standardized PCs

    > pc summary(pc)Importance of components:

    PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8Standard deviation 3.1651 1.9973 1.8312 1.48272 1.33402 1.14400 1.06427 1.01107Proportion of Variance 0.3339 0.1330 0.1118 0.07328 0.05932 0.04362 0.03776 0.03408Cumulative Proportion 0.3339 0.4669 0.5787 0.65195 0.71127 0.75490 0.79265 0.82673

    PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16Standard deviation 0.87600 0.79861 0.76065 0.71565 0.68304 0.60664 0.5797 0.52760Proportion of Variance 0.02558 0.02126 0.01929 0.01707 0.01555 0.01227 0.0112 0.00928Cumulative Proportion 0.85231 0.87357 0.89285 0.90992 0.92548 0.93774 0.9489 0.95822

    PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24Standard deviation 0.50731 0.43862 0.42948 0.38868 0.33886 0.3146 0.28247 0.25723Proportion of Variance 0.00858 0.00641 0.00615 0.00504 0.00383 0.0033 0.00266 0.00221Cumulative Proportion 0.96680 0.97322 0.97936 0.98440 0.98823 0.9915 0.99418 0.99639

    PC25 PC26 PC27 PC28 PC29 PC30Standard deviation 0.20782 0.1647 0.14647 0.11458 0.04917 0.03113Proportion of Variance 0.00144 0.0009 0.00072 0.00044 0.00008 0.00003Cumulative Proportion 0.99783 0.9987 0.99945 0.99989 0.99997 1.00000

    12 PCs (out of 30) explained 90% of the standardized variance of 30standardized variables

    D G Rossiter

  • PCA 39

    Screeplot

    > screeplot(pc, npcs=16)

    pcV

    aria

    nces

    02

    46

    810

    D G Rossiter

  • PCA 40

    Synthetic variablesThe standardized variables are multiplied by these factors to make up the synthetic variables (PCs)

    > print(pc$rotation[,1:4])PC1 PC2 PC3 PC4

    dens.a 0.18958475 0.3139849696 -0.096232326 0.088356551vcs.a 0.07924278 0.0403831655 0.230370094 -0.048419791cs.a 0.16208849 -0.1202754319 0.363705662 -0.078683279ms.a 0.25194385 -0.1284850503 0.210080111 0.008324169fs.a 0.28293845 -0.1363107368 -0.014925221 0.054220372vfs.a 0.27325614 -0.0910810904 -0.064745942 0.041049247sand.a 0.22462929 -0.2282354263 -0.122083768 0.004537925silt.a -0.04055455 -0.0339314591 0.094532854 -0.128362638clay.a -0.19025200 0.2451180696 0.037298357 0.096288845oc.a -0.22319928 -0.0981217660 0.117350816 -0.103240123ph.a -0.02002915 -0.1607898185 -0.341449972 -0.032629362cec.a -0.11061555 -0.3136921769 0.084300743 0.073849593caco3.a -0.16458980 -0.3322511108 -0.087971037 -0.214771086p.a -0.04422812 -0.1751203044 0.041317218 0.493486269k.a -0.14509102 -0.1129631442 0.068684059 0.389384538dens.b 0.21263155 0.2860476925 -0.081790307 0.119293244vcs.b 0.09246395 0.0124449952 0.416921421 -0.096156354cs.b 0.14040571 -0.0003804787 0.390813767 -0.054448951ms.b 0.23184462 -0.0820839309 0.213227766 -0.002106178fs.b 0.26409667 -0.1415713026 -0.048437418 0.072954308vfs.b 0.27090087 -0.0808317887 -0.116662538 0.078520254sand.b 0.25024710 -0.1834095557 -0.092293422 0.045293178silt.b -0.16691593 -0.0321539033 0.149702956 -0.016192942

    D G Rossiter

  • PCA 41

    clay.b -0.19685387 0.2769034074 -0.000293272 -0.046612720oc.b -0.18170694 -0.1227445391 0.191119295 0.086771096ph.b 0.07542964 -0.1271823250 -0.326496078 -0.036204018cec.b -0.15933624 -0.2679633668 -0.019482653 -0.050358586caco3.b -0.14353695 -0.3045767375 -0.026160011 -0.290825175p.b -0.07072839 -0.0569701860 0.100738984 0.438684722k.b -0.15217588 -0.1233052242 -0.006211632 0.404872772

    D G Rossiter

  • PCA 42

    Biplot

    > plot.it

  • PCA 43

    dens

    .a

    vcs.a

    cs.a ms.afs.avfs.a

    sand.a

    silt.a

    clay.a

    oc.a

    ph.a

    cec.

    a

    caco

    3.a

    p.a

    k.a

    dens

    .b

    vcs.bcs.b

    ms.bfs.b

    vfs.b

    sand.b

    silt.b

    clay.b

    oc.b ph.b

    cec.b

    caco

    3.b

    p.b

    k.b

    −5.0

    −2.5

    0.0

    2.5

    −5 0 5PC1 (33.4% explained var.)

    PC

    2 (1

    3.3%

    exp

    lain

    ed v

    ar.)

    2 3 6 7

    – normal ellipse of each group in PC space; 1 s.d. by default– distance between the points ≈ Mahalanobis distance; inner product between thevariables ≈ correlation– graph produced with ggbiplot package

    D G Rossiter

  • PCA 44

    Interpretation of PC1 and 2

    • strong correlations between top/sub for most properties

    • sand fractions highly correlated

    • clay opposite sand

    • bulk density opposite CEC/OC/silt/CaCO3

    • PC1 +dense, sandy, low silt, low OC

    • PC2 +low fertility, base saturation, carbonates

    • But these are not exactly aligned with the PCs

    • Geomorphic regions cluster points in PC1/2 space

    • especially 2 (piedmont) and 7 (recently-emerged lake sediments)D G Rossiter

  • PCA 45

    dens.a

    vcs.acs.ams.a

    fs.avfs.asand.a

    silt.a

    clay

    .a

    oc.a

    ph.a

    cec.a

    caco

    3.a

    p.a

    k.a

    dens.b

    vcs.bcs.b

    ms.b

    fs.bvfs.bsand.bsilt.b

    clay

    .b

    oc.b

    ph.b

    cec.

    bca

    co3.

    b

    p.bk.b

    −2.5

    0.0

    2.5

    5.0

    −2.5 0.0 2.5 5.0PC3 (11.2% explained var.)

    PC

    4 (7

    .3%

    exp

    lain

    ed v

    ar.)

    2 3 6 7

    – PC3 contrasts top and subsoil sand– PC4 contrasts K, P with CaCO3

    D G Rossiter

  • PCA 46

    Topic: Factor analysis: beyond PCA

    Limitations of PCA:

    • PCA is a data reduction technique

    • It does not impose any structure on the PCs or synthetic variables, they comeout directly from the eigen decomposition of the correlation or covariancematrix

    • So, the resulting PCs or synthetic variables may not be easily interpretable

    – the loadings may not line up with any of the axes – see Lake Valenciaexample

    – Clear associations of variables but none lined up with a PC

    D G Rossiter

  • PCA 47

    Latent variable analysis – concept

    • “Factor analysis” as used in social sciences

    • Hypothesis: the set of observed variables is a measureable expression ofsome (smaller) number of latent variables

    – These can not be directly measured,– They influence a number of the observed variables.– Example: “math ability”, “ability to think abstractly” are assumed to exist

    (based on external evidence or theory) and measured with various tests(observed variables).

    • The set of observed variables can be analyzed with PCA and then (1) reduced;(2) rotated into interpretable components

    D G Rossiter

  • PCA 48

    Latent variable analysis – computation

    • from k observed variables hypothesize p < k latent variables (“factors”)

    • decompose k× k variance-covariance matrix Σ of the original variables into:1. a p × k loadings matrix Λ (the k columns are the original variables, the p

    rows are the latent variables);2. and a k× k diagonal matrix of unexplained variances per original variable

    (its uniqueness) Ψ , so Σ = Λ′Λ+ Ψ• In PCA p = k, there is no Ψ , and all variance is explained by the synthetic

    variables; there is only one way to do this.

    • In factor analysis, the loadings matrix Λ is not unique– it can be multiplied by any k× k orthogonal matrix, known as rotations.– The factor analysis algorithm finds a rotation to satisfy user-specified

    conditions; e.g., varimax.

    D G Rossiter

  • PCA 49

    Latent variables – example

    Lake Valencia, topsoils only; 15 observed variablesHypothesize two latent variables (1) particle-size distribution (texture), (2)reaction (pH, carbonates)

    > (fa

  • PCA 50

    Latent variables – interpretation

    • uniqueness: Ψ , noise left over after factors are fitted, i.e., variable isunexplained

    – here P, K, pH, CaCO3 most unique → more factors are needed

    • variance explained as in PCA. Note in PCA k = p and all variance is eventuallyexplained

    • loadings as in PCA

    – Factor 1 is associated with all sand fractions (coarse-textured soils) and bulkdensity, opposed to clay concentration and organic C

    – Factor 2 is associated with silt opposed to clay– So both latent variables are most associated with particle-size distribution;

    not according to our original hypothesis – this should be modified

    D G Rossiter

  • PCA 51

    Latent variables – plot

    Factor 1 (most variance explained) was rotated to show the maximum contrast(varimax rotation)

    D G Rossiter

  • PCA 52

    Latent variables – DTD images example

    Hypothesis: one factor, “intensity”

    Uniquenesses:CK_DTD_304 CK_DTD_306 CK_DTD_307 CK_DTD_308

    0.181 0.047 0.073 0.171

    Loadings:Factor1

    CK_DTD_304 0.905CK_DTD_306 0.976CK_DTD_307 0.963CK_DTD_308 0.910

    Factor1SS loadings 3.527Proportion Var 0.882

    Strong correlation with each image, little uniqueness, 88% of variance explained.

    Compare with PCA results.

    D G Rossiter

  • PCA 53

    Single “best” image under the hypothesis of one processCompare with PCA synthetic band 1.

    D G Rossiter

  • PCA 54

    Conclusion: PCA vs. latent variable analysis

    PCA pure data reduction, maybe can interpret

    Latent variable analysis aim is to produce interpretable variables representingthe latent process

    D G Rossiter

    1: Factor Analysis1.1. Uses of factor analysis in remote sensing

    2: Principal Components Analysis2.1. What does PCA do?2.2. Mathematics: the data matrix2.3. Mathematics: the covariance or correlation matrix2.4. Mathematics: Eigen decomposition2.5. Computation2.6. Standardized vs. unstandardized -- 1

    3: Remote sensing examples3.1. Example: Time series of diurnal temperature differences3.2. Original images3.3. Reading in the dataset to R3.4. Simple and partial correlations3.5. PCA processing in R3.6. Structure of prcomp object3.7. PCA results -- Importance of components3.8. PCA results -- rotations3.9. Screeplot3.10. Biplots3.11. PCs (synthetic bands) -- same stretch3.12. PCs (synthetic bands) -- individual stretch3.13. Another example of synthetic images

    4: PCA of Long time series to reveal seasonality4.1. Synthetic images4.2. Loadings4.3. Interpretation

    5: PCA for a multi- to hyper-variate dataset5.1. Geomorphic regions explain some variation5.2. Computing standardized PCs5.3. Screeplot5.4. Synthetic variables5.5. Biplot5.6. Interpretation of PC1 and 2

    6: Factor analysis6.1. Latent variable analysis -- concept6.2. Latent variable analysis -- computation6.3. Latent variables -- example6.4. Latent variables -- interpretation6.5. Latent variables -- plot6.6. Latent variables -- DTD images example6.7. Conclusion: PCA vs. latent variable analysis