Upload
jared-sutton
View
229
Download
0
Embed Size (px)
Citation preview
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Program for North American Mobility in Higher Education
Introducing Process Integration for Environmental Control in Engineering Curricula
MODULE 17: “Introduction to Multivariate Analysis”
Created at:Ecole Polytechnique de Montreal &
North Carolina State University, 2003.
NC STATEUNIVERSITY
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
2.4: Example (3)
Shorter Timescales
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Shorter timescales
The previous two examples used daily averages for the 130 process variables. However, we could just as easily have chosen weekly averages, monthly averages, or several other options.
We could also have chosen shorter timescales, such as 8-hour averages or 30-minute averages. Obviously, at some point the number of observations will become unmanageable. For instance, a spreadsheet with 3 years’ worth of 1-minute averages would have over a million lines.
Simply by choosing the timescale, you are already influencing your MVA results.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
10
s
1 m
in
10
min
1 h
8 h
24
h
1 w
k.
1 m
o.
1 y
ea
r
Pulp sampled every 2 hours
Chips sampled every 8 hours
Choosing a timescale
The first thing we need to understand is what timescales are available. For the TMP process we have been studying, the shortest possible time period between two logged values is 10 seconds (note that not all tags are updated this frequently).
Several key values, such as wood and pulp characteristics, are only measured every few hours as shown above. These tags will be of little or no use at a very short timescale.
IMPORTANT CONCEPTIMPORTANT CONCEPT: Some variables can only be : Some variables can only be studied at studied at longerlonger timescales, others at timescales, others at shortershorter timescales, timescales, depending on their sampling/logging frequency.depending on their sampling/logging frequency.
IMPORTANT CONCEPTIMPORTANT CONCEPT: Some variables can only be : Some variables can only be studied at studied at longerlonger timescales, others at timescales, others at shortershorter timescales, timescales, depending on their sampling/logging frequency.depending on their sampling/logging frequency.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Shortest possible timescale
For the purposes of illustration, we will use the shortest possible timescale in this example, namely 10 seconds. Because some tags are updated less frequently, we will use interpolated values for all variables, which may or may not represent reality.
10 seconds
To keep the size of the dataset manageable, we have taken these data over a 24-hour period, which corresponds to around9,000 observations. Because we have over 100 tags, the resulting dataset has about one million values.
A million values per day, for only one section of the A million values per day, for only one section of the papermaking process - if we were to include the entire papermaking process - if we were to include the entire industrial plant over several years, we would have to industrial plant over several years, we would have to analyseanalyse billions billions of datapoints. of datapoints.
A million values per day, for only one section of the A million values per day, for only one section of the papermaking process - if we were to include the entire papermaking process - if we were to include the entire industrial plant over several years, we would have to industrial plant over several years, we would have to analyseanalyse billions billions of datapoints. of datapoints.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
PCA of entire 24-hour period
0.00
0.20
0.40
0.60
0.80
1.00
Com
p[1]
Com
p[2]
Com
p[3]
Comp No.
Jun 20 02(1). 10 seconds COMPLETE WITH 45 min LAG.M1 (PCA-X), UntitledR2X(cum)Q2(cum)
Simca found numerous Simca found numerous components components retained 3 retained 3Simca found numerous Simca found numerous components components retained 3 retained 3
The PCA for the entire 24-hour period shows quite a strong model, with a cumulative R2 over 60%. This is misleading, however. As shown on the score plot, there is a major process excursion which has totally skewed the MVA results.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Major process excursion
Major process excursion from 8h15 to 8h45
A review of the original data indicates that production dropped below 10 t/d during a ten-minute period (8:15 to 8:25). The cause was a major refiner blockage known as a “feedguard event”, which makes the refiner motor shut down.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Exclude process excursion
The process excursion sticks out like a sore thumb on the score plot. This means that the process temporarily went to a radically different “place” or operating regime, where relationships between the variables are different.
Trying to do PCA on several different operating regimes all at once is a waste of time. The software will try to establish the correlations between the different variables, and if these correlations change abruptly the results will be useless. The way to get around this problem is to divide the observations into different operating regimes, and study each regime separately.
In this case we will remove the low production period to prevent it from skewing the rest of the results.
Sticking out like a sore thumb…or a solar flare
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
We removed the entire period when the process was perturbed (8:10 to 8:45) and did a PCA on the rest of the observations.
Interestingly, the R2 values went down slightly. This is because many of the variables changed abruptly all together when the process was shut down, making it look like they were “correlated” with each other.
Remember, MVA knows nothing about the process, and just uses the data as it is.
PCA with process excursion removed
0.00
0.20
0.40
0.60
0.80
1.00
Com
p[1]
Com
p[2]
Com
p[3]
Comp No.
Jun 20 02(1). 10 seconds COMPLETE WITH 45 min LAG.M2 (PCA-X), Extreme outliers removedR2X(cum)Q2(cum)
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Score plot of normal operation
Now that we have removed the process upset, the score plot takes on an entirely different character.
There is now an obvious time trend. During our 24-hour period, the process “snakes” around in multi-dimensional space. It is a moving target.
Almost all process data show this characteristic, because a real process is never really in steady state. The process control systems are constantly responding to outside perturbations, like changes in feed material quality. Operator intervention is another source of perturbation. There are many others. One operating goal is to maintain the “snake” within a certain desirable zone.
Whereas score plots for longer, averaged periods generally resemble clouds, score plots for short timescales resemble snakes.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Start:01:00
Start:01:00
End:00:59
End:00:59
Obvious time trend…
Obvious time trend…
Score plot showing time trend
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
What is the significance?
This “snaking” of the process at short timescales is highly significant. This was not seen when using the daily averages.
By looking at which variables are changing with time, we can get tremendous insight into the process dynamics. One way to do this is to compare the contribution plots (like we saw in Example 2) at different times.
Contribution plots for the start and end points of our 24-hour period are shown on the next page. Obviously it is impossible to read the names of all the variables, but that is not the point. Just look at the bar graphs. They are very different, indicating a continuous change in operating regime from start to finish.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
-2024
33LI214.A
I52F
FC
117.P
V52F
FC
166.P
V52F
IC115.P
V52F
IC116.P
V52F
IC154.P
V52F
IC164.P
V52F
IC165.P
V52F
IC167.P
V52F
IC177.P
V52H
IC
812.P
V52IIC
128.P
V52IIC
178.P
V52JC
C139.P
V52JI189.A
I52JIC
139.A
I52LIC
106.P
V52P
CA
111.P
V52P
CA
161.P
V52P
CB
111.P
V52P
CB
161.P
V52P
IC105.P
V52P
IC159.P
V52P
IC705.P
V52P
IC961.P
V52S
IC110.P
V52S
QI1
10.A
I52T
I011.A
I52T
I031.A
I52T
I118.A
I52T
I168.A
I52T
IC010.C
O52T
IC793.P
V52X
AI1
30.A
I52X
IC130.A
I52X
IC180.A
I52X
PI1
30.A
I52X
QI1
95.A
I52Z
IC147.P
V52Z
IC148.P
V52Z
IC197.P
V52Z
IC198.P
V53A
I034.A
I53F
FC
455.P
V53F
I012.A
I53H
IC
762.P
V53LIC
011.P
V53LIC
301.P
V53N
I716.A
I53N
IC
013.P
V53P
IC210.P
V53P
IC305.P
V53P
IC308.P
V53P
IC309.P
V53W
I012.A
IP
ex_L1_B
lan
Pex_L1_C
ons
Pex_L1_C
SF
Pex_L1_LM
FP
ex_L1_P
200
Pex_L1_P
FC
Pex_L1_P
FL
Pex_L1_P
FM
Pex_L1_R
100
Pex_L1_R
14
Pex_L1_R
28
Pex_L1_R
48
53LIC
510.P
V52F
R960.A
I52F
RA
703.A
I52K
QC
139.A
I52K
QC
189.A
I52P
I128.A
I52P
I178.A
I52P
I706.A
I52P
IA143.A
I52P
IA193.A
I52P
IB143.A
I52P
IB193.A
I52P
IP143.A
I52P
IP193.A
I52S
I055.A
I52S
IA110.A
I52T
IC102.P
V52T
IC711.P
V52T
R964.A
I52X
IC811.P
V52X
_130.A
I_split_
L1.
52Z
I144.A
I52Z
I194.A
I53A
IC453.P
V53LR
405.A
I53LV
301.A
I53N
IC
100.P
V85LC
B320.A
I
Score C
ontr
ib(O
bs 457 -
Average), W
eig
ht=
p1p2
Var ID (Primary)
Jun 20 02(1). 10 seconds COMPLETE WITH 45 min LAG.M3 (PCA-X), More extreme outliers removedScore Contrib(Obs 457 - Average), Weight=p1p2
-2-10123
33LI2
14.A
I52F
FC
117.P
V52F
FC
166.P
V52F
IC115.P
V52F
IC116.P
V52F
IC154.P
V52F
IC164.P
V52F
IC165.P
V52F
IC167.P
V52F
IC177.P
V52H
IC812.P
V52IIC
128.P
V52IIC
178.P
V52JC
C139.P
V52JI1
89.A
I52JIC
139.A
I52LIC
106.P
V52P
CA
111.P
V52P
CA
161.P
V52P
CB
111.P
V52P
CB
161.P
V52P
IC105.P
V52P
IC159.P
V52P
IC705.P
V52P
IC961.P
V52S
IC110.P
V52S
QI1
10.A
I52T
I011.A
I52T
I031.A
I52T
I118.A
I52T
I168.A
I52T
IC010.C
O52T
IC793.P
V52X
AI1
30.A
I52X
IC130.A
I52X
IC180.A
I52X
PI1
30.A
I52X
QI1
95.A
I52Z
IC147.P
V52Z
IC148.P
V52Z
IC197.P
V52Z
IC198.P
V53A
I034.A
I53F
FC
455.P
V53F
I012.A
I53H
IC762.P
V53LIC
011.P
V53LIC
301.P
V53N
I716.A
I53N
IC013.P
V53P
IC210.P
V53P
IC305.P
V53P
IC308.P
V53P
IC309.P
V53W
I012.A
IP
ex_L1_B
lan
Pex_L1_C
ons
Pex_L1_C
SF
Pex_L1_LM
FP
ex_L1_P
200
Pex_L1_P
FC
Pex_L1_P
FL
Pex_L1_P
FM
Pex_L1_R
100
Pex_L1_R
14
Pex_L1_R
28
Pex_L1_R
48
53LIC
510.P
V52F
R960.A
I52F
RA
703.A
I52K
QC
139.A
I52K
QC
189.A
I52P
I128.A
I52P
I178.A
I52P
I706.A
I52P
IA143.A
I52P
IA193.A
I52P
IB143.A
I52P
IB193.A
I52P
IP143.A
I52P
IP193.A
I52S
I055.A
I52S
IA110.A
I52T
IC102.P
V52T
IC711.P
V52T
R964.A
I52X
IC811.P
V52X
_130.A
I_split_
L1.
52Z
I144.A
I52Z
I194.A
I53A
IC453.P
V53LR
405.A
I53LV
301.A
I53N
IC100.P
V85LC
B320.A
I
Score C
ontr
ib(O
bs 7
910 -
Average), W
eig
ht=
p1p2
Var ID (Primary)
Jun 20 02(1). 10 seconds COMPLETE WITH 45 min LAG.M3 (PCA-X), More extreme outliers removedScore Contrib(Obs 7910 - Average), Weight=p1p2
Time trend within the process
01:0001:00
00:5900:59
Contribution plots…
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Studying the “snake”
To gain further insight, we can colour-code the observations on the score plot. We did something similar in Example 1, when we colour-coded the days to show the seasons. This is very easy to do with modern MVA software.
In this case, we have modified the score plot to show which range that observation falls in for one of the variables. In this case we have chosen “freeness”, an important pulp quality parameter which the process control systems try to maintain at a constant value. We could have chosen any variable.
Note that during the course of our 24-hour period, the freeness starts high, then gets lower, then goes back up again. Someone with an intimate knowledge of the process could gain insight from this result.
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Exactly the same score plot, coloured for pulp “freeness”
Exactly the same score plot, coloured for pulp “freeness”
Score plot coloured for “freeness”
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Same plot, showing 3rd component
Same plot, showing 3rd component
Component 2
Component 1
Component 3
Score plot in 3-D
Example 3
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
MVA “foresight”
Another powerful use of MVA over short timescales is to predict problems before they become more widely visible.
The residuals plot on the next page tells the whole story. Remember we said that the refiner shut down at 8:15 due to a blockage? It is obvious that the process started to move away from normal operation well before then. The operators tend to look at a handful of key variables when monitoring the process, but MVA looks at all the variables at the same time andis therefore much more sensitive.
An analogy would be aseismometer being used topredict volcanic eruptions.
Example 3
A seismometer is extremely sensitive to the slightest vibrations.
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Residuals plot showing MVA “foresight”
Example 3
Build-up to 8h15 – something is happening to the process!
Build-up to 8h15 – something is happening to the process!
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Using shorter timescalesBy now it should be clear that doing MVA at a shorter timescales is totally different to studying averages taken over longer timespans. Once again, we conclude that the best solution is to try many different approaches. No single MVA approach will provide all the answers we are seeking.
Part of the power of this technique is the way completely different results can be obtained from exactly the same database, simply by “slicing and dicing” the data in various ways:
• Longer vs. shorter timescales• More vs. fewer variables• PCA vs. PLS
MVA is just a “black box”. Its use MUST be driven by an understanding of the process being studied, otherwise it is just meaningless number-crunching.
Example 3
“Number Cruncher”
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
End of Example 3:
One step at a time…
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
End of Tier 2
Congratulations!
This is the end of Tier 2. Obviously the details of these examples are hard to grasp for a first-timer, but hopefully some of the overall patterns are starting to emerge. A true understanding of MVA can only come by actually doing it on your own, which is the purpose of Tier 3.
All that is left is to complete the short quiz that follows…
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 1:
What is the difference between a tag and a variable?
a) The words “tag” and “variable” are synonyms.b) A tag is an identity label or address, while a variable is an
attribute of the process. c) Tags change with time, but variables are fixed. d) Variables measure similar attributes, while tags measure
dissimilar attributes.e) Answers (b) and (c).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 2:
Does averaging reduce or increase noise?
a) Averaging increases noise significantly.b) Averaging increases noise, but only slightly.c) Averaging does not affect noise.d) Averaging reduces noise.e) Averaging reduces noise, but increases the likelihood of outliers.
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 3:
What is the danger of interpolating between readings that are far apart in time?
a) The interpolation will give far more weight to these individual readings than they deserve.
b) The interpolated values will indicate slow upward and downward trends where there are none.
c) The effect of outliers will be enhanced many-fold. d) The engineer will have the false sense of comparing variables
that are similar, when in fact they are very different. e) All of the above.
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 4:
If interpolation is such a problem, then why can’t we just use the discrete values instead?
a) This would give far too much weight to periods with a large number of discrete values.
b) Discrete values must be averaged to have meaning. c) No tag is ever truly discrete.d) Discrete values have no time signature. e) Answers (b) and (c).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 5:
What is the difference between a process lag and a delayed reading?
a) One is caused by the process itself, the other by the measurement instruments.
b) They are the same thing. c) A process lag is due to residence time, while a delayed reading
is due to the time required for sampling, measurement and recording.
d) One is much longer than the other. e) Answers (a) and (c).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 6:
Why does the MVA software reject variables that do not change enough with time?
a) Only variables which are part of the “experiment” are permitted.b) Tags change with time, but these variables are fixed. c) There are insufficient data points.d) If a variable does not change with time, then it cannot be
correlated to any other variables.e) None of the above.
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 7:
What should you do if your initial PCA gives a score plot with two distinct and separate data clouds?
a) Study each data cloud separately.b) Try to determine what these two clouds represent. c) Ignore the first component, which is probably being artificially
induced by the two clouds.d) Do an MVA on the entire dataset. e) Answers (a), (b) and (c).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 8:
Your residual (“DModX”) plot shows several moderate outliers. What should you do?
a) Remove them and continue. b) Leave them in and continue.c) Study their contribution plots.d) Look at the original data to try to determine the cause. e) Answers (c) and (d).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 9:
Two variables are located in opposite corners of your PCA loadings plot (components 1 and 2). What do you conclude?
a) These variables are uncorrelated with each other.b) These variables are negatively correlated with each other. c) These variables contribute to both the first and second
components. d) These variables contribute to neither the first nor the second
component. e) Answers (b) and (c).
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 2 Quiz
Question 10:
Theoretically, on average what proportion of residuals should be above the 95% confidence line? (the red line on the “DModX” plot)
a) Exactly 0.05%b) Exactly 5%.c) More than 5%.d) Less than 5%.e) Depends on the dataset.
Tier 2 Quiz
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
TIER 3:
Open-Ended Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 3: Statement of intent:
The goal of Tier 3 is to finally allow the student to do MVA independently, though in a controlled context. At the end of Tier 3, the student should know how to do the following:
• Prepare a spreadsheet for use in MVA• Import spreadsheet into MVA software• Set up dataset within MVA software• Create simple PCA plots• Identify and investigate major and moderate outliers• Create and interpret more elaborate PCA plots
In order to avoid losing the student along the way, each of these steps is broken down into a series of sub-steps with clear instructions.
Tier 3: Statement of Intent
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Tier 3 is broken down into four sections:
3.1 Problem Statement and Dataset
3.2 Preparing and Importing the Spreadsheet
3.3 Initial MVA Results
3.4 Outliers and More Elaborate MVA plots
Unlike the previous two sections, Tier 3 has no quiz. The student must submit the results of the above work in a succinct project report (10-15 pages).
Tier 3: Contents
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
3.1: Problem Statement and Dataset
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Problem StatementYour are the process engineer at the TMP mill from the Tier 2 examples. Your boss, the plant manager, wants to know why the pulp has different properties in the summer than in the winter.
You decide to start by generating PCA results for two different datasets, one taken during the summer, the other during the winter, and then comparing them to each other.
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Summer/Winter datasets
After talking to the operators, you decide to take two full weeks of data for 15 key tags, using 1-hour averages.
Your data have already been imported by an IT technician into a standard spreadsheet software. The two files are:
• Summerdata.xls
• Winterdata.xls
Open these files, and have a look at the data. Can you tell anything about the summer/winter question just by looking?
Of course not!
These are the actual data files you are going to use!
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
3.2: Preparing and Importing the Spreadsheet
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Preparing the spreadsheet
As you can see, the spreadsheet has two names for each variable: • long descriptive name, and • short “tag” for easy identification on the MVA graphs.
We want to do something similar with the individual observations. The full time signature is too long, and will make the score plots impossible to read. Besides, we already know which year and month it is. This is not useful information. We therefore want to insert a column to the right of the time signature, which gives the number of hours from the start of the two-week period.
Do this now, for both spreadsheets. When you are done, save them under a new name.
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Importing the spreadsheet
Now we are ready to open the MVA software. Do it now.
The first thing we need to do is import the data. Go to “File: import data”, and select your newly renamed file for summer.
The software will ask you a series of questions. Answer them according to the instructions on Page 2 of the spreadsheet file. One of these steps involves saving the new dataset as an MVA file.
Repeat this operation for the winter spreadsheet.
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
3.3: Initial MVA Results
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Initial MVA results
Re-open the summer file, and create the following plot:• Model bar chart
How many components does the software suggest? Usually for this kind of initial exercise, keeping 3 components is normal. Eliminate the components you do not intend to use.
Now create the following basic PCA plots:• Score plots: t(1) vs. t(2) What do you notice about the results? Right! There are major outliers.
Now do the same for the winter dataset.
Copy it by right-clicking and import it into your word processor file. All these plots must appear in your report.
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
3.4: Outliers and More Elaborate MVA Plots
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Investigating Outliers
The summer data contains a major process excursion that is clearly visible on the score plot. Looking at the original data, try to determine the cause.
Once you are satisfied, remove the outliers and save the new model.
The winter data looks OK on the score plot, but that is not the entire story. Generate the following residuals plot:• DModX
What do you notice? Right! There is one major outlier. Create a contribution plot to investigate:• Contribution plot
What do you conclude? Remove this point and continue.
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Comparing Summer and Winter
Now we are ready to compare the summer and winter results. Create the following basic PCA plots:• Score plots: t(1) vs. t(2); t(1) vs. t(3); 3-D plot• Loadings plot: p(1) vs. p(2); p(1) vs. p(3); 3-D plot
Do you notice any major differences between summer and winter?
Of course you do! What are they?
And what does this imply about the cause of the summer/winter process differences?
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
Drawing your conclusions
Now you have something to report to your boss…
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
More Elaborate MVA Plots
To get familiar with some of the other MVA outputs, create the following for the final summer and winter datasets:• DModX • X/Y Contribution Plot• Residuals distribution• …• …
What do these plots indicate to you? Don’t worry about finding the “right” answer, just try to figure out what these plots are trying to tell us. However, you must justify your answers. Don’t just guess.
Don’t just guess!
Open Problem
NAMP Module 17: “Introduction to Multivariate Analysis” Tier 3, Rev.: 4
End of Tier 3
Congratulations!
This is the end of Module 17. Please submit your report to your professor for grading.
We are always interested in suggestions on how to improve the course. You may contact us as www.namppimodule.org