Upload
aubrie-perry
View
230
Download
0
Embed Size (px)
Citation preview
SigClust Gaussian null distribution - Simulation
Now simulate from null distribution using
where (indep)
Again rotation invariance makes this work
(and location invariance)
jij NX 0~
id
i
X
X
1
SigClust Gaussian null distribution - Simulation
Then compare data CI
With simulated null population CIs
bull Spirit similar to DiProPermbull But now significance happens for
smaller values of CI
An example (details to follow)
P-val = 00045
SigClust Real Data Results
Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19
Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10
Split Luminal A p-val = 10-7
Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions
Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless
nlim
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
SigClust Gaussian null distribution - Simulation
Then compare data CI
With simulated null population CIs
bull Spirit similar to DiProPermbull But now significance happens for
smaller values of CI
An example (details to follow)
P-val = 00045
SigClust Real Data Results
Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19
Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10
Split Luminal A p-val = 10-7
Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions
Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless
nlim
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An example (details to follow)
P-val = 00045
SigClust Real Data Results
Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19
Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10
Split Luminal A p-val = 10-7
Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions
Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless
nlim
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
SigClust Real Data Results
Summary of Perou 500 SigClust ResultsLum amp Norm vs Her2 amp Basal p-val = 10-19
Luminal A vs B p-val = 00045Her 2 vs Basal p-val = 10-10
Split Luminal A p-val = 10-7
Split Luminal B p-val = 0058Split Her 2 p-val = 010Split Basal p-val = 0005
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions
Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless
nlim
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Ie Uses limiting operations Almost always Occasional misconceptions
Indicates behavior for large samples Thus only makes sense for ldquolargerdquo samples Models phenomenon of ldquoincreasing datardquo So other flavors are useless
nlim
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics
Modern Mathematical Statistics Based on asymptotic analysis Real Reasons
Approximation provides insightsCan find simple underlying structureIn complex situations
Thus various flavors are fine
Even desirable (find additional insights)
0limlimlimlim dndn
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
Where are Data
Near Peak of Density
Thanks to psycnetapaorg
d
dd
d
IN
Z
Z
Z 0~1
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
As
-Data lie roughly on surface of sphere
with radius
- Yet origin is point of highest density
- Paradox resolved by
density w r t Lebesgue Measure
d
)1(pOdZ
d
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
- Paradox resolved by
density w r t Lebesgue Measure
Lebesgue Measure Pushes Mass Out Density Pulls Data In Is The Balance Point
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
As
Important Philosophical Consequence
ldquoAverage Peoplerdquo
Parents Lament
Why Canrsquot I Have Average Children
Theorem Impossible (over many factors)
d )1(pOdZ
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
Distance tends to non-random constant
bullFactor since
Can extend to
)1(221 pOdZZ
nZZ
1
222
121 XsdXsdXXsd 2
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asymptotics Simple Paradoxes
For dimrsquoal Standard Normal distrsquon
indep of
High dimrsquoal Angles (as )
- Everything is orthogonal
d
d
dd INZ 0~2
)(90 2121
dOZZAngle p
1Z
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Subspace Generated by Data
Hyperplane through 0
of dimension
Points are ldquonearly equidistant to 0rdquo
amp dist
Within plane can
ldquorotate towards Unit Simplexrdquo
All Gaussian data sets are
ldquonear Unit Simplex Verticesrdquo
ldquoRandomnessrdquo appears
only in rotation of simplex
n
d ddn INZZ 0~1
d
d
Hall Marron amp Neeman (2005)
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Representrsquon
Assume let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant dist
Points lie at vertices of
ldquoregular hedronrdquo
Again ldquorandomness in datardquo is only in rotation
Surprisingly rigid structure in random data
1n
d ddn INZZ 0~1
d2d2~
n
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View study ldquorigidity after rotationrdquobull Simple 3 point data setsbull In dimensions d = 2 20 200 20000bull Generate hyperplane of dimension 2bull Rotate that to plane of screenbull Rotate within plane to make ldquocomparablerdquobull Repeat 10 times use different colors
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Simulation View Shows ldquoRigidity after Rotationrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Now Recall HDLSS Simulation Results
Comparing DWD SVM amp Others from 102114
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Main idea
Comparison of
bull SVM (Support Vector Machine)
bull DWD (Distance Weighted Discrimination)
bull MD (Mean Difference aka Centroid)
Linear versions across dimensions
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Overall Approachbull Study different known phenomena
ndash Spherical Gaussiansndash Outliersndash Polynomial Embedding
bull Common Sample Sizes
bull But wide range of dimensions25 nn
16004001004010d
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Spherical Gaussians
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Outlier Mixture
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Wobble Mixture
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Nested Spheres
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
hellip
Interesting Phenomenon
All methods come together
in very high dimensions
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Discrimrsquon Simulations
Can we say more about
All methods come together
in very high dimensions
Mathematical Statistical Question
Mathematics behind this
(Use Geometric Representation)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other class
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Explanation of Observed (Simulation) Behavior
ldquoeverything similar for very high d rdquo
bull 2 popnrsquos are 2 simplices (ie regular n-hedrons)bull All are same distance from the other classbull ie everything is a support vectorbull ie all sensible directions show ldquodata pilingrdquobull so ldquosensible methods are all nearly the samerdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical Represenrsquotion
Straightforward Generalizations
non-Gaussian data only need moments
non-independent use ldquomixing conditionsrdquo
Mild Eigenvalue condition on Theoretical Cov (Ahn Marron Muller amp Chi 2007)
All based on simple ldquoLaws of Large Numbersrdquo
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large in sense
For assume ie
(min possible)
(much weaker than previous mixing conditionshellip)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Background
In classical multivariate analysis the statistic
Is called the ldquoepsilon statisticrdquo
And is used to test ldquosphericityrdquo of distrsquon
ie ldquoare all covrsquonce eigenvalues the samerdquo
d
jj
d
jj
d1
2
2
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
d
jj
d
jj
d1
2
2
1
11d
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
d
jj
d
jj
d1
2
2
1
11d1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
d
jj
d
jj
d1
2
2
1
11d1
d
1
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can show epsilon statistic
Satisfies
bull For spherical Normal
bull Single extreme eigenvalue gives
bull So assumption is very mild
bull Much weaker than mixing conditions
d
jj
d
jj
d1
2
2
1
11d
1 d
1
d
1
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
1 d
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Ahn Marron Muller amp Chi (2007) Assume 2nd Moments
Assume no eigenvalues too large
Then
Not so strong as before
1 d
dOXX pji )1(
)1(221 pOdZZ
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can we improve on
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
ddddiININX 10050050~
dOXX pji )1(
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Can we improve on
John Kent example Normal scale mixture
Wonrsquot get
ddddiININX 10050050~
dOXX pji )1(
)1(pjiOdCXX
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
3rd Paper on HDLSS Asymptotics
Get Geometrical Representation using
bull 4th Moment Assumption
bull Stronger Covariance Matrix (only) Assumrsquon
Yata amp Aoshima (2012)
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
ddddiININX 10050050~
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
2nd Paper on HDLSS Asymptotics
Notes on Kentrsquos Normal Scale Mixture
bull Data Vectors are indeprsquodent of each other
bull But entries of each have strong dependrsquoce
bull However can show entries have cov = 0
bull Recall statistical folklore
Covariance = 0 Independence
ddddiININX 10050050~
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
(Note Not Using Multivariate Gaussian)
YX
10~ NYX
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Random Variables and
bull Make both Gaussian
bull With strong dependence
bull Yet 0 covariance
Given define
YX
10~ NYX
0c
cXX
cXXY
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example c to make cov(XY) = 0
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
0cov YXc
c 0cov YX
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Simple Example
bull Distribution is degenerate
bull Supported on diagonal lines
bull Not abs cont wrt 2-d Lebesgue meas
bull For small have
bull For large have
bull By continuity with
0cov YXc
c
c 0cov YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
YX
0cov YX
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
YX
0cov YX
X Y
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
0 Covariance is not independence
Result
bull Joint distribution of and ndash Has Gaussian marginals
ndash Has
ndash Yet strong dependence of and
ndash Thus not multivariate Gaussian
Shows Multivariate Gaussian means more
than Gaussian Marginals
YX
0cov YX
X Y
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Asyrsquos Geometrical RepresenrsquotionFurther Consequences of Geometric Represenrsquotion
1 DWD more stable than SVM(based on deeper limiting distributions)
(reflects intuitive idea feeling sampling variation)(something like mean vs median)
Hall Marron Neeman (2005)
2 1-NN rule inefficiency is quantified Hall Marron Neeman (2005)
3 Inefficiency of DWD for uneven sample size(motivates weighted version)
Qiao et al (2010)
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
(Study Properties of PCA
In Estimating Eigen-Directions amp -Values)
[Assume Data are Mean Centered]
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues 11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
Note Critical Parameter
11 21 dddd d
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
Turns out Direction Doesnrsquot Matter
11 21 dddd d
1u
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Math Stat of PCA
Consistency amp Strong Inconsistency
Spike Covariance Model Paul (2007)
For Eigenvalues
1st Eigenvector
How Good are Empirical Versions
as Estimates
11 21 dddd d
1u
11 ˆˆˆ uddd
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency (big enough spike)
For 1
0ˆ 11 uuAngle
HDLSS Math Stat of PCA
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency (big enough spike)
For
Strong Inconsistency (spike not big enough)
For
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
HDLSS Math Stat of PCA
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
HDLSS Math Stat of PCA
1
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Intuition Random Noise ~ d12
For (Recall on Scale of Variance)
Spike Pops Out of Pure Noise Sphere
For
Spike Contained in Pure Noise Sphere
HDLSS Math Stat of PCA
1
1
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency of eigenvalues
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency of eigenvalues
Eigenvalues Inconsistent
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
nn
dL
d
2
11
HDLSS Math Stat of PCA
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Consistency of eigenvalues
Eigenvalues Inconsistent
But Known Distribution
Consistent when as Well
nn
dL
d
2
11
n
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
PCA Conditions Same since Noise Still
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
21dOp
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon amp PCA Consist
John Kent example
Can only say
not deterministic
But for Geo Reprsquon need some Mixing Cond
HDLSS Math Stat of PCA
dddddd ININX 10002
10
2
1~
21212121
21
10
)(
pwd
pwddOX p
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Conclude Need some Mixing Condition
HDLSS Math Stat of PCA
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Recall Standard Asymptotic Results as
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
(ldquoWeakrdquo = in prob ldquoStrongrdquo = as)
Mixing Conditions
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Recall Standard Asymptotic Results as
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
(Usually Ignore )
Mixing Conditions
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Law of Large Numbers
Central Limit Theorem
Both have Technical Assumptions
Eg Independent and Ident Distrsquod
Mixing Conditions
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Idea From Probability Theory
Mixing Conditions
Explore Weaker Assumptions to Still Get
Law of Large Numbers
Central Limit Theorem
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
Mixing Conditions
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Conditions
bull A Whole Area in Probability Theory
bull a Large Literature
bull A Comprehensive Reference
Bradley (2005 update of 1986 version)
bull Better Newer References
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
For Sigma-Fields Generated bybull bull bull Note Gap of Lag
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Mixing Conditions
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Mixing Condition Used Here
Rho ndash Mixing
For Random Variables Define
Where
Assume
Idea Uncorrelated at Far Lags
Mixing Conditions
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Assume Entries of Data Vectors
Are -mixing
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Hall Marron and Neeman (2005)
Drawback Strong Assumption
(In JRSS-B since
Biometrika Refused)
HDLSS Math Stat of PCA
d
j
X
X
X
X
2
1
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Series of Technical Improvements
bull Ahn Marron Muller amp Chi (2007)
bull Aoshima (2010) Yata amp Aoshima (2012)
(Fully Covariance Based
No Mixing)
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Tricky Point Classical Mixing Conditions
Require Notion of Time Ordering
Not Always Clear eg Microarrays
HDLSS Math Stat of PCA
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Note Not Gaussian
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Standardized
Version
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Conditions for Geo Reprsquon
Condition from Jung amp Marron (2009)
where
Define
Assume Ǝ a permutation
So that is ρ-mixing
HDLSS Math Stat of PCA
ddX 0~ tdddd UU
dtddd XUZ 21
d
ddZ
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Careful look at
PCA Consistency - spike
(Reality Check Suggested by Reviewer)
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
HDLSS Math Stat of PCA
1
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Careful look at
PCA Consistency - spike
Independent of Sample Size
So true for n = 1 ()
Reviewers Conclusion Absurd shows
assumption too strong for practice
HDLSS Math Stat of PCA
1
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall
RNAseq
Data From
82312
d ~ 1700
n = 180
HDLSS Math Stat of PCA
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Manually
Brushed
Clusters
Clear
Alternate
Splicing
Not
Noise
Functional Data Analysis
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
HDLSS Math Stat of PCA
1
1
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall Theoretical Separation
Strong Inconsistency - spike
Consistency - spike
Mathematically Driven Conclusion
Real Data Signals Are This Strong
HDLSS Math Stat of PCA
1
1
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Recall for Consistency
For Strong Inconsistency
HDLSS Math Stat of PCA
1
0ˆ 11 uuAngle
1
011 90ˆ uuAngle
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
HDLSS Math Stat of PCA
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores
What we study in PCA scatterplots
HDLSS Math Stat of PCA
ivji xPsjˆˆ
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
An Interesting Objection
Should not Study Angles in PCA
Because PC Scores (ie projections)
Not Consistent
For Scores and
Can Show (Random)
Thanks to Dan Shen
HDLSS Math Stat of PCA
ivji xPsjˆˆ ivji xPs
j
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
HDLSS Math Stat of PCA
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS
PCA
Often
Finds
Signal
Not Pure
Noise
HDLSS Math Stat of PCA
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Same Realization for
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
PC Scores (ie projections)
Not Consistent
So how can PCA find Useful Signals in Data
Key is ldquoProportional Errorsrdquo
Axes have Inconsistent Scales
But Relationships are Still Useful
HDLSS Math Stat of PCA
1ˆ
jji
ji Rs
s
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
HDLSS Deep Open Problem
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
In PCA Consistency
Strong Inconsistency - spike
Consistency - spike
What happens at boundary ()
Ǝ interesting Limit Distnrsquos
Jung Sen amp Marron (2012)
HDLSS Deep Open Problem Result
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Recall
Flexibility
From
Kernel
Embedding
Idea
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
Answer El Karoui (2010)
bull In Random Matrix Limit
bull Kernel Embedded Classifiers ~
~ Linear Classifiers
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
HDLSS Asymptotics amp Kernel Methods
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
Interesting Question
Behavior in Very High Dimension
Implications for DWD
Recall Main Advantage is for High d
So not Clear Embedding Helps
Thus not yet Implemented in DWD
HDLSS Asymptotics amp Kernel Methods
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this
HDLSS Additional Results
Batch Adjustment Xuxin Liu
Recall Intuition from above
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
Mathematics behind this