Upload
ichigaku-takigawa
View
3.884
Download
1
Embed Size (px)
Citation preview
/361
JST
2016.11.18 @ 19 (IBIS2016)
/362 (
http://art.ist.hokudai.ac.jp
) 1974 Turing Award Lecture Computer Programming as an Art (Don Knuth)
: ()
ScienceEngineering Art
https://doi.org/10.1145/361604.361612
/363
(active): 1,737 (inactive): 26,895
https://pubchem.ncbi.nlm.nih.gov/bioassay/41
PC3
(/QSAR)
/364
Atom + Bond SYBYL MOL2 Pharmacophore
O
N
N
NH
NHN
N
N
CH3
CH3
Imatinib (CID 5291) Conformers in PubChem3D
1
2
3
284
Molecular Graph Representations(Implicit Hydrogen)
Molecular Descriptors
(variations: constitutional, topological, atom pairs, geometrical, electronic, thermodynamical, physicochemical, WHIM, fingerprints, RDF, autocorrelations, functional groups, structural keys, properties, interaction fields, )
(LogP)HOMO/LUMO ()
(
)
/365
[Ramakrishnan+ 2014] Sci Data. 2014 Aug 5;1:140022 Quantum chemistry structures and properties of 134 kilo molecules.
C,O,N,F9133,88515(C7H10O26,095 )
?(c.f. Ugly duckling theorem , 1969)
/366
(?)
: The Art of Feature Engineering
"Applied machine learning" is basically feature engineering. Andrew Ng
Feature Engineering is the next buzz word after big data. Nayyar A. Zaid
()
Art
/367
(confounder)?
()
()
!!
()
/368
Abduction/Induction: , , FM,
/DAG: RF, GBM/MART/AnyBoost, XGBoost, RGF, DJ
: Best SubsetLASSO, SCAD, MC+, SIS: Stability Selection (aka Randomized LASSO)
: Chance Correlation, Concentration of Measures
: , PLS, PCA, t-SNE, Embedding(2vec)
: RP/ELM/RC, ExtraTrees, VR-Trees: Stacked Generalization (aka Stacking/Blending)
: ACE (Alternative Conditional Expectations)
AD(Applicability Domain), Y-Scrambling Test
Leo Breiman (1928-2005)
CART (Classification and Regression Trees), PIMPLE Random Forest Arcing (aka Boosting) Bagging, Pasting ACE (Alternative Conditional Expectations) Stacked Generalization (aka Stacking/Blending) Nonnegative Garrote (LASSO for Subset) Instability / Stabilization in Model Selection
Shannon-McMillan-Breiman Theorem () Kelly-Breiman Strategy ()
UC Berkeley 2005 SIGKDD Innovation Award Probability Theorist
If statistics is an applied field and not a minor branch of mathematics, then 99% of the published papers are useless exercises. ("Reflections after refereeing papers for NIPS, The Mathematics of Generalization, Ed. D.H. Wolpert, 1995)
https://en.wikipedia.org/wiki/File:Leo_Breiman.jpg
/3610Abduction/Induction
:/
()
/
Hypotheses/Axioms
Experimental Facts
deduction abductioninduction
The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. (Albert Einstein)
/3611
: (+) (82)
14(23)primary features (A-B)
(ZB)
(WZ)
(RS)
(,,,,,etc) (10000)
1. LASSOpre-select 2. Subset(Best Subset)
Case Study: PRL 114, 105503, 2015
/3612
: (+)
2) EhC 3violate!
4
1.
2.
3.
4.
2,4violate! (KRR)
1) (ZA,ZB)
Case Study: PRL 114, 105503, 2015
/3613
(3)
Case Study: PRL 114, 105503, 2015
()
()
/3614
(=interaction)
ij
(e.g. XORParity)
1
/3615, (GAM), FM,
+ (e.g. Factorization Machines)
(PolyReg)
(?)
(GAM)
http://playground.tensorflow.org/ (by Big Picture group, Google)
inputArchitecture Engineering
(or )
XOR
(?) overfit
ReLU
off
1/3, 4
NN(Tuning?)
/3618:
(n)Best Subset(m)
QSAR (Topliss 1972, 1979)
J. Fan, Features of Big Data and sparsest solution in high confidence set, 2014
Fan, 2014 5
()
(= 5Best Subset)
Chance Correlation / Spurious Correlation
/36
Beyer+ 1999:
19:
K. Beyer+, When Is Nearest Neighbor Meaningful? ICDT99 V. Pestov, On the geometry of similarity search: dimensionality curse and concentration of measure, Information Processing Letters, 1999.
Pd()
n+1d
Concentration of Measures Phenomena
/3620: Best Subset
(///RELIEFF/t etc) Best Subset (L0)
LASSO (Tibshirani 1996) (L1, Basis Pursuit Denoising)
leaps(Furnival & Wilson 1974) or (Morgan & Tatar 1972) () +
LARS (Efron+ 2003) or (Friedman+ 2007) LASSO=Best Subset(=t) LASSOBest Subset (biased)
glmnet (Friedman+ 2008)L1+L2(Elastic-Net): p>nLASSOn
/3621LASSOSIS
(Fan & Li, 2001)
Adaptive LASSO(Zou 2006) 2LASSO SCAD(Fan & Li 2001) () MC+(Zhang 2010) SCAD
1: Best Subset 2:
: SISpre-selectSCAD
Sure Independence Screening (SIS) (Fan & Lv 2008)p(0,1)n
/3622: Randomized Sparse Models
Stability Selection (Meinshausen & Buhlmann 2010)
Randomized LASSO (Meinshausen & Buhlmann 2010)
) BootstrapmBolasso (Bach 2008)
Regularization Path LASSO Stability Path () LASSO Randomized LASSO
/3623:
CART (Breiman+ 1984), AID (Morgan & Sonquist 1963), CHAID (Kass 1980) CLS (Hunt 1966), ID3 (Quinlan 1986), C4.5/C5.0 (Quinlan 1993) VFDT/Hoeffding Trees (Domingos & Hulten 2000)
Hyafil, Laurent; Rivest, RL (1976). "Constructing Optimal Binary Decision Trees is NP-complete". Information Processing Letters. 5 (1): 1517. doi:10.1016/0020-0190(76)90095-8.
Known Facts Automatic Interaction Detector(AID) CARTBayes-risk consistent (Gordon & Olshen 1978, 1980) 2NP(Hyafil & Rivest 1976)greedy+pruning OK
(DNF)
https://dx.doi.org/10.1016/0020-0190(76)90095-8
/3624
/DAG: RF, GBM/MART/AnyBoost, XGBoost, RGF, DJ
: , PLS, PCA, t-SNE, Embedding
: RP/ELM/RC, ExtraTrees, VR-Trees: Stacked Generalization (aka Stacking/Blending)
: ACE(Alternative Conditional Expectations)
:
: MARS: (aka Kantorovich)
:
/3625
PCA(Bair+ 2006), Sparse PCA(Zou+ 2006), Sparse PLS(L Cao+ 2008; Chun & Kele 2010), ICA(Comon 1994),
: PLS, PCA,
(0,1)
(: )PLS (: )
Manifold Learning: ISOMAP(Tenenbaum+ 2000), LLE(Roweis & Saul 2000), t-SNE(van der Maaten & Hinton 2008),
Neural Networks: Embedding (2vec), AutoEncoders,
/3626
ACE (Alternative Conditional Expectations), Breiman & Friedman 1985
: ACE
/3627
Random Forest (Breiman 2001)
()
XGBoost (Chen & Guestrin KDD16) +L2 GBM/MART/GBDT/AnyBoost (Friedman 1999; Mason+ NIPS99)
Regularized Greedy Forests, RGF (Johnson & Zhang 2014)
Importance Sample Learning Ensemble, ISLE (Friedman & Popescu 2003) (1/2) LASSO
Decision Jungles (Shotton+ NIPS13) DAG
()
Bootstrap +
: split or
Greedy + Fully Corrective
/3628Feature ImportancePartial Dependence
: ?
CART ?
Best Subset
Feature Importance Partial Dependence Plot (PDP) ?
!! () PCABlending()
ESLII (2009)
/3629
OK ()
Randomized Trees
: : :FC FC
: :FC FC
Extreme Learning Machine, ELM (Huang 2006)
Reservoir Computing, RC (e.g. Schrauwen+ 2007)
()()
Extremely Randomized Trees (Geurts+ 2006) VR-Trees (Liu 2008)
Random Projections
or CART
/3630
Netflix PrizeStacking/Blending
: CV
12345678910
(Out-of-Sample Estimate)
()
CV
Stacked Generalization(Wolpert 1992; Breiman 1996)
/3631
(Linear)
+stack
=
(aka Kantorovich)
/3632
NIPS 2003 Challenge (Guyon+ NIPS04)
PCA or + NN(MCMC) ARD prior or Dirichlet Difussion Tree prior Bayesian Neural Networks (Neal & Zhang 2006) 2(258 units)
NIPS 2003 Feature Selection Challenge
%
Guyon5! (Guyon+ 2007)
ESLII(Hastie+ 2009)NN,RF
16 ()
(entryKernel methods)
/3633KDD Cup 2015
https://speakerdeck.com/smly/techniques-tricks-for-data-mining-competitions
Churn Prediction: MOOC(XuetangX)dropout ($20,000)
Techniques (Tricks) for Data Mining Competitions (@smly)
821(+leaky?)
Linear Stacker 3StackingStacker1,2
GBMNN + LR Stack () KRRET(Extra Trees)2
/3634
AD(Applicability Domain)QSAR
AD = ( or )
C. Rcker+, J. Chem. Inf. Model., 2007, 47 (6), pp 23452357
Y-Scrambling Test / Y-Randomization
y
(?)
/3635:
underfit (RF, XGBoost, RGF, ET, DJ, ) Blending?
(FM) diverse (RPNN) Stacking
(SIS, t)(t-SNE, PLS, PCA, etc)
Cross ValidationADY-Scrambling
(Best Subset)
(Boosting, Bagging, Stacking)(Stability Selection, Bagging/Feature Bagging, ELM, ExtraTrees, etc)
/3636
(: )
JST