50
Apr. 26, 2010 CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi Sugiyama [email protected] http://sugiyama-www.cs.titech.ac.j

Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

Apr. 26, 2010CMU

Density Ratio Estimation:A New Versatile Toolfor Machine Learning

Density Ratio Estimation:A New Versatile Toolfor Machine Learning

Department of Computer ScienceTokyo Institute of Technology

Masashi [email protected]

http://sugiyama-www.cs.titech.ac.jp/~sugi

Page 2: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

2Overview of My Talk (1)Consider the ratio of two probability densities.

If the ratio is estimated, various machine learning problems can be solved!Non-stationarity adaptation, domain adaptation,

multi-task learning, outlier detection, change detection in time series, feature selection, dimensionality reduction, independent component analysis, conditional density estimation, classification, two-sample test

Page 3: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

3Overview of My Talk (2)

Estimating density ratio is substantially easier than estimating densities!

Various direct density ratio estimation methods have been developed recently.

Vapnik said: When solving a problem of interest,one should not solve a more general problem

as an intermediate step

Knowing densities Knowing ratio

Page 4: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

4Organization of My Talk

1. Usage of Density Ratios:Covariate shift adaptationOutlier detectionMutual information estimationConditional probability estimation

2. Methods of Density Ratio Estimation

Page 5: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

5Covariate Shift Adaptation

Trainingsamples

Testsamples

Function

Targetfunction

Training/test input distributions are different, but target function remains unchanged:

“extrapolation”

Input density

Shimodaira (JSPI2000)

Page 6: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

6 Adaptation Using Density Ratios Ordinary least-squares

is not consistent.

Applicable to any likelihood-based methods!

Density-ratio weighted least-squares is consistent.

Page 7: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

7Model SelectionControlling bias-variance trade-off is importa

nt in covariate shift adaptation.No weighting: low-variance, high-biasDensity ratio weighting: low-bias, high-variance

Density ratio weighting plays an important role for unbiased model selection:Akaike information criterion (regular models)

Subspace information criterion (linear models)

Cross-validation (arbitrary models)

Shimodaira (JSPI2000)

Sugiyama & Müller (Stat&Dec.2005)

Sugiyama, Krauledat & Müller (JMLR2007)

Page 8: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

8Active Learning

Covariate shift naturally occurs in active learning scenarios: is designed by users. is determined by environment.

Density ratio weighting plays an important role for low-bias active learning.Population-based methods:

Pool-based methods:

Wiens (JSPI2000)Kanamori & Shimodaira (JSPI2003)

Sugiyama, (JMLR2006)

Kanamori, (NeuroComp2007),Sugiyama & Nakajima (MLJ2009)

Page 9: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

9Real-world ApplicationsAge prediction from faces (with NECsoft):

Illumination and angle change (KLS&CV)

Speaker identification (with Yamaha):Voice quality change (KLR&CV)

Brain-computer interface:Mental condition change (LDA&CV)

Ueki, Sugiyama & Ihara (ICPR2010)

Yamada, Sugiyama & Matsui (SP2010)

Sugiyama, Krauledat & Müller (JMLR2007)Li, Kambara, Koike & Sugiyama (IEEE-TBE2010)

Page 10: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

10Real-world Applications (cont.)Japanese text segmentation (with IBM):

Domain adaptation from general conversation to medicine (CRF&CV)

Semiconductor wafer alignment (with Nikon):Optical marker allocation (AL)

Robot control:Dynamic motion update (KLS&CV)

Optimal exploration (AL)

Tsuboi, Kashima, Hido, Bickel & Sugiyama (JIP2008)

Sugiyama & Nakajima (MLJ2009)

Akiyama, Hachiya, & Sugiyama (NN2010)

Hachiya, Akiyama, Sugiyama & Peters (NN2009) Hachiya, Peters & Sugiyama (ECML2009)

こんな・失敗・は・ご・愛敬・だ・よ・.

Page 11: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

11Organization of My Talk

1. Usage of Density Ratios:Covariate shift adaptationOutlier detectionMutual information estimationConditional probability estimation

2. Methods of Density Ratio Estimation

Page 12: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

12Inlier-based Outlier Detection

Test samples having different tendency from regular ones are regarded as outliers.

Outlier

Hido, Tsuboi, Kashima, Sugiyama & Kanamori (ICDM2008)Smola, Song & Teo (AISTATS2009)

Page 13: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

13Real-world ApplicationsSteel plant diagnosis (with JFE Steel)

Printer roller quality control (with Canon)

Loan customer inspection (with IBM)

Sleep therapy from biometric data

Takimoto, Matsugu & Sugiyama (DMSS2009)

Hido, Tsuboi, Kashima, Sugiyama & Kanamori (KAIS2010)

Kawahara & Sugiyama (SDM2009)

Page 14: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

14Organization of My Talk

1. Usage of Density Ratios:Covariate shift adaptationOutlier detectionMutual information estimationConditional probability estimation

2. Methods of Density Ratio Estimation

Page 15: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

15Mutual Information EstimationMutual information (MI):

MI as an independence measure:

MI can be approximated using density ratio:

   and   arestatistically

independent

Suzuki, Sugiyama & Tanaka (ISIT2009)

Page 16: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

16Usage of MI EstimatorIndependence between input and output:

Feature selectionSufficient dimension reduction

Independence between inputs:Independent component analysis

Independence between input and residual:Causal inference

x x’

yεFeature selection,

Sufficient dimension reduction

Independent component analysis

Causal inference

Suzuki & Sugiyama (AISTATS2010)

Yamada & Sugiyama (AAAI2010)

Suzuki, Sugiyama, Sese & Kanamori(BMC Bioinformatics 2009)

Suzuki & Sugiyama(ICA2009)

Page 17: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

17Organization of My Talk

1. Usage of Density Ratios:Covariate shift adaptationOutlier detectionMutual information estimationConditional probability estimation

2. Methods of Density Ratio Estimation

Page 18: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

18Conditional Probability Estimation

When is continuous:Conditional density

estimationMobile robots’

transition estimation

When is categorical:Probabilistic classification

Pattern

Class 1

Class 2

80%

20%Sugiyama (Submitted)

Sugiyama, Takeuchi, Suzuki, Kanamori, Hachiya & Okanohara (AISTATS2010)

Page 19: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

19Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment matching Approach

C) Ratio matching Approach

D) Comparison

E) Dimensionality Reduction

Page 20: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

20Density Ratio Estimation

Density ratios are shown to be versatile.In practice, however, the density ratio

should be estimated from data.

Naïve approach: Estimate two densities separately and take their ratio

Page 21: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

21Vapnik’s Principle

Estimating density ratio is substantially easier than estimating densities!

We estimate density ratio without going through density estimation.

When solving a problem, don’t solvemore difficult problems as an intermediate step

Knowing densities Knowing ratio

Page 22: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

22Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approach

D) Comparison

E) Dimensionality Reduction

Page 23: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

23Probabilistic ClassificationSeparate numerator and denominator

samples by a probabilistic classifier:

Logistic regression achieves the minimum asymptotic variance when model is correct.

Qin (Biometrika1998)

Page 24: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

24Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approach

D) Comparison

E) Dimensionality Reduction

Page 25: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

25Moment Matching

Match moments of and :Ex) Matching the mean:

Page 26: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

26Kernel Mean Matching

Matching a finite number of moments does not yield the true ratio even asymptotically.

Kernel mean matching: All the moments are efficiently matched using Gaussian RKHS:

Ratio values at samples can be estimated via quadratic program.

Variants for learning entire ratio function and other loss functions are also available.

Huang, Smola, Gretton, Borgwardt & Schölkopf (NIPS2006)

Kanamori, Suzuki & Sugiyama (arXiv2009)

Page 27: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

27Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approachi. Kullback-Leibler Divergence

ii. Squared Distance

D) Comparison

E) Dimensionality Reduction

Page 28: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

28Ratio Matching

Match a ratio model with true ratio under the Bregman divergence:

Its empirical approximation is given by

(Constant is ignored)

Sugiyama, Suzuki & Kanamori (RIMS2010)

Page 29: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

29Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approachi. Kullback-Leibler Divergence

ii. Squared Distance

D) Comparison

E) Dimensionality Reduction

Page 30: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

30Kullback-Leibler Divergence (KL)

Put . Then BR yields

For a linear model ,

minimization of KL is convex.

Sugiyama, Nakajima, Kashima, von Bünau & Kawanabe (NIPS2007)Nguyen, Wainwright & Jordan (NIPS2007)

(ex. Gauss kernel)

Page 31: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

31Properties

Parametric case:

Learned parameter converge to the optimal value with order .

Non-parametric case:

Learned function converges to the optimal function with order slightly slower than (depending on the covering number or the bracketing entropy of the function class).

Nguyen, Wainwright & Jordan (NIPS2007)Sugiyama, Suzuki, Nakajima, Kashima, von Bünau & Kawanabe (AISM2008)

Page 32: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

32Experiments Setup: d-dim. Gaussian with covariance identity and

Denominator: mean (0,0,0,…,0)Numerator: mean (1,0,0,…,0)

Ratio of kernel density estimators (RKDE):Estimate two densities separately and take ratio.Gaussian width is chosen by CV.

KL method:Estimate density ratio directly.Gaussian width is chosen by CV.

Page 33: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

33

RKDE: “curse of dimensionality”KL: works well

RKDE

KLNor

mal

ized

MS

E

Dim.

Accuracy as a Function

of Input Dimensionality

Page 34: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

34Variations

EM algorithms for KL withLog-linear models

Gaussian mixture models

Probabilistic PCA mixture models

are also available.

Tsuboi, Kashima, Hido, Bickel & Sugiyama (JIP2009)

Yamada & Sugiyama (IEICE2009)

Yamada, Sugiyama, Wichern & Jaak (Submitted)

Page 35: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

35Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approachi. Kullback-Leibler Divergence

ii. Squared Distance

D) Comparison

E) Dimensionality Reduction

Page 36: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

36Squared Distance (SQ)

Put . Then BR yields

For a linear model ,

minimizer of SQ is analytic:

Computationally efficient!

Kanamori, Hido & Sugiyama (JMLR2009)

Page 37: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

37Properties

Optimal parametric/non-parametric rates of convergence are achieved.

Leave-one-out cross-validation score can be computed analytically.

SQ has the smallest condition number.Analytic solution allows computation of the

derivative of mutual information estimator:Sufficient dimension reductionIndependent component analysisCausal inference

Kanamori, Suzuki & Sugiyama (arXiv2009)Kanamori, Hido & Sugiyama (JMLR2009)

Page 38: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

38Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approach

D) Comparison

E) Dimensionality Reduction

Page 39: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

39Density Ratio Estimation Methods

MethodDensity

estimationDomains of

de/nuModel

selectionComputation

time

RKDE Involved Could differ Possible Very fast

LR Free Same? Possible Slow

KMM Free Same? ??? Slow

KL Free Could differ Possible Slow

SQ Free Could differ Possible Fast

SQ would be preferable in practice.

Page 40: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

40Organization of My Talk

1. Usage of Density Ratios

2. Methods of Density Ratio Estimation:A) Probabilistic Classification Approach

B) Moment Matching Approach

C) Ratio Matching Approach

D) Comparison

E) Dimensionality Reduction

Page 41: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

41Direct Density Ratio Estimationwith Dimensionality Reduction (D3)

Directly density ratio estimation without going through density estimation is promising.

However, for high-dimensional data, density ratio estimation is still challenging.

We combine direct density ratio estimation with dimensionality reduction!

Page 42: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

42Hetero-distributional Subspace (HS)Key assumption: and are

different only in a subspace (called HS).

This allows us to estimate the density ratio only within the low-dimensional HS!

: Full-rank and orthogonal

HS

Page 43: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

43HS Search Based onSupervised Dimension

ReductionIn HS, and are different, so and would be separable.

We may use any supervised dimension reduction method for HS search.

Local Fisher discriminant analysis (LFDA):Can handle within-class multimodalityNo limitation on reduced dimensionsAnalytic solution available

Sugiyama (JMLR2007)

Sugiyama, Kawanabe & Chui (NN2010)

Page 44: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

44Pros and Cons of SupervisedDimension Reduction

ApproachPros:SQ+LFDA gives analytic solution for D3.

Cons:Rather restrictive implicit assumption that

and are independent.

Separability does not always lead to HS (e.g., two overlapped, but different densities)

Separable HS

Page 45: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

45Characterization of HS

HS is given as the maximizer of the Pearson divergence :

Thus, we can identify HS by finding maximizer of PE with respect to .

PE can be analytically approximated by SQ!

Sugiyama, Hara, von Bünau, Suzuki, Kanamori & Kawanabe (SDM2010)

Page 46: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

46HS Search Gradient descent:Natural gradient descent:

Utilize the Stiefel-manifold structure for efficiency

Givens rotation:Choose a pair of axes and rotate in the subspace

Subspace rotation:Ignore rotation within subspaces for efficiency

Hetero-distributionalsubspace

Rotation acrossthe subspace

Rotation withinthe subspace

Amari (NeuralComp1998)

Plumbley (NeuroComp2005)

Page 47: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

47

Samples (2d) True ratio (2d)

D3-SQ (2d)

Plain SQ (2d)

Experiments

Increasing dimensionality(by adding noisy dims)

Plain SQ

D3-SQ

Page 48: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

48ConclusionsMany ML tasks can be solved via density rati

o estimation:Non-stationarity adaptation, domain adaptation,

multi-task learning, outlier detection, change detection in time series, feature selection, dimensionality reduction, independent component analysis, conditional density estimation, classification, two-sample test

Directly estimating density ratios without going through density estimation is the key.LR, KMM, KL, SQ, and D3 .

Page 49: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

49BooksQuiñonero-Candela, Sugiyama, Schwaighofe

r & Lawrence (Eds.), Dataset Shift in Machine Learning, MIT Press, 2009.

Sugiyama, von Bünau, Kawanabe & Müller, Covariate Shift Adaptation in Machine Learning, MIT Press (coming soon!)

Sugiyama, Suzuki & Kanamori, Density Ratio Estimation in Machine Learning, Cambridge University Press (coming soon!)

Page 50: Apr. 26, 2010CMU Density Ratio Estimation: A New Versatile Tool for Machine Learning Department of Computer Science Tokyo Institute of Technology Masashi

50The World of Density Ratios

Theoretical analysis:Convergence, Information criteria, Numerical stability

Density ratio estimation:Fundamental algorithms (LR, KMM, KL, SQ)

Large-scale, High-dimensionality, Stabilization, Robustification

Machine learning algorithms:Covariate shift adaptation, Outlier detection,

Conditional probability estimation, Mutual information estimation

Real-world applications:Brain-computer interface, Robot control, Image understanding,

Speech recognition, Natural language processing, Bioinformatics