19
1 Support Cluster Ma chine Paper from ICML2007 Read by Haiqin Yang 2007-10-18 This paper, Support Cluster Machine, was written by B in Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, whic h was published in 2007.

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

  • Upload
    sheena

  • View
    16

  • Download
    0

Embed Size (px)

DESCRIPTION

Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18. This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007. Outline. Background and Motivation Support Cluster Machine - SCM Kernel in SCM - PowerPoint PPT Presentation

Citation preview

Page 1: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

1

Support Cluster MachinePaper from ICML2007

Read by Haiqin Yang

2007-10-18

This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.

Page 2: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

2

Outline

Background and Motivation

Support Cluster Machine - SCM

Kernel in SCM

Experiments

An Interesting Application: Privacy-preserving Data Mining

Discussions

Page 3: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

3

Background and Motivation

Large scale classification problem Decomposition methods

Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001;

Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006;

Parallel techniques Collobert et al., 2001; Graf et al., 2004;

Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001;

Choose representatives Active learning - Schohn & Co

hn, 2003; Cluster Based-SVM - Yu et al.,

2003; Core Vector Machine (CVM) -

Tsang et al., 2005; Clustering SVM - Boley, D. &

Cao, 2004;

Page 4: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

4

Support Cluster Machine - SCM

Given training samples:

Procedure

Page 5: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

5

SCM Solution

Dual representation

Decision function

Page 6: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

6

Kernel

Probability product kernel

By Gaussian assumption, i.e.,

Hence

Page 7: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

7

Kernel Property I

That is

Decision function

Property II

Page 8: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

8

Experiments

Datasets Toydata MNIST – Handwritten digits

(‘0’-’9’) classification Adult – Privacy-preserving Dat

aset

Clustering algorithms Threshold Order Dependent (T

OD) EM algorithm

Classification methods libSVM SVMTorch SVMlight

CVM (Core Vector Machine) SCM

Model selection

CPU: 3.0GHz

Page 9: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

9

Toydata

Samples: 2500 samples/class generated from a mixture of Gaussian distribution

Clustering algorithm: TOD Clustering results: 25 positive, 25 negative

Page 10: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

10

MNIST Data description

10 classes: Handwritten digits ‘0’-’9’ Training samples: 60,000, about 6000 for each class Testing samples: 10,000

Construct 45 binary classifiers Results

25 Clusters for EM algorithm

Page 11: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

11

MNIST

Test results for TOD algorithm

Page 12: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

12

Privacy-preserving Data Mining Inter-Enterprise data mining

Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information.

Horizontally partitionedRecords (users) split across companiesExample: Credit card fraud detection model

Vertically partitionedAttributes split across companiesExample: Associations across websites

Page 13: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

13

Privacy-preserving Data Mining Randomization approach

50 | 40K | ... 30 | 70K | ... ...

...

Randomizer Randomizer

Reconstructdistribution

of Age

Reconstructdistributionof Salary

Data MiningAlgorithms

Model

65 | 20K | ... 25 | 60K | ... ...

Page 14: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

14

Classification Example

Age Salary Repeat Visitor?

23 50K Repeat

17 30K Repeat

43 40K Repeat

68 50K Single

32 70K Single

20 20K Repeat

Age < 25

Salary < 50K

Repeat

Repeat

Single

Yes

Yes

No

No

Page 15: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

15

Privacy-preserving Dataset: Adult

Data description Training samples: 30162 Testing samples: 15060 Percentage of positive samples: 24.78%

Procedure Horizontally partition data into three subsets (parties) Cluster by TOD algorithm Obtain three positive and three negative GMMs Combine positive and negative GMMs into one positive and one negative

GMMs with modified priors Classify them by SCM

Page 16: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

16

Privacy-preserving Dataset: Adult Partition results

Experimental results

Page 17: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

17

Discussions Solved problems

Large scale problems: downsample by clustering + classifier Privacy-preserving problems: hide individual information

Differences to other methods Training units are generative model, testing units are vectors Training units contain complete statistical information Only one parameter for model selection Easy implementation Generalization ability is not clear, while the RBF kernel in SVM has the p

roperty of larger width leads to lower VC dimension.

Page 18: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

18

Discussions

Advantages of using priors and covariances

Page 19: Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18

19

Thank you!