54
Molecular Biomedical Informatics 分分分分分分分分分 Machine Learning and Bioinformatics 分分分分分分分分分分 Machine Learning & Bioinformatics 1

Machine Learning and Bioinformatics 機器學習與生物資訊學

Embed Size (px)

DESCRIPTION

Machine Learning and Bioinformatics 機器學習與生物資訊學. Using kernel density estimation in enhancing conventional instance-based classifier. Kernel approximation Kernel a pproximation in O ( n ) Multi-dimensional kernel approximation Kernel d ensity e stimation - PowerPoint PPT Presentation

Citation preview

Page 1: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 1

Molecular Biomedical Informatics分子生醫資訊實驗室Machine Learning and Bioinformatics機器學習與生物資訊學

Page 2: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 2

Using kernel density estimation in enhancing conventional instance-based classifier

Kernel approximation Kernel approximation in O(n) Multi-dimensional kernel approximation Kernel density estimation Application in data classification

Page 3: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 3

Identifying class boundary

Page 4: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 4

Boundary identified

Page 5: Machine Learning and Bioinformatics 機器學習與生物資訊學

5Machine Learning and Bioinformatics

Kernel approximation

Page 6: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 6

Kernel approximation

Problem definition Given the values of function at a set of

samples We want to find a set of symmetric kernel

functions and the corresponding weights such

that

Page 7: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 7

Equations

(1)

Page 8: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 8

Kernel approximation

An 1-D example�̂� (𝐯 )=∑

𝑖

𝑤 𝑖∙𝐾 (𝒗 ,𝑐 𝑖 ,𝑏𝑖)≅ 𝑓 (𝐯 )

𝑤𝑖 ∙𝐾 (𝒗 ,𝑐𝑖 ,𝑏𝑖 )

Page 9: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 9

Kernel approximation

Spherical Gaussian functions Hartman et al. showed that a linear combination of

spherical Gaussian functions can approximate any

function with arbitrarily small error “Layered neural networks with Gaussian hidden

units as universal approximations”, Neural

Computation, Vol. 2, No. 2, 1990 With the Gaussian kernel functions, we want to find

such that

Page 10: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 10

Equations

(1)

(2)

Page 11: Machine Learning and Bioinformatics 機器學習與生物資訊學

11Machine Learning and Bioinformatics

Kernel approximation in O(n)

Page 12: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 12

Kernel approximation in O(n) In the learning algorithm, we assume uniform

sampling– namely, samples are located at the crosses of an evenly-

spaced grid in the d-dimensional vector space

Let denote the distance between two adjacent samples If the assumption of uniform sampling does not hold,

then some sort of interpolation can be conducted to

obtain the approximate function values at the crosses

of the grid

Page 13: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 13

A 2-D example of uniform sampling

𝛿

𝛿

Page 14: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 14

The O(n) kernel approximation Under the assumption that the sampling

density is sufficiently high, i.e. , we have the

function values at a sample and its k nearest

samples, , are virtually equal: In other words, is virtually a constant function

equal to in the proximity of Accordingly, we can expect that ;

Page 15: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 15

The O(n) kernel approximation

An 1-D example𝑓 (𝑥)

h 𝛿𝑓 (𝑥)≅ 𝑓 (h 𝛿 )

h 𝛿(h+1)𝛿(h − 1)𝛿(h −𝑘2)𝛿 (h+

𝑘2)𝛿

Page 16: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 16

In the 1-D example, samples located at , where

h is an integer Under the assumption that , we have

The issue now is to find appropriate and such

that

in the proximity of

Page 17: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 17

Equations

(1)

(2)

(3)

Page 18: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 18

What’sthe difference between equations (2) and (3)?

Page 19: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 19

If   we   set  𝜎h=𝛿 ,  we   have𝑓 (h𝛿)

exp (− (𝑥−𝑖 𝛿 )2

2𝜎h2 )

Page 20: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 20

Therefore, with , we can set and obtain

In fact, it can be shown that is bounded by Therefore, we have the following function

approximate:

Page 21: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 21

Equations (1) (2) (3) (4) (5)

Page 22: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 22

The O(n) kernel approximation

Generalization We can generalize the result by setting , where is a real

number

The table shows the bounds of

Bounds

0.5

1.0

1.5

Page 23: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 23

Equations (1) (2) (3) (4) (5) (6)

Page 24: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 24

The effect of

∑𝑗=−∞

exp (− (𝑥− 𝑗 𝛿 )2

2 ( 0.5𝛿 )2 )exp (− (𝑥− 𝑗 𝛿 )2

2 (0.5𝛿 )2 )

Page 25: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 25

The smoothing effect The kernel approximation function is actually

a weighted average of the sampled function

values Selecting a larger value implies that the

smoothing effect will be more significant Our suggestion is set

Page 26: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 26

An example of the smoothing effect

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1targetfit

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1targetfit

Page 27: Machine Learning and Bioinformatics 機器學習與生物資訊學

27Machine Learning and Bioinformatics

Multi-dimensional kernel approximation

Page 28: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 28

Multi-dimensional kernel approximation

The basic concepts Under the assumption that the sampling

density is sufficiently high, i.e. , we have the

function values at a sample and its k nearest

samples, , are virtually equal: As a result, we can expect that and , where

and are the weights and bandwidths of the

Gaussian functions located at , respectively

Page 29: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 29

Let , the scalar form of is

If we set to , then is bounded by

Page 30: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 30

Accordingly, we have is bounded by and,

with ,

In RVKDE, d is relaxed to a parameter, Finally, by setting uniformly to , we obtain

the following kernel smoothing function that

approximates :

, where are the k nearest samples of the object

located at v

Page 31: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 31

Equations (1) (2) (3) (4) (5) (6) (7)

Page 32: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 32

Multi-dimensional kernel approximation

Generalization If we set uniformly to , then

As a result, we want to set , where and obtain

the following function approximate:

, where are the k nearest samples of the

object located at v In RVKDE, k is a parameter kt

Page 33: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 33

Equations (1) (2) (3) (4) (5) (6) (7) (8)

Page 34: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 34

Multi-dimensional kernel approximation

Simplification An interesting observation is that, as long as ,

then regardless of the value of , we have With this observation, we have

Page 35: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 35

Equations (1) (2) (5) (6) (7) (8) (9)

Page 36: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 36

Any Questions?

Page 37: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 37

Lack something??

Page 38: Machine Learning and Bioinformatics 機器學習與生物資訊學

38Machine Learning and Bioinformatics

Kernel density estimation

Page 39: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 39

Kernel density estimation

Problem definition Assume that we are given a set of samples

taken from a probability distribution in a d-

dimensional vector space The problem definition of kernel density

estimation is to find a linear combination of

kernel functions that provides a good

approximation of the probability density

function

Page 40: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 40

Probability density estimation The value of the probability density function

at a vector v can be estimated as follows:

n is the total number of samples R(v) is the distance between vector v and its

ks-th nearest samples is the volume of a sphere with radius = R(v)

in a d-dimensional vector space

Page 41: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 41

The Gamma function

, since

, since

Page 42: Machine Learning and Bioinformatics 機器學習與生物資訊學

42Machine Learning and Bioinformatics

Application in data classification

Page 43: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 43

Application in data classification Recent development in data classification

focuses on the support vector machines

(SVM), due to accuracy concern RVKDE

– delivers the same level of accuracy as the SVM

and enjoys some advantages

– constructs one approximate probability density

function for one class of objects

Page 44: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 44

Time complexity The average time complexity to construct a

classifier is , if the k-d tree structure is

employed, where n is the number of training

samples The time complexity to classify m new objects

with unknown class is

Page 45: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 45

Comparison of classification accuracyClassification algorithms

Data sets RVKDE SVM 1NN 3NN

1. iris (150) 97.33 () 97.33 94.00 94.67

2. wine (178) 99.44 () 99.44 96.08 94.97

3. vowel (528) 99.62 () 99.05 99.43 97.16

4. segment (2310) 97.27 () 97.40 96.84 95.98

Avg. 1-4 98.42 98.31 96.59 95.70

5. glass (214) 75.74 () 71.50 69.65 72.45

6. vehicle (846) 73.53 () 86.64 70.45 71.98

Avg. 1-6 90.49 91.89 87.74 87.87

Page 46: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 46

Comparison of classification accuracy

3 larger data setsClassification algorithms

Data sets RVKDE SVM 1NN 3NN

7. satimage(4435,2000)

92.30 () 91.30 89.35 90.60

8. letter(15000,5000)

97.12 () 97.98 95.26 95.46

9. shuttle(43500,14500)

99.94 () 99.92 99.91 99.92

Avg. 7-9 96.45 96.40 94.84 95.33

Page 47: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 47

Data reduction As the learning algorithm is instance-based,

removal of redundant training samples will

lower the complexity of the prediction phase The effect of a naïve data reduction mechanism

was studied The naïve mechanism removes a training

sample, if all of its 10 nearest samples belong to

the same class as this particular sample

Page 48: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 48

With the KDE based statistical learning algorithm, each training sample is associated with a kernel function with typically a varying width

Page 49: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 49

In comparison with the decision function of SVM in which only support vectors are associated with a kernel function with a fixed width

Page 50: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 50

Effect of data reductionsatimage letter shuttle

# of training samples in the original data set 4435 15000 43500

# of training samples after data reduction is applied 1815 7794 627

% of training samples remaining 40.92% 51.96% 1.44%

Classification accuracy after data reduction is applied 92.15% 96.18% 99.32%

Degradation of accuracy due to data reduction -0.15% -0.94% -0.62%

Page 51: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 51

Effect of data reduction# of training samples after data reduction is applied

# of support vectors identified by LIBSVM

satimage 1815 1689

letter 7794 8931

shuttle 627 287

Page 52: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 52

Execution times (in seconds)Proposed algorithm without data reduction

Proposed algorithm with data reduction

SVM

Cross validation

satimage 670 265 64622

letter 2825 1724 386814

shuttle 96795 59.9 467825

Make classifier satimage 5.91 0.85 21.66

letter 17.05 6.48 282.05

shuttle 1745 0.69 129.84

Test satimage 21.3 7.4 11.53

letter 128.6 51.74 94.91

shuttle 996.1 5.58 2.13

Page 53: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 53

Today’s exercise

Page 54: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 54

Multiple-class predictionDesign your own select, feature, buy and sell

programs. Upload and test them in our

simulation system. Finally, commit your best

version before 23:59 10/15 (Mon).