OTL: A Framework of Online Transfer Learning

OTL: A Framework of Online Transfer Learning

Online Transfer Learning Algorithm

~

The Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS2009)

•Propose the first framework for Online Transfer Learning.•Propose algorithms for two cases: homogeneous domain and heterogeneous domain.•We offered theoretical analysis. And encouraging results show the proposed algorithms are effective.

Problem Setting

Peilin Zhao¹, Steven C.H. Hoi¹¹Nanyang Technological University

Homogeneous Domain

Mistake Bound for OTL Algorithm

Experimental Results

Conclusions

Old/source data space: where and .

Kernel in old/source data space: .

Old/source classifier:

where are the set of support vectors for the old/source trainingdata set, and are the coefficients of support vectors.

New/target domain:• Homogeneous domain: .• Heterogeneous domain: .

A sequence of examples from the new/target domain:

Goal: learn some prediction function on the new/target domain from the sequence of examples.

Heterogeneous Domain

Concept drift: the target variable to be predicted changes over time in the learning process.

Basic idea: based on the ensemble learning concept for transferring knowledge from the old/source domain. we construct an entirely new prediction function f only from the data in the target domain.

Prediction:where and are weights, and .

Updating weights:

where , and ..

Co-regularized Online Transfer Learning Algorithm & Its Mistake Bound

Experimental Testbed and SetupDatasets: “a7a”, “w7a”, “mushrooms”, “spambase”, “usenet2” and “newsgroup4”.Download from: http://www.csie.ntu.edu.tw/˜cjlin/libsvmtools/datasets/http://www.ics.uci.edu/˜mlearn/MLRepository.html“newsgroup4” is a concept drift dataset created from “newsgroup20”, whose detail is shown in table 1.

Algorithms: PA is Passive Aggressive algorithm. PAIO is PA initialized with h (only using the first view for heterogeneous case).OTL(fixed) is OTL with fixed weights 0.5. COTL0 is COTL with first view classifier initialized with zero. SCT is combining PAIO of first view with PA of second view using fixed weights 0.5

RBF kernel: , (homo); , (hetero), for all the dataset and algorithm. Repeat 20 times for every dataset.

421 8 5C

Observation from experimental results: Homogeneous

• For the datasets without concept drifting, PA suffers more mistakes.

• For the datasets with concept drifting, OTL works the best than the other algorithms.

• OTL produce denser classifiers and spend more time. Heterogeneous

• PA without using information from old domain achieved the highest mistake rate.

• COTL algorithm has the smallest mistake rate. • COTL produce denser classifiers and spend more time.

Theorem 1. Let us denote by M the number of mistakes made by the OTL algorithm, we then have M bounded from above by:

Remark: Denote by and the mistake bound of model h and f, respectively. Noticeand . Assumeand . We have

Assumption: , for simplicity , the first m dimensions of represents the old feature space .Basic idea: split into and , then adopt a co-regularization principle to update the two classifiers and simultaneously for the two views.

Prediction:

Initialization: ,

Updating:

where and are positive and

Theorem 2. Let be a sequence of examples, where and for all . In addition a and . And we split the instance into two views . Then for any and , the number of mistakes M made by the proposed COTL algorithm is bounded from above by:

where

Remark: when the distance between h and the optimal is less than the distance between and the zero function, we can transfer knowledge from the old classifier to the new domain.

41 82

Documents

OTL: A Framework of Online Transfer Learning