20
1 A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?

Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

1

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers

Jing Jiang & ChengXiang Zhai

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

��������������� �������������

What is domain adaptation?

Page 2: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

2

��������������� ������������� �

Example: named entity recognition

New York Times New York Times

train (labeled)

test (unlabeled)

NER Classifier

standard supervised learning 85.5%

persons, locations, organizations, etc.

��������������� ������������� �

Example: named entity recognition

New York Times

NER Classifier

train (labeled)

test (unlabeled)

New York Times

labeled data not availableReuters

64.1%

persons, locations, organizations, etc.

non-standard (realistic) setting

Page 3: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

3

��������������� ������������� �

Domain difference �

performance drop

train test

NYT NYT

New York Times New York Times

NER Classifier 85.5%

Reuters NYT

Reuters New York Times

NER Classifier 64.1%

ideal setting

realistic setting

��������������� ������������� �

Another NER example

train test

mouse mouse

gene name

recognizer

fly mouse

gene name

recognizer

ideal setting

realistic setting

54.1%

28.1%

Page 4: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

4

��������������� ������������� �

Other examples

� Spam filtering:� Public email collection

�personal inboxes

� Sentiment analysis of product reviews� Digital cameras

�cell phones

� Movies �

books� Can we do better than standard supervised

learning?� Domain adaptation: to design learning methods that

are aware of the training and test domain difference.

��������������� ������������� �

How do we solve the problem in general?

Page 5: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

5

��������������� ������������� �

Observation 1domain-specific features

winglessdaughterlesseyelessapexless…

��������������� ������������� ��

Observation 1domain-specific features

winglessdaughterlesseyelessapexless…

• describing phenotype

• in fly gene nomenclature

• feature “ -less” weighted high

CD38PABPC5

feature still useful for other

organisms?

No!

Page 6: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

6

��������������� ������������� � �

Observation 2generalizable features

� ������������ ����������������� � ������������� �� "!!$#&%� '!�()(*!��,+-�������.0/�12/43$(5%��76869!: ;�$(

in each �

� 6=<��>6 ?A@CBED +0(F!>#:%� "!�(*()!��G7HIG /J6=<I��!:3� "/���(K�:���L1�.M+N��.OP!�.-.0( � 6=<��76 Q�RTS�QE?VU +0(!$#&%� '!�()(*!��,+-�XWY!>6Z��. G "��+-������,+[�\�] 8���$1�!\/�W^����3�. 66_+0(*(`3�!�($a

��������������� ������������� ��

Observation 2generalizable features

� ��!�OP��%$!��76Z��%:.0!�1�+0O����� bc+[��1�.0!�()( �� "!��d^�ef�g���J��� +h�������.0/�12/43$(5%��76869!: ;�$(

in each �

� 6=<��>6 ijlk�m +0( ��d^�en�g���>���G7HIG /�6=<L��!:3� "/���(K�:���I1�.-+o��.OP!:.M.0( � 6=<��76 p7qsr�p�i�t +0(��d^�ef�g���J��� +h�5WY!J6u��. G 8��+[������,+h�v�w Y����1�!\/�W^����3�. 66_+0()(73�!�(>a

feature “X be expressed”

Page 7: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

7

��������������� ������������� � �

General idea: two-stage approach

SourceDomain Target

Domain

features

generalizable features

domain-specific features

��������������� ������������� � �

Goal

TargetDomain

SourceDomain

features

Page 8: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

8

��������������� ������������� � �

Regular classification

SourceDomain Target

Domain

features

��������������� ������������� � �

Generalization: to emphasize generalizable features in the trained model

SourceDomain Target

Domain

features Stage 1

Page 9: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

9

��������������� ������������� ��

Adaptation: to pick up domain-specific features for the target domain

SourceDomain Target

Domain

features Stage 2

��������������� ������������� � �

Regular semi-supervised learning

SourceDomain Target

Domain

features

Page 10: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

10

��������������� ������������� � �

Comparison with related work

� We explicitly model generalizable features.� Previous work models it implicitly [Blitzer et al. 2006, Ben-

David et al. 2007, Daumé III 2007].� We do not need labeled target data but we need

multiple source (training) domains.� Some work requires labeled target data [Daumé III 2007].

� We have a 2nd stage of adaptation, which uses semi-supervised learning.

� Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, DauméIII 2007].

��������������� ������������� �

Implementation of the two-stage approach with logistic

regression classifiers

Page 11: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

11

��������������� ������������� �

x

Logistic regression classifiers01001::010

0.24.55

-0.33.0

::

2.1-0.90.4

-less

X be expressed

wyT x

∑=

'' )exp(

)exp(),|(

y

Ty

Tyyp

xw

xwwx

p binary features

… and wingless are expressed in…

wy

��������������� ������������� �

Learning a logistic regression classifier

01001::010

0.24.55

-0.33.0

::

2.1-0.90.4

(

���

=

∑ ∑=

N

iy

Ty

Ty

N 1'

'

2

)exp(

)exp(log

1

minargˆ

xw

xw

www

λ

log likelihood of training data

regularization term

penalize large weights

control model complexity

wyT x

Page 12: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

12

��������������� ������������� ��

Generalizable features in weight vectors

0.24.55

-0.33.0

::

2.1-0.90.4

3.20.54.5-0.13.5

::

0.1-1.0-0.2

0.10.74.20.13.2::

1.70.10.3

D1 D2 DK

w1 w2 wK

K source domains

generalizable features

domain-specific features

��������������� ������������� ��

We want to decompose w in this way

0.24.55

-0.33.0

::

2.1-0.90.4

00

4.60

3.2::000

0.24.50.4-0.3-0.2

::

2.1-0.90.4

= +

h non-zero entries for h

generalizable features

Page 13: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

13

��������������� ������������� ��

Feature selection matrix A

0 0 1 0 0 … 00 0 0 0 1 … 0

::

0 0 0 0 0 … 1

01001::010

01::0

x

A z = Ax

h

matrix A selects h generalizable features

��������������� ������������� ��

0.24.55

-0.33.0

::

2.1-0.90.4

4.63.2

::

3.6

0.24.50.4-0.3-0.2

::

2.1-0.90.4

= +

Decomposition of w

01001::010

01::0

01001::010

wT x = vT z + uT x

weights for generalizable features

weights for domain-specific features

Page 14: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

14

��������������� ������������� �

Decomposition of w

wTx = vTz + uTx

= (Av)Tx + uTx

= vTAx + uTx

w = AT v + u

��������������� ������������� ��

0.24.55

-0.33.0

::

2.1-0.90.4

4.63.2::

3.6

0.24.50.4-0.3-0.2

::

2.1-0.90.4

= +

0 0 … 00 0 … 01 0 … 00 0 … 00 1 … 0

::

0 0 … 00 0 … 00 0 … 0

Decomposition of wshared by all domainsdomain-specific

w = AT v + u

Page 15: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

15

��������������� ������������� ��

Framework for generalization

Fix A, optimize:

wkregularization term�s >> 1: to penalize domain-specific features

SourceDomain

TargetDomain

���

+−

��� �� �+=

� ��

= =

=

k k

k

N

k

N

i

kTki

ki

k

K

k

ks

k

AypNK 1 1

1

22

}{,

);|(log11

minarg} )ˆ{,ˆ(

uvx

uvuvuv

λλ

likelihood of labeled data from K source domains

��������������� ������������� ���

Framework for adaptation

������++

���� ++

��� ������++=

�� �

=

= =

=

);|(log1

);|(log1

1

1

minarg})ˆ{,ˆ,ˆ(

1

1 1

2

1

22

}{,

m

i

tTti

ti

N

k

N

i

kTki

ki

k

tt

K

k

ks

kt

Aypm

AypNK

k k

kt

uvx

uvx

uuvuuvuuv

λλλ

SourceDomain

TargetDomain

likelihood of pseudo labeled target domain

examples

�t = 1 <<

�s : to pick up domain-specific

features in the target domain

Fix A, optimize:

Page 16: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

16

��������������� ������������� ���

� Joint optimization

� Alternating optimization

���

+−

��� !"#$%

+=

∑ ∑

= =

=

k k

k

N

k

N

i

kTki

ki

k

K

k

ks

A,

k

AypNK

A

1 1

1

22

}{,

);|(log11

minarg})ˆ{,ˆ,ˆ(

uvx

uvuvuv

λλ

How to find A? (1)

&�'�(�)�*+�,�,�) -�.�/�0�+�,�,�) 1+

How to find A? (2)

2 Domain cross validation3 Idea: training on (K – 1) source domains and test on the held-out source domain3 Approximation:4 wf

k: weight for feature f learned from domain k4 wfk: weight for feature f learned from other domains4 rank features by

3 See paper for details

5=

⋅K

k

kf

kf ww

1

Page 17: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

17

&�'�(�)�*+�,�,�) -�.�/�0�+�,�,�) 1 1

Intuition for domain cross validation

…domains

………expressed………-less

D1 D2 Dk-1 Dk (fly)

……-less……expressed……

w

1.5

0.05

w

2.0

1.2………expressed………-less

1.8

0.1

&�'�(�)�*+�,�,�) -�.�/�0�+�,�,�) 1��

Experiments

2 Data set3 BioCreative Challenge Task 1B3 Gene/protein name recognition3 3 organisms/domains: fly, mouse and yeast2 Experiment setup3 2 organisms for training, 1 for testing3 F1 as performance measure

Page 18: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

18

&�'�(�)�*+�,�,�) -�.�/�0�+�,�,�) 1��

Experiments: Generalization

0.4700.1950.654DA-2(domain CV)

0.4250.1530.627DA-1 (joint-opt)

0.4160.1290.633BL

Y+F�

MM+Y�

FF+M�

YMethod

SourceDomain

TargetDomain

SourceDomain

TargetDomain using generalizable features is effective

domain cross validation is more effective than joint optimization

F: fly M: mouse Y: yeast

&�'�(�)�*+�,�,�) -�.�/�0�+�,�,�) 1��

Experiments: Adaptation

0.5010.3050.759DA-2-SSL0.4580.2410.633BL-SSL

Y+F � MM+Y � FF+M � YMethod

SourceDomain

TargetDomain

F: fly M: mouse Y: yeast

domain-adaptive bootstrapping is more effective than regular bootstrapping

SourceDomain

TargetDomain

Page 19: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

19

��������������� ������������� ���

Experiments: Adaptation

domain-adaptive SSL is more effective, especially with a small number of pseudo labels

��������������� ������������� ���

Conclusions and future work

� Two-stage domain adaptation� Generalization: outperformed standard supervised

learning� Adaptation: outperformed standard bootstrapping

� Two ways to find generalizable features� Domain cross validation is more effective

� Future work� Single source domain?� Setting parameters h and m

Page 20: Jing Jiang & ChengXiang Zhai07.pdf · Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? ... Adaptation:

20

��������������� ������������� ���

References

� S. Ben-David, J. Blitzer, K. Crammer & F. Pereira. Analysis of representations for domain adaptation. NIPS 2007.

� J. Blitzer, R. McDonald & F. Pereira. Domain adaptation with structural correspondence learning. EMNLP 2006.

� H. Daumé III. Frustratingly easy domain adaptation. ACL 2007.

��������������� ������������� ���

Thank you!