19
Language Models Natural Language Processing Emory University Jinho D. Choi

CS517: Language Models

Embed Size (px)

Citation preview

Language Models

Natural Language ProcessingEmory University

Jinho D. Choi

Probability

2

Probability of tomorrow being cloudy?

5 days 3 days 2 days

P (cloudy) =C(cloudy)

C(sunny) + C(cloudy) + C(snowy)=

3

10

Conditional Probability

3

Probability of tomorrow being cloudy if today is snowy?

P (cloudy|snowy) = C(snowy, cloudy)

C(snowy)=

1

2

Conditional Probability

4

Probability of tomorrow being cloudy if today and yesterday are snowy?

P (cloudy|snowy, snowy) = C(snowy, snowy, cloudy)

C(snowy, snowy)=

1

1= 1

Joint Probability

5

Probability of next 2 days being cloudy, sunny?

P (cloudy, sunny) = P (cloudy) · P (sunny|cloudy)

Joint Probability

6

Probability of next 3 days being snowy, cloudy, sunny?

P (snowy, cloudy, sunny) = P (snowy)·P (cloudy|snowy)·P (sunny|snowy, cloudy)

N-gram Models

7

1-gram (Unigram)

P (wi) =C(wi)P8k C(wk)

2-gram (Bigram)

P (wi+1|wi) =C(wi, wi+1)P8k C(wi, wk)

=C(wi, wi+1)

C(wi)

=C(wi)

N# of tokens

tokenvs

type?

Emory University Logo Guidelines

#P�KPUVKVWVKQP�CU�NCTIG�CPF�XCTKGF�CU�'OQT[�TGSWKTGU�C�EQPUKUVGPV�XKUWCN�KFGPVKV[�VJCV�WPKƂGU�KVU�XCTKQWU�CHƂNKCVGU��'OQT[oU�EWTTGPV�UVCPFCTFU��YJKEJ�JCXG�DGGP�KP�WUG�UKPEG�������TGKPHQTEG�VJG�WPKSWG�EJCTCEVGT�CPF�SWCNKV[�QH�GCEJ�CECFGOKE�CPF�CFOKPKUVTCVKXG�WPKV��YJKNG�UKOWNVCPGQWUN[�OCMKPI�KV�ENGCT�VJCV�'OQT[�UVCPFU�DGJKPF�GCEJ�QH�VJGO� +P�CFFKVKQP�VQ�VJG�OCKP�7PKXGTUKV[�ITCRJKE�KFGPVKƂGTU��OQUV�UEJQQNU�CPF�OCLQT�WPKVU�JCXG�VJGKT�QYP�EQORNGOGPVCT[�UGV�QH�KFGPVKV[�ITCRJKEU�HQT�RTKPV�CPF�YGD��YJKEJ�YGTG�FGXGNQRGF�KP�ECTGHWN�EQPUWNVC-VKQP�YKVJ�FGCPU�CPF�WPKV�JGCFU��&QYPNQCFCDNG�NQIQU�CPF�YQTFOCTMU�HQT�'OQT[�7PKXGTUKV[��VJG�UEJQQNU��CPF�OCLQT�WPKVU�ECP�DG�HQWPF�QP�VJG�YGD�CV�JVVR���KFGPVKV[�GOQT[�GFW��RTKPV�ITCRJKE�UVCPFCTFU��CPF�JVVR���YGDIWKFG�GOQT[�GFW��YGD�ITCRJKE�UVCPFCTFU���6JG�'/14;�YQTFOCTM�KU�C�HGFGTCNN[�TGIKUVGTGF�VTCFGOCTM��#�UCPEVKQPGF�KFGPVKƂGT�QH�VJG�7PKXGTUKV[tC�UEJQQN�QT�OCLQT�WPKV�NQIQ�VJCV�KPENWFGU�VJG�UJKGNF�U[ODQN�CPF�VJG�YQTFOCTM�'/14;tUJQWNF�CRRGCT�QP�GCEJ�RWDNKECVKQP��+H�[QW�YKUJ�VQ�JCXG�CP�KFGPVKƂGT�WPKV�UKIPCVWTG��ETGCVGF�URGEKƂECNN[�HQT�[QWT�RTQITCO�QT�FGRCTVOGPV��RNGCUG�EQPVCEV�VJG�1HƂEG�QH�$TCPF�/CPCIGOGPV�CV��������������QT�UVCPKU�MQFOCP"GOQT[�GFW��

6JG�V[RGHCEG�)QWF[�KU�TGUGTXGF�HQT�VJG�'OQT[�VTCFGOCTMU�CPF�PGXGT�UJQWNF�DG�WUGF�KP�VGZV�QT�FKURNC[�EQR[��;QW�PGXGT�UJQWNF�CVVGORV�VQ�TGPFGT�VJG�'OQT[�NQIQ�D[�V[RKPI�VJG�NGVVGTU�KP�C�YQTF�RTQEGUUKPI�QT�RCIG�NC[QWV�RTQITCO��0QPG�QH�VJG�NQIQU�KU�C�V[RGF�YQTF�DWV�TCVJGT�KU�URGEKƂECNN[�FGUKIPGF�XGEVQT�CTV�

'OQT[�VKGT�QPG�NQIQU�UJQWNF�TGRTQFWEG�QPN[�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG��6JG�UJKGNF�JGKIJV�UJQWNF�TGRTQFWEG�CV����q�QT�NCTIGT���#�IGPGTCN�TWNG�HQT�URCEKPI�CTQWPF�CP�'OQT[�NQIQ�KU�VQ�KPVGITCVG�CP�QDXKQWU�XKUWCN�UGRCTCVKQPtPQ�FGUKIP�GNGOGPV�QT�VGZV�UJQWNF�DG�PGUVGF�YKVJ�'OQT[�NQIQU�

'OQT[oU�RTKOCT[�EQNQTU�CTG�'OQT[�DNWG�2/5������CPF�[GNNQY�2/5�������'OQT[�7PKXGTUKV[�YQTFOCTMU� ECP�DG�TGRTQFWEGF�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG�QP�CP�'OQT[�DNWG�QT�FCTM�DCEMITQWPF��

YYY�KFGPVKV[�GOQT[�GFWJVVR���YGDIWKFG�GOQT[�GFW

3/8” minimun reproductionheight of shield

Keep a space around the logo equal to the height and width of the “M” in Emory

PMS 280 PMS 131 PMS 130 coated uncoated

Web colors are: Emory blue 002878 gold (dark) d28e00 gold (light) d2b000

N-gram Models• Unigram model

- Given any word w, it shows how likely w appears in context.

- This is known as the likelihood (probability) of w, written as P(w).

- How likely does the word “Emory” appear in context?

- Does this mean “Emory” appears 17.39% time in any context?

- How can we measure more accurate likelihoods?

8

Emory University was found as Emory College by John Emory.Emory University is 16th among the colleges and universities in US.

P (Emory) =4

23⇡ 0.1739

Emory University Logo Guidelines

#P�KPUVKVWVKQP�CU�NCTIG�CPF�XCTKGF�CU�'OQT[�TGSWKTGU�C�EQPUKUVGPV�XKUWCN�KFGPVKV[�VJCV�WPKƂGU�KVU�XCTKQWU�CHƂNKCVGU��'OQT[oU�EWTTGPV�UVCPFCTFU��YJKEJ�JCXG�DGGP�KP�WUG�UKPEG�������TGKPHQTEG�VJG�WPKSWG�EJCTCEVGT�CPF�SWCNKV[�QH�GCEJ�CECFGOKE�CPF�CFOKPKUVTCVKXG�WPKV��YJKNG�UKOWNVCPGQWUN[�OCMKPI�KV�ENGCT�VJCV�'OQT[�UVCPFU�DGJKPF�GCEJ�QH�VJGO� +P�CFFKVKQP�VQ�VJG�OCKP�7PKXGTUKV[�ITCRJKE�KFGPVKƂGTU��OQUV�UEJQQNU�CPF�OCLQT�WPKVU�JCXG�VJGKT�QYP�EQORNGOGPVCT[�UGV�QH�KFGPVKV[�ITCRJKEU�HQT�RTKPV�CPF�YGD��YJKEJ�YGTG�FGXGNQRGF�KP�ECTGHWN�EQPUWNVC-VKQP�YKVJ�FGCPU�CPF�WPKV�JGCFU��&QYPNQCFCDNG�NQIQU�CPF�YQTFOCTMU�HQT�'OQT[�7PKXGTUKV[��VJG�UEJQQNU��CPF�OCLQT�WPKVU�ECP�DG�HQWPF�QP�VJG�YGD�CV�JVVR���KFGPVKV[�GOQT[�GFW��RTKPV�ITCRJKE�UVCPFCTFU��CPF�JVVR���YGDIWKFG�GOQT[�GFW��YGD�ITCRJKE�UVCPFCTFU���6JG�'/14;�YQTFOCTM�KU�C�HGFGTCNN[�TGIKUVGTGF�VTCFGOCTM��#�UCPEVKQPGF�KFGPVKƂGT�QH�VJG�7PKXGTUKV[tC�UEJQQN�QT�OCLQT�WPKV�NQIQ�VJCV�KPENWFGU�VJG�UJKGNF�U[ODQN�CPF�VJG�YQTFOCTM�'/14;tUJQWNF�CRRGCT�QP�GCEJ�RWDNKECVKQP��+H�[QW�YKUJ�VQ�JCXG�CP�KFGPVKƂGT�WPKV�UKIPCVWTG��ETGCVGF�URGEKƂECNN[�HQT�[QWT�RTQITCO�QT�FGRCTVOGPV��RNGCUG�EQPVCEV�VJG�1HƂEG�QH�$TCPF�/CPCIGOGPV�CV��������������QT�UVCPKU�MQFOCP"GOQT[�GFW��

6JG�V[RGHCEG�)QWF[�KU�TGUGTXGF�HQT�VJG�'OQT[�VTCFGOCTMU�CPF�PGXGT�UJQWNF�DG�WUGF�KP�VGZV�QT�FKURNC[�EQR[��;QW�PGXGT�UJQWNF�CVVGORV�VQ�TGPFGT�VJG�'OQT[�NQIQ�D[�V[RKPI�VJG�NGVVGTU�KP�C�YQTF�RTQEGUUKPI�QT�RCIG�NC[QWV�RTQITCO��0QPG�QH�VJG�NQIQU�KU�C�V[RGF�YQTF�DWV�TCVJGT�KU�URGEKƂECNN[�FGUKIPGF�XGEVQT�CTV�

'OQT[�VKGT�QPG�NQIQU�UJQWNF�TGRTQFWEG�QPN[�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG��6JG�UJKGNF�JGKIJV�UJQWNF�TGRTQFWEG�CV����q�QT�NCTIGT���#�IGPGTCN�TWNG�HQT�URCEKPI�CTQWPF�CP�'OQT[�NQIQ�KU�VQ�KPVGITCVG�CP�QDXKQWU�XKUWCN�UGRCTCVKQPtPQ�FGUKIP�GNGOGPV�QT�VGZV�UJQWNF�DG�PGUVGF�YKVJ�'OQT[�NQIQU�

'OQT[oU�RTKOCT[�EQNQTU�CTG�'OQT[�DNWG�2/5������CPF�[GNNQY�2/5�������'OQT[�7PKXGTUKV[�YQTFOCTMU� ECP�DG�TGRTQFWEGF�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG�QP�CP�'OQT[�DNWG�QT�FCTM�DCEMITQWPF��

YYY�KFGPVKV[�GOQT[�GFWJVVR���YGDIWKFG�GOQT[�GFW

3/8” minimun reproductionheight of shield

Keep a space around the logo equal to the height and width of the “M” in Emory

PMS 280 PMS 131 PMS 130 coated uncoated

Web colors are: Emory blue 002878 gold (dark) d28e00 gold (light) d2b000

N-gram Models• Bigram model

- Given any words wi and wj in sequence, it shows the likelihood of wj following wi in context.

- This can be represented as the conditional probability of P(wj|wi).

- What is the most likely word following “Emory”?

9

Emory University was found as Emory College by John Emory.Emory University is the 20th among the national universities in US.

P (University|Emory) = 24 = 0.5

P (College|Emory) = 14 = 0.25

P (.|Emory) = 14 = 0.25

argmax

kP (wk|Emory)

Maximum Likelihood

10

x

n1 = x1, . . . , xn

P (xn1 ) = P (x1) · P (x2|x1) · P (x3|x2

1) · · ·P (xn|xn�11 )

Chain rule

Any practical issue?

(x1, …, xk) can be very sparse.

Markov assumption

P (xk|xk�11 ) ⇡ P (xk|xk�1)

P (xn1 ) ⇡ P (x1) · P (x2|x1) · P (x3|x2) · · ·P (xn|xn�1)

Emory University Logo Guidelines

#P�KPUVKVWVKQP�CU�NCTIG�CPF�XCTKGF�CU�'OQT[�TGSWKTGU�C�EQPUKUVGPV�XKUWCN�KFGPVKV[�VJCV�WPKƂGU�KVU�XCTKQWU�CHƂNKCVGU��'OQT[oU�EWTTGPV�UVCPFCTFU��YJKEJ�JCXG�DGGP�KP�WUG�UKPEG�������TGKPHQTEG�VJG�WPKSWG�EJCTCEVGT�CPF�SWCNKV[�QH�GCEJ�CECFGOKE�CPF�CFOKPKUVTCVKXG�WPKV��YJKNG�UKOWNVCPGQWUN[�OCMKPI�KV�ENGCT�VJCV�'OQT[�UVCPFU�DGJKPF�GCEJ�QH�VJGO� +P�CFFKVKQP�VQ�VJG�OCKP�7PKXGTUKV[�ITCRJKE�KFGPVKƂGTU��OQUV�UEJQQNU�CPF�OCLQT�WPKVU�JCXG�VJGKT�QYP�EQORNGOGPVCT[�UGV�QH�KFGPVKV[�ITCRJKEU�HQT�RTKPV�CPF�YGD��YJKEJ�YGTG�FGXGNQRGF�KP�ECTGHWN�EQPUWNVC-VKQP�YKVJ�FGCPU�CPF�WPKV�JGCFU��&QYPNQCFCDNG�NQIQU�CPF�YQTFOCTMU�HQT�'OQT[�7PKXGTUKV[��VJG�UEJQQNU��CPF�OCLQT�WPKVU�ECP�DG�HQWPF�QP�VJG�YGD�CV�JVVR���KFGPVKV[�GOQT[�GFW��RTKPV�ITCRJKE�UVCPFCTFU��CPF�JVVR���YGDIWKFG�GOQT[�GFW��YGD�ITCRJKE�UVCPFCTFU���6JG�'/14;�YQTFOCTM�KU�C�HGFGTCNN[�TGIKUVGTGF�VTCFGOCTM��#�UCPEVKQPGF�KFGPVKƂGT�QH�VJG�7PKXGTUKV[tC�UEJQQN�QT�OCLQT�WPKV�NQIQ�VJCV�KPENWFGU�VJG�UJKGNF�U[ODQN�CPF�VJG�YQTFOCTM�'/14;tUJQWNF�CRRGCT�QP�GCEJ�RWDNKECVKQP��+H�[QW�YKUJ�VQ�JCXG�CP�KFGPVKƂGT�WPKV�UKIPCVWTG��ETGCVGF�URGEKƂECNN[�HQT�[QWT�RTQITCO�QT�FGRCTVOGPV��RNGCUG�EQPVCEV�VJG�1HƂEG�QH�$TCPF�/CPCIGOGPV�CV��������������QT�UVCPKU�MQFOCP"GOQT[�GFW��

6JG�V[RGHCEG�)QWF[�KU�TGUGTXGF�HQT�VJG�'OQT[�VTCFGOCTMU�CPF�PGXGT�UJQWNF�DG�WUGF�KP�VGZV�QT�FKURNC[�EQR[��;QW�PGXGT�UJQWNF�CVVGORV�VQ�TGPFGT�VJG�'OQT[�NQIQ�D[�V[RKPI�VJG�NGVVGTU�KP�C�YQTF�RTQEGUUKPI�QT�RCIG�NC[QWV�RTQITCO��0QPG�QH�VJG�NQIQU�KU�C�V[RGF�YQTF�DWV�TCVJGT�KU�URGEKƂECNN[�FGUKIPGF�XGEVQT�CTV�

'OQT[�VKGT�QPG�NQIQU�UJQWNF�TGRTQFWEG�QPN[�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG��6JG�UJKGNF�JGKIJV�UJQWNF�TGRTQFWEG�CV����q�QT�NCTIGT���#�IGPGTCN�TWNG�HQT�URCEKPI�CTQWPF�CP�'OQT[�NQIQ�KU�VQ�KPVGITCVG�CP�QDXKQWU�XKUWCN�UGRCTCVKQPtPQ�FGUKIP�GNGOGPV�QT�VGZV�UJQWNF�DG�PGUVGF�YKVJ�'OQT[�NQIQU�

'OQT[oU�RTKOCT[�EQNQTU�CTG�'OQT[�DNWG�2/5������CPF�[GNNQY�2/5�������'OQT[�7PKXGTUKV[�YQTFOCTMU� ECP�DG�TGRTQFWEGF�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG�QP�CP�'OQT[�DNWG�QT�FCTM�DCEMITQWPF��

YYY�KFGPVKV[�GOQT[�GFWJVVR���YGDIWKFG�GOQT[�GFW

3/8” minimun reproductionheight of shield

Keep a space around the logo equal to the height and width of the “M” in Emory

PMS 280 PMS 131 PMS 130 coated uncoated

Web colors are: Emory blue 002878 gold (dark) d28e00 gold (light) d2b000

Maximum Likelihood• Maximum likelihood

- Given any word sequence wi, …, wn, how likely this sequence appears in context.

- This can be represented as the joint probability of P(wj, …, wn).

- How likely does the sequence “you know” appears in context?

11

you know , I know you know that you do .

Chain rule

P (you, know) =2

11

P (you) · P (know|you) = 3

11· 23=

2

11

not 10?

Emory University Logo Guidelines

#P�KPUVKVWVKQP�CU�NCTIG�CPF�XCTKGF�CU�'OQT[�TGSWKTGU�C�EQPUKUVGPV�XKUWCN�KFGPVKV[�VJCV�WPKƂGU�KVU�XCTKQWU�CHƂNKCVGU��'OQT[oU�EWTTGPV�UVCPFCTFU��YJKEJ�JCXG�DGGP�KP�WUG�UKPEG�������TGKPHQTEG�VJG�WPKSWG�EJCTCEVGT�CPF�SWCNKV[�QH�GCEJ�CECFGOKE�CPF�CFOKPKUVTCVKXG�WPKV��YJKNG�UKOWNVCPGQWUN[�OCMKPI�KV�ENGCT�VJCV�'OQT[�UVCPFU�DGJKPF�GCEJ�QH�VJGO� +P�CFFKVKQP�VQ�VJG�OCKP�7PKXGTUKV[�ITCRJKE�KFGPVKƂGTU��OQUV�UEJQQNU�CPF�OCLQT�WPKVU�JCXG�VJGKT�QYP�EQORNGOGPVCT[�UGV�QH�KFGPVKV[�ITCRJKEU�HQT�RTKPV�CPF�YGD��YJKEJ�YGTG�FGXGNQRGF�KP�ECTGHWN�EQPUWNVC-VKQP�YKVJ�FGCPU�CPF�WPKV�JGCFU��&QYPNQCFCDNG�NQIQU�CPF�YQTFOCTMU�HQT�'OQT[�7PKXGTUKV[��VJG�UEJQQNU��CPF�OCLQT�WPKVU�ECP�DG�HQWPF�QP�VJG�YGD�CV�JVVR���KFGPVKV[�GOQT[�GFW��RTKPV�ITCRJKE�UVCPFCTFU��CPF�JVVR���YGDIWKFG�GOQT[�GFW��YGD�ITCRJKE�UVCPFCTFU���6JG�'/14;�YQTFOCTM�KU�C�HGFGTCNN[�TGIKUVGTGF�VTCFGOCTM��#�UCPEVKQPGF�KFGPVKƂGT�QH�VJG�7PKXGTUKV[tC�UEJQQN�QT�OCLQT�WPKV�NQIQ�VJCV�KPENWFGU�VJG�UJKGNF�U[ODQN�CPF�VJG�YQTFOCTM�'/14;tUJQWNF�CRRGCT�QP�GCEJ�RWDNKECVKQP��+H�[QW�YKUJ�VQ�JCXG�CP�KFGPVKƂGT�WPKV�UKIPCVWTG��ETGCVGF�URGEKƂECNN[�HQT�[QWT�RTQITCO�QT�FGRCTVOGPV��RNGCUG�EQPVCEV�VJG�1HƂEG�QH�$TCPF�/CPCIGOGPV�CV��������������QT�UVCPKU�MQFOCP"GOQT[�GFW��

6JG�V[RGHCEG�)QWF[�KU�TGUGTXGF�HQT�VJG�'OQT[�VTCFGOCTMU�CPF�PGXGT�UJQWNF�DG�WUGF�KP�VGZV�QT�FKURNC[�EQR[��;QW�PGXGT�UJQWNF�CVVGORV�VQ�TGPFGT�VJG�'OQT[�NQIQ�D[�V[RKPI�VJG�NGVVGTU�KP�C�YQTF�RTQEGUUKPI�QT�RCIG�NC[QWV�RTQITCO��0QPG�QH�VJG�NQIQU�KU�C�V[RGF�YQTF�DWV�TCVJGT�KU�URGEKƂECNN[�FGUKIPGF�XGEVQT�CTV�

'OQT[�VKGT�QPG�NQIQU�UJQWNF�TGRTQFWEG�QPN[�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG��6JG�UJKGNF�JGKIJV�UJQWNF�TGRTQFWEG�CV����q�QT�NCTIGT���#�IGPGTCN�TWNG�HQT�URCEKPI�CTQWPF�CP�'OQT[�NQIQ�KU�VQ�KPVGITCVG�CP�QDXKQWU�XKUWCN�UGRCTCVKQPtPQ�FGUKIP�GNGOGPV�QT�VGZV�UJQWNF�DG�PGUVGF�YKVJ�'OQT[�NQIQU�

'OQT[oU�RTKOCT[�EQNQTU�CTG�'OQT[�DNWG�2/5������CPF�[GNNQY�2/5�������'OQT[�7PKXGTUKV[�YQTFOCTMU� ECP�DG�TGRTQFWEGF�KP�'OQT[�DNWG�2/5�������DNCEM��QT�YJKVG�QP�CP�'OQT[�DNWG�QT�FCTM�DCEMITQWPF��

YYY�KFGPVKV[�GOQT[�GFWJVVR���YGDIWKFG�GOQT[�GFW

3/8” minimun reproductionheight of shield

Keep a space around the logo equal to the height and width of the “M” in Emory

PMS 280 PMS 131 PMS 130 coated uncoated

Web colors are: Emory blue 002878 gold (dark) d28e00 gold (light) d2b000

Word Segmentation• Word segmentation

- Segment a chunk of string into a sequence of words.

- Are there more than one possible sequence?

- Choose the sequence that most likely appears in context.

12

youknow

P (you) · P (know|you) > P (yo) · P (uk|yo) · P (now|uk)

log(P (you) · P (know|you)) > log(P (yo) · P (uk|yo) · P (now|uk))

log(P (you)) + log(P (know|you)) > log(P (yo)) + log(P (uk|yo)) + log(P (now|uk))

Laplace Smoothing

13

P (xn1 ) ⇡ P (x1) · P (x2|x1) · P (x3|x2) · · ·P (xn|xn�1)

What if P(x1) = 0? P (xn1 ) ⇡ 0 ← BAD!!

Laplace Smoothing

Pl(xi) =C(xi) + ↵Pk(C(xk) + ↵)

=C(xi) + ↵P

k C(xk) + ↵|X|

P (xi) =C(xi)Pk C(xk)

=C(xi) + ↵

N + ↵|X|

=↵

N + ↵|X|Pl(x?) =

C(x?) + ↵Pk C(xk) + ↵|X|

Laplace Smoothing

14

Laplace Smoothing

Pl(xj |xi) =C(xi, xj) + ↵Pk(C(xi, xk) + ↵)

=C(xi, xj) + ↵P

k C(xi, xk) + ↵|Xi,⇤|

P (xj |xi) =C(xi, xj)Pk C(xi, xk)

=C(xi, xj)

C(xi)

=C(xi, xj) + ↵

C(xi) + ↵|Xi,⇤|

Pl(x?|xi) =↵

C(xi) + ↵|Xi,⇤|

Discount Smoothing• Issues with Laplace smoothing

- Unfair discounts.

- Unseen likelihood may get penalized too harshly when the minimum count is much greater than α.

- How to reduce the gap between the minimum count and unseen count?

15

10

100= 0.1 ! 10 + 1

100 + 10= 0.1

50

100= 0.5 ! 50 + 1

100 + 10= 0.46

1

100= 0.01 ! 1 + 1

100 + 10= 0.018 +0.008

0

-0.04

Discount Smoothing

16

Laplace Discount

Pl(xi) =C(xi) + ↵

N + ↵|X|

Pl(xj |xi) =C(xi, xj) + ↵

C(xi) + ↵|Xi,⇤|

Pl(x?|xi) =↵

C(xi) + ↵|Xi,⇤|

Pl(x?) =↵

N + ↵|X|

Pd(xi) =C(xi)� Pd(x?)

N

Pd(xj |xi) =C(xi, xj)� Pd(x?|xi)

C(xi)

Pd(x?|xi) = ↵ ·mink

P (xk|xi)

Pd(x?) = ↵ ·mink

P (xk)

Good-Turing Smoothing

17

Nc = the count of n-grams that appear c times

Type

carp

perch

whitefish

trout

salmon

eel

?

Total

N1 = 3, N2 = 1, N3 = 1, N4 = 1

MLE

0.33

0.25

0.17

0.08

0.08

0.08

0.00

1.00

C

4

3

2

1

1

1

0

12

C(xi)⇤ =

(C(xi) + 1) ·Nc+1

Nc

P (x?) =N1

N

C(eel) =(C(eel) + 1) ·N2

N1=

2

3

C*

5.00

4.00

3.00

0.67

0.67

0.67

0.17P (carp) =?

P (eel) =2/3

12

Pb(xj |xi) =

8<

:

Pl|d(xj |xi) P (xj |xi) > 0

� · Pl|d(xj) Otherwise

Backoff• Backoff

- Bigrams are more accurate than unigrams.

- Bigrams are sparser than unigrams.

- Use bigrams in general, and use unigrams only bigrams don’t exist.

18

How to measure?

� = ↵ · h(P (xj |xi)ii,jh(P (xj)ij

Interpolation• Interpolation

- Unigrams and bigrams provide different but useful information .

- Use them both with different weights.

19

P̂ (xj |xi) = �1 · Pl|d(xj) + �2 · Pl|d(xj |xi)

X

k

�k = 1