33
Generation of Descriptive Elements for Text Mutsugu Kuboki, Kazuhide Yamamoto Nagaoka University of Technology, Japan 1

Generation of Descriptive Elements for Text

  • View
    122

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Generation of Descriptive Elements for Text

Generation of Descriptive Elements for Text

Mutsugu Kuboki, Kazuhide Yamamoto

Nagaoka University of Technology, Japan

1

Page 2: Generation of Descriptive Elements for Text

2

What description about query is these texts?

Query is “LPF”

We can‟t recognize it immediately. It may be that the text may not describe query.

We want to know content at web search

results. ….But it is difficult.

Page 3: Generation of Descriptive Elements for Text

3

We try to generate Descriptive Elements(DE).

Query is “LPF”

Structure,

Type, …

Background,

Work, …

Page 4: Generation of Descriptive Elements for Text

Main works

1. Extraction Candidates of DEs

2. Assigning DEs to text (This work is tried by Japanese texts only)

4

Page 5: Generation of Descriptive Elements for Text

Main works

1. Extraction Candidates of DEs

2. Assigning DEs to text (This work is tried by Japanese texts only)

5

Page 6: Generation of Descriptive Elements for Text

Extraction of DEs

DEs are different of query.

Examples Apple: Kind, Size, Area-of-Production, … LPF: Role, Structure, Performance , …

We try to get candidates in advance.

6

Page 7: Generation of Descriptive Elements for Text

Extraction of DEs

Extract DEs from web search results. We use following rules.

7

(1)Pattern

ex)

enforcement of

LawProtectingPersonalInformation(eng)

kojinjyouhouhogohou-no-shikou(jpn)

(2)DEs are one word in Japanese.

„noun or compound nouns‟ of „query‟

„query “no” nouns or compound nouns‟(jpn)

Note. Japanese word “no” means “of”

Page 8: Generation of Descriptive Elements for Text

Candidates extraction Query

„law protecting personal information‟ (”kojinjouhouhogohou” in Japanese)

Data top 10,000 Google search results

Evaluation We evaluate candidate manually.

8

Page 9: Generation of Descriptive Elements for Text

Result of candidates extraction

Candidates 366 Adequate DEs 289(79%) Inadequate DEs 77(21%)

Adequate)infraction, operation, influence, …

Inadequate)learning, expert, …

79% of candidates are useful.

9

Page 10: Generation of Descriptive Elements for Text

Result of candidates extraction

Candidates 366 Adequate DEs 289(79%) Inadequate DEs 77(21%)

Next experiment use 54 DEs from adequate

Candidates.

10

Above results include a lot of low frequency

DEs. These DEs are rejected from candidates.

Page 11: Generation of Descriptive Elements for Text

Main works

1. Extraction Candidates of DEs

2. Assigning DEs to text (This work is tried by Japanese texts only)

11

Page 12: Generation of Descriptive Elements for Text

Method We assume that texts of same DE include same words.

12

Paragraph 1 ={w1,w2,w3,…} Paragraph 2 ={w2,w3,w4,…} …

Paragraphs DE: X

{w2,w3}

Trigger of DE X

Triggers construct 1, 2 and 3 morphemes.

(2)Extract

cooccurrence

words.

(1)Extract text

from the web.

(3)Collect Triggers.

Page 13: Generation of Descriptive Elements for Text

Method We assume that same DE texts include same words.

13

{w2, w3}, {w4, w7,w9}, {w11},…

Triggers of DE X

Does text include Triggers?

YES

DE: X

No

This text is not DE X.

Page 14: Generation of Descriptive Elements for Text

How to make Triggers(1)

14

1. Extract paragraphs which include

“query-no-DE(jpn)” from the web.

2. Extract the content words from the

paragraphs.

3. Extract cooccurrence words from the

same DE paragraphs.

Triggers

Page 15: Generation of Descriptive Elements for Text

How to make Triggers(2)

If Triggers apply to following rules, these

are excluded from Triggers.

Appearance frequency is 10% or under of

whole paragraphs which include „query-no-

DE‟

Same to query words

15

Page 16: Generation of Descriptive Elements for Text

How to make Triggers(3)

Try to pilot test.

And we use combination of following rules.

(1)[used Trigger] used by the pilot test

Effect to increase accuracy

(2)[unused Trigger] assign mistake over two

(3)[unused Trigger] used by over 2 Des

Lead to error (not decisive factor)

16

Page 17: Generation of Descriptive Elements for Text

Result of cooccurrence Trigger

• 1, 2, 3 triggers are number of morphemes.

• (1)(2)(3) are restriction rules.

Type Recall Precision F value average

nominations

Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3

17

Page 18: Generation of Descriptive Elements for Text

Result of cooccurrence Trigger

• High recall values and low precision values

• Restriction rule is effective (nomination is

decreasing)

Type Recall Precision F value average

nominations

Data (100 texts) 0.72 0.06 0.10 54.0 1 Trigger (1) 0.70 0.07 0.13 41.4 2 Trigger (1) 0.70 0.08 0.14 36.5 3 Trigger (1) 0.62 0.09 0.16 27.3 1 Trigger (1)(2) 0.42 0.15 0.22 5.9 2 Trigger (1)(2) 0.54 0.10 0.17 20.9 3 Trigger (1)(2) 0.55 0.10 0.16 21.8 1 Trigger (1)(2)(3) 0.37 0.16 0.22 3.4 2 Trigger (1)(2)(3) 0.52 0.10 0.17 18.5 3 Trigger (1)(2)(3) 0.55 0.10 0.17 20.3

18

Page 19: Generation of Descriptive Elements for Text

Issues of proposed method

Low Precision Value.

We want to know factor to decide

DEs.

Let‟s try to use more strong rules.

Modification relation Triggers

Notice.

Next experiment uses 19 DEs for simplicity (Similar DEs are rejected from candidates)

19

Page 20: Generation of Descriptive Elements for Text

Modification relation Triggers

Used patterns 1. noun and DEs

2. noun and synonym of DEs

3. noun and hyponym of DEs

Synonyms and hyponyms are obtained from

Japanese WordNet.

20

Page 21: Generation of Descriptive Elements for Text

Result of modification relation Triggers

p/p=system out right answers

p/n=system out mistake answers

n/p=system doesn‟t looking for DE (answers have DE)

n/n=system does recognize non DE text 21

Trigger Prec. system/answer

p/p p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer

data - 192 - - 1708

Page 22: Generation of Descriptive Elements for Text

Result of modification relation Triggers

22

Trigger Prec. system/answer

p/p p/n n/p n/n All 0.31 11 24 181 1615 DE 0.67 6 3 - - Synonym 0.21 3 11 - - Hyponym 0.17 2 10 - - Answer

data - 192 - - 1708

Results have a lot of mistakes.

Trigger is not effect to evaluate true or false?

check results

Page 23: Generation of Descriptive Elements for Text

Result p/n(assigning errors)(1)

Almost Triggers are constructed by

words to relate DE.

23

22/24 results are constructed by

relation words.

Examples Operation(cabinet, citizen, month)

Enforcement-status(announcement,

cabinet, year)

Words of Triggers relate to DE.

But precision value is low.

Page 24: Generation of Descriptive Elements for Text

Result p/n(assigning errors)(2)

Conclusion

Error factor isn‟t necessarily Trigger

words.

24

Only judgment that text have keywords

or not doesn‟t assure precision.

What factor does increase

precision?

Page 25: Generation of Descriptive Elements for Text

Result p/n(system can‟t find DE)

how to decide DEs by people.

we check n/p(181 pairs) manually

25

Factor is unclear • 28 pairs(15%)

• 11 pair DE is

“description”

• Others are low

frequently.

Factor is clear • 153 pairs(85%)

• These have specific

expression. Word, Words or

Phrase

Page 26: Generation of Descriptive Elements for Text

How to decide DEs by people

Conclusion

We use only part of text to decide

almost DE.(Text only explain query)

Point

don‟t use all text.

Example …Law protecting personal information

is established for fiscal year 2003…

26

Page 27: Generation of Descriptive Elements for Text

Conclusions

Effect of Trigger

Trigger don‟t assure precision.

How to increase precision

Don‟t use all text.

System have to use only part of

text that explain query.

27

Page 28: Generation of Descriptive Elements for Text

Future work

Assigning DEs use part of text

Look into…

• Effect of using part of text

• Other factor to decide DEs

28

Page 29: Generation of Descriptive Elements for Text

29

Page 30: Generation of Descriptive Elements for Text

Mistake in my paper…

I‟m sorry … III. Experiments and Results C. Assigning DE Using Restricted Triggers Please change (1) to (2).

30

Page 31: Generation of Descriptive Elements for Text

Example of p/p

●Difinition(生存-living, 識別-recognize) 死者に関する情報であってもその内容が遺族等の生存する個人を識別できる場合には個人情報保護法の個人情報として取り扱う必要があります。

31

Page 32: Generation of Descriptive Elements for Text

Example of p/n

● Effect(多い-many, 施行-enforcement) 主催した道中小企業家同友会帯広支部の石戸谷和政事務局長は「個人情報保護法といっても、正直、何から始めればいいのか分からない経営者が多い。施行が目の前に迫り、せっぱ詰まっている」と経営者たちの胸の内を代弁する。

32

Page 33: Generation of Descriptive Elements for Text

Example of n/p

●影響(Effect) 情報漏洩罪が出てきた背景には、従業員が個人情報を漏洩するケースが多く、かつ技術による防御には限界があるという認識がある。情報セキュリティに完璧はありえない。完璧を求めなくとも情報セキュリティ対策にはコストがかかり、個人情報保護法の施行以来、企業は多大なコスト負担に泣いているという現状がある。

33