1
Hilofumi Yamamoto Tokyo Institute of Technology Bor Hodoˇ cek Osaka University A Study on the Distribution of Cooccurrence Weight Patterns of Classical Japanese Poetry Introduction more generic more specific P: Hairball effect or spoke effect (Yamamoto 2005) P: Difficult to observe all word adjacency features. O: Checking the distribution of cooccurrence weight (Yamamoto 2006) G: Drawing clear figures and extracting function words. S: Cooccurrence weight is like the mean value of two person’s weights. light heavy C. F. Gauss (1777–1855) x σ 0 -σ y Tall Head Tall Head Tall Head Long Tail Dinosaur Long Tail Dinosaur Long Tail Dinosaur Content Tail Function Tail Methods Material: the Hachidaish¯ u (ca. 905–1205) Calculation of Cooccurrence Weight: cw w(t, d) = (1+log tf (t, d)) · idf (t) cw(t 1 ,t 2 ,d) = (1+log ctf (t 1 ,t 2 ,d)) · cidf (t 1 ,t 2 ) cidf (t 1 ,t 2 ) = idf (t 1 ) · idf (t 2 ) idf (t) = log N df (t) Bell curve Distribution of cw becomes Bell curve. ring!ring! Over σ = Content Tail. Under -σ Function Tail. Result -2 -1 0 1 2 3 0 200 400 600 800 1000 z-value frequency Sakura (cherry) -2 -1 0 1 2 3 4 0 200 400 600 800 1000 1200 1400 1600 z-value frequency Ume (plum) Figure 1: Bell curves Table 1: Upper cutoff patterns of ame (sakura): cw = co-occurrence weight; z = z-value (normalized value of frequency). word annotations: ari(be), ba(cond.), ha(topic.), hana(flower), hito(human), keri(past.), ki(past.), koso(emphatic.), miru(see), mo (also), nasi(no exist), nu(neg.), o(obj.), omou(think), ramu(aux.will), su(do), te(p.), to(and), ware(we), zo(emphatic.), zu(neg.) cw z pattern cw z pattern cw z pattern 1 0.62 -0.91 mo–keri 11 0.59 -0.96 nasi–ha 21 0.52 -1.05 nu–o 2 0.62 -0.92 hana–o 12 0.57 -0.98 o–ramu 22 0.52 -1.05 o–zo 3 0.62 -0.92 o–koso 13 0.57 -0.98 mo–ramu 23 0.52 -1.05 miru–o 4 0.60 -0.94 zu–keri 14 0.57 -0.98 ha–ki 24 0.48 -1.09 ba–mo 5 0.60 -0.94 su–ha 15 0.56 -1.00 zu–mo 25 0.48 -1.09 o–keri 6 0.60 -0.94 to–ba 16 0.56 -1.00 o–te 26 0.43 -1.16 zu–ha 7 0.59 -0.96 ari–ha 17 0.55 -1.01 hito–mo 27 0.43 -1.16 to–o 8 0.59 -0.96 ari–mo 18 0.54 -1.02 zu–te 28 0.43 -1.16 te–ha 9 0.59 -0.96 ware–mo 19 0.52 -1.05 zo–ha 29 0.34 -1.27 o–ha 10 0.59 -0.96 nasi–o 20 0.52 -1.05 omou–o 30 0.34 -1.27 o–mo well as in classical texts (Fig. 2). Therefore we will attempt to divide t Table 2: Lower cutoff patterns of ame (sakura) in Kokinsh¯ u: 30 out of 164 patterns extracted; cw = co-occurrence weight; z = z-value (normalized value of fre- quency) word annotations: ba(cond.), bakari(only), besi(should be), chiru(fall), fukakusa(deepgreen), hana(flower), isa(already), kakusu(hide), katu(win), koku(pull), komoru(go deep inside), magiru(mix), makasu(entrust), maku(wind up), manimani(as it is), masi(as), mazu(mix), me(eye), minami(south), miyako(city), mono(thing), nagara(even if), sakura(cherry), si(emphasic.), sumi(black ink), tatu(start,stand), tazumu(being around), tu(past.), uturou(change), watasu(give), yamakaze(mountain wind), yamu(stop), yanagi(willow), yononaka(world) cw z pattern cw z pattern 1 3.86 3.18 yamu–manimani 106 2.38 1.31 si–fukakusa 2 3.75 3.04 minami–magiru 107 2.38 1.31 sakura–hana 3 3.67 2.93 minami–maku 108 2.38 1.31 sakura–isa 4 3.61 2.86 maku–magiru 109 2.38 1.31 sakura–ba 5 3.42 2.62 yanagi–koku 110 2.38 1.30 sakura–me 6 3.38 2.57 yamu–makasu 7 3.38 2.56 mazu–koku 155 2.17 1.04 chiru–katu 8 3.27 2.43 yanagi–mazu 156 2.17 1.04 bakari–sumi 9 3.26 2.42 sakura–yamu 157 2.16 1.03 maku–besi 10 3.25 2.40 minami–yamakaze 158 2.16 1.03 tatu–maku 159 2.16 1.03 tatu–tazumu 101 2.40 1.33 uturou–komoru 160 2.16 1.03 tazumu–tu 102 2.40 1.33 sakura–watasu 161 2.16 1.03 miyako–sakura 103 2.40 1.33 katu–nagara 162 2.16 1.02 kakusu–si 104 2.39 1.32 sakura–masi 163 2.14 1.00 yononaka–sakura 105 2.39 1.31 sakura–makasu 164 2.14 1.00 mono–sakura Conclusion omou (think.1) B1 VERB a3 LK a1 a2 DN gamma b1 b2 END omou (think) suru (do.2) 1 haseru (run) 1 (ra)reru (PASS/POT) 5 te/de (LK) 24 ni (LK) 1 ta (PERF) 5 no/koto (DN) 38 node (because.2) 1 hodo (as much as..) 2 yo (FP) 7 wa (FP) 13 . 37 u (CJR) 1 torawareru (be captured) 1 1 1 2 1 1 iru (be-AS) 23 kuru (toward-AS) 1 koi (love.2) 1 fuan (anxiety) 1 1 15 ka (Q) 1 2 5 5 1 5 da (ASSER) 34 4 sa (FP) 12 5 kara (because.1) 1 taisetsu (precious) 1 1 1 6 1 14 20 Figure 2: Construction of the predicate of omou (think) with Function Tail cherry blossom (28/131,4.28): CT ctf.>0.00; non-dist=off; idf=off; pruned under U:1: here way home 1 village.1 1 lodging 1 scatter.1 1 forget 1 confuse 1 do.1 1 (adv.neg) 1 (must) 1 while 2 ã® 2 ã ¦ 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 2 2 1 1 2 1 10 4 2 8 15 8 perplexed 1 (adv.will) 4 forcibly 1 ã ¨ 9 ã ´ 6 even 3 see.3 10 ã 11 ã ã 2 get used 1 know 1 say 2 (p.wish) 6 ã ¯ 7 bloom 1 ç ¡ã ˙ 5 (adv.would) 1 ã ˛ 4 break off 1 ã 1 ã 2 ã `ã 1 increase 1 love.1 1 will not 1 kurabu.pl 1 ã 2 cloud.vi 1 such as.1 1 ã ªã 2 wait 2 ã ° 8 ã 3 borrow 1 regrettable 1 here we go 1 like.2 1 blow 2 å¦ ä½¯ã « 1 regret 1 bear 1 think 1 fast/early 1 ã ˙ 1 (ku) 3 lovely 1 sorrowful.1 1 1 1 1 1 2 2 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 6 3 2 2 1 1 2 (aux.past) 1 (aux.perf) 1 2 1 8 4 11 8 4 1 2 1 3 1 1 2 11 3 3 6 1 1 3 3 (suffix.forbid) 1 1 painful.1 1 high.1 2 (emph) 1 wish 1 1 (interj.) 2 Yoshino.PN 1 seemed 1 3 2 1 1 5 1 quiet 2 although 1 2 1 1 2 2 2 1 2 1 1 15 7 1 7 7 4 㠤㠤 1 seem 1 1 5 2 1 1 1 1 5 1 4 2 ã ˆã ˝ 2 2 1 1 6 2 1 11 1 7 7 2 2 3 2 2 2 1 2 2 1 1 2 2 3 3 1 1 4 2 2 1 2 1 4 1 1 either way 1 till 1 person 1 road 1 stop.vi.2 1 go 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 lodge.n 1 be.3 1 take 1 1 1 1 1 4 2 7 2 7 8 2 1 1 1 1 6 1 3 3 4 7 3 mountain village 1 1 3 1 2 mind 2 wind.n 1 mountain 2 desolate 1 come 1 2 be lonely 1 prosper 1 1 1 1 1 stone 1 waterfall 1 hand 1 1 run 1 1 stand.vi 1 1 cross 1 go over 1 send.v2 word 1 hear 1 1 1 present 1 tell 1 2 add 1 get tired.1 1 2 1 name 1 1 vain 1 rare.1 1 1 1 1 fall 1 1 spring rain 1 tear 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 2 1 1 exchange 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 2 2 1 1 1 see.1 1 1 1 1 1 1 2 6 5 4 1 3 2 4 1 1 1 3 1 2 1 1 1 1 1 2 4 8 1 4 5 2 3 2 1 2 1 1 1 2 1 1 2 1 1 1 3 2 3 1 1 2 1 what 1 1 2 thing.2 1 between 1 1 spring haze 1 hide.vt 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 4 1 1 2 1 1 1 1 2 1 2 1 myself 1 2 1 number 1 only 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 3 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 snow 1 1 1 1 such as.2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 3 1 2 1 1 1 1 trail 2 1 change.1 1 transfer.1 1 1 1 2 1 1 1 1 7 1 1 7 5 6 6 2 2 1 10 1 3 5 2 3 1 1 3 1 trust.2 1 1 1 1 2 1 1 1 1 3 1 1 1 4 2 1 2 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 1 1 1 thing.1 1 1 1 spring 1 this year 1 ã ã 1 begin 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 year 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 white cloud 1 foot 1 pull 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 2 2 1 1 2 1 4 1 2 1 1 1 1 1 3 2 1 1 2 1 out 1 1 2 1 1 1 1 after 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 5 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 5 2 5 9 3 2 1 1 3 4 1 4 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 1 1 1 3 1 2 1 1 1 1 mi 2 2 1 1 1 1 1 2 1 1 1 1 day 1 1 1 2 1 cloud.1 1 3 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 3 3 1 1 1 1 3 2 1 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 3 3 3 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 wave 1 sky 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 5 9 1 2 2 2 3 4 3 1 2 1 1 1 1 1 2 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 3 2 3 3 2 2 1 5 3 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 4 2 4 4 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 for 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 remnant 1 1 2 1 1 1 1 1 water 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 4 1 2 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 mistake 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 gorge 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 every 1 3 1 1 2 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 3 1 2 1 1 1 1 1 old age 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 today 2 1 1 2 1 1 1 1 1 1 1 a glance 1 you.3 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 1 1 2 2 2 1 1 mountain area 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 2 1 2 1 3 3 1 2 1 2 1 1 1 1 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 remain 1 be over 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 1 1 1 2 1 1 1 1 2 1 2 3 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 person from old village 3 2 1 1 1 1 1 1 1 1 1 1 world.2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 colour 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 3: Hairball effect cherry blossom (28/131,4.28): CT ctf.>0.00; non-dist=off; idf=off; pruned under U:1: here way home 1 lodging 1 confuse 1 1 1 village.1 1 scatter.1 1 forget 1 (must) 1 1 cloud.vi 1 such as.1 1 either way road 1 stop.vi.2 1 perplexed 1 forcibly 1 1 1 1 1 2 exchange 1 1 1 1 1 what hide.vt 1 out after 1 mountain village 1 1 mi cloud.1 1 mountain 3 desolate 1 prosper 1 painful.1 1 high.1 3 seem 1 1 break off 1 1 3 trail 2 1 1 1 (suffix.forbid) 1 be lonely 1 (emph) 1 1 1 1 1 1 1 1 1 for stone 1 waterfall 1 hand 1 run 1 wish 1 1 1 1 1 1 1 1 1 1 1 1 present 1 tell 1 1 1 come 1 1 1 remnant wave 1 1 1 2 1 kurabu.pl 1 1 (p.wish) 6 see.3 10 even 3 (ku) 3 lovely 1 fast/early 1 between 1 2 spring haze 1 send.v2 word 1 1 1 2 transfer.1 1 2 get tired.1 1 Yoshino.PN 1 gorge white cloud 1 foot 1 pull 1 bloom 1 1 seemed 1 1 1 1 1 1 1 1 1 such as.2 1 quiet 1 1 1 1 1 every 1 1 1 1 1 (adv.will) 2 old age 1 1 1 1 say 1 1 1 1 1 1 1 1 1 1 spring add 1 1 (aux.past) 1 (aux.perf) 1 year 1 rare.1 1 day 1 1 this year begin 1 get used 1 1 1 1 1 1 1 today a glance 1 2 1 you.3 1 2 wait 1 mountain area mistake 1 1 1 1 1 number 1 till 1 borrow 1 like.2 1 1 regrettable 1 here we go 1 1 1 1 1 only 1 1 myself 1 1 1 1 1 1 1 1 1 person from old village 3 1 1 1 lodge.n 1 world.2 1 sorrowful.1 1 name 1 vain 1 1 wind.n 3 1 1 1 spring rain regret 1 snow 1 6 remain 1 be over 1 take 1 1 1 1 1 1 1 1 1 1 1 1 1 change.1 1 cross 1 go over 1 1 increase 1 bear 1 think 1 1 love.1 1 blow 1 trust.2 1 1 will not 1 Figure 4: Only with Content Tail Conclusion 1) the distribution of classical texts fits a Gaussian (Bell) curve as well as in modern texts (Hodoˇ cek and Yamamoto 2013); 2) the cw value can separate patterns into three layers (low-, mid-, and high-range) using inflection points (-1σ and 1σ); 3) of the three layers, the high-range could be extracted without a list of stop words; 4) the mid-range lexical layer might include mathematical traits not yet revealed in the present study. Reference Yamamoto, H. (2005), Visualisation of the construction of poetic vocabulary using the database of the Kokinsh¯ u ., Jinbun kagaku to etab¯ esu (Humanities and Database) the 11th symposium, 81–8, The council of humanities and database. Yamamoto, H. (2006), Extraction and Visualisation of the Connotation of Classical Japanese Poetic Vocabulary, Symposium for Computer and Humanities, vol. 2006, 21–28, The information processing society of Japan. Hodoˇ cek, B. and H. Yamamoto (2013) “Analysis and Application of Midrange Terms of Modern Japanese”, in Computer and Hu- manities 2013 Symposium Proceedings, No. 4, pp. 21–26.

Gaussian curve. In this study, we will attempt to expand this …yamagen/JADH2018-poster02.pdf · sider the threshold values that separate upper cuto , mid-range, and lower cuto not

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian curve. In this study, we will attempt to expand this …yamagen/JADH2018-poster02.pdf · sider the threshold values that separate upper cuto , mid-range, and lower cuto not

Hilofumi YamamotoTokyo Institute of Technology

Bor HodoscekOsaka University

A Study on the Distribution of Cooccurrence Weight Patterns ofClassical Japanese Poetry

Introduction

← more generic more specific →

• P: Hairball effect or spoke effect (Yamamoto 2005)• P: Difficult to observe all word adjacency features.• O: Checking the distribution of cooccurrence

weight (Yamamoto 2006)• G: Drawing clear figures and extracting function words.• S: Cooccurrence weight is like the mean

value of two person’s weights.

light heavyC.F.Gauss(1777–1855)

x

σ0−σ

y

1

Tall HeadTall HeadTall Head

~Long Tail DinosaurLong Tail DinosaurLong Tail Dinosaur

Content TailFunction Tail

MethodsMaterial: the Hachidaishu (ca. 905–1205)

Calculation of Cooccurrence Weight: cw

We need the process of untangling hairball to obtain a clear graph model.(Nocajet al. 2014)

One of the methods used in Hodoscek and Yamamoto (2013) exploited theoccurrence not of individual words but of pairwise or cooccurrence patternssuch as‘ fragranceflower ’relationships and revealed that the distributionof cooccurrence weights in modern Japanese texts approximately fitted aGaussian curve. In this study, we will attempt to expand this analysis toclassical texts by utilizing the characteristics of the Gaussian distribution toautomatically group words into three clusters of cooccurrence patterns. Wewill show the differences between each of the three by visualizing with graphmodels.

2 Methods

We use the Hachidaishu as the material of the present study, which comprisesthe eight anthologies compiled under order of the Emperors (ca. 9051205)and contains about 9,500 poems. We developed the corpus and a method ofcooccurrence weighting similar to the tf-idf method, cw (Yamamoto 2006),which calculates the weight of patterns of any two words occurring in a poemsentence (Sparck Jones 1972, Robertson 2004, Manning and Schutze 1999,Rajaraman and Ullman 2012).

w(t, d) = (1+log tf(t, d)) · idf(t)cw(t1, t2, d) = (1+log ctf(t1, t2, d)) · cidf(t1, t2)cidf(t1, t2) =

√idf(t1) · idf(t2)

idf(t) = logN

df(t)

Where w is a weight, t a token, and N the number of tokens. Thefunction idf is called the “inverse document frequency.”(Sparck Jones 1972,Robertson 2004, Manning and Schutze 1999) The function cw is called the“co-occurrence weight,” which allows us to examine the patterns of poeticword constructions through mathematical modeling.

As in Fig. 1, there is a concept (Losee 2001: 1019) of terms located ineach layer being effective query terms. Luhn (1968) cuts the top and bot-tom words of the frequency and uses mid-range vocabulary for the devel-opment of an automatic outline generation system.(Fig. 1) Nagao (1983: 28)also mentioned mid-range vocabulary to be effective in generating automaticabstracts. Nagao (1983)’s viewpoint is slightly different with Luhn (1968) in

2

Bell curve~Distribution of cw becomes Bell curve.ring!ring!

• Over σ =⇒ Content Tail.• Under −σ ⇒ Function Tail.

Result

−2 −1 0 1 2 30

200

400

600

800

1000

z−value

frequency Sakura

(cherry)

−2 −1 0 1 2 3 40

200

400

600

800

1000

1200

1400

1600

z−value

frequency Ume

(plum)

Figure 1: Bell curves

Table 1: Upper cutoff patterns of ame (sakura): cw = co-occurrence weight; z = z-value(normalized value of frequency). word annotations: ari(be), ba(cond.), ha(topic.),hana(flower), hito(human), keri(past.), ki(past.), koso(emphatic.), miru(see), mo(also), nasi(no exist), nu(neg.), o(obj.), omou(think), ramu(aux.will), su(do), te(p.),to(and), ware(we), zo(emphatic.), zu(neg.)

cw z pattern cw z pattern cw z pattern

1 0.62 -0.91 mo–keri 11 0.59 -0.96 nasi–ha 21 0.52 -1.05 nu–o2 0.62 -0.92 hana–o 12 0.57 -0.98 o–ramu 22 0.52 -1.05 o–zo3 0.62 -0.92 o–koso 13 0.57 -0.98 mo–ramu 23 0.52 -1.05 miru–o4 0.60 -0.94 zu–keri 14 0.57 -0.98 ha–ki 24 0.48 -1.09 ba–mo5 0.60 -0.94 su–ha 15 0.56 -1.00 zu–mo 25 0.48 -1.09 o–keri6 0.60 -0.94 to–ba 16 0.56 -1.00 o–te 26 0.43 -1.16 zu–ha7 0.59 -0.96 ari–ha 17 0.55 -1.01 hito–mo 27 0.43 -1.16 to–o8 0.59 -0.96 ari–mo 18 0.54 -1.02 zu–te 28 0.43 -1.16 te–ha9 0.59 -0.96 ware–mo 19 0.52 -1.05 zo–ha 29 0.34 -1.27 o–ha

10 0.59 -0.96 nasi–o 20 0.52 -1.05 omou–o 30 0.34 -1.27 o–mo

as well as in classical texts (Fig. 2). Therefore we will attempt to divide thisshape into three layers by inflection points.

The co-occurrence patterns of sakura (cherry) under -0.9 (near -1) cwvalue are adjacent patterns comprising function words, and over 1 cw valueare those of the patterns with content words as we expected (Table 1 and2). As for the upper cutoff, we used an under -0.9 (near -1) σ value ofcw, which could extract patterns of functional tokens: almost all patternsincluded functional words, while as lower cutoff, we used over 1 σ values,which could extract patterns of content tokens: almost all patterns includedcontent words. Both under -1 and over 1 σ are regarded as inflection pointswhich have mathematically interesting property.

4 Discussion

Inflection points are defined as the points on the curve where the curvaturechanges its sign while a tangent exists.(Bronshtein et al. 2004: 231) We con-sider the threshold values that separate upper cutoff, mid-range, and lowercutoff not as coincidental but as evidential points. It is, however, necessary toconduct further experiments and continue to discuss the mathematical traitsbehind the distributions of cooccurrence weights. In terms of removing thelow-range (upper cutoff) and extracting the high-range (lower cutoff) frompoetic texts, we found that we do not need to use any filters to eliminateterms, since cw values returned semantically cooccurring patterns. Apartfrom low-range and high-range, the characteristics of the mid-range lexicallayer are still unknown.

4

Table 2: Lower cutoff patterns of ame (sakura) in Kokinshu: 30 out of 164 patternsextracted; cw = co-occurrence weight; z = z-value (normalized value of fre-quency) word annotations: ba(cond.), bakari(only), besi(should be), chiru(fall),fukakusa(deepgreen), hana(flower), isa(already), kakusu(hide), katu(win), koku(pull),komoru(go deep inside), magiru(mix), makasu(entrust), maku(wind up), manimani(asit is), masi(as), mazu(mix), me(eye), minami(south), miyako(city), mono(thing),nagara(even if), sakura(cherry), si(emphasic.), sumi(black ink), tatu(start,stand),tazumu(being around), tu(past.), uturou(change), watasu(give), yamakaze(mountainwind), yamu(stop), yanagi(willow), yononaka(world)

cw z pattern cw z pattern

1 3.86 3.18 yamu–manimani 106 2.38 1.31 si–fukakusa2 3.75 3.04 minami–magiru 107 2.38 1.31 sakura–hana3 3.67 2.93 minami–maku 108 2.38 1.31 sakura–isa4 3.61 2.86 maku–magiru 109 2.38 1.31 sakura–ba5 3.42 2.62 yanagi–koku 110 2.38 1.30 sakura–me6 3.38 2.57 yamu–makasu —7 3.38 2.56 mazu–koku 155 2.17 1.04 chiru–katu8 3.27 2.43 yanagi–mazu 156 2.17 1.04 bakari–sumi9 3.26 2.42 sakura–yamu 157 2.16 1.03 maku–besi

10 3.25 2.40 minami–yamakaze 158 2.16 1.03 tatu–maku– 159 2.16 1.03 tatu–tazumu

101 2.40 1.33 uturou–komoru 160 2.16 1.03 tazumu–tu102 2.40 1.33 sakura–watasu 161 2.16 1.03 miyako–sakura103 2.40 1.33 katu–nagara 162 2.16 1.02 kakusu–si104 2.39 1.32 sakura–masi 163 2.14 1.00 yononaka–sakura105 2.39 1.31 sakura–makasu 164 2.14 1.00 mono–sakura

5 Conclusion

Using the distribution characteristics of cooccurrence weights, we were ableto classify cooccurrence patterns into three layers of cooccurrence patterns:high-, mid-, and low-range patterns.

We found that 1) the distribution of classical texts fits a Gaussian curveas well as in modern texts; 2) the cw value can separate patterns into threelayers (low-, mid-, and high-range) using inflection points (-1σ and 1σ); 3)of the three layers, the high-range could be extracted without a list of stopwords; 4) the mid-range lexical layer might include mathematical traits notyet revealed in the present study.

References

Bronshtein, I.N., K. A. Semendyayev, G. Musiol, and H. Muehlig (2004)Handbook of Mathematics : Springer-Verlag, 4th edition.

Hodoscek, Bor and Hilofumi Yamamoto (2013) “Analysis and Application ofMidrange Terms of Modern Japanese”, in Computer and Humanities 2013Symposium Proceedings, No. 4, pp. 21–26.

5

omou (think.1) B1

VERB a3 LK a1 a2 DN gamma b1 b2 END

omou(think)

suru(do.2)

1

haseru(run)

1

(ra)reru(PASS/POT)5

te/de(LK)

24

ni(LK)

1

ta(PERF)

5

no/koto(DN)38

node(because.2)

1

hodo(as much as..)

2

yo(FP)

7

wa(FP)

13

.

37

u(CJR)

1

torawareru(be captured)

1

1

1

2

11

iru(be-AS)23 kuru

(toward-AS)1 koi(love.2)

1

fuan(anxiety)

1

1

15

ka(Q)

1

2

5

5

1

5

da(ASSER)

34

4 sa(FP)

125

kara(because.1) 1

taisetsu(precious)

1

1 1

6

1

14

20

Figure 2: Construction of the predicate of omou (think)

with Function Tail cherry blossom (28/131,4.28): CT ctf.>0.00;non-dist=off; idf=off; pruned under U:1:

here way home1village.11

lodging1

scatter.1

1

forget1

confuse

1

do.1

1

(adv.neg)

1

(must)

1

while

2

�

2

�

1

11

1

1

1

1

1

1

2 21

1

1

1

1

1

1

1

22 1

1

1

1

1

1

1

2 21

1

21

104

2

8

15

8

perplexed

1

(adv.will)4

forcibly

1

�9

�6

even

3

see.310�11

��

2

get used

1

know

1say

2

(p.wish)

6

�7

bloom

1

ç�¡ã�˙5

(adv.would)

1

ã�˛

4break off

1

�

1

�

2

ã�`ã��

1

increase

1

love.1

1

will not

1

kurabu.pl

1

�2

cloud.vi

1

such as.1

1

��2

wait2

�8�3

borrow1

regrettable1

here we go1

like.2

1

blow

2

�佯�

1

regret

1

bear1

think

1

fast/early1

ã�˙ 1

(ku)

3

lovely

1

sorrowful.1

11

1

1

1

22 1

1

1

1

2

3

1

1

1

1

1

1

1

1 1

1

6

3 2

2

1

1

2

(aux.past)

1(aux.perf)1

2

1

8

4

118

41

2 13

1

1

2

113

3

6 1

1

33

(suffix.forbid)

11

painful.1

1high.1

2

(emph)

1

wish

1

1

(interj.)

2

Yoshino.PN1seemed

1

3

2

1

151

quiet

2

although

1

21

1

2

22 1

2 1

1

15

71

7

7

4

��

1

seem

1

1

5

2

1

1

11

5

14

2

ã�ˆã�˝

2

2

11

6

2

1

11

1

7

7

2

2

3

2

2

2

1 2

2

1

1

2

233

1

1

42

2

1

2

1

41

1

either way

1 till

1

person

1

road1

stop.vi.21

go

1

1

1

1

1

2

2

1

1

11

1

1

2

1

1

2 111

1

1

11 lodge.n

1

be.3

1

take1

1

1

1

1

4

2

727 8

2

1

1

1

1

6

1

3

34

73

mountain village

1

1

3

1

2

mind2

wind.n1

mountain2

desolate

1

come1

2

be lonely

1

prosper

1

1

1

1

1

stone

1

waterfall

1

hand

11

run

1

1

stand.vi

11

cross

1

go over

1

send.v2 word

1

hear

1 1

1

present

1

tell

1

2

add 1

get tired.1

12

1

name

1

1vain

1

rare.11

1

1

1

fall

1

1spring rain

1

tear

1

1

1

1

1

1

2

1

1

1

1

1

2

1

2

2

1

1

exchange

1

11

1

1

1

1

1

1

1

2

1112

1

1

2

1

2

2

11

1

see.1

1

1

1

1

1

12

65

4

1

3

2

4

1

1

1

3

1

2

1

1

1

1

1

2

4 8

1

45

2

3

2

1

2

1

1

1

2

1

12

1

1

13

2

3

1

12

1

what1

1

2

thing.2

1

between

1

1spring haze1

hide.vt1

1

1

1

1

1

2

1

11

1

11

1

21

1

24

11

2

11

1

12

1

2

1

myself1

2

1

number

1

only

1

1

1 1

1

1

11

1

1

1

2

1

2 1

1

11

2

1

12

3

11

11

2

1

2

1 1

1

1

1

1

11 1

1

11

111 1

1

snow

1

11

1

such as.2

1

11

1

1

1

11

1

12

1

1

1

2

1

23

1

2

1

1

1

1

trail2

1

change.1

1

transfer.1

11

1

2

1

1

1

1

71

1

7

56

62

2

1

10

1

3

5

23

11

3

1

trust.2

1

1

1

1

2

1

1 11

3

1

1

1

4

2

1

2

51

1

11

1

1 11

11

1

1

1

2

1

1

2

12

1 1

1

thing.1

11

1

spring

1

this year1

��

1

begin1

11

11

1

1

1

21

11

1

1

1

1

1

111

1

1

21

1

year

1

1

11

1

1

1

1

1

111

11

1

11

1

2

1

111

1

1

1

1

1

1

1

white cloud

1

foot

1

pull

1

1

1

1

1

11

1

1

11

1

1

11

1

1

1

1

1

1

11 1

2

1

1

11

11

1

2

2

1

22

1

1

2

1

4

1

2

1

1

1

1

1

3

21

1

2

1

out

1

12

1

111

after

11

11

11112

11

1

1

1

11

1

1

11

2

111

11

1

1

2 225

1

1

11

2

12

1

1

1 1

1

1

1

1

11

1

1

5 25

9

3

2

1

1

34

14

1

1

1

1

1

1

1

1

3

1

1

1

1

1

211

1

31

2

1

1

1

1

mi

221

1

1 1

121

1

1

1

day

111

2

1

cloud.1

1

3

1

1

1

1

1

1

1

1

1

3

1

11

1

1

11

11

1

1

1

1

1

1

1

1

1

13122

1 11

211

1 12

1

11

1

1 12

1

1

11

1

1

1

112

1 11

1

1

11

1

1

1

11

1

1

1

1

413

3

1

1

1 132

1

2

2

1

2

1

1

1

1

21

1 1

1

1

1

11

11

1

1

1

3

2

233

3

11

2

1

2

12

1

1

1

1

1

1 1

1

wave

1

sky

1

11

1

11

1

2

1

1

1

1

11

11

1

1

1

1

1

1

1

1 415

9

1

2

2

2

3

43

121

1

1

1

1

2

1

1

1

1

1

32

1 11

1

1

11

1

1

1

2

1

1

1 1

1

1

1

1

1

2

1

1

1

11

1

1

1

1

1

1

1

1

6

1

32

33

22

1

5

3

12

2

1

1

1

1

1

11

2

1

1

11

1

1

1

2

1

1

24

2

4

421

2

1

1

1

11

1

1

1

1

1

11

1

111

1

1

1

1

1

11

13

1 1

1

1

11

1

1

1

11

11

1

111

1

1

1

1

11

1

1

1

1

11

1

1

21

1

11

1

1

1

1 11

1

22

1

12

1 1

1

1

1

1

1

1

1

for

11

1

1 11

1

1

1

1

1

11 1

1

1

11

11

1

1

1

1

11 1

1

1

1 1

111

1

1

1

1 1

1

1

131 1

1

2

11 11

1

1

11

1

1

1

1

1

1

11

11

1

1

1

1

1

1 1

11

remnant

112

1

1

1

1

1

water1

1

1

11

1 121

1

11

1

1

1

11

1121

1

1

1

1

11 11

2 1

1

11

1

1

11

41

1

11

1

1

2

1

11

1

1

1

1

112

1

11

1

11

1

1

1

1

1

1

1

1

5

4

1

22

3

11

1

1

11

1

1

1

1

1

111

1

1

1

1

11

1

1

1

2

mistake

1

21

11

1

112

1

11

11

1

112

1

1

11

111 2

1

11

1

1 2

1

11

1

111

1

12

1

11

1

111

112

1

1

1 1

1

gorge12

1

1

1

1

1

1

1

1

1

12

1

1

1

1

1

1

1

12

1

1

1

1

1

121

1

1

1

every 1

3

1

12

11

1

1

1 1

1

1

3

1

21

1

1

1

1

31

2

1

1

1

11

old age

1

1

1

1

11

1

11

11

1

1

1

1

11

11

1

11

11

1

1

1

1

131

21

1

211

11

1

1

12

1

1

11

1

1

1

1

1 1

11

1

11

1

1

2

2

2

31

1

1

1

1

1

2

1

2

1

2

1

1

11

1

111

1

2

1

2

2

1

1

1

1

11

11

1

1

21 1

1

1

1

1

2

1

1

1

1

11

1

1

today2

1

1

21

1

11

1

1

1

a glance

1

you.3

1 12

1

1

2

1

1

1

1

1

1

1

11

21

1 1

1

1

1

33

1

1

11

11

11

1

1

1

1

1

34

1

1

2

2

2

1

1

mountain area

11

1

1

1

1

1

1

1

1

11

1

112

3

1

111

1

11

1

1

2

111

1

1 1

1

11

1

11

11

1

1

1 1

1

1

1

1

1

111

1

11

11

1

11

11

1

332

3

2

12

13

3

1

2

1

2

1

1

1

1

2

1

32

1

1

11

1

1

1

1

1

1

1

1

remain

1

be over

111

1

1

1

11

1

1

1

11

1

1

1

1

1

1111

11

1

1

1

1

111

1

11

1

1

1

11 1

11

1

1

1

1

1

1

1

2

1

1

11

1

1

1

1

1

1

1

1

2 12

2

2

1

1

1 121

1

1

1

2 12

31

2

11

1

1

1

1

2

1

1

1

112

1

1

1

12

1

11

11

1

2

111

1

1

1

1 11

1

11

1

1 11

1

1

1 1

1

1

1

1

111

1

1 11

1

21

1

1

2

1

1

1

1

21

1

1

2

11

1

1

1

2

1

person from old village32

1

11

1

1

1

1

1

1

1

world.2

1

11

1

1

1

1

11

11

111

1

1

1

1

11

11

1

1

1

1

11

111

1

1

111 1

colour1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

Figure 3: Hairball effect

cherry blossom (28/131,4.28): CT ctf.>0.00;

non-dist=off; idf=off; pruned under U:1:

here

way home

1

lodging1

confuse

1

1

1

village.1

1

scatter.1

1

forget

1

(must)

1

1cloud.vi

1

such as.1

1

either way

road

1

stop.vi.2

1

perplexed1

forcibly1

1

1

1

1

2

exchange

11

1

1

1

what

hide.vt

1

out after1

mountain village

1 1

mi

cloud.1

1

mountain

3

desolate1

prosper

1

painful.1

1

high.13

seem

1

1

break off

1

1

3

trail

2

1

1

1

(suffix.forbid)

1

be lonely

1

(emph)

1

1

11

1

1

1

1

1

for

stone

1waterfall 1

hand

1

run 1

wish

1

1

1

1

1

1

1

1

1

1

1

1

present

1

tell

1

1 1

come

1

1

1

remnant

wave1

1

1

2

1kurabu.pl

1

1(p.wish)

6

see.310even3

(ku)3

lovely

1

fast/early

1

between

1

2

spring haze

1

send.v2 word

1

1

1

2

transfer.11

2

get tired.1

1Yoshino.PN

1

gorge

white cloud

1

foot 1

pull

1

bloom

1

1

seemed1

1

1

11

1

1

1

1

such as.2

1

quiet

1

1

1

1

1

every1

11

1

1

(adv.will)

2

old age

1

11

1

say

11

1

1

1

1 1

1

11

spring

add

1

1

(aux.past)

1

(aux.perf)

1

year

1

rare.1

1

day

1

1

this year

begin 1

get used

1

11

1

1

1

1

today

a glance

1

2

1

you.3

1

2

wait

1

mountain area

mistake

1

1

1

1

1

number

1

till

1

borrow

1

like.2

1

1

regrettable

1

here we go

1

1

1

1

1

only

1

1

myself

1

1

1

1

1

1 1

1

1

person from old village3

1

11

lodge.n

1

world.2

1

sorrowful.1

1

name

1

vain

1

1

wind.n

3

1

1

1

spring rain

regret1

snow1

6

remain

1

be over

1

take

1 1

1

11

1

1

1

1

1

1

1

1

change.11

cross

1

go over1

1

increase

1

bear

1

think

1

1

love.11

blow

1

trust.2

1

1

will not

1

Figure 4: Only with Content TailConclusion1) the distribution of classical texts fits a Gaussian (Bell) curve as well as in modern texts (Hodoscek and Yamamoto 2013);2) the cw value can separate patterns into three layers (low-, mid-, and high-range) using inflection points (-1σ and 1σ);3) of the three layers, the high-range could be extracted without a list of stop words;4) the mid-range lexical layer might include mathematical traits not yet revealed in the present study.

Reference• Yamamoto, H. (2005), Visualisation of the construction of poetic vocabulary using the database of the Kokinshu., Jinbun kagaku todetabesu (Humanities and Database) the 11th symposium, 81–8, The council of humanities and database.

• Yamamoto, H. (2006), Extraction and Visualisation of the Connotation of Classical Japanese Poetic Vocabulary, Symposium forComputer and Humanities, vol. 2006, 21–28, The information processing society of Japan.

• Hodoscek, B. and H. Yamamoto (2013) “Analysis and Application of Midrange Terms of Modern Japanese”, in Computer and Hu-manities 2013 Symposium Proceedings, No. 4, pp. 21–26.

1