27
Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit¨ at Konstanz ParGram Meeting Spring 2014 1 / 27

Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Finding Semantic Classes of Nouns for Hindi/UrduComplex Predicates

Sebastian Sulger & Ashwini Vaidya

Universitat Konstanz

ParGram MeetingSpring 2014

1 / 27

Page 2: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

The situation

Spoken and written Hindi/Urdu: heavy, productive use of complexpredicates (CPs) across domains

Different types of CPs:

Aspectual V+V CPs: gIr pAr. ‘suddenly fall’ (lit. ‘fall fall’)Permissive V+V CPs: jane de ‘let go’ (lit. ‘go give’)N+V CPs: yad kAr ‘remember’ (lit. ‘memory do’)

In other languages:

take a bath (≈ ‘bathe’)give a stir (≈ ‘stir’)in Betracht ziehen ‘consider’ (lit. ‘in look-at pull’)

2 / 27

Page 3: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

The challenges

General problem in deep and shallow parsing methods for Hindi/Urdu(and other South Asian languages): proper treatment of ComplexPredicates

Automatic distinction of CPs from simplex verbsExtraction of subcategorization framesSemantic role labelingDrawing semantic inferences

Research questions:

What existing resources may be employed to explore CP usage?Can we confirm/reject existing theoretical hypotheses of N+V CPs?How far can clustering algorithms take us?... and how “good” are the resulting classes?

3 / 27

Page 4: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

Contents

1 Hindi/Urdu Noun-Verb Complex Predicates

2 Corpus study

3 Evaluation

4 Semantic Classes

4 / 27

Page 5: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

The construction

Combination of noun and light verb to form a single predicational unit

Noun contributes main predicational content (including argument(s)),light verb dictates case marking and expresses subtle lexical semanticdifferences

Highly productive constructions

[Ahmed and Butt, 2011]: proposal for different classes of N+V CPsbased on a small case study of 45 nouns

light verbN+V type kAr ‘do’ ho ‘be’ hu- ‘become’ analyisclass a + + + psych predicationsclass b + − + only agentiveclass c + + − subject is not an undergoer

Table: Classes of nouns identified by [Ahmed and Butt, 2011]

5 / 27

Page 6: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

Class A: psych predications

Occur with all three light verbs examined by [Ahmed and Butt, 2011]

(1) a. lAr.ki=ne kAhani yad k-igirl.F.Sg=Erg story.F.Sg memory.F.Sg do-Perf.F.Sg‘The girl remembered a/the story.’(lit. ‘The girl did memory of the story.’)

b. lAr.ki=ko kAhani yad hEgirl.F.Sg=Dat story.F.Sg memory.F.Sg be.Pres.3.Sg‘The girl remembers/knows a/the story.’(lit. ‘Memory of the story is at the girl.’)

c. lAr.ki=ko kAhani yad hu-igirl.F.Sg=Dat story.F.Sg memory.F.Sg be.Perf-F.Sg‘The girl came to remember a/the story.’(lit. ‘Memory of the story became to be at the girl.’)

6 / 27

Page 7: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

Class B: agentive CPs

Require an agentive (ergative-marked) subject and light verb kAr ‘do’

(2) a. bIlal=ne mAkan tAmir kI-yaBilal.M.Sg=Erg house.M.Sg construction.F.Sg do-Perf.M.Sg‘Bilal built a/the house.’(lit. ‘Bilal did construction of the house.’)

b. * bIlal=ko mAkan tAmir hEBilal.M.Sg=Dat house.M.Sg construction.F.Sg be.Pres.3.Sg

c. * bIlal=ko mAkan tAmir hu-aBilal.M.Sg=Dat house.M.Sg construction.F.Sg be.Perf-M.Sg

7 / 27

Page 8: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

Class C: subject not an undergoer

Exclude the light verb hu- ‘become’

(3) a. bIlal=ne yih sArt. tAslim k-iBilal.M.Sg=Erg this condition.F.Sg acceptance.M.Sg do-Perf.F.Sg‘Bilal accepted this condition.’(lit. ‘Bilal did acceptance of this condition.’)

b. bIlal=ko yih sArt. tAslim hEBilal.M.Sg=Dat this condition.F.Sg acceptance.M.Sg be.Pres.3.Sg‘Bilal accepted this condition.’(lit. ‘Acceptance of this condition was at Bilal.’)

c. * bIlal=ko yih sArt. tAslim hu-aBilal.M.Sg=Dat this condition.F.Sg acceptance.M.Sg be.Perf-M.Sg

8 / 27

Page 9: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

And beyond ...

[Ahmed and Butt, 2011] looked at a set of three light verbs

Extending the set of light verbs brings up new questions

Nouns that occur with kAr ‘do’ and de ‘give’ (but exclude other lightverbs)

(4) a. nadya=ne lAr.ki=ko pAramArs kI-yaNadya.F.Sg=Erg girl.F.Sg=Acc advice.M.Sg do-Perf.M.Sg‘Nadya advised the girl.’(lit. ‘Nadya did advice to the girl.’)

b. nadya=ne lAr.ki=ko pAramArs dI-yaNadya.F.Sg=Erg girl.F.Sg=Acc advice.M.Sg give-Perf.M.Sg‘Nadya advised the girl.’(lit. ‘Nadya gave advice to the girl.’)

9 / 27

Page 10: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

And beyond ...

Nouns that occur with kAr ‘do’ only, not with de ‘give’

(5) a. bIlal=ne mAkan tAmir kI-yaBilal.M.Sg=Erg house.M.Sg construction.F.Sg do-Perf.M.Sg‘Bilal built a/the house.’(lit. ‘Bilal did construction of a/the house.’)[Ahmed and Butt, 2011, p. 3]

b. * bIlal=ne mAkan tAmir dI-yaBilal.M.Sg=Erg house.M.Sg construction.F.Sg give-Perf.M.Sg

10 / 27

Page 11: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

And beyond ...

Nouns that occur with le ‘take’ only, not with any other light verb

(6) a. nadya=ne lAr.ki=ko god lI-yaNadya.F.Sg=Erg girl.F.Sg=Acc lap.F.Sg take-Perf.M.Sg‘Nadya adopted the girl.’(lit. ‘Nadya took lap to the girl.’)

b. * nadya=ne lAr.ki=ko god kI-yaNadya.F.Sg=Erg girl.F.Sg=Acc lap.F.Sg do-Perf.M.Sg

c. * nadya=ne lAr.ki=ko god dI-yaNadya.F.Sg=Erg girl.F.Sg=Acc lap.F.Sg do-Perf.M.Sg

11 / 27

Page 12: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Hindi/Urdu Noun-Verb Complex Predicates

Goals of the investigation

How do the proposals by [Ahmed and Butt, 2011] hold up towards alarger empirical basis (i.e., bigger corpora)?

Extend the set of light verbs

Apply different strategies of acquiring knowledge about CPs:

“Brute-force” statistical approach, based on bigram extraction,collocation analysis and clustering [Butt et al., 2012]“Seed list” approach, using knowledge amassed from treebanksand clustering, and try to do evaluation of clusters

Come up with semantic classes of nouns:

Members of classes will behave in a coherent way with respect to thelight verbs they may occur withOf great use for the Hindi/Urdu grammar: extend noun lexicon, definetemplates of N+V CPs

12 / 27

Page 13: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Corpus study

Contents

1 Hindi/Urdu Noun-Verb Complex Predicates

2 Corpus study

3 Evaluation

4 Semantic Classes

13 / 27

Page 14: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Corpus study

Methodology

In a recent corpus study on Hindi, we used the approach below:

1 Use corpus of 17 million words harvested from BBC Hindi website &Hindi wikipedia

2 Look at a set of seven light verbs: kAr ‘do’, ho ‘be’, de ‘give’, le‘take’, rAkh ‘put’, lAg ‘be attached’, a ‘come’ (seven most frequentlyoccurring light verbs)

POS tagged, lemmatized using a state-of-the-art Hindi tagger[Reddy and Sharoff, 2011]

14 / 27

Page 15: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Corpus study

Methodology

3 Make use of the Hindi-Urdu Treebank (HUTB) [Bhatt et al., 2009]

Includes dependency annotation schemeEmploys label pof (for part of) to annotate complex predicatesExtract all items that are tagged as nouns and carry pof label

→ “Seed list” of nouns that we know take part in N-V CPs

4 Extract all bigrams which have one of the seven light verbs (theirlemmas) on the right (frequency cutoff 10, to get rid of some spellingvariation as well as marginal usages)

15 / 27

Page 16: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Corpus study

Methodology

5 Compute relative frequencies of noun combined with light verbs (880noun instances)

kAr ho de le rAkh lAg aID noun ‘do’ ‘be’ ‘give’ ‘take’ ‘put’ ‘attach’ ‘come’1 tAnav ‘tension’ 0.115 0.562 0.058 0.058 0.000 0.000 0.2072 bhag ‘part’ 0.149 0.365 0.119 0.253 0.000 0.000 0.1153 ag ‘fire’ 0.110 0.251 0.087 0.000 0.055 0.443 0.0554 mAzuri ‘sanction’ 0.000 0.000 0.757 0.243 0.000 0.000 0.0005 dhava ‘attack’ 1.000 0.000 0.000 0.000 0.000 0.000 0.0006 krIpa ‘mercy’ 0.409 0.486 0.000 0.000 0.105 0.000 0.000

Table: Relative frequencies of co-occurrence of nouns with light verbs

16 / 27

Page 17: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Corpus study

Methodology

6 Apply clustering algorithm to the data

Clustering the nouns based on their occurrence patterns with light verbsk-means clusteringProblems: How good are resulting clusters? What value should we usefor k?

→ How to evaluate?

We already know that our combinations (“seed list” nouns + lightverbs) form legitimate CPs.What we don’t know is how semantically coherent the clusters are.We also don’t know which k is giving us the best (i.e. mostexpressive/semantically most coherent) clusters.(But k = 8 seemed to be a good value during initial inspection.)

17 / 27

Page 18: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Evaluation

Contents

1 Hindi/Urdu Noun-Verb Complex Predicates

2 Corpus study

3 Evaluation

4 Semantic Classes

18 / 27

Page 19: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Evaluation

Preliminary evaluation using WordNet

Hindi WordNet publicly available [Bhattacharyya, 2010]

Follow the technique described by e.g. [Van de Cruys, 2006] for eachk = 2, ..., 10

Extract synonyms, hypernyms and hyponyms for every word in a clusterChoose cluster centroid: word with most semantic relations with everyother word in clusterExtract co-hyponyms, i.e. the hyponyms of the hypernyms (sisters inthe ontology tree), for each centroid from WordNet (along with theirsynonyms, hypernyms and hyponyms)Calculate precision for each cluster: count number of words thatoverlap with words in centroid’s relations & divide by number of wordsin cluster

19 / 27

Page 20: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Evaluation

Preliminary evaluation using WordNet

k Precision

2 0.04523 0.03714 0.05675 0.08116 0.08227 0.07988 0.08989 0.074010 0.082

Table: Evaluating cluster size using semantic relations in WordNet (low precisionvalues because of small size of data given to the algorithm)

→ Result: most coherent clusters according to evaluation with k = 8

20 / 27

Page 21: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

Contents

1 Hindi/Urdu Noun-Verb Complex Predicates

2 Corpus study

3 Evaluation

4 Semantic Classes

21 / 27

Page 22: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

Overview

Preliminary overview of semantic classes of nouns (labels partly borrowedfrom [Ahmed, 2010]/[Ahmed and Butt, 2011]):

Property Cluster Size Light Verbs

1. Change of state 22 a ‘come’2. Mental state 102 ho ‘be’3. Sending away 110 de ‘give’4. Mental state/mental action 101 ho ‘be’; kAr ‘do’5. Action 476 kAr ‘do’6. Sudden event 10 lAg ‘attach’7. Short duration/durative 28 rAkh ‘keep’8. Ingestive/mental gain 31 le ‘take’

Table: Occurrences of light verb with semantic classes of nouns

22 / 27

Page 23: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

Description of classes I

Class 1: a ‘come’ — events with a direction that has a beginning,path and end

bAdlav a ‘change come, change’

Class 2: ho ‘be’ — mental states/psych predicates, require dativesubjects

dukh ho ‘sadness be, be sad’, khed ho ‘regret be, regret’

Class 3: de ‘give’ — events involve “transmissions” away from thesender/subject

sAndes de ‘message give, give a message’, hUkm de ‘order give, ordersomeone’

Class 4: ho ‘be’, kAr ‘do’ — mental states/mental actions, subjectcase marking alternates between dative and ergative, depending onlight verb

pyar ho ‘love be, love’

23 / 27

Page 24: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

Description of classes II

Class 5: kAr ‘do’ — largest class, dynamic events/actions, takeergative subjects

dAstAkhAt kAr ‘signature do, sign’, fon kAr ‘phone do, call someone’,tExt kAr ‘text do, text someone’ (and other borrowings from English)

Class 6: lAg ‘attach’ — sudden events

jhAtka lAg ‘jolt attach, get jolted’, chot lAg ‘injury attach, get injured’

Class 7: rAkh ‘keep’ — durative, non-momentary events

tAllUkh rAkh ‘contact keep, keep in touch’, khAyal rAkh ‘care keep, takecare’

Class 8: le ‘take’ — events involve “transmissions” to thereceiver/subject, which is the endpoint of transmission

sApAth le ‘oath take, take an oath’, sAharA le ‘shelter take, take shelter’

24 / 27

Page 25: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

Summary

Some nouns heavily lexicalized towards a peculiar semanticconfiguration (i.e., compatible with a smaller subset of light verbs)

Others may occur with a wider range of light verbs

→ Use for grammar development?

Lexicon developmentCan define templates, based on classificationHandle new coinages/borrowings, predict their usage

Future work:

Apply method to Urdu dataRefine/narrow down clusters (using more data/more features/morelight verbs)

25 / 27

Page 26: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

References I

Ahmed, T. (2010).

The interaction of light verbs and verb classes of Urdu.In Interdisciplinary Workshop on Verbs - The Identification and Representation of Verb Features, Pisa.

Ahmed, T. and Butt, M. (2011).

Discovering Semantic Classes for Urdu N-V Complex Predicates.In Proceedings of the International Conference on Computational Semantics (IWCS 2011).

Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D., and Xia, F. (2009).

A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu.In Proceedings of the Third Linguistic Annotation Workshop, pages 186–189, Suntec, Singapore. Association forComputational Linguistics.

Bhattacharyya, P. (2010).

IndoWordNet.In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pages3785–3792.

Butt, M., Bogel, T., Hautli, A., Sulger, S., and Ahmed, T. (2012).

Identifying Urdu Complex Predication via Bigram Extraction.In In Proceedings of COLING 2012, Technical Papers, pages 409 – 424, Mumbai, India.

Lamprecht, A., Hautli, A., Rohrdantz, C., and Bogel, T. (2013).

A Visual Analytics System for Cluster Exploration.In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations,pages 109–114, Sofia, Bulgaria. Association for Computational Linguistics.

26 / 27

Page 27: Finding Semantic Classes of Nouns for Hindi/Urdu …...Finding Semantic Classes of Nouns for Hindi/Urdu Complex Predicates Sebastian Sulger & Ashwini Vaidya Universit at Konstanz ParGram

Semantic Classes

References II

Reddy, S. and Sharoff, S. (2011).

Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using TeluguResources.In Proceedings of the Fifth International Workshop On Cross Lingual Information Access, pages 11–19, Chiang Mai,Thailand. Asian Federation of Natural Language Processing.

Van de Cruys, T. (2006).

Semantic Clustering in Dutch.In Proceedings of the Sixteenth Computational Linguistics in Netherlands (CLIN), pages 17–32.

27 / 27