48
1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney http://www.sultry.arts.usyd.edu.au/kirrkirr/

1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

Embed Size (px)

Citation preview

Page 1: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

1

Kirrkirr: Transforming the Representation of Lexical

Knowledge

Christopher ManningUniversity of Sydney

http://www.sultry.arts.usyd.edu.au/kirrkirr/

Page 2: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

2

Project ObjectivesAims of the project:• examining the richness of lexical structure, in

particular the connotational and figurative use of words

• providing innovative ways for representing a dictionary, through creative use of the medium of computers

• augmenting dictionaries from corpora• to be able to provide practical educationally

useful programs as a result (at low labor cost)Main initial target: an interactive front end for

exploring or using the Warlpiri dictionary.

Page 3: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

3

Acknowledgements

• Ken Hale, Mary Laughren, Robert Hoogenraad, Jane Simpson, David Nash

• Many Warlpiri (Kay Ross for the audio)• Kevin Jansz, Nitin Indurkhya, Wee Jim Sng• Susan Poetsch, Miriam Corris• and many others

Page 4: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

4

Research Program: Lexicon

• A lexicon is not just words but a vast network of associations between words and within and across the concepts represented by words

• The aim of this work is to provide people with a better understanding of this conceptual map.

• Traditional paper dictionaries offer very limited ways for making such networks visible

• On a computer, one can imagine all sorts of ways of bringing out such relationships

Page 5: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

5

Research: Computational Lexicography

• Dictionaries on computers are now commonplace

• But there has been little attempt to utilize the potential of the new medium

• Goal: fun dictionary tools that are effective for language learning, browsing, and research

• Special interest: dictionaries for minority languages. Here economic, motivational, and user support reasons all point to an important role for computers.

Page 6: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

Research: Computational Lexicography

• Dictionaries on computers are now commonplace– But there has been little attempt to utilise

the potential of the new medium– Most present a plain, search-oriented

representation of the paper version• Goal: fun dictionary tools that are effective

for browsing and language learning (cf. Kegl 1995)

Page 7: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

Research: Computational Lexicography

• Fun dictionary tools: – Like flicking through a paper dictionary, but

better– Innovative ways for representing and

linking dictionary information, through creative use of computer software

– Should improve user supports and incidental learning

• Focus: exploration/dissemination, not creation

Page 8: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

8

MRD Structure

• The internal structures of current Machine Readable Dictionaries usually merely mimic the structure of the printed form (Boguraev 1990)

• Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organization (here, organization via “synsets” which are related via links of part, subkind, opposite)

• But this research hasn’t been taken to users.

Page 9: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

9

Research Program: Education

• Dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met

• Weiner (1994) : The initial purpose of the OED:– “to create a record of vocabulary so that

English literature could be understood by all. But English scholarship grew up and lexicography grew with it … inevitably parting company with the man in the street.”

• Challenge is to avoid this.

Page 10: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

10

Dictionary usefulness and usability

Kegl (1995) “Machine-Readable Dictionaries and Education”

• “Originally, this paper was intended as a survey of educational applications using MRDs. As far as I have been able to determine, no such applications currently exist”

• Standard dictionaries are reference works, ill-suited for use as learning tools

• Studies of American ‘dictionary skills training’ show that many tasks achieve little in the way of education (but do teach word lookup!)

Page 11: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

11

Educational value of dictionaries

However derived lexical information is useful!Think of a high school foreign language textbook• terminology sets• pictures with parts named• vocabulary lists• word explicationsMajor issue:• Not many people sit around reading

dictionaries – need something fun

Page 12: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

12

Data on usability: evaluating a paper dictionary

• Study of paper dictionary usability by Susan Poetsch, tested using Alawa dictionary (draft by Margaret Sharpe)

• In community, old people are very concerned to keep language strong, and help as volunteers in bilingual education. They are keen on dictionary

• However, they lack the literacy skills to use it• Susan worked with people aged 25–50• Since volunteers, probably better than average

literacy skills for the community

Page 13: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

13

Findings

• Not very literate: A big dictionary is overwhelming to someone with emerging literacy skills

• People knew words are ordered but could not use ordering effectively (restart or flick randomly)

• Often around 3 minutes a word lookup• People lost place in page regularly• An overcrowding of information is confusing• One word correspondences are easiest for users,

but often unrealistic linguistically• Subentries were confusing; part of speech puzzling

Page 14: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

14

Findings (2)

• Regular dictionary users (especially, compilers!) grossly underestimate the time they have spent becoming familiar with dictionary structure

• If a dictionary is going to be made for a speech community, then the people in that community need to feel confident in using it.

• Teachers felt that the draft dictionary is too long and detailed for school use

• Conclusion: These people need a different dictionary (My First Alawa)

• Would probably be used by adults as well as kids

Page 15: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

15

Our educational goals

• Aim at school kids• “Information seeking is a complex process which

is often not attended to in K-12 education” (Wallace et al. 1998)

• Provide learner supports for getting started with dictionaries

• Adaptable interface: can cater to different needs• Support for active reading by allowing note taking• An interface where you can see words, but are

not required to know words

Page 16: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

16

Kirrkirr: A Warlpiri dictionary browser

(Jansz 1998; Jansz, Manning and Indurkhya 1999)

• An environment for the interactive exploration of dictionaries.

• The design is general, but our current work has just been with Warlpiri

• Attempts to more fully utilize graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information

• Written in Java, it can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross-platform support).

Page 17: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

17

Specific goals

• An interactive environment that encouraged exploration: easy and fun to use

• Reduction of the dependence on alphabetical order: The low level of literacy in the region makes an e-dictionary potentially more useful than a paper edition

• Catering to the needs of different user groups (kids, teachers, professionals)

• Flexible enough to display appropriate information in appropriate ways depending on user level

Page 18: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

18

Overview

Kirrkirr provides various modules• Graph layout of word relationships• Formatted dictionary entries• Semantic domain browsing• A notes facility for ‘jotting in the margin’• Multimedia: audio, pictures• Advanced searching interfaces• others in planning: colors, figuration patternsThese attempt to cater to users with different

competence levels

Page 19: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

19

Page 20: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

20

The lexical database

• Original text materials are stored in an ad hoc format using backslash codes [origin: runoff]

• These are converted to XML using an error-correcting stack-based parser (written in PERL)– The inconsistency and flexibility of dictionary entries

made this a surprisingly difficult task.– Many structural errors/inconsistencies/typos from years

of hand maintenance in text editors and via regular expressions

– Many problems with link consistency– Heuristic content-sensitive parser imposes data

integrity– Lots of Information Systems 101

Page 21: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

21

XML

• XML: a descendant of SGML for structured markup of text

• XML separates the structure of the data from its presentation

• Much of the recent enthusiasm for XML has centered around representing simple and rigid structures such as database records

• The rich hierarchical and variable structure of dictionary entries is really more what something like XML excels at!

• Result remains a portable, tangible text file

Page 22: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

22

XML indexing

• XML is a median between the structure, indexing, etc. of a database, and the freedom of a word processor.

• To improve speed, an ad hoc index to the XML file is built, and can be used for rapid headword and gloss lookup and indexing which parts of the XML file to process.

Page 23: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

23

Visualization of dictionary information

• For applications with simple textual content behind them, there is little that can be done but an on-line reflection of a printed page

• But we want more than just definitions of words: we want to know their relationships to other words, and the patterning in these relationships

• In a computational approach, can mediate between the lexical data and the user

• The interface can select from and choose how to present information (according to the user’s preferences) – in many different ways

Page 24: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

24

Previous work

• Current systems present the search-dominated interface of classic Information Retrieval systems: you type a word in a search box

• Results try to mimic, but are generally inferior to, the printed version of the dictionary

• Good feature: rapid searching• These systems do little to utilize the

captivating qualities of computers: interactivity, user control and adaptability (Brown 1985).

Page 25: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

25

Previous work (2)

• Only effective when user has a clearly specified information need – even here, we are ignoring the distinction between information gained and knowledge sought (Sharpe 1995)

• Lack browsing, and chances for incidental or curiosity driven learning

• Lack tangibility and situatedness of paper: ineffective for getting an idea of a collection

• We wish to exploit the essence of hypertext, which is “click to explore” browsing

Page 26: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

26

Previous work (3)

• Little research work (in corpus linguistics, visualization etc.) on dictionary visualization

• WordNet built a rich network of relationships, which fundamentally departed from the paper dictionary tradition, and has been used in many computational projects

• However very little has been done in the way of interfaces that make these relationships visible and intelligible to users.

• Graphical representations seem particularly important given our target users.

Page 27: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

27

MRD Interfaces: WordNet

Page 28: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

28

Graph-based visualization

• There is a little previous work on graphical representations of dictionaries

• For instance, the visual-thesaurus by plumbdesign derived from WordNet

• But it is also a good demonstration of how chaotic and confusing graphical interfaces can become.

Page 29: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

29

Perils of visualization

Page 30: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

30

Graph-based visualization(Jansz 1998; Jansz, Manning and Indurkhya 1999)• Classic graph layout problem• Adapts work by Eades et al. (1998) and Huang et

al. (1998) on visualization and navigation of WWW document linkages

• Uses the spring algorithm. Big advantage is that it is an iterative updating algorithm, and so gives an easy interactivity:– it wiggles and people can play with it.

• Clarity and simplicity of graph: Software maintains a set of focus nodes to prevent overcrowding

Page 31: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

31

Educational advantages

• Alphabetical order is important, but• A web of words offers other effective

opportunities for learning • A student can opportunistically explore words

that are related in various ways• Important semantic relationships can be

understood

Page 32: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

32

Kirrkirr network display

Page 33: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

33

Kirrkirr network display

Page 34: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

34

Formatted dictionary entries

• Are produced automatically from the XML by using XSL (a style language)

• XSL allows easy modeling of some user preferences.

• Most trivially, one can leave out information such as part of speech, or detailed definitions

• This is useful as many users find information overload quite confusing and demotivating

• Can produce bilingual or monolingual dictionary• Opportunities for various output styles, and

formats such as RTF or TeX for printing.

Page 35: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

35

Formatted dictionary entries

Page 36: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

36

Rich typology of link types

• The semantically rich types of linkages present in a dictionary (synonym, antonym, hyponym, subheadword, variant, coverbs, …) solves one of the major problems of the web: we have many link types with a clear semantic interpretation

• Use consistent color-coded text and edges to show these link types

• Can tell where you are going before clicking• Dictionary links can be supplemented by links

derived from collocational analysis of texts

Page 37: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

37

A collocations e.g

pangurnu ‘digging scoop’:– pangurnu– pili small coolamon/digging scoop– rdaku hole in the ground– kaninjarra downwards– pangirni dig, produce cavity– mulju soak in soft earth (dig for water)– karlaja foot end of sleeping area– pirrkirni scrape– yirrarni put down

Page 38: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

38

Browsing

• Work (at PARC and elsewhere: Pirolli et al. 1996) has stressed role for browsing as well as searching in information access

• It provides a context for learning• We provide browsing in several ways:

– conventional hypertext• but with rich semantically-interpreted links• their color-coding matches network edges

– network-based display of words– browsing through semantic domains

Page 39: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

39

Semantic Domains

• Alphabetical order is one indexing strategy, but there are many others

• Most requested is ability to find things by semantic domains: e.g., food, manufactured items.

• Essentially the nouns structure of WordNet, or the classical KR ISA hierarchy

• We can exploit the domain info in the dictionary

Page 40: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

40

Semantic Domains

Page 41: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

41

Other components

• Multimedia (currently pictures and audio)– Can hear pronunciations – gives a better under-

standing of pronunciation than phonetic symbols– pictures are more intelligible than descriptions– (future: videos of Warlpiri sign language?)

• Advanced search page– search various fields,

regular expressions, fuzzy spelling, etc.

• Notes:– one can annotate

dictionary entries (to correct or personalise)

Page 42: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

42

Simple features

• Show the alphabet• The list on the left gives concreteness, and

tangibility– people can start with one of those words

• One can just type a few letters and then look at the list – traditional benefit of paper dictionary

• English lookup can be helpful when Warlpiri spelling fails

• “Fuzzy spelling” of Warlpiri – a user support

Page 43: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

User study

Mim Corris (Yuendumu, Willowra), Jane Simpson (Lajamanu)

• Observation and testing with primary and (lower) secondary students

• Observation of Warlpiri literacy workers• Comments from teachers, other adults etc.• Purely qualitative observational studies of

dictionary use. • Initial reactions quite enthusiastic• Could use as a basis for classroom activities (better

with some further development: games and puzzles)

Page 44: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

A positive anecdote“One of the introductory Warlpiri literacy students, who had

not been very interested in the literacy class, spent nearly 3/4 hour looking at Kirrkirr apparently in absorbed concentration. She wasn’t especially interested in the sound and picture possibilities. She moved between words, scrolling along the list, typing in the search, clicking on the words in the network pane. She wasn’t even put off when the dictionary definitions stopped appearing – looking at the networks of words instead. This is quite unlike her attitude to the backslash coded electronic dictionary (where she lost interest quickly because of the difficulty for her of narrowing down searches). After the Kirrkirr demo she asked if she could have a printed dictionary to take away with her to use in camp to learn the words. I interpret this as a desire to learn words in her own time and place.”

Page 45: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

45

Endangered lang. dictionaries

(Corris, Manning, Poetsch, and Simpson 1999). Based on 72 people.

• Testing both paper and electronic dictionaries– competing goals: documentation dictionaries vs.

maintenance/learning dictionaries– symbolic vs. practically useful organization– lack of training, and limited literacy can make

paper dictionaries ineffective• 45–60 minutes for 12 dictionary lookups…

– lack of electricity makes e-dictionaries ineffective in some places (e.g., Indonesia)

• E-dictionaries can solve many usability issues– font size, amount of info, ‘infinite’ space, easy

lookup, sound

Page 46: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

Conclusions

• Kirrkirr is just a prototype of what one can do to develop new ways to visualise lexicons

• We have demonstrated an approach to making dictionary information usable through the creation of an application which mediates between well-structured data and users’ needs for searching/browsing and presentation

• While we have focused our research on Warlpiri, the system can be easily applied to other languages – the design is general

Page 47: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

ANLPF Tutorial 12th January 1998

Conclusions (cont.)

• “... The best future applications of MRDs in education will be those most able to respond to the insights and needs of their users” (Kegl 1995)

• Kirrkirr can be seen as a step towards the future of e dictionaries

Page 48: 1 Kirrkirr: Transforming the Representation of Lexical Knowledge Christopher Manning University of Sydney

48