34
EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research Center on CALL

Contextualising concordances for corpusCALL

  • Upload
    efia

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Contextualising concordances for corpusCALL. Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research Center on CALL. Overview. Corpora for CALL: samples Types of sample rendition XML: new opportunities. Corpora for CALL /1. Corpora for learning activities - PowerPoint PPT Presentation

Citation preview

Page 1: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Contextualising concordances for corpusCALL

Hans Paulussen & Piet DesmetK.U.Leuven / KULAK

ALT Research Center on CALL

Page 2: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Overview

• Corpora for CALL: samples

• Types of sample rendition

• XML: new opportunities

Page 3: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /1

• Corpora for learning activities– before: preparation of exercises– during: corpus material as part of the learning

activity– after: corpus material for feedback

Page 4: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /2

• Corpora as reference material– learner dictionaries– learner grammars

Page 5: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Corpora during learning activities

• corpus is part of learning activity– Mariana (Vordingburg Gymnasium, Denmark)

• http://www.vordingbg-gym.dk/km/ict4lt/

• corpus supports learning activity– NEDERLEX (FUNDP Namur)

• http://obelix.droit.fundp.ac.be/droit1/index.php

Page 6: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 7: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

REBECA

• Ressources Electroniques Bilingues Extraites de Corpus Alignés (bilingual electronic ressources extracted from aligned corpora)

• Parallel corpus:5,000,000 Dutch 5,000,000 French

• automatic corpus selection• sentence alignment

Page 8: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 9: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Resource links

REBECA

alignedbilingualcorpus

filteredKWICfiles

coursetexts

lexicon

Page 10: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

http://corpora.informatik.uni-leipzig.de/

Page 11: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /1• At the winery, Evxinograd's director, Ivan Penkov, is pouring out

glasses of his 20-year-old brandy. (source: Wall Street Journal 1991)

• Drinking glasses are plastic and are weighted on the bottom so that they are tough to knock over. (source: Wall Street Journal 1991)

• Noting that Winston Churchill regularly drank several glasses of whiskey, brandy, champagne, and at least one high-ball during the working day, the Economist observed on March 4, "he could never have been trusted to run the Pentagon." (source: Wall Street Journal 1989)

• The two drain their glasses in one gulp. (source: Wall Street Journal 1991)

Page 12: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /2• Yet, consumerism lived, even if it didn't fill too many champagne

glasses. (source: Wall Street Journal 1988)• After Investcorp took over, Tiffany played down its $10 wine

glasses to concentrate on the high-priced diamonds and gold jewelry that had made it famous. (source: Wall Street Journal 1990)

• At a black-tie benefit hosted by ARA Services Inc. a few weeks ago, Chairman Joseph Neubauer and members of his management team exuded confidence as they moved from one dinner table to the next, shaking hands, patting backs and clinking glasses. (source: Wall Street Journal 1988)

Page 13: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Spectacles• Mr. Brown wears tinted aviator glasses, combat boots, and a

Soldier of Fortune cap on his closely shaved head. (source: Wall Street Journal 1991)

• She must wear prism glasses to correct double vision caused by the accident. (source: Wall Street Journal 1991)

• In meetings, he often can be seen chewing on the end of his reading glasses; sometimes, he speaks so softly that he can't be heard. (source: Wall Street Journal 1990)

• His horn-rimmed glasses and rakish beret were irresistibly photogenic. (source: Wall Street Journal 1987)

Page 14: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Problematic glasses• And its replaceable filters are good for only about 100 glasses.

(source: Wall Street Journal 1991)• I still have to find his glasses and keys for him. (source: Wall Street

Journal 1988)• The police confiscated her watch and glasses. (source: Wall Street

Journal 1989)• He plans to hand out 100 glasses when he performs in

Washington, D.C., in December at the Kennedy Center's Mozart Festival. (source: Wall Street Journal 1991)

• He often hands out glasses to his audience and has them play chords. (source: Wall Street Journal 1991)

• The glasses were my idea. (source: Wall Street Journal 1988)

Page 15: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Rendering authentic text samples

• Linked samples

• Extracted samples

• Embedded samples

Page 16: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Linked samples

• The sample is linked to the original document (e.g. pdf document)– Original context & layout– Full context– Problem: sample skimming

Page 17: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Extracted samples

• The example is extracted from the original document (e.g. KWIC concordance)– Sample shown in immediate context– Layout: not authentic– Context: limited

Page 18: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Embedded samples

• The example is embedded in the original document– Sample shown in full context– Layout: authentic– Problem: recreating and indexing the

document

Page 19: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

XML -> XHTML

• XML: extensible markup language

• Stylesheets:– CSS: cascading style sheet– XSLT: XML style sheet transformations

• XHTML

Page 20: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Web reinvents standardisation

• SGML: standard generalized markup language (1968; ISO in 1986)

• HTML: hypertext markup language (1993)

• XML: extensible markup language (1998)

• XHTML: extensible HTML

Page 21: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 22: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 23: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 24: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 25: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

<?xml version ="1.0" encoding="ISO-8859-1"?><!DOCTYPE poème SYSTEM "poemfr.dtd"><poème><préambule><titre>Chanson d'automne</titre><recueil>Poèmes saturniens</recueil><date>1866</date><auteur>Paul Veraine</auteur></préambule><corps><stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r/>De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r/>Monotone.</ligne></stance>

Page 26: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

poem.dtd

<?xml version="1.0" encoding="ISO-8859-1"?><!-- poemfr.dtd : DTD pour poésie M. Goossens --><!ELEMENT poème (préambule, corps)><!ELEMENT préambule (titre, recueil?, date?, auteur)><!ELEMENT titre (#PCDATA)><!ELEMENT recueil (#PCDATA)><!ELEMENT date (#PCDATA)><!ELEMENT auteur (#PCDATA)><!ELEMENT corps (stance|ligne)+><!ELEMENT stance (ligne)+><!ELEMENT ligne (#PCDATA|r)*><!ELEMENT r EMPTY>

Page 27: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

xpath

$ xpath -e '//*/stance[contains(., "langueur")]' Verlaine1.xml Found 1 nodes in Verlaine1.xml:-- NODE --<stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r />De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r />Monotone.</ligne></stance>

Page 28: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 29: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 30: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 31: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 32: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 33: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Page 34: Contextualising concordances for corpusCALL

EUROCALL September 2006 — Universidad de Granada

Conclusion

• Recreating an authentic document containing indexed samples is feasible

• At what cost?– Full control of production cycle– Text and images?– Optimisation of on-the-fly rendition