Contextualising concordances for corpusCALL

Preview:

DESCRIPTION

Contextualising concordances for corpusCALL. Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research Center on CALL. Overview. Corpora for CALL: samples Types of sample rendition XML: new opportunities. Corpora for CALL /1. Corpora for learning activities - PowerPoint PPT Presentation

Citation preview

EUROCALL September 2006 — Universidad de Granada

Contextualising concordances for corpusCALL

Hans Paulussen & Piet DesmetK.U.Leuven / KULAK

ALT Research Center on CALL

EUROCALL September 2006 — Universidad de Granada

Overview

• Corpora for CALL: samples

• Types of sample rendition

• XML: new opportunities

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /1

• Corpora for learning activities– before: preparation of exercises– during: corpus material as part of the learning

activity– after: corpus material for feedback

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /2

• Corpora as reference material– learner dictionaries– learner grammars

EUROCALL September 2006 — Universidad de Granada

Corpora during learning activities

• corpus is part of learning activity– Mariana (Vordingburg Gymnasium, Denmark)

• http://www.vordingbg-gym.dk/km/ict4lt/

• corpus supports learning activity– NEDERLEX (FUNDP Namur)

• http://obelix.droit.fundp.ac.be/droit1/index.php

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

REBECA

• Ressources Electroniques Bilingues Extraites de Corpus Alignés (bilingual electronic ressources extracted from aligned corpora)

• Parallel corpus:5,000,000 Dutch 5,000,000 French

• automatic corpus selection• sentence alignment

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

Resource links

REBECA

alignedbilingualcorpus

filteredKWICfiles

coursetexts

lexicon

EUROCALL September 2006 — Universidad de Granada

http://corpora.informatik.uni-leipzig.de/

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /1• At the winery, Evxinograd's director, Ivan Penkov, is pouring out

glasses of his 20-year-old brandy. (source: Wall Street Journal 1991)

• Drinking glasses are plastic and are weighted on the bottom so that they are tough to knock over. (source: Wall Street Journal 1991)

• Noting that Winston Churchill regularly drank several glasses of whiskey, brandy, champagne, and at least one high-ball during the working day, the Economist observed on March 4, "he could never have been trusted to run the Pentagon." (source: Wall Street Journal 1989)

• The two drain their glasses in one gulp. (source: Wall Street Journal 1991)

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /2• Yet, consumerism lived, even if it didn't fill too many champagne

glasses. (source: Wall Street Journal 1988)• After Investcorp took over, Tiffany played down its $10 wine

glasses to concentrate on the high-priced diamonds and gold jewelry that had made it famous. (source: Wall Street Journal 1990)

• At a black-tie benefit hosted by ARA Services Inc. a few weeks ago, Chairman Joseph Neubauer and members of his management team exuded confidence as they moved from one dinner table to the next, shaking hands, patting backs and clinking glasses. (source: Wall Street Journal 1988)

EUROCALL September 2006 — Universidad de Granada

Spectacles• Mr. Brown wears tinted aviator glasses, combat boots, and a

Soldier of Fortune cap on his closely shaved head. (source: Wall Street Journal 1991)

• She must wear prism glasses to correct double vision caused by the accident. (source: Wall Street Journal 1991)

• In meetings, he often can be seen chewing on the end of his reading glasses; sometimes, he speaks so softly that he can't be heard. (source: Wall Street Journal 1990)

• His horn-rimmed glasses and rakish beret were irresistibly photogenic. (source: Wall Street Journal 1987)

EUROCALL September 2006 — Universidad de Granada

Problematic glasses• And its replaceable filters are good for only about 100 glasses.

(source: Wall Street Journal 1991)• I still have to find his glasses and keys for him. (source: Wall Street

Journal 1988)• The police confiscated her watch and glasses. (source: Wall Street

Journal 1989)• He plans to hand out 100 glasses when he performs in

Washington, D.C., in December at the Kennedy Center's Mozart Festival. (source: Wall Street Journal 1991)

• He often hands out glasses to his audience and has them play chords. (source: Wall Street Journal 1991)

• The glasses were my idea. (source: Wall Street Journal 1988)

EUROCALL September 2006 — Universidad de Granada

Rendering authentic text samples

• Linked samples

• Extracted samples

• Embedded samples

EUROCALL September 2006 — Universidad de Granada

Linked samples

• The sample is linked to the original document (e.g. pdf document)– Original context & layout– Full context– Problem: sample skimming

EUROCALL September 2006 — Universidad de Granada

Extracted samples

• The example is extracted from the original document (e.g. KWIC concordance)– Sample shown in immediate context– Layout: not authentic– Context: limited

EUROCALL September 2006 — Universidad de Granada

Embedded samples

• The example is embedded in the original document– Sample shown in full context– Layout: authentic– Problem: recreating and indexing the

document

EUROCALL September 2006 — Universidad de Granada

XML -> XHTML

• XML: extensible markup language

• Stylesheets:– CSS: cascading style sheet– XSLT: XML style sheet transformations

• XHTML

EUROCALL September 2006 — Universidad de Granada

Web reinvents standardisation

• SGML: standard generalized markup language (1968; ISO in 1986)

• HTML: hypertext markup language (1993)

• XML: extensible markup language (1998)

• XHTML: extensible HTML

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

<?xml version ="1.0" encoding="ISO-8859-1"?><!DOCTYPE poème SYSTEM "poemfr.dtd"><poème><préambule><titre>Chanson d'automne</titre><recueil>Poèmes saturniens</recueil><date>1866</date><auteur>Paul Veraine</auteur></préambule><corps><stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r/>De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r/>Monotone.</ligne></stance>

EUROCALL September 2006 — Universidad de Granada

poem.dtd

<?xml version="1.0" encoding="ISO-8859-1"?><!-- poemfr.dtd : DTD pour poésie M. Goossens --><!ELEMENT poème (préambule, corps)><!ELEMENT préambule (titre, recueil?, date?, auteur)><!ELEMENT titre (#PCDATA)><!ELEMENT recueil (#PCDATA)><!ELEMENT date (#PCDATA)><!ELEMENT auteur (#PCDATA)><!ELEMENT corps (stance|ligne)+><!ELEMENT stance (ligne)+><!ELEMENT ligne (#PCDATA|r)*><!ELEMENT r EMPTY>

EUROCALL September 2006 — Universidad de Granada

xpath

$ xpath -e '//*/stance[contains(., "langueur")]' Verlaine1.xml Found 1 nodes in Verlaine1.xml:-- NODE --<stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r />De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r />Monotone.</ligne></stance>

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

EUROCALL September 2006 — Universidad de Granada

Conclusion

• Recreating an authentic document containing indexed samples is feasible

• At what cost?– Full control of production cycle– Text and images?– Optimisation of on-the-fly rendition

Recommended