30
Language resources and translation studies: “Search, find, archive and share” Vesna Lušicky University of Vienna Austria Tanja Wissik Austrian Academy of Sciences Austria

Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies: “Search, find,

archive and share”

Vesna LušickyUniversity of Vienna

Austria

Tanja WissikAustrian Academy of Sciences

Austria

Page 2: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Overview

1. Introduction2. Language resources3. Language resource management lifecycle4. Research infrastructures5. Search, Find, Create, Use, Archive, Share6. Benefits of CLARIN 7. Conclusions

Page 3: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Language resources and translation studies “search, find, archive, share”

Language Resources I.

“a set of speech or language data and descriptions in machine readable form,

used for building, improving or evaluating natural language and speech algorithms

or systems, or, as core resources for the software localisation and language

services industries, for language studies, electronic publishing, international

transactions, subject-area specialists and end users.” (ELRA)

Page 4: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Language Resources II.

Data

Tools Standards, Annotation frameworks…

(based on Mörth 2017)

Page 5: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Language Resources III.

• Examples of language resources can be:

– corpora (written and spoken; mono-, bi- and multilingual; parallel and comparable),

– translation memories,

– termbases (terminology databases),

– computational lexica,

– digital dictionaries,

– ontologies, etc.

Page 6: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Language Resources IV.

• Creation and processing:

○ Time consuming and costly,

○ Can involve legal challenges,

○ Can involve technical challenges, etc.

Page 7: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Language Resource Management Lifecycle

RICataloguesRepositories

Find

LookupExploratory

Search

RepositoriesCataloguesCitationLegal Issues

Share

For researchIn teaching

Use

Institutional RepositoryDomain specific RepositoryInfrastructural Repository

Archive

Data collectionData processing

Create

Page 8: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Research Infrastructures

“Facilities, resources and related services that are used by the scientific

community to conduct top-level research in their respective fields and covers

major scientific equipment or sets of instruments; knowledge-based

resources such as collections, archives or structures for scientific

information […]”

((EC) No 723/2009 of 25 June 2009)

Page 9: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

SEARCH

● Lookup○ Has precise search goals, such as finding facts to answer a

specific question.

● Exploratory search○ Includes a variety of qualitative definitions○ Open-ended search goals ○ Faceted search

Page 10: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND

● USE CASE I.● Finding a language resource.

● USE CASE II.

● Finding elements within a resource

Page 11: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case I.Repositories

• Preserve, manage, and provide access to language resources in a variety of formats;

• Curated to enable search, discovery, and reuse;

• Persistent Identifiers;

• Data can be cited;

• Long-term preservation;

• Examples: CLARIN Centres, Zenodo, institutional repositories.

Source: https://repo.clarino.uib.no

Page 12: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case I.Catalogues

• Contain metadata* from repositories about language resources;

• Do not contain the data or tool itself but only refer to it;

• Often enable faceted search;• Useful as a first step to which

language resources are available.

* Data that provides information about other data.

Source: https://vlo.clarin.eu

Page 13: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case II.● Aggregator: Content search in

several resources and several repositories at the same time.

● Useful as a first step to discover where interesting language resources are hosted.

Source: https://spraakbanken.gu.se/ws/fcs/2.0/aggregator

Page 14: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case II.

● Download of the search results.

● Next step: Specialised search in the resource in the repository.

Source: https://spraakbanken.gu.se/ws/fcs/2.0/aggregator

Page 15: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case II.● Specialised search in the

resource in the repository.● Finding elements in authentic

texts, e.g. monolingual corpus, as a support for the production of target text.

○ Example: Corpus of the Contemporary Lithuanian Language

Source: http://corpus.vdu.lt/en/?word=prieglobstis

Page 16: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

FIND: Use Case II.● Specialised search in the resource in the repository.● Finding contrastive elements: e.g. in an aligned parallel corpus

○ Example: EU DGT 2015

Source: https://www.clarin.si/kontext/

Page 17: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

CREATE

● Data collection- Data found in CLARIN- Data collection from scratch

● Data processing- CLARIN tools/services for data processing- Tool chains for data processing

Page 18: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

CREATE: Example Weblicht

Source: https://weblicht.sfs.uni-tuebingen.de/weblicht/

Page 19: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

CREATE: Example ContaWords

Source: http://contawords.iula.upf.edu

Page 20: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

USE: Research – Example

Source:https://vlo.clarin.eu/record?0&docId=http_58__47__47_urn.fi_47_urn_58_nbn_58_fi_58_lb-2016042502&q=the+finnish+broadcasting+corpus&index=1&count=5

Source:https://vlo.clarin.eu/record?3&docId=http_58__47__47_urn.fi_47_urn_58_nbn_58_fi_58_lb-20140730134&q=finnish+broadcasting+corpus&index=0&count=6

Page 21: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

USE: Research – Example

Example: Mikhailov, M., & Cooper, R. (2016). Corpus Linguistics for Translation and Contrastive Studies: A Guide for Research. Routledge.

Source:https://vlo.clarin.eu/record?4&docId=http_58__47__47_urn.fi_47_urn_58_nbn_58_fi_58_lb-201405278&q=mulcold&index=1&count=5

Page 22: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

USE: Research – ExampleMörth, Karlheinz, Daniel Schopper, and Omar Siam. 2017. Linking Instead of Lemmatising. 2017. Enriching the TUNICO Corpus with the Dictionary of Tunis Arabic. In Tunisian and Libyan Arabic Dialects: Common Trends - Recent Developments - Diachronic Aspects, ed. V. Ritt-Benmimoun, 219-238. Zaragoza: Prensas de la Universidad de Zaragoza.

Source: https://vlo.clarin.eu accessed 7th February 2018 Source: https://arche.acdh.oeaw.ac.at accessed 7th February 2018

Page 23: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

USE: Research – Example● Maegaard B. et al. (2017) Observatory for Language Resources and Machine Translation in Europe –

LT_Observatory. In: Quesada J., Martín Mateos FJ., López Soto T. (eds) Future and Emerging Trends in Language Technology. Machine Learning and Big Data. FETLT 2016. Lecture Notes in Computer Science, vol 10341. Springer, Cham.

Page 24: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

USE: In the classroom● Technology-related courses● Information mining● Language-specific courses

Source: Lušicky, Vesna (2017) Towards convergence of e-research in translation studies and blended learning in translator training throughtechnology and language resources.

Source: Centre for Translation Studies, University of Vienna.

Page 25: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

ARCHIVE

“A curation activity that ensures that data are properly selected, stored, and can be accessed, and for which logical and physical integrity are maintained over time, including security and authenticity”

(http://dictionary.casrai.org/Archiving)

Page 26: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Repositories

● Institutional repository● Domain specific repository● Infrastructural repository

Page 27: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

CLARIN depositing services● Decentralized● CLARIN Centres● List of CLARIN Centres:

https://www.clarin.eu/content/depositing-services

Source: https://www.clarin.eu/content/services

Page 28: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

SHARE

● Citability● Persistent Identifier: Assigned permanent

addresses● Licences

Source: https://arche.acdh.oeaw.ac.at/browser/

Page 29: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

BENEFITS OF CLARIN

• Time and cost effectiveness: Re-use of language resources from already existing repositories and catalogues.

• Accessibility: Distributed access to a larger number of resources.• Durability: Archiving your own data, responding to technology

changes.• Discoverability: Make your data discoverable.• Citability: Make your data citable and traceable for other researchers.• Interoperability: Make resources reusable in various tools and

scenarios.

Page 30: Language resources and translation studies: “Search, find ... · Language resources and translation studies “search, find, archive, share” CLARIN Workshop, Vienna, 8th February

Language resources and translation studies “search, find, archive, share”

CLARIN Workshop, Vienna, 8th February 2018 Vesna Lušicky & Tanja Wissik

Thank you!

Questions?

[email protected]

[email protected]