Upload
dighumlab
View
371
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Language Materials/Tools and CLARIN
Bente MaegaardCentre for Language TechnologyUniversity of Copenhagen
DIGHUMLAB 1 – Language Materials and Tools
Objective of DIGHUMLAB• Provide humanities researchers with access to digital data and tools for research and education
The theme Language Materials and Tools defines its object as all types of collections of materials which are expressed in language, be it written or spoken, including multimodal such as videos.
Collections of materials are of course interesting, but the real value comes with the services which are offered together with the materials – tools for analysing, visualising, storing and retrieving, modifying, comparing, annotating etc.
Many humanities researchers may think: Do we need this? Or why do we need this?
Dias 2
Centre for Language Technology
Aarhus September 2012
Why is this interesting?
The answer is that many sciences have benefitted immensely from introducing IT into their research. • Makes it easier to do the same things as before• More importantly: adds new dimensions and new
perspectives – and maybe new research questions• Opens the possibility of sharing• But of course not all research will benefit
Dias 3
Centre for Language Technology
Aarhus September 2012
In what way is this new?
It is true that many researchers have been using digital resources and tools for years.
What makes this special is that everything gets under the same roof – in two ways• National roof: DIGHUMLAB• International roof: CLARIN
Another important feature is that this research infrastructure is meant to be persistent – if not eternal.
Data which are stored here will be available after your research project is over, more data may be provided by others.
Collaboration is made possible and supported this way.
Dias 4
Centre for Language Technology
Aarhus September 2012
DIGHUMLAB and CLARIN
This part of DIGHUMLAB relates to the European research infrastructure CLARIN ERIC, to be presented later.
So, the activities that are being performed constitute the Danish contribution to CLARIN, and some of them are coordinated by CLARIN.
Our work plan in short• Collect digital material, make available• Collect and create tools, make available• Improve and extend the existing technical infrastructure• Disseminate knowledge about language resources, tools
etc – knowledge sharing at the national level as well as the international level
• Provide a CLARIN technical centre
Dias 5
Centre for Language Technology
Aarhus September 2012
Workplan – digital resources and tools
First step – a survey of existing resources and needs• Information meetings and follow-up meetings with
researchers at the Danish universities• Identify existing resources which could be integrated
• Determine the need for update, conversion, etc to CLARIN formats
• Clarify the existing licenses, copyrights etc• Identify needs – what is needed in order for a teacher to
use language materials and tools in the classroom, for exercises etc.? Very important for the take-up, so that next generation is better prepared.
• We have had meetings at the University of Copenhagen, and will visit all universities (with humanities faculties) in Denmark
For tools similarly, but we already have a long list of wishes for tools and services
Dias 6
Centre for Language Technology
Aarhus September 2012
Examples of data and services
•Text, old and modern•Literature, language for special purposes•Parallel texts for translation studies•Videos, - audio and gestures•Newspapers, news on other media•Parliament debates•Tomb stones
•Add annotation (e.g. morpohology, lemma, analysis of gestures)•Search all occurrences of the same gesture•Find the most common pattern of xx•Find all names in historical texts•Find all different pronunciations of the letter ’a’ in Danish and their frequency•Find positive or negative expressions relating to the bank sector in FR and DK newspapers between 1980 and 2000.
Dias 7
Centre for Language Technology
Aarhus September 2012
Extend current technical infrastructure, CLARIN technical centre
It is part of the Danish CLARIN contribution to provide a technical centre, authorisation and authentication mechanisms, trust federation with the other European CLARIN centres etc.
This will be part of the DIGHUMLAB central operations. Ongoing.
In the DK-CLARIN national project a first technical infrastructure was built, clarin.dk, which already has many of the features required and contains many resources.
User friendliness to be improved, tools and services to be extended and developed.
Dias 8
Centre for Language Technology
Aarhus September 2012
Knowledge sharing
National and international activities
• PhD courses• Courses at undergraduate level• Workshops on special issues
Centres of expertise – Important instrument• Identify existing centres of expertise in Denmark that
are prepared to act as knowledge centres. Ongoing.
Dias 9
Centre for Language Technology
Aarhus September 2012
Summing up: Language materials and CLARIN
User driven
Focus on • Long-term storage• Tools and services• User-friendliness• Being present at all universities and at other institutions
(cultural institutions such as libraries (SB, KB), National Museum, Danish Language Council, Society for Danish Language and Literature etc.)
Dias 10
Centre for Language Technology
Aarhus September 2012