Language Technologies within · Language Technologies within A Digital Agenda for Europe and other perspectives DG Information Society and Media of the European Commission supports

LanguageTechnologies withinA Digital Agenda for

Europe and otherperspectives

DG Information Society and Media ofthe European Commission supportsmultilingual technologies through researchand innovation programmes. The currentlevel of EU support for languagetechnology projects exceeds � 70 millionand will reach � 130 million in 2012

Kimmo RossiEuropean CommissionDirectorate General InformationSociety and Media

The challenge

Future online services, whether addressingbusiness, administrative or social needs, have tobridge language barriers. To achieve this, lan-guage technologies still need to find their wayinto constantly evolving web services and tech-nologies. New ways of communicating andcollaborating online are no longer the realm ofa few corporate sites or online communities,but an integral feature of most portals and ser-vices. Millions of users produce and consumeonline content which changes shape every sec-ond. Language technologies need to adapt tothe challenge of delivering transparently, “on-the-fly”, while coping with different forms anduses of written and spoken language. At the

same time, the European map is still fragment-ed and scattered with too many blank spots.

With the help of our portfolio of projects, wehave made a good start – bringing differentcommunities and specialist groups together,stimulating cooperation between users andproviders, making supply and demand meet inspecific application domains. But we still havea long way to go, and the process will taketime, substantial resources, and a commonvision.

What happens next?

The ICT Work Programme for 2011-2012 oper-ations will be published in the coming weeks.A dozen calls for proposals are planned over thenext two years. The 7th call launched in lateSeptember will provide � 50 million forresearch activities under Objective 4.2 –Language Technologies. In February 2011,another call will be published providing � 35million within Objective 4.1 – SME action onDigital Content and Languages.

The small and medium size enterprises (SME)action is a new concept in many ways. Firstly,it provides an opportunity for fast-movingSMEs, including start-ups and university spin-offs, to test and validate innovative ideas.Secondly, it provides new ways of linking theinformation management community with thelanguage technology and resources communi-ty. Thirdly, since participation is not limited toSMEs, the action provides result-orientedopportunities for new partnerships: publicwith private, SMEs with well established com-panies, data providers with developers andleading users. The overall goal is to generateadded value from trading, pooling and exploit-ing data in order to produce novel technolo-gies, products and services.

Looking further into the future

The recently published A Digital Agenda forEurope (DAE) throws a big challenge to thelanguage technology community. The DAEsets out to boost economic growth and exploit

the full benefits of the digital economy by cre-ating a borderless and interoperable digital sin-gle market. More concretely, the DAE invitesthe Commission to “work with stakeholders todevelop a new generation of web-based appli-cations and services, including for multilingualcontent and services, by supporting standardsand open platforms through EU-funded pro-grammes.”

In practice, this means broader and closer col-laboration between EU projects, businesses,institutions and administrations. We will workon multilingual services and platforms togeth-er with other EC departments, EU institutionsand international organisations exposed tomultilingualism. We will liaise with the webindustry to establish standards and best prac-tices to make the web a more language-friend-ly environment, and to pave the way for wide-spread deployment of language technologies.

In 2011 we will set up a Business Forum, bring-ing together current and emerging stakeholdersfrom the supply and demand side. The forumis expected to help define the requirements ofboth providers and users of language technolo-gy products and services, with emphasis oninnovation and demand-driven technologydevelopment. At the same time, we expectongoing and upcoming EU-funded actions todeliver a compelling and widely supportedStrategic Research Agenda for the field at large.This agenda will be released in 2012 at a pointin time when important decisions are going tobe taken on the next R&D framework pro-gramme, due to start towards the end of 2013.

Data sharing and portability have becomemore important than ever for researchers,developers and users. Fifteen years after ELRAwas established, new initiatives such as CLAR-IN and META-NET can promote a new cul-ture and help pool, preserve and reuse valuableresources. While these projects serve differentcommunities and purposes, there is amplescope for collaboration, e.g. implementingopen repositories, unifying metadata descrip-tions, and solving legal issues hampering thewide availability of research results. C

Number 9-10, 2010, March-June

22 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN

Editors’Foreword

Marko Tadi}& Dan CristeaCLARIN Newsletter editors

It seems that, as we approach the end of theproject, the agenda of CLARIN-related

activities and events becomes more and moreheavy. If you go through this double issue, evenfrom the tip of the finger, you will be surprisedto see how many important things has hap-pened from the beginning of this year, and atwhat speed we are moving forward. In the fol-lowing we will browse for you the content ofthis CLARIN Newsletter, the double issue9-10, not necessarily in the order the articlesare included.

First, we are happy to bring to the CLARINreaders a word from the very heart of theEuropean Commission, there where the strate-gic decisions are being taken in LanguageTechnology. Kimmo Rossi is giving the freshand very welcome news of a hot autumn andwinter: very generous calls of the ICT WorkProgramme are waiting for us behind a doorthat we are invited to open and 2011-2012 willbring even more amazing initiatives.

Then, we announce new launches of pro-grammes, projects and collaborations. InNederlands, the CLARIN-NL Project hasreceived an extremely generous financing,which could be taken as an example by many

CLARIN member states, news offered to us byJan Odijk. António Branco and Marko Tadi}relate about the recent launch of thePortuguese CLARIN in Lisbon. The sameAntónio, together with Vera Lúcia Strube deLima, Thiago Pardo and Steven Krawer,describe, from the position of principal actors,another important event involving thePortuguese language, the PROPOR confer-ence, which can be seen as initiating a promis-ing collaboration ofCLARIN on the otherside of the Atlantic, inBrasil. From Kaunas,we hear the voice ofRu-ta Marcinkevi~iene

.,

bringing to the CLAR-IN community newsfrom Lithuania, thenew member of CLA-RIN, and its remark-able progresses in HLT.

Then, as in all our pre-vious issues, a series ofarticles relate about CLARIN activities. Thenew face and functionalities of the CLARINwebsite are commented by Corina Dima, itsmain implementer (WP6). Ville Oksanen,Krister Lindér and Hanna Westerlund, a Finishteam (WP8), present the principal LRs distrib-ution licenses standardized within the project.And the state of the art in speech technologiesand how are they related to language learningCALL systems, as possible part of the CLAR-IN infrastructure, is contributed by CatiaCucchiarini.

Finally, a number of articles present the mainLR&T events in Europe within the last fewmonths. The LREC series of conferences, dueto a rich heritage and an almost perfect con-ceptualization and organization, has becomeone of the major global events of our commu-nity, attended now by 1,246 registered partici-pants at the main conference and workshops.Less known, perhaps even funny, stories from

the recent Malta event are brought forward bythe chair of the local organization committee,Mike Rosner, while the conference main chair,Nicoletta Calzolari, together with ClaudiaSoria and Riccardo Del Gratta, describe one ofthe most important side effects of the MaltaLREC – the LRT Map. A great idea, which,will certainly have important consequences inthe making of a global thesaurus of languageresources and tools. Then, the already signifi-

cant CICLing, held thisyear for the first time inEurope, in Ias,i, is pre-sented by its very inven-tor, Alexander Gelbukh,and the principal localorganizer, Corina Fo-rDscu. One of the mostimportant CLARINmeetings since thebeginning of the projectwas the join gatheringof its boards, whichincluded the Executive

Board, the Scientific Board, the StrategicCoordination Board and the InternationalAdvisory Board. The coordinator of the pro-ject, Steven Krauwer, is giving a short presenta-tion, with a stress on the advances towardsestablishing the CLARIN ERIC, the newframework under which the project will func-tion starting with the construction phase. Youwill find a word, then, on the recent join meet-ing of CLARIN and FLaReNet in Stockholm,by Rolf Carlson, Kjell Elenius and DavidHouse. The discussions focused on speech andmultimodality, as well as on the best practicesthat would enable to integrate tools and toachieve uniformity in annotation. You can findalso a very short survey of the LT Days thattook place in Luxembourg, reported by MarkoTadi}.

Enjoy reading it! We would be happy to know

your opinion, your news, your suggestions.

Write us! C

List of national

correspondents

Austria

Gerhard Budin

Belgium – Flanders

Inneke Schuurman

Bulgaria

Svetla Koeva

Croatia

Marko Tadi}

Czech Republic

Karel Pala

Denmark

Bente Maegaard

Hanne Fersøe

ELRA/ELDA

Stelios Piperidis

Khalid Choukri

Estonia

Tiit Roosmaa

Finland

Kimmo Koskenniemi

France

William Del Mancino

Bertrand Gaiffe

Germany

Lothar Lemnitzer

Greece

Maria Gavrilidou

Hungary

Tamás Váradi

Italy

Valeria Quochi

Latvia

Andrejs Vasiljevs

Malta

Mike Rosner

Netherlands

Peter Wittenburg

Norway

Koenraad De Smedt

Poland

Maciej Piasecki

Portugal

Antonio Branco

Romania

Dan Cristea

Dan Tufis,

Spain

Nuria Bel

Sweden

Sven Strömqvist

UK

Martin Wynne

Call for contributions

Dear readers of the CLARINNewsletter,If you have ideas, thoughts,comments, additions, corrections,arguments, questions etc. which areconnected to the CLARIN project,even remotely, please feel free tosend them to us as your contributionat [email protected] or directly tothe editors at [email protected] [email protected].

CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 33

Jan OdijkUtrecht University

CLARIN-NL forms the Dutch

national counterpart of the CLAR-

IN enterprise on the European level

(CLARIN-EU). CLARIN-NL covers a

period of 6 years, partitioned in three

phases of two years: preparation, con-

struction, and exploitation. CLARIN-

NL effectively started in April 2009 and

has a budget of 9.01 million euro.

CLARIN-NL is the first CLARIN-relat-

ed national project that has been award-

ed money for the implementation and

exploitation phases. Its website

(http://www.clarin.nl) is mostly in

English, so additional information on

the project can be easily obtained here.

CLARIN-NL has been set up as a mix-

ture between a programme and a pro-

ject, providing both the flexibility to

adapt the contents to new develop-

ments, and to new players in the field

(e.g. humanities researchers not reached

yet). At the same time it offers opportu-

nities for defining a few longer term pro-

jects in selected areas so that knowledge

and expertise built up will be sustained

in the participating institutes.

Wide coverage of academicinstitutions

Currently CLARIN-NL has 23 partici-

pants from linguistics and the humani-

ties more broadly, and it is open for new

Dutch participants. It includes all uni-

versities with a humanities faculty or

with expertise in language and/or speech

technology, and several institutes from

the Royal Netherlands Academy of Arts

and Sciences. The institutes carry out

research in linguistics construed broadly

(10), language technology (6), speech

technology (2), culture (2), lexicography

(2), social history (4), and literature (1).

CLARIN-NL thus covers a large part of

humanities research and some social sci-

ences research. Libraries are represented

as well (2), and 5 institutes are data cen-

tres. Of these, 4 (INL, MPI, Meertens,

and DANS) have expressed the inten-

tion to become a CLARIN centre (type

A or B), and some others are considering

this.

CLARIN-NL sub-projects

CLARIN-NL has initiated a range of

sub-projects. Some of these concern the

design and implementation of the infra-

structure and the CLARIN centres in

the Netherlands, among them a metada-ta project to carry out first tests of the

Component Metadata Infrastructure

(CMDI) as developed in CLARIN-prep

(WP2 and WP5) against a representative

sample of data residing in Dutch

research organizations; the infrastructureimplementation project (IIP) to design

and implement the technical infrastruc-

ture with CLARIN data centre candi-

dates as participants; the Search &Develop project in which centralized

metadata search and distributed content

search functionality will be implemen-

ted.

A project has been started up to carry

out a User Survey and Baseline Measure-ment. The survey will be carried out via

a set of interactive interviews in which

the surveyor gets into a dialogue with

the researcher to get an overview of

his/her research questions and may sug-

gest possible tools and data that might

facilitate the research. The baseline mea-

surement will inventory the current use

or non-use of digital data and tools in

the humanities making it possible to

determine the impact of the CLARIN

infrastructure on the humanities

research in a later stage.

A call has been launched in 2009 for

Data Curation and Demonstrator Projectswhich has led to 11 small projects car-

ried out by LST and humanities

researchers, coordinated by members

from the target group (humanities

researchers) and focusing on needs

derived from the humanities researchers'

research questions.

The goal of a data curation project is to

adapt an existing resource so that it is

visible, uniquely referable and accessible

via the web, and properly documented.

The goal of a demonstrator project is to

create a documented web application

The CLARIN-NL Project

Continued on the next page �

CCLLAARRIINN--NNLL kkiicckk--ooffff mmeeeettiinngg,, UUttrreecchhtt,, 22000099--0055--2277


starting from an existing tool or applica-

tion that can be used as a demonstrator

and function as a showcase of the type of

functionality CLARIN will incorporate

and support.

Important goals common to both types

of projects are (1) apply standards and

best practices and make use of the sug-

gested CLARIN architecture and agree-

ments to understand their limitations

and the requirements for extensions; and

(2) establish requirements and desidera-

ta for the CLARIN infrastructure.

In particular, all projects will have to

make metadata for their resources in

accordance with the CMDI, and con-

tribute to semantic interoperability by

mapping the data categories used to data

categories in ISOCAT, or by creating

new data categories in ISOCAT.

In this way, curated resources and a

range of showcases will become avail-

able. At the same time evidence-based

requirements and desiderata for the

CLARIN infrastructure and supported

standards and best practices will be

obtained making it possible to influence

the final selection of standards and best

practices promoted by CLARIN. These

data curation and demonstrator projects

neatly complement the European

CLARIN preparatory project, in which

the budget for a work package on these

issues was cut by the European

Commission. An overview of the award-

ed projects can be found on the CLAR-

IN-NL website.

Dutch-Flanders cooperation

A 3-year project for cooperation betweenthe Netherlands and Flanders has started.

The cooperation project consists of mul-

tiple aspects, but the core is a project in

which existing tools for Dutch are

adapted to become web services that can

be used in a work flow system. This will

be done for two modalities: text and

speech. The users involved in this pro-

ject are researchers from literary studies

and archaeology on the text side, and

social history on the speech side.

Offering web services in a work flow will

become one of the most important func-

tionalities the CLARIN infrastructure

will offer. The added value of the coop-

eration with Flanders can be found in

the fact that the tools have originally

been developed together and concern

the shared Dutch language.

Education, training andawareness

The CLARIN-NL project includes a

plan for creating awareness and for edu-

cation and training. These activities

(some already undertaken, more in the

pipeline) involve active contributions as

well as financial and logistic support for

a range of events such as workshops,

tutorials, summer and winter schools,

project web site and wiki, newsflashes,

mailing lists, publications in magazines,

etc. CLARIN-NL is also working on

introducing courses for working with

digital data and tools in the CLARIN

infrastructure as part of the regular

humanities curriculum.

Relation with CLARIN Europe

CLARIN-NL is complementary to the

European CLARIN preparatory project

in several respects: First, CLARIN-NL

does not only cover the preparatory

phase, but also the implementation

phase and part of the exploitation phase

of the CLARIN infrastructure. Second,

CLARIN-NL carries out activities in the

preparatory phase, such as the metadata

project, data curation and demonstrator

projects, for which no funds are avail-

able in the European preparatory pro-

ject.

Future

The CLARIN-NL executive board is

currently analyzing the situation in the

Netherlands. It attempts to identify gaps

in topics and disciplines covered that are

as yet underrepresented, as well as essen-

tial infrastructure functionality that is

not covered yet. It will – based on this

analysis, the results of the user survey,

and after consultation with its advisory

panels – work out a proposal for a new

call for sub-projects in 2010 with clearly

identified priorities. It will also deter-

mine the best organisation of this call,

both in terms of character (open call,

tender, direct assignments), budget, and

timing. It is expected that the plans for a

new call will be available in June 2010.

CLARIN-NL aims to create an infra-

structure that is supposed to have a long

term existence, although it is just a pro-

ject, finishing in 2014. It is therefore of

utmost importance to already start

working on embedding the CLARIN

infrastructure in the normal research

and centre activities and preparing both

a governance structure and structural

financing to ensure the long-term exis-

tence of the infrastructure. Therefore it

is closely tracking the developments in

Europe, especially with regard to the

CLARIN ERIC, and assessing how to

best play a role in this organisation.

Conclusions

The CLARIN-NL project can serve as

an excellent example for other national

CLARIN projects: its set-up as a mix of

programme and project, creates flexibil-

ity, and is an excellent way of involving

as many relevant national parties.

Moreover data curation and demonstra-

tor projects offer opportunities to seri-

ously test the standards and best prac-

tices promoted by CLARIN, strength-

ening these standards and best practices,

and provide arguments for modifica-

tions or extensions. More generally,

these projects will yield evidence-based

requirements and desiderata for the

CLARIN infrastructure. This is the best

way to ensure possibilities for influenc-

ing a selection of standards and best

practices in CLARIN that is compatible

with the existing national data. In addi-

tion the projects will yield curated data,

and, via the demonstrators, a range of

showcases that can be used to explain

and demonstrate the advantages of the

CLARIN infrastructure and the new

possibilities it will offer to researchers.

Finally, the project requires cooperation

between intended users (humanities

researchers) and the technology

providers (infrastructure specialists and

HLT providers), with a central role for

the users' research questions and thus

contributes to bringing these different

communities together in collaborative

projects. C


Corina Dima“Alexandru Ioan Cuza”University of Ias,i

The CLARIN project internal website is a

space used by the researchers in the pro-

ject to work on their tasks and collaborate

with the other members of their group.

There are currently 194 member institutions

in CLARIN, from 33 different countries,

and a number of 406 registered users.

In March this year the CLARIN project

internal website was updated. It received a

fresh look and a lot of new functionality was

added. In this article we'd like to give a short

overview of these changes.

Site navigation

The site navigation system now features a

blue drop-down menu at the top of the page,

which contains handy links to the most

important pages in the site. Here you can

find an overview of CLARIN, its structure

and its member institutions, in the Clarinmenu, a series of published material under

Publications, links to the home pages of the

project's groups under Clarin Groups, a list

of events relevant for CLARIN under Events,a list of metadata inventories hosted by

CLARIN under Resources and a list of fre-

quently asked questions about CLARIN and

a Website section under Help Desk.

News, events and newsletter

On the left side of the page, three informa-

tion blocks contain updates relevant to the

CLARIN community:

1. The News section brings under the mem-

bers attention information about new

developments in CLARIN and related

infrastructures, about important upcom-

ing events and various other news relevant

for the community.

2. The Upcoming events section offers infor-

mation about future events.

3. The CLARIN Newsletter section high-

lights the latest newsletter published by

CLARIN. The newsletter presents infor-

mation and new developments from all

over the CLARIN world, and is published

4 times a year.

Sections for registered users

The employees of CLARIN Member institu-

tions are allowed to create an account on the

CLARIN internal website. Registered users

have access to content creation facilities,

which are found in the grey user menu.

Content creation

Under the Create content tab members can

find several types of content that they are

able to create on the internal site:

• Book page: book pages are the simplest

type of content on the site; they can be

used to add new pages of information.

• Deliverable: the deliverables content type is

used to add the project deliverables and

milestones to the site.

• Event: the event content type allows the

members to add new events to the internal

website; the events can be either public

(like a conference) or organized by CLAR-

IN.

• News item: relevant news items can be

added to the website using this form; ideal-

ly, the added information should be gener-

al and relevant for all the CLARIN mem-

bers;

• Presentation: members can upload their

CLARIN-related presentations using the

provided form;

• Resource: members are encouraged to use

this form to add metadata about existing

linguistic resources to the CLARIN inven-

tory;

• Tool: tools are also harvested in the CLAR-

IN inventory; this form offers the possibil-

ity to add metadata about new tools.

Site outline

All the content on the website was reorga-

nized and added to a hierarchy. It can be

accessed via the Site Outline menu link

found in the grey user menu. On the root

level there are the most important sections of

the site, mainly related to the CLARIN

groups, but also to the most important types

of content. Almost all the created content is

added automatically to the hierarchy. For

example, all the events are by default placed

The New Face of theCLARIN Project Website

CCLLAARRIINN wweebb ssiittee rreeddeessiiggnneedd

Continued on the next page �

under the Events node when they are creat-

ed. The pages are the only type of content

for whom the user can choose the place in

the hierarchy where it should be added. The

site outline can also be edited and the nodes

inside rearranged, but this operation in

restricted to a group of trusted users.

Home pages for groups

All the CLARIN groups now have a home

page. Here one can find information about

the mission of a group and about its mem-

bers. A Documents section presents the pages

that contain relevant information for that

group. A News section is present on the right

side of the page. Group administrators can

add news items that are only relevant for

their group using the Group News form and

selecting their group as a target for that

piece of news. For the larger, work package-

level groups, a Deliverables & Milestones sec-

tion is present. Deliverables are added auto-

matically to these sections as soon as they

are uploaded to the site using the Deliverable

form. Three special pages are the ones for the

CLARIN Members, CLARIN Partners and

CLARIN Executive Board groups. The

CLARIN Members page contains an extra

Flyers & Posters section. The Partners page

has 4 additional sections:

• Templates, containing various templates

with the CLARIN logo (PPT presenta-

tion, deliverable, letter);

• Administration & Finances, with adminis-

trative and financial information related to

CLARIN;

• Consortium Meetings, which presents the

past and future CLARIN consortium

meetings;

• News from the Executive Board, where

small updates, special for partners, coming

from the Executive Board can be seen.

The Executive Board page contains sections

created especially for the Executive Board

Meetings, Agendas, Minutes and Present-

ations. The access to the Partners and

Executive Board pages is restricted to the

members of these groups only.

This is just a short presentation of the inter-

nal site. You can find more detailed infor-

mation in the Help Desk menu under

Website. Also, if you have a question or need

to report a problem regarding the function-

ality of the website, you can use the Contact

webmaster form under Help Desk > Website.

Enjoy your browsing! C

Ville OksanenKrister LindérHanna WesterlundUniversity of Helsinki

One of the most challenging tasks inbuilding language resources is the

management of copyright licences.There are several reasons for this. First ofall, the current European copyright sys-tem is designed to a large extent to satis-fy the commercial actors, e.g. publishers,record companies, etc. This means thatthe scope and duration of the rights arevery extensive and there are even certainforms of protection that do not existelsewhere in the world, e.g. databaserights. On the other hand, the excep-tions for research and teaching are typi-cally very narrow.

In CLARIN the typical flow of the contentis the following: A copyright holder, e.g. anewspaper, licenses its content to a CLAR-IN Content Provider that distributes thecontent to the End Users through a CLAR-IN Service Provider. This means that thelicense chain has to follow a similar struc-ture. Unfortunately even the first step isoften difficult because there is a group ofresources, for which there are no writtenlicense agreements and individuals familiarwith the details are no longer available.Another problem from the CLARIN per-spective is the variation in the existinglicense agreements, which makes it hard tooffer a centralized service. To tackle theseproblems, several sets of agreements have to

be used. For an outline of the resource clas-sification procedure, see Figure 1.

Regarding the license variation, we carriedout an extensive survey and found that it ispossible to categorize the licenses into threedifferent groups:

• Publicly Available Resources

• Resources for Academic Use

• Resources for Restricted Use

We followed the model used by CreativeCommons and created simple icons, i.e. caresymbols, making it easier for the end-user toimmediately see under which conditions theresource can be used, see Figure 2. In addi-tion, a deed describes the rights in humanreadable textual form. Finally, there is alsothe actual license agreement and the meta-data, i.e. the machine readable information.However, Creative Commons is not suffi-cient as such for CLARIN, because CreativeCommons does not allow for distributionrestricted to academia or even more limitedgroups of users, which is essential for manyof the older resources to be included inCLARIN.

Publicly Available (PUB) is one of the cate-gories endorsed by CLARIN. To belong tothis group, the following requirements haveto be met:

• the licence should allow distribution of the

tools and resources from the CLARIN

infrastructure,

• there must be no limitations, e.g. based on

status or geographical location etc., on

who can access and use the tools and

resources and

• there must be no limitations on the pur-

pose for which the tools and the resources

are used.

For Academic Use (ACA) the licence agree-ment includes an additional requirementthat the use is somehow related to an acade-mic institution. Here the problem may arisefrom the definition of academic use. Toqualify under this category, the tools andresources:

• should be available at least for anyone

doing research or studying in an academic


FFiigguurree 11:: RReessoouurrccee ccllaassssiiffiiccaattiioonn ttaasskk

institution recognized by the Identity

Provider Federation and

• should be available for

studying, research and

teaching purposes.

The last category, RestrictedUse (RES) includes theresources that do not fulfillthe previous requirementsbut still could be offered tothe users if certain addition-al requirements are met. Themost typical reasons for aresource to fall under thescope of RES are:

• a requirement to submit

detailed information, e.g.

an abstract, on the planned

usage or

• specific ethical or data pro-

tection-related additional

requirements.

In conjunction with the main license cate-gories PUB, ACA and RES, there can also beall or any of three additional requirements:

• A requirement for strictly non-commercial

use (NC)

• A requirement to inform the copyright

holder regarding the usage of the tools

and/or the resources in published articles

(Inf )

• A requirement to redeposit modified ver-

sions of the tools and resources with the

Service Provider (ReD)

Figure 3 displays the symbols designed forthe additional requirements.

However, this does not solve all the prob-lems. In some cases there either is no licenceagreement at all, because such an agreementhas never been made. It is also quite com-mon that the existing agreement is somehowproblematic, e.g. very low in details, makingthe categorization impossible. For those sit-uations we created the CLARIN UpdateModel Agreements with the purpose to pro-cure the required rights. The best option isto re-license the content with the CC0-license. See the Berlin Declaration (2003)for best scientific licensing practices. It iswell-understood and offers enough rights for

all parties in different digi-tal and non-digital envi-ronments. It is also com-patible with most of theother open content licen-ces. Unfortunately it is notalways possible to useCC0 due to the demandsof the copyright holders.Thus Update ModelAgreements for Academicand Non-Commercial Useare also available.

In order to test the usabil-ity of the classification sys-tem and our classificationguidelines, we did an ini-tial classification test. Wesent out a request to thecustodians of 116resources found in the

CLARIN LRT inventory. We received ananswer for 40 of the resources, i.e. 34.5%.In Figure 4, we see that more than half of theresources were classified into the (RES)restricted category. Approximately one thirdwere classified as (PUB) publicly or (ACA)academically available. Finally, one sixth wasfound to be exceptional.

One of the publicly available resources(PUB) was also classified as non-commer-cial, whereas all of the academically availableresources (ACA) were non-commercial. Therestricted resources (RES) were roughlyequally divided among no additional restric-tions (27.3 %), a non-commercial restric-tion (31.8 %) and a requirement that thelicense be personally granted by the contentowner (40.9 %).

Only one resource was such that the contentprovider found no applicable distributiontype because there was no formal agreementbetween the content owner and the contentprovider. In addition, there were 5 resourcesfor which the research project was still ongo-ing and the question of distribution wouldbe discussed only after the project had fin-ished. All in all, some kind of classificationwas received for a total of 40 resources inthis initial survey. For additional informa-

tion, see the Oksanen & al (2010). C

References

Berlin Declaration. (2003). Berlin Declaration onOpen Access to Knowledge in the Sciences andHumanities. Berlin. http://www.zim.mpg.de/ope-naccess-berlin/ berlindeclaration.html.

Hietanen, H., Oksanen, V., and Välimäki, M. (2007).Community created content. Law, business andpolicy.

Oksanen, Lindén, Westerlund (2010). LaundrySymbols and License Management – PracticalConsiderations for the Distribution of LRs basedon experiences from CLARIN. In Proceedings ofLREC 2010. Malta.


Laundry Symbols and Licence ManagementPractical Considerations for the Distribution of LRs

based on experiences from CLARIN

FFiigguurree 22:: SSyymmbboollss ffoorr tthhee mmaaiinnddiissttrriibbuuttiioonn ccllaasssseess

FFiigguurree 33:: SSyymmbboollss ffoorr aaddddiittiioonnaallddiissttrriibbuuttiioonn rreessttrriiccttiioonnss

FFiigguurree 44:: MMaaiinn ddiissttrriibbuuttiioonn ccaatteeggoorriieess..

Catia CucchiariniNederlandse Taalunie, The Hague

The interest in applying speech technolo-gy and in particular Automatic Speech

Recognition (ASR) technology to secondlanguage (L2) learning has been growingconsiderably. The addition of ASR technol-ogy to Computer Assisted LanguageLearning (CALL) systems makes it possibleto assess oral skills in a second language andto provide corrective feedback automatically.The latter feature appears particularlyappealing, because providing practice andfeedback on speaking proficiency is extreme-ly time-consuming, with the consequencethat the necessary amount of practice isalmost never achieved in traditional teacher-fronted lessons. Against this background,ASR-based CALL systems would seem tomake for an interesting supplement to tradi-tional L2 classes.

An additional advantage of ASR-basedCALL systems that often goes unmentionedis that such systems can also be employed forlanguage resource acquisition and for con-ducting innovative research on languagelearning. In this contribution we explainhow.

Developing ASR-based CALL systems thatcan provide accurate and useful feedback onoral proficiency is not trivial, because thespeech of non-natives poses special difficul-ties to ASR technology. In addition, existingsystems usually fail to provide correctivefeedback that is detailed enough and accu-rate, especially on L2 pronunciation, whichis considered a particularly challenging skill,both for L2 learners and CALL systems. Thekey factor in developing such systems con-sists in circumventing the limitations of thetechnology while taking account of itsadvantages and of pedagogical criteria (Neriet al. 2009). Recent research shows that thisis possible (Cucchiarini, et al. 2009).

Another problem that has hampered therealization of ASR-based CALL systems,especially for the smaller languages, is that

although publishers are willing to use thetechnology once it is developed, in generalthey do not have the means to finance thedevelopment of such technology and theunderlying language resources required. Inthis connection it may be interesting to referto an initiative in the Netherlands andFlanders aimed at supporting research anddevelopment in the field of languageresources and language and speech technol-ogy, which, among other things, is likely tohave positive consequences for the field ofCALL and language learning in general.

In 2004 the Dutch-Flemish programmeSTEVIN (a Dutch acronym that stands forEssential Language Resources in Dutch) waslaunched with the aim of stimulating thedevelopment of basic language and speechresources and technology for the Dutch lan-guage (see:http://taalunieversum.org/taal/technologie/stevin/english/). The programme is fundedby the Flemish and Dutch governments andis coordinated by the Dutch-Flemish organi-zation for language policy, the DutchLanguage Union(http://taalunieversum.org/taalunie).

Within the framework of the STEVIN pro-gramme various resources have been or arebeing realized which will be beneficial forthe development of ASR-based CALL appli-cations, and, more generally, for research onlanguage learning, such as an ASR toolkit, acorpus of non-native speech and a prototypeof an ASR-based CALL system. Theseresources are briefly discussed below.

The SPRAAK speech recognizer

The STEVIN project SPRAAK extendedthe work on the ASR package that had beendeveloped for over 15 years by ESAT at theUniversity of Leuven to a) develop a highlymodular toolkit for research into speechrecognition algorithms and b) provide astate-of-the art recogniser for Dutch with asimple interface that could be used by non-specialists. (Demuynck et al., 2008).SPRAAK is distributed as open source foracademic usage and at moderate cost for

commercial exploitation (for further details,see http://www.spraak.org/).

The JASMIN corpus of Dutchnon-native speech

The STEVIN project JASMIN (Cucchiariniet al. 2008) produced a corpus of speech bychildren of different age groups, elderly peo-ple and non-natives with different mothertongues (http://www.esat.kuleuven.be/psi/spraak/projects/JASMIN), which was col-lected in the Netherlands and Flanders andis specifically aimed at facilitating the devel-opment of speech-based applications forchildren, non-natives and elderly people. Inthe case of non-native speakers the applica-tions envisaged were especially languagelearning applications because there is consid-erable demand for CALL products that canhelp making Dutch L2 teaching more effi-cient.

The DISCO prototype of anASR-based CALL application

The STEVIN project DISCO (Develop-ment and Integration of Speech technologyinto COurseware for language learning)aims at developing a prototype of an ASR-based CALL system for practicing oral skillsin Dutch (Strik et al, 2009; for furtherdetails, see: http://lands.let.ru.nl/~strik/research/DISCO/). This project makes useof the SPRAAK toolkit and the JASMINnon-native speech corpus. DISCO addressesdifferent aspects of speaking proficiency,detects errors in speaking performance,points them out to the learners and givethem the opportunity to try again until theymanage to produce the correct form.

One of the interesting features of theSTEVIN projects is that the resources andthe technology developed are made publiclyavailable to interested users (researchers,HLT companies and publishers) through theDutch-Flemish HLT Agency(http://www.inl.nl/tst-centrale), a centralrepository for Dutch digital languageresources set up and financed by the Dutch


Language Resources, Speech Technologyand Language Learning:How to Establish a Virtuous CirclesState of the art in speech technologies as a part of CLARIN infrastructure

Language Union and hosted by the Institutefor Dutch Lexicology in Leiden andAntwerp. These resources can be employedfor developing applications and for conduct-ing research.

ASR-based CALL and languagelearning research

Once an ASR-based CALL system has beendeveloped, language learners can use it topractice oral skills. The system can thus beused for acquiring additional non-nativespeech data, for extending already existingcorpora like JASMIN, or for creating newones.

In addition, the CALL system can bedesigned and developed in such a way that itis possible to log details regarding the inter-actions with the users. The logbook can con-tain information on what appeared on thescreen, how long the user waited, how (s)heresponded, the feedback provided by thesystem, and how the user reacted to thisfeedback.

The additional corpus of non-native speechand the log-files can be useful for research onlanguage learning and for developing new,improved CALL systems.

In particular, ASR-based CALL systemsallow to create research conditions that werehitherto impossible, thus opening up oppor-

tunities for innovative research on languagelearning.

For instance, at the moment a project isbeing carried out at the Radboud Universityof Nijmegen, which is aimed at studying theimpact of corrective feedback on the acquisi-tion of syntax in oral proficiency(http://lands.let.kun.nl/~strik/research/FASOP).

Within this project an adapted version of theDISCO ASR-based CALL system is used tostudy how corrective feedback on oral skillsis processed on-line, whether it leads touptake in the short term and to actual acqui-sition in the long term. This has severaladvantages compared to previous studies: thelearner's oral production can be assessed on-line, near-optimal corrective feedback (clear,intensive, systematic, adapted to the learner'sdevelopmental stage and learning style) canbe provided immediately, all interactionsbetween learner and system can be logged sothat data on input, output and feedback arereadily available to be studied from differentangles.

The approach chosen in this project indi-cates how CALL and language and speechtechnology can offer new opportunities forlanguage learning research (Ellis and Bogart,2007). In turn, new insights in languagelearning can inform the development of bet-

ter CALL systems and language and speechtechnology. In this way a virtuous circle canbe established in which these fields profitfrom each other and lead to increased knowl-edge and better language learning systemsand this systems could be an integral part of

CLARIN infrastructure. C

References

Cucchiarini, C., Driesen, J., Van Hamme, H. andSanders, E., Recording speech of children, non-natives and elderly people for HLT applications: theJASMIN-CGN corpus, in Proceedings of LREC,2008.

Cucchiarini, C., Neri, A. and Strik, H., (2009) Oralproficiency training in Dutch L2: The contributionof ASR-based corrective feedback, SpeechCommunication, 51, 853-863.

Demuynck, K., Roelens, J., Compernolle, D. van, andWambacq, P. (2008).SPRAAK: an open sourceSPeech Recognition and Automatic Annotation Kit,In Proceedings of Interspeech 2008.

Ellis, N.C., Bogart, P.S.H., (2007), Speech andLanguage Technology in Education: the perspectivefrom SLA research and practice, Proceedings ISCAITRW SLaTE, Farmington PA.

Neri, A., Cucchiarini, C., Strik, H. and Boves, L.W.J.(2009). The pedagogy-technology interface in com-puter-assisted pronunciation training. In P.Hubbard (Ed.), Computer Assisted LanguageLearning: Critical Concepts in Linguistics, VolumesI-IV (pp. 140-164). London/New York: Routledge.

Strik, H., Cornillie, F., Colpaert, J., Doremalen,J.J.H.C. van and Cucchiarini, C. (2009).Developing a CALL System for Practicing OralProficiency: How to Design for Speech Technology,Pedagogy and Learners. In Proceedings of SLATE(pp. CD-rom). Warwickshire, U.K.. Available from:http://www.eee.bham.ac.uk/SLaTE2009/papers/SLaTE2009-32.pdf [05-09-2009].


FFiigguurree 11:: EExxaammppllee ooff aa DDIISSCCOO pprroonnuunncciiaattiioonn eexxeerrcciissee wwiitthh ffeeeeddbbaacckk


Nicoletta CalzolariClaudia SoriaRiccardo Del GrattaIstituto di LinguisticaComputazionale “AntonioZampolli”, CNRS, Pisa

This year, the LREC Conference intro-

duced a very special feature, the LREC

Map of Language Resources and Tools,

jointly sponsored by FLaReNet and ELRA.

The purpose of the Map is to shed light onthe vast amount of resources that representthe background of the research presented atLREC, in the attempt to fill in gaps in com-munity knowledge about the resources thatare used or created worldwide.

Basic information aboutresources in one place

In a nutshell, the Map can be described as acompanion to LREC providing basic infor-mation about all the resources (in a broadsense, i.e. also tools, standards, evaluationpackages, etc.) – either used or created byLREC authors and described in their papers.

Knowledge about existing resources is essen-tial to the overall advancement of research inthe field of Language Resources andTechnology: it is important to be able tolocate and retrieve the right resources for theright applications, and to exploit existingones before building new ones from scratch.Having a clear picture of which resources areavailable for which languages and for whichuse is important in order to identify existinggaps for certain languages at a given timeand estimate the amount of investmentneeded to fill them in. Knowledge about thecurrent use of resources is equally important.Knowing which resources are most used forthe various applications will help to betterunderstand the reason behind their success(their intrinsic quality, their wide availabili-ty, their licensing model, etc.). Knowingwhich standards are used in resource repre-sentation would help improve the develop-ment of standards themselves, by gettingthem more tuned to actual needs andrequirements.

Clear and easy-to-reach information of thistype about resources and related technolo-gies is lacking. At the same time, it is veryimportant to stress that most resources arevery poorly documented, or not document-ed at all, thus hindering their accessibilityand in the end, their full deployment.

The LREC 2010 Mapof Language Resourcesand ToolsLREC2010, Valetta, Malta, May 17-23, 2010

LREC2010:The Inside Story

Mike Rosner

University of Malta

So LREC 2010 is over. And yes, it seems tohave run smoothly, to judge from the reac-

tions of some of the 1200 participants. Great!We managed to fool the lot of them.Appearances can be deceptive. Read belowfor a blow-by-blow account of some of thepotential disasters.

The story begins in October 2009 when a del-egation of veteran LRECians from Paris andPisa visited to reconvince themselves thatMalta really was an appropriate venue forLREC 2010. We almost blew it on the first

evening. To create a goodimpression, I had chosen awell-known restaurant for din-ner on the Tower Road prome-nade. Everything was fine untildessert time when one of thePisa team turned white veryfast and had to be escortedfrom the table supported bytwo colleagues. Next day, asecond member of the Pisateam was looking shaky butstoically maintained a brave but quiet faceduring our first meeting with the enthusiasticRector of the University. Was this food poison-ing? Was this the delayed effect of some north-ern Italian bacteria? We shall never know forsure. What we can safely report is that themeal made an impression on all - but perhapsnot quite the expected one; and Maltaremained the choice.

LREC is dominated by two events: a stand-upwelcome reception, usually held on the first

evening, and a sit-down ban-quet, held on the last. Bothpresent considerable problemsfor the local organizer, sincevenues for 1200 people donot exactly grow on trees (par-ticularly in treeless Malta), andalso, in May, contrary to popu-lar opinion, the weather is veryunpredictable so purely out-door locations are out of thequestion.

We decided to adopt an indoor solution forthe banquet. But the reception venueremained undecided until the second visit ofthe Paris/Pisa team in March. On this occa-sion, accompanied by the Rector, we went tosee the President of the Republic in his mag-nificent Palace in Valletta. The mission hadthree goals to achieve: to ask for LREC to beheld under his Excellency's Patronage, to askfor his Excellency to open the conference, andalso to solve the problem of the reception


The Map, thus, is intended to complementongoing cataloguing efforts in order toenhance visibility for all resources, for alllanguages and intended applications, in par-ticular the ELRA and LDC catalogues aswell as the ELRA Universal Catalogue.

Apart from providing a portrait of theresources behind the LREC community, theLREC 2010 Map intends to become a mea-suring instrument for monitoring the field oflanguage resources. In particular, the Map istailored to portraying uses and usability ofresources, either newly developed or alreadyexisting.

The Map contains information about 1994resources (data and tools) for 170 differentlanguages: their type, modality, availability,intended or actual application purposes,innovation potential, etc.

The first analyses alreadyavailable

The large amount of information containedin the Map can be analyzed in many differ-ent ways. For instance, it provides informa-tion about the most frequent type ofresource, the most represented language, theapplications for which resources are used orare being developed, the proportion of newresources vs. already existing ones, or theway in which resources are distributed to thecommunity.

The first analyses of the LREC data are avail-able at www.resourcebook.eu, where thebrowsing interface which is being developedwill be soon made accessible.

The Map is conceived as a community-builtand community-maintained repository ofinformation: the intention is to make itevolve so that it may become a place whereeach user of a resource can update and addnew resources or edit already submittedones.

The Map has a great potential for many uses.In addition to being an information gather-ing tool it may become a great instrumentfor monitoring the evolution of the field(useful also for funders), if applied in differ-ent contexts and times. It can be seen as ahuge joint effort, the beginning of an evenlarger cooperative action. It can thus becomealso an “educational” means towards thebroad acknowledgment of the need of meta-research activities with the active involve-ment of many. Finally, the Map is alsoinstrumental in introducing the new notionof “citation of resources” that could providean award and a means of scholarly recogni-tion for researchers engaged in resource cre-ation.

Lack of information and documentationabout resources is, in the e-science paradigm,a very critical issue. With the Map we aimnot only at starting to fill this gap, but alsoat encouraging the needed “change in cul-ture”. We need in fact to make each andevery researcher aware of the importance ofhis/her personal engagement in document-ing resources. It is a task as important as cre-ating new resources and is not an accessoryto be disregarded. It is the necessary serviceto the whole community that we all have toprovide.

The Map is already being adopted in othercontexts. The Language Resources andEvaluation Journal will also make use of it, asannounced in the new Special Issue LREC2008: Selected papers (freely accessible to allLREC participants). And quite recently alsoCOLING 2010 has decided to make use ofthe Map.

This work could not have been possiblewithout the invaluable contribution of allthe LREC authors, who patiently providedthe input requested. The entire communityis deeply indebted to them. C

venue. How exactly does youquietly slip a request to lend aPalace into the conversation?Another mystery that only histo-ry may solve. Rector uttered amagic incantation in Malteseand Lo! The President's nextwords were ".... and I may havejust the place for your recep-tion". Verdala Palace, a formerhunting lodge used by theKnights of Malta for gamehunting, surrounded by its own wood, was theperfect solution. With the mission totallyaccomplished, we emerged from that meetingfeeling triumphant.

That feeling remained until three weeks beforethe conference, when out of the blue thePresident's office issued a communication: dueto urgent works, Verdala Palace is no longeravailable, so please find somewhere else and,sorry for the inconvenience and all that, but wedon't have any other palaces available. The

same day, we learned that thePresident, who was visitingChina at the time, had fallenoff a platform, and had to behospitalized. So the one per-son who might have beenable to intervene was not onlyout of the country but indis-posed. What does one dounder such circumstances?Simple. You fight like with like.And what is like the President's

Office? Well how about the Prime Minister'sOffice? Helped again by the Rector, we man-aged to get our message into appropriate cor-ridor at the Auberge de Castille. And Lo! Oneweek later the President's Office called back tosay that after careful consideration, they haddecided to delay the works that we might pro-ceed with our reception at Verdala. Phew!

But that's not all, folks. Knowing that thePresident would be unable to open the confer-ence as planned due to his injuries, I was

forced to use magic again to come up with asolution. And Lo! At very short notice theMinister of IT kindly accepted to perform theopening. Once again the warm feeling of tri-umph, which lasted until… the night before theconference when I was informed, just like that,that the Minister was no longer available.Help! Just 30 mins before the official openingthe conference was lacking an official opener.Clearly, more magic was required … anotherseries of desparate phone calls. And Lo! TheMayor of Valletta had managed to cancel ameeting and would be arriving in 10 mins. Theopening was saved.

What have we learned from this? Well, magicstill works in Malta but has to be used spar-ingly since it only works in the right context.That context was provided by the many peoplewho, though their hard work, contributed tothat overwhelming impression that LREC 2010ran without a hitch. Thanks to all of you. C

IImmaaggee ccoo

uurrttee

ssyy ooff WW

oorrdd

llee ((hh

ttttpp::////ww

wwww

..wwoorrdd

llee..nn

eett))

CICLing2010, Ias,i,March 21-27, 2010

Alexander GelbukhCentro de Investigación enComputación, InstitutoPolitécnico Nacional, MexicoCorina Fora

u

scu

University Alexandru IoanCuza, Ias,i

From 21st to 27th March 2010, the 11th

CICLing, International Conferenceon Intelligent Text Processing andComputational Linguistics (http://profs.info.uaic.ro/~cicling2010 and http://www.cicling.org/2010) took place at theAlexandru Ioan Cuza University of Ias,i.The conference was jointly organized bythe Faculty of Computer Science of theAlexandru Ioan Cuza University of Ias,i(UAIC), Romania, and the NaturalLanguage Processing Laboratory of theCIC (Center for Computing Research) ofthe IPN (National Polytechnic Institute),Mexico. Romania was chosen for manyreasons, from its touristic attractions tothe host institution's experience in orga-nizing international events. Most impor-tantly, however, with this event CICLingpaid tribute to a nation that gave theworld many of the greatest wonderfulcomputational linguists, of which somehave been Program Committee members

or keynote speakers at past CICLingevents. Yet another reason for holdingCICling in Ias,i was the sesquicentennialjubilee of the Alexandru Ioan CuzaUniversity of Iasi, the oldest higher educa-tion institution in Romania.

European edition

Held for the first time in Europe, this year'sCICLing event received the record highnumber of submissions in its 11-year histo-ry: 271 submitted papers from 47 countries,the previous record being 232 papersreceived for 2008 event in Haifa, Israel. The61 accepted oral papers and the three invit-ed papers were published in ComputationalLinguistics and Intelligent Text Processing, aspecial issue of Springer's Lecture Notes inComputer Science. In addition, two otherjournals dedicated special issues to thisCICLing event: 27 papers were published inNatural Language Processing and its Applic-ations, volume 46 of Research in ComputingScience, IPN, México, and 18 papers in thefirst inaugural issue of International Journalof Computational Linguistics and Applicat-ions, Bahri Publications, India.

Almost 150 participants, including regis-tered participants, organizers, students, andthe great student volunteers, enjoyed theinvited lectures delivered by Shuly Wintnerof the University of Haifa, Israel, who pre-sented the keynote talk ComputationalModels of Language Acquisition, and JamesPustejovsky of Brandeis University, USA,

who called his keynote talk The Recognitionand Interpretation of Motion in Language. Inaddition, each keynote speaker organizedvery interesting “special events” – this is adistinctive feature of CICLing conferences.In this year both special events took theform of a discussion on ComputationalLinguistics vs. Natural Language Engineering.

Gheorghe Grigoras,, the Dean of the Facultyof Computer Science, and Dan Cristea of

the Al. I. Cuza University of Ias,i, who kind-ly accepted to be the Honorary Local Chairof CICLing 2010 and delivered a veryappreciated presentation about dictionaries,addressed the participants warm welcomespeeches at the opening ceremony. A verydear friend of CICLing, Nancy Ide of theVassar College, USA, who was withCICLing from its early days and is a pastCICLing keynote speaker, kindly acceptedto address some words to the CICLingersright after the ceremony organized by the Al.I. Cuza University where she was awardedthe title of Honorary Professor of the Al. I.

Cuza University of Ias,i.

Best paper awards and liveinternet coverage

The scientific program included, apart from

the keynote lectures and special events, oral

presentations of the CICLing authors. The

CICLing sessions were organized under the

following sections: Lexical Resources, Syntax

and Parsing, Word Sense Disambiguation

and Named Entity Recognition, Semantics

and Dialog, Humor and Emotions, Machine

Translation and Multilingualism, Infor-

mation Extraction, Information Retrieval,

Text Categorization and Classification,

Plagiarism Detection, Text Summarization,

Speech Generation. All scientific activities

took place in the Mihai Eminescu Aula

Magna of the Al. I. Cuza University of Ias,i(see the picture to the left). The poster ses-

sion, which started with a brief oral presen-

tation of all posters, was combined with the

welcome reception and was held in the Hall

of the Lost Footsteps of the University, see

the other picture.

Based on the Program Committee votes,three best papers awards were offered:

• First place: George Tsatsaronis, IraklisVarlamis, and Kjetil Nørvåg, AnExperimental Study on Unsupervised Graph-Based Word Sense Disambiguation; accord-ing to a ballot among all attendees, these


CICLing Comes to Europe

authors also won the best presentationaward.

• Second place: Paolo Annesi and RobertoBasili, Cross-lingual Alignment of FrameNetAnnotations through Hidden MarkovModels;

• Third place: Lieve Macken and WalterDaelemans, A Chunk-driven BootstrappingApproach to Extracting Translation Patterns.

Another special award was offered for thebest student paper: Integer LinearProgramming for Dutch Sentence Com-pression, co-authored by Jan De Belder andMarie-Francine Moens.

For the first time in the history of CICLingand also in the history of the UniversityAula, all the CICLing 2010 scientific activi-ties were transmitted live over the internet;

er, with only small rains, and only duringthe days of scientific program. This wasgood also for the cultural program ofCICLing 2010-from the very first CICLingevent its famous touristic program is one ofits most attractive features. This yearCICLingers participated in three tours tothe most attractive touristic areas ofRomania. The first tour, on Sunday, March21, was dedicated especially to nature and

history of the Neamt, county. Wednesday,

March 24, was dedicated to the myths: we

visited the old city centre of Bras,ov, with the

Black Church, and after a tour at the Peles,Castle in Sinaia we had an excellent lunchwith Romanian cuisine; those CICLingerswho were very keen at the idea and still hadenergy, went also to the Bran Castle, famousfor its connection with the legend of Count

Association of Students in Informatics in

Ias,i (ASII).

Abundant local support

We also greatly appreciate the support of the

Romanian Academy. In particular, Dan

Tufis, was closely involved in the organiza-

tion of PROMISE conference (ProcessingROmanian in Multilingual, Interoperationaland Scalable Environments), a satellite event

of CICLing 2010 co-organised under the

aegis of the Romanian Academy together

with the Al. I. Cuza University of Ias,i. Very

special thanks go to Rada Mihalcea, one of

the greatest and oldest friends of CICLing,

who was, like many times before, guiding

and helping the organizing team in many

the recordings of all the sessions are availablefor download at http://profs.info.uaic.ro/~cicling2010/historyCic.html. Other pic-tures and videos are available at http://pica-saweb.google.com/cicling2010.iasi; all par-ticipants willing to share their memories cancontribute their files to this page.

Since it was the first CICLing held inEurope, the usual dates of CICLing, whichused to be in mid-April, had to be changedtaking into account other important scien-tific events as well as the local Romanianprogram and, of course, the RomanianSpring weather. Luckily, even though theusual weather in the Spring is a rainy one,during CICLing we had a very nice weath-

Dracula. The last trip, on Saturday, March27, was dedicated to the Romanian paintedmonasteries in the Bucovina county.

From all the feedback received from theCICLingers during and after the confer-ence, we can consider that the conferencewas a success, especially taking into accountthe world economic crisis. This success is ina great degree due to active participationand constant support of many people. Firstof all we want to thank all people involvedfrom the Alexandru Ioan Cuza University,starting from the leaders of the University,passing through all the University depart-ments and ending with the wonderful stu-dent volunteers – most of them from the

activities, behind the scene but always pre-sent.

Finally, after many months of close collabo-ration from the opposite sides of theAtlantic – and now separated by it again –we would say a lot of very warm wordsabout each other if we were not both theauthors of these lines – it's a pity we have toomit here this part of our emotions andgratitude...

Some months after CICLing 2010 we arehappy we survived all the arduous work, andwe are looking forward for the nextCICLing in 2011, and for the courageouslocal organizers willing to contribute to yet

another successful CICLing event. C


GGrroouupp pphhoottoo ooff CCIICCLLiinngg ppaarrttiicciippaannttss


Utrecht, March 3-4, 2010

Steven KrauwerCLARIN coordinator

As many of you may know, CLARINhas, in addition to the Executive

Board, three other important Boards. TheScientific Board consists of high-level sci-entists, one from each participating coun-try and appointed by the national fundingagency. It will monitor the execution ofthe programme and ensure the overall sci-

entific soundness, coherence, complete-ness, consistency and feasibility. It deter-mines the overall scientific strategy andreviews the project deliverables. TheStrategic Coordination Board consists ofrepresentatives appointed by the fundingagencies, one per country. It will monitorthe execution of the programme of workwith a view to compliance with nationalgovernments' and funding agencies' poli-cies. It determines the overall governanceand financial strategy. The StrategicCoordination Board will review all projectdeliverables.

The International Advisory Board consists ofhigh-ranking international experts, and willgive advice to the Executive, Scientific andStrategic Coordination Boards on issues ofcommon interest, such as opportunities for

collaboration, coordination and harmoniza-tion with initiatives at the internationallevel. On http://www.clarin.eu/external/index.php?page=about-clarin&sub=4 onecan find a table that presents the composi-tion of the three boards.

All-boards meeting

On March 3rd this year the three Boardsmet for the first time with the ExecutiveBoard in order to discuss the project resultsand deliverables of the first two years of theproject. On March 4th there was a meetingbetween the Strategic Coordination Boardand the Executive Board, with representa-tives from the Dutch Ministry of Science,where the focus was on the future structure

of CLARIN after completion of thePreparatory Phase project.

The various components of the project werepresented by the Work Package leaders, anddiscussed with the participants. On thewhole the members of the Boards expressedtheir satisfaction with the results presentedand were happy to endorse our deliverables.It was encouraging to hear that some of themembers of our international advisory boardtold us that they were impressed by ourachievements, especially in comparison withwhat was happening outside Europe.

The discussion on the first day brought upmany different topics, both at the global andat the detail level, and we collected a largenumber of comments and suggestions. Arecurring topic was, as could be expected,how to reach the users, especially users from

outside the field of linguistics. Even if mostof the activities related to this specific issuehad to be deleted from our original workplan at the request of the EC it was general-ly felt that this was one of the most pressingproblems that should be urgently addressedin the following phases of CLARIN, both atthe European and at the national level.Collaboration with existing and emerginginfrastructure activities within and outsideEurope could be extremely beneficial, andthe creation of CHAIN (Coalition ofHumanities and Arts Infrastructures andNetworks, http://www.chaincoalition.org/)was seen as an important step towards this.

Heading for ERIC framework

On March 4th the focus was on the futuregovernance and financing of CLARIN. Thecurrent vision to create an ERIC for CLAR-IN was presented for the Strategic Coor-dination Board. An important feature of theERIC is that it is a consortium of govern-ments (as opposed to universities, digitalarchives or research organizations). The roleof the present project can only be to makethe proper preparations for this, so that thegovernments can take over to conduct thefinal negotiations. At the meeting the cur-rent plans for the organisation and theimplementation of the ERIC were presentedand discussed. The presence of representa-tives from ministries or funding agenciesfrom most CLARIN countries made it anexcellent opportunity for an open discussionof possible obstacles. Representatives of theDutch ministry of research confirmed theirintention to take the lead in this process andto act as the host country for CLARIN. Theparticipants expressed their agreement withthe general direction taken and their willing-ness to continue the discussions. One of theimportant conclusions from this meetingwas that the original time schedule wherebythe ERIC would become operational on Jan1st 2011 was recognized as unrealistic, asmost countries needed more time to finalizetheir national roadmaps and to start thefunding allocation process. It was agreedthat the project would apply for an unfund-ed extension till July 1st 2011, to avoid a gapbetween the end of the preparatory phaseand the construction phase of CLARIN.

The meeting on March 3rd and 4th was avery important one for CLARIN, becausethis was the first opportunity for a directexchange of views and ideas between CLAR-IN and its three Boards. The fact that therewas broad support for the directions the pro-ject is taking (along its various dimensions)was very encouraging and makes us feel con-fident that we will succeed in makingCLARIN happen! C

The First Joint Meeting of the CLARIN Boards

AA ppiiccttoorreessqquuee bbuuiillddiinngg ooff tthhee UUttrreecchhtt UUnniivveerrssiittyy wwhheerree bbooaarrddss mmeeeettiinngg ttooookk ppllaaccee


EstablishingPortugueseCLARINLisbon, March 19, 2010

António BrancoUniversity of LisbonMarko Tadi}University of Zagreb

The CLARIN Lisbon Meeting onCommon Language Resources and

Technology Infrastructure for the Humanitiesand Social Sciences took place in March 19,2010. It was requested by the Portuguesefunding agency FCT-Fundação para aCiência e Tecnologia, from the MCTES-Ministério da Ciência, Tecnologia e EnsinoSuperior (Portuguese Ministry for Science,Technology and Higher Education) in orderto establish the Portuguese CLARIN com-munity. Also, the ultimate goal of this meet-ing was to support the national decisionprocess on the joining of the CLARIN infra-structure.

The meeting was divided in four sessions whereeach was conducted by a chair person and a rap-porteur.

Session A

This introductory session started with CLARINcoordinator Steven Krauwer and his overall pre-sentation of the CLARIN project, its goals,achievements, challenges and present state ofCLARIN community that is shaping up at theEuropean as well as at the different national lev-els. Krauwer’s presentation was followed by threetalks from three different partners presentingtheir national situations.

The first one was given by Adam Przepiórkowski(Polish Academy of Sciences, Warsaw) and it wasa report from a national network with a status oflanguage resources and technology developmentand maturity quite similar to the one of Portugal.The second talk was given by Núria Bel(University Pompeu Fabra, Barcelona) where shepresented a report from a national network with amajor language from the neighbouring country,whose global projection matters. The third onewas given by Erhard Hinrichs (University ofTübingen) and he reported on a national networkthat is a major driving force of the infrastructure,with a large companion national funding pro-gramme of support. All three talks presented dif-ferent approaches and sources of inspiration toorganize a CLARIN community at the nationallevel thus giving examples to the PortugueseCLARIN initiative.

Session B

The main purpose of Session B was to plot thesituation concerning the Portuguese language andits position in the CLARIN project. The speakersfocused on the Portuguese participation in theCLARIN project and the formation of thenational CLARIN network, the Portuguese andBrazilian NLP and Linguistics community work-ing on Portuguese language, and the objective offostering collaboration between Portuguese andBrazilian research centres under CLARIN (seealso contribution in this issue of CLARINNewsletter at page 18). This session was closedwith the presentation of the view of thePortuguese Funding Agency on the ESFRI pro-jects and on the involvement of Portugal in theforthcoming development of the CLARIN frame-work.

The introductory talk in session B was given byAntónio Branco (University of Lisbon), coordina-tor of the CLARIN-Portugal network titled TheCLARIN project in Portugal. He was followed byVera Lúcia Strube de Lima (FACIN-PUCRS,Porto Alegre, Brazil) with the talk Research inNLP in Brazil: an overview. The closing contribu-tion in this section was given by Isabel Figueiredo(on behalf of Lígia Amâncio, FCT Vice-Presidentand Portuguese delegate at ESFRI) under the titleThe Portuguese participation in ESFRI projects.

Session C

This session had two main goals: description ofexamples of use cases and presentation and dis-cussion of technical problems and possible solu-tions.

The description of use cases was provided in thefollowing presentations:

1. Language databases in the psychologist's toolbox,by São Luís Castro (University of Porto);

2. Research in History meets LT: the case of Portu-guese Royal Inquiries of 1258, by Luís Gomes(University of Azores);

3. Multilinguality: global access to information,entrepreneurship and research: a global Portu-guese perspective, by José Gabriel Lopes (NewUniversity of Lisbon.)

The discussion on technical problems was initiat-ed by Daan Broader (Max-Planck Institute forPsycholinguistics) who made a detailed analysis ofthe main issues involved:

• An ecology of “e-infrastructures”, from networkservices to community services

• Technical, organizational and juridical issues

• What is needed (in terms of ICT infrastructure)to allow researchers to find, process, and storelanguage resources?

Session D

The last session started with Overview of Con-tributions from 16 CLARIN-PT members on theirown RUs, by Maria Francisca Xavier (Centro deLinguistica da Universidade Nova de Lisboa) andwas followed by a round table. The discussion inthe round table was organized around the follow-ing questions the Research Units (RUs) had beenpreviously given, and their position papers trig-gered by such questions:

• the kind of research that should be supported byCLARIN;

• the expected role of CLARIN in preservingknowledge and scientific results;

• the resources and services that CLARIN shouldmade available;

• the expected impact of CLARIN on researchand on the support of Portuguese language;

• the contributions to CLARIN that the RU willbe ready to make available.

The CLARIN Lisbon meeting finished with afruitful discussion around these key issues thusundoubtedly leading to an expected Portugueseparticipation in ERIC This meeting yielded threedeliverables, available for distribution:Memorandum with the institutonal profiles of thePortuguese CLARIN members (24 pp.);Memoradum with the postion papers of thePortuguese CLARIN members for the round table(23 pp.); The Proceedings of the meeting (128

pp.). C

PPaarrttiicciippaannttss ooff PPoorrttuugguueessee CCLLAARRIINN kkiicckk--ooffff mmeeeettiinngg


CLARIN and FLaReNetworkshop, StockholmNovember 25-26, 2009

Rolf CarlsonKjell EleniusDavid HouseKTH, Stockholms

In November 2009, CLARIN andFLaReNet organized a two day work-

shop at KTH in Stockholm. The themewas “Best practices for speech and multi-modal databases”. The participants werespecially invited to cover different aspectsof infrastructure for spoken and multi-modal databases. The workshop web pagecan be found on http://www.speech.kth.se/clarin/ with links to all presenta-tions. The following is a short report fromthis meeting including a few snapshotsfrom the subjects that were presented anddiscussed during these intensive two days.

The background for the workshop was todiscuss research needs and share ideas andviews among spoken language researchersnot yet heavily involved in the CLARINmission. It is obvious that currentadvances and rapid development ofspeech-based interfaces particularly haveto a large degree benefited from successfulcollection and collaborative use of large-scale speech and multimodal databases.

The spoken and multimodal researchcommunity is large and varied, and thereis a great need to define the requirementsfor infrastructures which can be fruitfullyand easily shared. The annotation of inter-esting corpora in this field is, however,divergent and would benefit from findingways for harmonization and interoperabil-ity. Thus, it is of importance to look intostandards and best practices to encode var-ious relevant features of these corpora tofacilitate their use for a wide diversity ofresearchers. The features may be intona-tion, facial expressions, gestures, turn-tak-ing and emotions.

When we look at speech and multimodalresearch topics in the humanities andsocial sciences such as child-directedspeech, accent shifts, dialects and soci-olects and dialogues in different environ-ments, we do not have a strong tradition

for collection and distribution of relevantdatabases. Currently it is very much theresearchers' own task to collect the data ortry to do research on a corpus collected forsome other purpose. To be able to easilylocate, access and share data oriented tothe humanities and social sciences wouldbe a big step for many researchers. Thusan important goal towards increasing ourunderstanding of human communicationand making our applications more intelli-gent is to make language resources andtechnology available and readily usable forall kinds of scholars. Common standardsare essential for achieving this.

CLARIN framework

The first session, chaired by ErhardHinrichs, was opened by Steven Krauwerexplaining the CLARIN framework andsetting up the challenges we are facing toinclude or engage the spoken languageand multimodal community. While it is sofar the case that most CLARIN playersoriginate from text processing, it isacknowledged that speech and multi-modal resources are also important, fore.g. phoneticians, linguists, historians ofthe future and social scientists. Despitethis, speech and multimodal actors areunderrepresented in the CLARIN activi-ties. Steven pointed out that sharingspeech and multimodal resources pays offbecause of the high cost (in time &money) of annotation, and that consulta-tion with the community, especially aboutstandards is urgently needed. This is in thecommon interest of both CLARIN andFLaReNet.

Several presentations were focused on cur-rent work on data collection, annotationand analyses. The aim of PeterWittenburg's contribution was to reviewsome of the multimodal annotations inuse and point to the heterogeneity ofmany of the encoding schemes even whenbased on a harmonized format for annota-tion structures. Anton Batliner's presenta-tion was entitled “Annotations andbeyond – what (not) to standardize, andhow come that sometimes, it just hap-

Following thePrevious SuccessLT Days, LuxembourgMarch 22-23, 2010

Marko Tadi}

Editor

Following the event successfully organ-ised in January 2009 (see CLARIN

Newsletter No 4 for full report and intro-ductory details on pages 1 and 8) whereLanguage Resources and Technologieswere strongly supported by the pro-grammes that EC grants through FP7 ICTand CIP ICT-PSP calls, this year the sameformula was applied.

New calls

LT Days 2010 aimed at informing potentialproposers about announced ICT-PSP 4thCall, Theme 6: Multilingual Web and theICT-FP7 Work Programme 2010-2011.The details of each programme and callwere presented by EC officials in Sessions

3 and 4 preceeded by contextualisation ofthis calls within existing projects/initiativesin Session 1: “Different initatives, differenttarget communities, different businessmodels... one common effort” and Session2: “Enterprise Language Processing:mega-trends and developments” wherehigh-level speakers gave presentationscovering varied related topics, includingnewly started projects, the commercialviewpoint and future prospect.

In this introductory session Steven Krauwersuccesfully presented CLARIN and itsachievements so far, particularly when itcomes to sharing LRT. Common metadataschemes that would allow different projectsand initiatives to coordinate and cooperatein order to avoid the doubling of efforts,were listed as one of the top priorities. C


pens” and also detailed some other typesof (missing) standards, wrong standards,and lacunae (such as agreed upon perfor-mance measures) where we badly needsome conventions which, however, possi-bly should not be pinned down as “newstandards”. Concluding the first sessionwas Volker Steinbiss with a talk entitled“Best practices that could help avoidingthe mess”. Among several aspects he pre-sented a few observations and best prac-tices that have been helpful to keep thestress of working with large amounts ofdata at an acceptable level.

The tools for spoken data collection devel-oped by Christoph Draxler have the

potential to become important web basedservices for efficient and economic corpuswork. Data accessibility using web basedservices were also presented by AndersEriksson, together with a description ofthe extensive dialect project SWEDIA.Important aspects of all corpus work arethe ethical or privacy issues which was thefinal subject in the session discussed byMartti Vainio.

Sarah Hawkins' presentation “Tools forwork that uses acoustic-phonetic analysis”gave a closer view on the detailed work inphonetics and especially within the S2Snetwork. Of special interest is the devel-

opment of freeware acoustic-phoneticanalysis tools that will allow student oroccasional professional users to quicklylearn how to do specific measurements,possibly manually, yet also appeal toexpert researchers working with corpora,including those who are competent pro-grammers. Such a common platformshould aim to build on and integrate,rather than replace, existing software, andto facilitate cross-disciplinary collaborativeresearch. Such a development is also ofimportance for sharing speech data fromsmall languages discussed in the presenta-tion by Ingunn Amdal. Emotions in spon-taneous conversation and in music were

themes covered by Klara Vicsi andRoberto Bresin respectively.

Hot topic: Multimodality

Multimodality is a current hot topic inhuman interaction research and the needfor annotation on multiple levels was acommon denominator for the sequence ofpresentations by Jean Carletta, KirstenBergmann, Jonas Beskow, Jens Edlundand David House.

Based on his experience of working withseveral different tools for manual annota-tion and also talking to people who usethem regularly, Nick Campbell concluded

that using these tools can be tedious andrestrictive. Rather than prescribe a stan-dard at this time, he suggested that wemight benefit more from creating a sup-port group whereby people who annotatedata regularly can communicate and sharesamples, tools, and formats.

Questions of ethics and privacy

During the course of the workshop, ampletime was devoted to general discussionwhich covered many of the topicsreviewed above. Especially lively discus-sion points concerned ethics and privacyas well as tools and standards. The finaldiscussion was led by Lou Boves andfocused on concrete objectives for CLAR-IN in meeting the challenges of engagingthe spoken language community to amuch larger extent. Lou summarized theimportant medium-term topics discussedduring the workshop as the following:

• Taxonomy of research objectives and therequired degree of granularity in annota-tions;

• Different resources and different toolsfor different research objectives;

• Re-purposing of resources;

• How dangerous is easy-to-use software?;

• Better understanding of the interplay ofresearch and infrastructure;

while the long term issues concerned are:

• Legal Issues;

• Data preservation;

• Additional Centres;

• Sustainable business model;

• Requirements of journals and fundingagencies to make data available;

• How can we create a research environ-ment in which research on small lan-guages makes similar contributions toscience as the NIST competition.

It is our conviction that the workshop

resulted in a deeper understanding of

research issues and needs within the two

research communities dealing with either

text or spoken language and also helped to

more clearly articulate and define the

basic challenges we need to face to make

further progress in the future. C

Best practices for speech and multimodal databases

KKTTHH ccaammppuuss


PROPOR, Porto Alegre,April 27-30, 2010

António Branco

University of Lisbon

Vera Lucia Strube de Lima

PUCRS Porto Alegre

Thiago Pardo

University of São Paulo

Steven Krauwer

Utrecht University

PROPOR is the Portuguese verb 'topropose'. It is also the acronym of

the international conference on thecomputational processing of Portuguese,the sixth language with the largest num-ber of speakers, spread over theAmerican, African and Asian continents.

This conference is held every other year,alternating between Portugal and Braziland gathering over one hundred partici-pants. It features the full range of initia-tives of a mature scientific event of itssize, with oral and poster presentations,plenary keynote presentations, demosessions, tutorials, best dissertationawards, etc. Its selected papers are pub-lished in Springer's Lecture Notes inArtificial Intelligence, with a very com-petitive acceptance rate, scoring 27% inthe last edition. Of course, it is also a fes-tive occasion where old acquaintancesmeet again and new friends are made.

A relevance to CLARIN

In terms of research community mobi-lization, the PROPOR conference playsthe role of what in other communities isthe attribution of some formal associa-tive structure. It is the reference focalpoint of this research community fromboth margins of the Atlantic, and fromall over the world, whose work is on thelanguage resources and technology forPortuguese.

As its acronym suggests in thePortuguese reading, the PROPOR con-ference is the place to propose new ideasand lines of action. That was what hap-pened once more in the last edition,which took place in last April 27-30 inPorto Alegre, Brazil (http://www.inf.pucrs.br/~propor2010). And this timewith special relevance for our CLARINproject.

At the end of the morning of the secondday, the conference accommodated aone hour special plenary session devotedto CLARIN. This provided a uniqueopportunity to present the project andits progress to an audience includingboth future researchers – starting now asPhD students – and today's major play-ers in the language resources and tech-nologies from Portugal and Brazil.

This special session included three parts.In a first slot, Steven Krauwer presenteda broad overview of the project anddescribed its results and current status.Next, António Branco reported on theachievements of the Portuguese CLAR-IN network and underlined the interest

of enlarging the participation of thePortuguese language in CLARIN withthe contribution of the Brazilian col-leagues. Finally, there was time to answerquestions and clarify issues raised by theaudience.

Initiative for cooperation ofCLARIN with Brasilian centres

In the following day, we were invited bythe conference organizers to a side meet-ing with Brazilian colleagues, includingthe major players and group leaders inBrazil. The goal was to get a deeperunderstanding of the project and a bet-ter grasp of the possibility for Brazil tojoin CLARIN and the measures needed,both in Brazil and in the project, toeventually achieve that goal.

At the end of the meeting the enthusi-asm and confidence among participantswas very high about these new chancesof cooperation across continents. It isour belief that very soon we will hearabout more developments coming outof these seeds of collaboration sown inPorto Alegre.

As Brazil is a South-American country,this would clearly represent a firstopportunity and test bed towards mak-ing CLARIN a truly internationalendeavour across continents. Naturally,it brings about also the challenge forCLARIN to find ways to expand its ini-tial parameters and our accrued respon-sibility of making it an open and a reli-able organization for colleagues from allover the world! C

Bridging the Portuguese Speaking Margins ofthe Atlantic with CLARIN

SStteevveenn KKrraauuwweerr aanndd AAnnttóónniioo BBrraannccoo pprreesseennttiinngg CCLLAARRIINN


Ru-ta Marcinkevi~iene.

Vytautas Magnus University,Kaunas

Introduction

The last two decades have been important forLithuania as a number of activities relating toHuman Language Technologies (HLT) havebeen carried out, including:

• localization of general tools,

• digitization (including adaptation of digi-tized resources),

• compilation of tools, language resources andknowledge bases,

• training and research,

• documentation and dissemination.

The first two types of activities, i.e. localizationof the user interface and digitization of cultur-al heritage, cannot be classified under HLTproper. However, some types of digitized prod-ucts can be used as linguistic resources, e.g.Database of Old Lithuanian Writings,Dictionary of Lithuanian Language,Dictionary of Contemporary LithuanianLanguage, Dictionary of Toponyms, Databaseof Lithuanian Dialects etc.

However, digitized resources are of limited useas resources, therefore a greater prominence isgiven to the third type of activity, the compila-tion of general and special corpora and lan-guage processing tools.

An overview of Lithuanian HLT

The most obvious outcome of the programmefor Lithuanian HLT was compilation of thecorpus of 100 million running words and sometools (e.g. corpus query system and collocationextraction tool, a system of morphologicalannotation and disambiguation) available forpublic use. Later developments in the field,financed mostly by national foundations,resulted in the production of the followingtools and resources:

• a morphologically annotated corpus (115million running words),

• an annotated manually checked corpus ofone million words,

• a set of parallel corpora:

• a bi-directional Czech-Lithuanian andLithuanian-Czech corpus of five millionwords

• English-Lithuanian corpus of 18 millionwords in size,

• a database of Lithuanian nominal colloca-tions, extracted from the corpus of 100 mil-lion words,

• a number of tools such as:

• a tool for the automatic identification oftext functions for the Lithuanian language,

• the tool for the extraction of collocations,

• a Lithuanian tagger,

• the Aligner2067,

• an automatic accentuation tool for theLithuanian language,

• a corpus of Spoken Lithuanian language,

• a universal annotated database of speechrecordings.

The above list is confined to the tools for lan-guage resources made at Vytautas MagnusUniversity and sponsored mainly by twonational funding agencies, the Lithuanian StateLanguage Commission and the LithuanianState Science and Studies Foundation.

Other institutions have developed a set of toolsand databases for public use or purchase. TheState Commission of the Lithuanian Languageis monitoring an open terminological database.The Institute of Mathematics and Informaticshas digitized term dictionaries from 27 disci-plines into one database. A private company

Fotonija is known for its electronic dictionariesInterleksis, T��; English-Lithuanian dictionar-ies Alkonas and Anglonas, French-Lithuaniandictionary Frankonas and a spellchecker Juodosavys. A corpus of academic discourse has beenstarted at Vilnius University, Faculty ofPhilology.

The most recent jointly developed tool was arule-based machine translation system for thetranslation of English internet texts intoLithuanian. It was developed by a group ofcompanies including Promt (St. Petersburg),Fotonija (Vilnius), Alna Software (Kaunas).They co-operated within the framework of aproject financed by EU Structural Funds.

Before the automatic machine translation sys-tem there was an automatized translation toolVertimo Vedlys incorporated in text editor Tilde.sbiuras. together with a spellchecker and multi-lingual support software. It translates nounphrases and simple sentences.

National policies related toresearch infrastructures

On a national level various research and devel-opment programs continue to promote HLTrelated activities. However, the most importantdevelopment and support of resources is fore-seen in the framework of the NationalResearch Infrastructure (NRI) compatible withESFRI requirements for national states. Thestrategy of NRI includes documentation andunification of existing national resources aswell as support for trans-national initiativessuch as CLARIN, CESSDA and other similarjoint infrastructures for the Social Sciences andHumanities (SSH). National support forresearch infrastructures in general and HLT inparticular is timely since “SSH researchers relyon new technologies, and real overhead costs

for SSH research have increased dramaticallyover the past 20 years, without governmentsubsidies necessarily reflecting these changes.Consequently, more and more SSH researchdepends on capital injections to develop cut-ting edge data sets and develop retrieval sys-tems” (METRIS report 2009).

It can be concluded that most of LithuanianHLT related activities, mentioned in theIntroduction, are taken care of on nationallevel. Training and research, however, remainthe least attended activities. Therefore, it isexpected that our partnership in CLARIN con-sortium will boost the development of bothresearch infrastructures and research itself. C

The State of Lithuanian HLT

TThhee bbuuiillddiinngg ooff VVyyttaauuttaass MMaaggnnuuss UUnniivveerrssiittyy iinn KKaauunnaass


Join CLARINThe CLARIN project is a combination ofCollaborative Projects and Coordinationand Support Actions, registered at the EUunder the number FRA-2007-2.2.1.2. Itstarted with the preparatory phase in 2008that will make the grounds for the nextphases and it will cover the generic, lan-guage independent activities. In order todo our work properly we have to rely on amuch wider circle than just the formal con-sortium partners in the project. For thisreason we have opened up all our projectworking groups for participation by orga-nizations that are not part of the consor-tium.

MembersAAuussttrriiaa ((NNCCPP:: GGeerrhhaarrdd BBuuddiinn)) University of Graz (Graz): AustrianGerman Research Centre (C:Rudolf Muhr); Institut für Romanistik(C:Stefan Schneider)Austrian Academy of Sciences (Vienna): Austrian Academy Corpus

(C:Christoph Benda); Department of Linguistics andCommunication Research (C:Sabine Laaha); Institute ofLexicography of Austrian Dialects and Names (C:Eveline Wandl-Vogt)

Secure Business Austria (Vienna): (C:Edgar R. Weippl)University of Vienna (Vienna): Center for Translation Studies

(C:Gerhard Budin)BBeellggiiuumm ((NNCCPP:: IInneekkee SScchhuuuurrmmaann)) University of Antwerp (Antwerp):

Center for Dutch Language and Speech (C:Walter Daelemans)Vrije Universiteit Brussel (Brussels): Laboratory for Digital Speech and

Audio Processing, Department of Electronics and InformationProcessing (C:Werner Verhelst)

Gent University (Gent): Digital Speech and Signal Processing researchgroup at the Electronics and Information Systems department(C:Jean-Pierre Martens)

University College Ghent (Gent): Faculty of Translation Studies,Language and Translation Technology Team (C:Veronique Hoste)

Katholieke Universiteit Leuven (Leuven): Center for ComputationalLinguistics (C:Frank Van Eynde); ESAT-PSI/Speech (C:PatrickWambacq); Language Intelligence & Information Retrieval(C:Marie-Francine Moens)

Katholieke Universiteit Leuven (Leuven - Kortrijk): iTec(Interdisciplinary research on Technology, Education &Communication) (C:Hans Paulussen)

BBuullggaarriiaa ((NNCCPP:: KKiirriill SSiimmoovv)) University of Plovdiv (Plovdiv): Faculty ofMathematics and Informatics (C:Veska Noncheva)

Bulgarian Academy of Sciences (Sofia): Department of ComputationalLinguistics, Institute for Bulgarian Language (C:Svetla Koeva);Institute for Parallel Processing of Bulgarian Academy of Sciences(Sofia): Linguistic Modelling Department (C:Kiril Simov); Instituteof Mathematics and Informatics, Bulgarian Academy of Sciences(Sofia): Mathematical Linguistics Departement (C:LudmilaDimitrova)

St. Cyril and St. Methodius University (Veliko Turnovo): (C:BoryanaBratanova)

CCrrooaattiiaa ((NNCCPP:: MMaarrkkoo TTaaddii}})) Institute of Croatian Language andLinguistics (Zagreb): (C:Damir ]avar)

University of Zagreb (Zagreb): Department of Linguistics, Faculty ofHumanities and Social Sciences (C:Marko Tadi}); Zagreb UniversityComputing Center (C:Zoran Beki})

CCyypprruuss ((NNCCPP:: --)) Cyprus College (Nicosia): Research Center (C:AntonisTheocharous)

CCzzeecchh RReeppuubblliicc ((NNCCPP:: EEvvaa HHaajjii~~oovváá)) Masaryk University (Brno):Faculty of Informatics (C:Aleš Horák)

Charles University (Prague): Institute of Formal and AppliedLinguistics, Faculty of Mathematics and Physics (C:Eva Haji~ová)

The Institute of the Czech Language, Czech Academy of Sciences(Prague): The Institute of the Czech Language (C:Karel Oliva)

DDeennmmaarrkk ((NNCCPP:: BBeennttee MMaaeeggaaaarrdd)) Copenhagen Business School(Copenhagen): Department of International Language Studies andComputational Linguistics (C:Peter Juel Henrichsen)

Dansk Sprognævn - Danish Language Council (Copenhagen):(C:Sabine Kirchmeier-Andersen)

Society for Danish Language and Literature (Copenhagen): (C:JørgAsmussen)

The National Museum of Denmark (Copenhagen): (C:Birgit Rønne)The Royal Library (Copenhagen): (C:Anders Conrad)University of Copenhagen (Copenhagen): Centre for Language

Technology, Faculty of Humanities (C:Bente Maegaard)University of Southern Denmark (Kolding): Faculty of Humanities

(C:Johannes Wagner)EEssttoonniiaa ((NNCCPP:: TTiiiitt RRoooossmmaaaa)) University of Tartu (Tartu): Institute of

Computer Science (C:Tiit Roosmaa)FFiinnllaanndd ((NNCCPP:: KKiimmmmoo KKoosskkeennnniieemmii)) CSC - the Finnish IT Center for

Science (Espoo): (C:Pirjo-Leena Forsström)Lingsoft Inc. (Helsinki): (C:Juhani Reiman)The Research Institute for the Languages of Finland (Helsinki): (C:Toni

Suutari)University of Helsinki (Helsinki): Department of General Linguistics

(C:Kimmo Koskenniemi)University of Joensuu (Joensuu): Department of Foreign Languages

and Translation Studies (C:Jussi Niemi)

University of Oulu (Oulu): Faculty of Humanities, Finnish Language(C:Marketta Harju-Autti)

University of Tampere (Tampere): Faculty of Information Sciences,Department of Information Studies and Interactive Media (C:EeroSormunen)

FFrraannccee ((NNCCPP:: JJeeaann--MMaarriiee PPiieerrrreell)) Centre de ressources pour la docu-mentation de l'oral (Aix-en-Provence) (C:Bernard Bel)

National Center for Scientific Research (CNRS) (Marseille): Laboratoired'Informatique Fondamentale de Marseille (LIF-CNRS) (C:MichaelZock)

Centre National de Ressources Textuelles et Lexicales (CNTRL) (Nancy):(C:Bertrand Gaiffe)

National Center for Scientific Research (CNRS) (Nancy): Analyse etTraitement Informatique de la Langue Française (ALTIF) (C:Jean-Marie Pierrel)

National Center for Scientific Research (CNRS) (Orsay): Institute forMultilingual and Multimedia Information (IMMI-CNRS) (C:JosephMariani)

Evaluations and Language resources Distribution Agency (ELDA)(Paris): (C:Khalid Choukri)

National Center for Scientific Research (CNRS) (Paris): TraitementÉlectronique des Manuscrits et des Archives (TELMA/DIS)(C:Florence Clavaud)

Université Paris 4 Sorbonne (Paris): Centre de linguistique théoriqueet appliquée (CELTA) (C:Andre Wlodarczyk)

Université de Strasbourg (Strasbourg): Equipe de recherche LiLPa(Linguistique, Langues, Parole) (C:Amalia Todirascu)

National Center for Scientific Research (CNRS) (Vandoeuvre lesNancy): L'Institut de l'Information Scientifique et Technique (INIST-CNRS) (C:Fabrice Lecocq)

University Paris Est/Paris 12 (Vitry Sur Seine): LISSI Laboratory(C:Yacine Amirat)

GGeerrmmaannyy ((NNCCPP:: EErrhhaarrdd HHiinnrriicchhss)) University of Augsburg (Augsburg):Philologisch-Historische Fakultät (C:Ulrike Gut)

Berlin-Brandenburg Academy of Sciences (Berlin): (C:AlexanderGeyken)

Humboldt-University Berlin (Berlin): Institut für deutsche Sprache undLinguistik (C:Anke Lüdeling)

Technische Universität Darmstadt (Darmstadt): Ubiquitous KnowledgeProcessing (UKP) Lab (C:Iryna Gurevych)

TU Dortmund University (Dortmund): Institute for German Languageand Literature (C:Michael Beißwenger)

Universität Duisburg-Essen (Essen): Fakultät Geisteswissenschaften /Germanistik / Linguistik (C:Bernhard Schröder)

University of Frankfurt/Main (Frankfurt/Main): Comparative LinguisticsDepartment (C:Jost Gippert)

University of Giessen (Giessen): Institut für Germanistik (C:HenningLobin)

DFG-Project “Language Variation in Northern Germany”(Sprachvariation in Norddeutschland - SiN) (Hamburg): (C:IngridSchröder)

University of Hamburg (Hamburg): Faculty for Language Literatureand Media, Arbeitsstelle “Computerphilologie” (C:Cristina Vertan);Fakultät für Geisteswissenschaften, Fachbereich Sprache, Literatur,Medien (C:Angelika Redder); Institute of German Sign Languageand Communication of the Deaf (C:Thomas Hanke); SFB 538Multilingualism (C:Thomas Schmidt)

University of Heidelberg (Heidelberg): Computational LinguisticsDepartment (C:Anette Frank)

University of Cologne (Köln): Institut für Linguistik - Phonetik(C:Dagmar Jung)

Max Planck Institute for Evolutionary Anthropology (Leipzig):Department of Linguistics (C:Hans-Jörg Bibiko)

University of Leipzig (Leipzig): Institut für Informatik, AbteilungAutomatische Sprachverarbeitung (C:Codrina Lauth)

Institut für Deutsche Sprache (Mannheim): (C:Marc Kupietz)Westfälische Wilhelms-Universität Münster (Münster): Institut für

Allgemeine Sprachwissenschaft (C:Gabriele Müller)University of Potsdam (Potsdam): Department of Linguistics

(C:Manfred Stede)German Research Center for Artificial Intelligence (Saarbrücken):

Language Technology Lab (C:Thierry Declerck)University of Stuttgart (Stuttgart): Institut für Maschinelle

Sprachverarbeitung (C:Ulrich Heid)Universität Trier (Trier): Kompetenzzentrum für elektronische

Erschließungs- und Publikationsverfahren in denGeisteswissenschaften (C:Andrea Rapp)

Universität Tübingen (Tübingen): Asien-Orient-Institut (C:Ulrich Apel);Seminar für Sprachwissenschaft (C:Erhard Hinrichs)

GGrreeeeccee ((NNCCPP:: SStteelliiooss PPiippeerriiddiiss)) Institute for Language and SpeechProcessing (Athens): Department of Language TechnologyApplications (C:Stelios Piperidis)

HHuunnggaarryy ((NNCCPP:: TTaammááss VVáárraaddii)) Hungarian Academy of Sciences(Budapest): Research Institute for Linguistics (C:Tamás Váradi);Institute for Psychological Research of the Hungarian Academy ofSciences (Budapest): (C:Bea Ehmann)

Budapest University of Technology and Economics (Budapest):Department of Sociology and Communications, Media ResearchCenter (C:Peter Halacsy)

Department of Telecommunication and Media Informatics, Laboratoryof Speech Acoustics (C:Klára Vicsi)

MorphoLogic Ltd. (Budapest): MorphoLogic Ltd. (C:László Tihanyi)University of Szeged (Szeged): Department of Informatics, Human

Language Technology Group (C:Dóra Csendes)IIcceellaanndd ((NNCCPP:: EEiirrííkkuurr RRööggnnvvaallddssssoonn)) Icelandic Centre for Language

Technology (Reykjavík): (C:Eiríkur Rögnvaldsson)University of Iceland (Reykjavík): Institute of Linguistics (C:Eiríkur

Rögnvaldsson)IIrreellaanndd ((NNCCPP:: --)) National University of Ireland (Galway): Department

of English (C:Sean Ryder)IIssrraaeell ((NNCCPP:: --)) Technion-Israel Institute of Technology (Haifa):

Computer Science Department (C:Alon Itai)

IIttaallyy ((NNCCPP:: NNiiccoolleettttaa CCaallzzoollaarrii)) European Academy Bozen/Bolzano(Bolzano): Institute for Specialised Communication andMultilingualism (C:Andrea Abel)

Università di Pavia (Pavia): Dipartimento di Linguistica Teorica eApplicata (C:Andrea Sansò)

National Research Council (Pisa): Istituto di LinguisticaComputazionale (C:Nicoletta Calzolari)

University of Rome “Tor Vergata” (Rome): Department of ComputerScience (C:Fabio Massimo Zanzotto)

LLaattvviiaa ((NNCCPP:: IInngguunnaa SSkkaaddiinnaa)) Tilde Language Technologies (Riga):Tilde Language Technologies (C:Andrejs Vasiljevs)

University of Latvia (Riga): Institute of Mathematics and ComputerScience (C:Inguna Skadina)

LLiitthhuuaanniiaa ((NNCCPP:: RRuuttaa MMaarrcciinneekkiieevviicciieennee)) Vytautas Magnus University(Kaunas): Center of Computational Linguistics (C:RutaMarcinkeviciene)

Institute of the Lithuanian Language (Vilnius): (C:Daiva Vaisniene)LLuuxxeemmbboouurrgg ((NNCCPP:: --)) European Language Resources Association

(ELRA) (Luxembourg): (C:S.Piperidis/K.Choukri)MMaallttaa ((NNCCPP:: MMiikkee RRoossnneerr)) University of Malta (Malta): Department of

Computer Science (C:Michael Rosner)NNeetthheerrllaannddss ((NNCCPP:: JJaann OOddiijjkk)) Meertens Institute (Amsterdam):

Meertens Institute (C:H.J. Bennis)Univerity of Amsterdam (Amsterdam): Intelligent Systems Lab

Amsterdam (ISLA) (C:Maarten de Rijke)Vrije Universiteit Amsterdam (Amsterdam): Computational Lexicology,

Faculteit der Letteren (C:Piek Vossen)Data Archiving and Networked Services (Den Haag): (C:Henk

Harmsen)Huygens Instituut KNAW (Den Haag): (C:K.van Dalen-Oskam)University of Twente (Enschede): Human Media Interaction Group,

Department of Electrical Engineering, Mathematics and ComputerScience (C:Roeland Ordelman)

University of Gröningen (Gröningen): Faculty of Arts, Center forLanguage and Cognition (C:Wyke van der Meer)

Digital Library for Dutch Literature (Leiden): (C:C.A. Klapwijk)Institute for Dutch Lexicology (Leiden): Instituut voor Nederlandse

Lexicologie (C:Remco van Veenendaal)Universiteit Leiden (Leiden): Leiden University Centre for Linguistics,

Faculty of Humanities (C:Jeroen van de Weijer)Max Planck Institute for Psycholinguistics (Nijmegen): (C:Peter

Wittenburg)Radboud University (Nijmegen): Centre for Language and Speech

Technology (C:L. Boves / N. Oostdijk); Centre for Language Studies(C:Pieter Muysken)

Tilburg University (Tilburg): ILK Research Group, Department ofCommunication and Information Sciences, Faculty of Humanities(C:Antal van den Bosch)

University of Utrecht (Utrecht): Utrecht Institute of Linguistics OTS,Faculty of Humanities (C:Steven Krauwer)

NNoorrwwaayy ((NNCCPP:: KKooeennrraaaadd DDee SSmmeeddtt)) Norwegian School of Economicsand Business Administration (NHH) (Bergen): (C:Gisle Andersen)

Unifob AS (Bergen): (C:Eli Hagen)University of Bergen (Bergen): Language Models and Resources group

(C:Koenraad de Smedt)SINTEF (Oslo): (C:Diana Santos)The Language Council of Norway (Oslo): (C:Torbjoerg Breivik)The National Library of Norway (Oslo) (C:Kristin Bakken)University of Oslo (Oslo): Department of Linguistics and Nordic

Studies, Faculty of Humanities (C:Janne Bondi Johannessen)University of Tromsø (Tromsø): Det humanistiske fakultet (C:Trond

Trosterud)Norwegian University of Science and Technology (Trondheim):

Department of Electronics and Telecommunications (C:TorbjørnSvendsen)

PPoollaanndd ((NNCCPP:: MMaacciieejj PPiiaasseecckkii)) University of Lodz (Lodz): Institute ofEnglish Language (C:Piotr Pezik)

Polish Academy of Sciences (Warsaw): Institute of Computer Science,Department of Artificial Intelligence (C:Adam Przepiórkowski);Institute of Slavic Studies (C:Violetta Koseska-Toszewa)

Polish-Japanese Institute of Information Technology (Warsaw):(C:Krzysztof Marasek)

University of Wroclaw (Wroclaw): Instytut Informatyki Stosowanej(C:Maciej Piasecki)

Wroclaw University of Technology (Wroclaw): Institute of AppliedInformatics (C:Maciej Piasecki)

PPoorrttuuggaall ((NNCCPP:: AAnnttoonniioo BBrraannccoo)) Universidade Católica Portuguesa(Braga): Centro de Estudos Filosóficos e Humanísticos (C:AugustoSoares da Silva)

University of Minho (Braga): Centro de Estudos Humanísticos (C:PilarBarbosa)

New University of Lisbon (Caparica): Faculdade de Ciências eTecnologia (C:José Gabriel Pereira Lopes)

Instituto de Telecomunicações (Coimbra): Pólo de Coimbra(C:Fernando Perdigão)

University of Coimbra (Coimbra): Centro de Esudos de LinguísticaGeral e Aplicada (CELGA ) (C:Cristina Martins); Centro deInvestigação do Núcleo de Estudos (C:José Augusto SimõesGonçalves Leitão)

Universidade de Évora (Évora): School of Sciences and Technology(C:Paulo Quaresma)

INESC-ID, Instituto de Engenharia de Sistemas e ComputadoresInvestigação e Desenvolvimento em Lisboa (Lisboa) (C:NunoMamede)

Instituto de Linguística Teórica e Computacional (Lisbon): (C:MargaritaCorreia)

New University of Lisbon (Lisbon): Centro de Linguística (C:MariaFrancisca Xavier)

University of Lisbon (Lisbon): Centro de Linguística da Universidade deLisboa (CLUL) (C:Amália Mendes); Natural Language and SpeechGroup (NLX-Group), Department of Informatics (C:António Branco)

University of Azores (Ponta Delgada (Azores)): (C:Luis Mendes Gomes)

University of Porto (Porto): Centro de Linguística (C:Fátima Oliveira);Laboratory of Artificial Intelligence and Computer Science(C:Miguel Filgueiras)

RRoommaanniiaa ((NNCCPP:: DDaann TTuuffiiss,,)) Romanian Academy of Sciences(Bucharest): Research Institute for Artificial Intelligence (C:DanTufis,)

University Babes-Bolyai (Cluj-Napoca): Faculty of Mathematics andComputer Science (C:Tatar Doina)

“Al. I. Cuza” University of Ias,i (Ias,i): Faculty of Computer Science(C:Dan Cristea)

Romanian Academy of Sciences (Ias,i): Institute of Computer Science(C:Horia-Nicolai Teodorescu)

University of Pites,ti (Pites,ti): Faculty of Letters (C:Mihaela Mitu)West University of Timis,oara (Timis,oara): Faculty of Mathematics and

Informatics (C:Viorel Negru)SSeerrbbiiaa ((NNCCPP:: --)) University of Belgrade (Belgrade): Faculty of

Mathematics (C:Du{ko Vitas)SSlloovvaakkiiaa ((NNCCPP:: --)) Slovak Academy of Sciences (Bratislava): L’. Štúr

Institute of Linguistics (C:Radovan Garabík)SSlloovveenniiaa ((NNCCPP:: TToommaa`̀ EErrjjaavveecc)) Alpineon d.o.o. (Ljubljana): (C:Jerneja

Zganec Gros)Josef Stefan Institute (Ljubljana): Dept. of Knowledge Technologies

(C:Toma� Erjavec)SSppaaiinn ((NNCCPP:: NNúúrriiaa BBeell)) University of Alicante (Alicante):

Departamento de Lenguajes y Sistemas Informáticos (C:PatricioMartínez-Barco)

Institut d'Estudis Catalans (Barcelona) (C:Joan Soler i Bou)Technical University of Catalonia (UPC) (Barcelona): Centro de

Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)(C:Asuncion Moreno)

Universitat Autònoma de Barcelona (Barcelona): Facultat de Filosofiai Lletres, Dpt. de Filologia Anglesa i de Germanística (C:AnaFernández Montraveta)

Universitat de Barcelona (Barcelona): Departament de LingüisticaGeneral (C:Irene Castellón)

Universitat Oberta de Catalunya (Barcelona): Department ofLanguages and Cultures (C:Salvador Climent)

Universitat Pompeu Fabra (Barcelona): Institut Universitari deLingüística Aplicada (C:Núria Bel)

University of Barcelona (Barcelona): Facultat de Filologia - RamonLlull Documentation Centre (C:Joana Alvarez)

Autonomous University of Barcelona (Bellaterra): Facultad de Letras,Dept. Filología Española (C:Carlos Subirats)

Girona City Council (Girona): Records Management, Archives andPublications Service (C:Joan Boadas i Raset)

University of Jaén (Jaén): Escuela Politécnica Superior, Departamentode Informática, SINAI group (C:María Teresa Martín Valdivia)

University of the Basque Country (Leioa): Computer Science Faculty,Natural Language Processing Group (C:Arantza Diaz de Ilarraza)

University of Lleida (Lleida): Departament d'Angles i Linguistica(C:Gloria Vázquez)

Autonomous University of Madrid (Madrid): Laboratorio de LingüísticaInformática (C:Manuel Alcantara Plá)

University of Málaga (Málaga): Facultad de Filosofía y Letras, Dept.of English, French, and German Philology (C:Antonio Moreno Ortiz)

Universidad Politécnica de Valencia - ITACA (Valencia): Grid and HighPerformance Computing Group (C:Vicente Hernández García)

University of Vigo (Vigo): Facultade de Filoloxía e Tradución,Department of English, Research group LVTC (C:Javier Perez-Guerra); Seminario de Lingüística Informática, Departamento deTradución e Lingüística, TALG Research Group (C:Xavier GómezGuinovart)

University of Zaragoza (Zaragoza): Facultad de Filosofía y Letras(C:Carmen Pérez-Llantada)

SSwweeddeenn ((NNCCPP:: LLaarrss BBoorriinn)) University of Gothenburg (Gothenburg):Department of Linguistics, Faculty of Arts (C:Anders Eriksson);Språkbanken, Dept. of Swedish Language (C:Lars Borin)

Linköping University (Linköping): Department of Computer andInformation Sciences (C:Lars Ahrenberg)

Lund University (Lund): Humanities Laboratory (C:Sven Strömqvist)KTH Royal Institute of Technology (Stockholm): Department of

Speech, Music and Hearing, CSC (C:Rolf Carlson)Language Council of Sweden (Stockholm): (C:Rickard Domeij)Swedish Institute of Computer Science AB (Stockholm): (C:Björn

Gambäck)Umeå University (Umeå): HUMlab (C:Patrik Svensson)Uppsala University (Uppsala): Department of Linguistics and

Philosophy (C:Joakim Nivre)TTuurrkkeeyy ((NNCCPP:: GGüüllss,,eenn EErryyiiggvviitt)) Istanbul Technical University (Istanbul):

Elektrics-Electronics Faculty, Computer Science Department,Natural Language Processing Group (C:Güls,en Eryigvit)

Sabanci University (Istanbul): Human Language and SpeechLaboratory, Faculty of Engineering and Natural Sciences (C:KemalOflazer)

UUnniitteedd KKiinnggddoomm ((NNCCPP:: MMaarrttiinn WWyynnnnee)) Bangor University (Bangor):Language Technologies Unit (C:Briony Williams)

University of Birmingham (Birmingham): Department of English(C:Oliver Mason)

University of Surrey (Guildford): Department of Computing, Faculty ofEngineering and Physical Science (C:Lee Gillam)

Lancaster University (Lancaster): Department of Linguistics andEnglish Langauge (C:Paul Rayson)

National Centre for Text Mining (Manchester): National Centre for TextMining (C:Bill Black)

Oxford Text Archive (Oxford): Oxford University Computing Services(C:Martin Wynne)

University of Sheffield (Sheffield): Natural Language Processinggroup, Department of Computer Science (C:Wim Peters)

University of Wolverhampton (Wolverhampton): Research Institute ofInformation and Language Processing (C:Constantin Orasan)

CCLLAARRIINN nneewwsslleetttteerr •• ppuubblliisshheerr CCLLAARRIINN •• eeddiittoorr--iinn--cchhiieeff MMaarrkkoo TTaaddii}} •• eeddiittoorriiaall bbooaarrdd GGeerrhhaarrdd BBuuddiinn,, DDaann CCrriisstteeaa,, KKooeennrraaaadd DDee SSmmeeddtt,, LLaauurreenntt RRoommaarryy,,

MMaarrttiinn WWyynnnnee •• hhttttpp::////wwwwww..ccllaarriinn..eeuu