Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
LanguageTechnologies withinA Digital Agenda for
Europe and otherperspectives
DG Information Society and Media ofthe European Commission supportsmultilingual technologies through researchand innovation programmes. The currentlevel of EU support for languagetechnology projects exceeds � 70 millionand will reach � 130 million in 2012
Kimmo RossiEuropean CommissionDirectorate General InformationSociety and Media
The challenge
Future online services, whether addressingbusiness, administrative or social needs, have tobridge language barriers. To achieve this, lan-guage technologies still need to find their wayinto constantly evolving web services and tech-nologies. New ways of communicating andcollaborating online are no longer the realm ofa few corporate sites or online communities,but an integral feature of most portals and ser-vices. Millions of users produce and consumeonline content which changes shape every sec-ond. Language technologies need to adapt tothe challenge of delivering transparently, “on-the-fly”, while coping with different forms anduses of written and spoken language. At the
same time, the European map is still fragment-ed and scattered with too many blank spots.
With the help of our portfolio of projects, wehave made a good start – bringing differentcommunities and specialist groups together,stimulating cooperation between users andproviders, making supply and demand meet inspecific application domains. But we still havea long way to go, and the process will taketime, substantial resources, and a commonvision.
What happens next?
The ICT Work Programme for 2011-2012 oper-ations will be published in the coming weeks.A dozen calls for proposals are planned over thenext two years. The 7th call launched in lateSeptember will provide � 50 million forresearch activities under Objective 4.2 –Language Technologies. In February 2011,another call will be published providing � 35million within Objective 4.1 – SME action onDigital Content and Languages.
The small and medium size enterprises (SME)action is a new concept in many ways. Firstly,it provides an opportunity for fast-movingSMEs, including start-ups and university spin-offs, to test and validate innovative ideas.Secondly, it provides new ways of linking theinformation management community with thelanguage technology and resources communi-ty. Thirdly, since participation is not limited toSMEs, the action provides result-orientedopportunities for new partnerships: publicwith private, SMEs with well established com-panies, data providers with developers andleading users. The overall goal is to generateadded value from trading, pooling and exploit-ing data in order to produce novel technolo-gies, products and services.
Looking further into the future
The recently published A Digital Agenda forEurope (DAE) throws a big challenge to thelanguage technology community. The DAEsets out to boost economic growth and exploit
the full benefits of the digital economy by cre-ating a borderless and interoperable digital sin-gle market. More concretely, the DAE invitesthe Commission to “work with stakeholders todevelop a new generation of web-based appli-cations and services, including for multilingualcontent and services, by supporting standardsand open platforms through EU-funded pro-grammes.”
In practice, this means broader and closer col-laboration between EU projects, businesses,institutions and administrations. We will workon multilingual services and platforms togeth-er with other EC departments, EU institutionsand international organisations exposed tomultilingualism. We will liaise with the webindustry to establish standards and best prac-tices to make the web a more language-friend-ly environment, and to pave the way for wide-spread deployment of language technologies.
In 2011 we will set up a Business Forum, bring-ing together current and emerging stakeholdersfrom the supply and demand side. The forumis expected to help define the requirements ofboth providers and users of language technolo-gy products and services, with emphasis oninnovation and demand-driven technologydevelopment. At the same time, we expectongoing and upcoming EU-funded actions todeliver a compelling and widely supportedStrategic Research Agenda for the field at large.This agenda will be released in 2012 at a pointin time when important decisions are going tobe taken on the next R&D framework pro-gramme, due to start towards the end of 2013.
Data sharing and portability have becomemore important than ever for researchers,developers and users. Fifteen years after ELRAwas established, new initiatives such as CLAR-IN and META-NET can promote a new cul-ture and help pool, preserve and reuse valuableresources. While these projects serve differentcommunities and purposes, there is amplescope for collaboration, e.g. implementingopen repositories, unifying metadata descrip-tions, and solving legal issues hampering thewide availability of research results. C
Number 9-10, 2010, March-June
22 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
Editors’Foreword
Marko Tadi}& Dan CristeaCLARIN Newsletter editors
It seems that, as we approach the end of theproject, the agenda of CLARIN-related
activities and events becomes more and moreheavy. If you go through this double issue, evenfrom the tip of the finger, you will be surprisedto see how many important things has hap-pened from the beginning of this year, and atwhat speed we are moving forward. In the fol-lowing we will browse for you the content ofthis CLARIN Newsletter, the double issue9-10, not necessarily in the order the articlesare included.
First, we are happy to bring to the CLARINreaders a word from the very heart of theEuropean Commission, there where the strate-gic decisions are being taken in LanguageTechnology. Kimmo Rossi is giving the freshand very welcome news of a hot autumn andwinter: very generous calls of the ICT WorkProgramme are waiting for us behind a doorthat we are invited to open and 2011-2012 willbring even more amazing initiatives.
Then, we announce new launches of pro-grammes, projects and collaborations. InNederlands, the CLARIN-NL Project hasreceived an extremely generous financing,which could be taken as an example by many
CLARIN member states, news offered to us byJan Odijk. António Branco and Marko Tadi}relate about the recent launch of thePortuguese CLARIN in Lisbon. The sameAntónio, together with Vera Lúcia Strube deLima, Thiago Pardo and Steven Krawer,describe, from the position of principal actors,another important event involving thePortuguese language, the PROPOR confer-ence, which can be seen as initiating a promis-ing collaboration ofCLARIN on the otherside of the Atlantic, inBrasil. From Kaunas,we hear the voice ofRu-ta Marcinkevi~iene
.,
bringing to the CLAR-IN community newsfrom Lithuania, thenew member of CLA-RIN, and its remark-able progresses in HLT.
Then, as in all our pre-vious issues, a series ofarticles relate about CLARIN activities. Thenew face and functionalities of the CLARINwebsite are commented by Corina Dima, itsmain implementer (WP6). Ville Oksanen,Krister Lindér and Hanna Westerlund, a Finishteam (WP8), present the principal LRs distrib-ution licenses standardized within the project.And the state of the art in speech technologiesand how are they related to language learningCALL systems, as possible part of the CLAR-IN infrastructure, is contributed by CatiaCucchiarini.
Finally, a number of articles present the mainLR&T events in Europe within the last fewmonths. The LREC series of conferences, dueto a rich heritage and an almost perfect con-ceptualization and organization, has becomeone of the major global events of our commu-nity, attended now by 1,246 registered partici-pants at the main conference and workshops.Less known, perhaps even funny, stories from
the recent Malta event are brought forward bythe chair of the local organization committee,Mike Rosner, while the conference main chair,Nicoletta Calzolari, together with ClaudiaSoria and Riccardo Del Gratta, describe one ofthe most important side effects of the MaltaLREC – the LRT Map. A great idea, which,will certainly have important consequences inthe making of a global thesaurus of languageresources and tools. Then, the already signifi-
cant CICLing, held thisyear for the first time inEurope, in Ias,i, is pre-sented by its very inven-tor, Alexander Gelbukh,and the principal localorganizer, Corina Fo-rDscu. One of the mostimportant CLARINmeetings since thebeginning of the projectwas the join gatheringof its boards, whichincluded the Executive
Board, the Scientific Board, the StrategicCoordination Board and the InternationalAdvisory Board. The coordinator of the pro-ject, Steven Krauwer, is giving a short presenta-tion, with a stress on the advances towardsestablishing the CLARIN ERIC, the newframework under which the project will func-tion starting with the construction phase. Youwill find a word, then, on the recent join meet-ing of CLARIN and FLaReNet in Stockholm,by Rolf Carlson, Kjell Elenius and DavidHouse. The discussions focused on speech andmultimodality, as well as on the best practicesthat would enable to integrate tools and toachieve uniformity in annotation. You can findalso a very short survey of the LT Days thattook place in Luxembourg, reported by MarkoTadi}.
Enjoy reading it! We would be happy to know
your opinion, your news, your suggestions.
Write us! C
List of national
correspondents
Austria
Gerhard Budin
Belgium – Flanders
Inneke Schuurman
Bulgaria
Svetla Koeva
Croatia
Marko Tadi}
Czech Republic
Karel Pala
Denmark
Bente Maegaard
Hanne Fersøe
ELRA/ELDA
Stelios Piperidis
Khalid Choukri
Estonia
Tiit Roosmaa
Finland
Kimmo Koskenniemi
France
William Del Mancino
Bertrand Gaiffe
Germany
Lothar Lemnitzer
Greece
Maria Gavrilidou
Hungary
Tamás Váradi
Italy
Valeria Quochi
Latvia
Andrejs Vasiljevs
Malta
Mike Rosner
Netherlands
Peter Wittenburg
Norway
Koenraad De Smedt
Poland
Maciej Piasecki
Portugal
Antonio Branco
Romania
Dan Cristea
Dan Tufis,
Spain
Nuria Bel
Sweden
Sven Strömqvist
UK
Martin Wynne
Call for contributions
Dear readers of the CLARINNewsletter,If you have ideas, thoughts,comments, additions, corrections,arguments, questions etc. which areconnected to the CLARIN project,even remotely, please feel free tosend them to us as your contributionat [email protected] or directly tothe editors at [email protected] [email protected].
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 33
Jan OdijkUtrecht University
CLARIN-NL forms the Dutch
national counterpart of the CLAR-
IN enterprise on the European level
(CLARIN-EU). CLARIN-NL covers a
period of 6 years, partitioned in three
phases of two years: preparation, con-
struction, and exploitation. CLARIN-
NL effectively started in April 2009 and
has a budget of 9.01 million euro.
CLARIN-NL is the first CLARIN-relat-
ed national project that has been award-
ed money for the implementation and
exploitation phases. Its website
(http://www.clarin.nl) is mostly in
English, so additional information on
the project can be easily obtained here.
CLARIN-NL has been set up as a mix-
ture between a programme and a pro-
ject, providing both the flexibility to
adapt the contents to new develop-
ments, and to new players in the field
(e.g. humanities researchers not reached
yet). At the same time it offers opportu-
nities for defining a few longer term pro-
jects in selected areas so that knowledge
and expertise built up will be sustained
in the participating institutes.
Wide coverage of academicinstitutions
Currently CLARIN-NL has 23 partici-
pants from linguistics and the humani-
ties more broadly, and it is open for new
Dutch participants. It includes all uni-
versities with a humanities faculty or
with expertise in language and/or speech
technology, and several institutes from
the Royal Netherlands Academy of Arts
and Sciences. The institutes carry out
research in linguistics construed broadly
(10), language technology (6), speech
technology (2), culture (2), lexicography
(2), social history (4), and literature (1).
CLARIN-NL thus covers a large part of
humanities research and some social sci-
ences research. Libraries are represented
as well (2), and 5 institutes are data cen-
tres. Of these, 4 (INL, MPI, Meertens,
and DANS) have expressed the inten-
tion to become a CLARIN centre (type
A or B), and some others are considering
this.
CLARIN-NL sub-projects
CLARIN-NL has initiated a range of
sub-projects. Some of these concern the
design and implementation of the infra-
structure and the CLARIN centres in
the Netherlands, among them a metada-ta project to carry out first tests of the
Component Metadata Infrastructure
(CMDI) as developed in CLARIN-prep
(WP2 and WP5) against a representative
sample of data residing in Dutch
research organizations; the infrastructureimplementation project (IIP) to design
and implement the technical infrastruc-
ture with CLARIN data centre candi-
dates as participants; the Search &Develop project in which centralized
metadata search and distributed content
search functionality will be implemen-
ted.
A project has been started up to carry
out a User Survey and Baseline Measure-ment. The survey will be carried out via
a set of interactive interviews in which
the surveyor gets into a dialogue with
the researcher to get an overview of
his/her research questions and may sug-
gest possible tools and data that might
facilitate the research. The baseline mea-
surement will inventory the current use
or non-use of digital data and tools in
the humanities making it possible to
determine the impact of the CLARIN
infrastructure on the humanities
research in a later stage.
A call has been launched in 2009 for
Data Curation and Demonstrator Projectswhich has led to 11 small projects car-
ried out by LST and humanities
researchers, coordinated by members
from the target group (humanities
researchers) and focusing on needs
derived from the humanities researchers'
research questions.
The goal of a data curation project is to
adapt an existing resource so that it is
visible, uniquely referable and accessible
via the web, and properly documented.
The goal of a demonstrator project is to
create a documented web application
The CLARIN-NL Project
Continued on the next page �
CCLLAARRIINN--NNLL kkiicckk--ooffff mmeeeettiinngg,, UUttrreecchhtt,, 22000099--0055--2277
44 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
starting from an existing tool or applica-
tion that can be used as a demonstrator
and function as a showcase of the type of
functionality CLARIN will incorporate
and support.
Important goals common to both types
of projects are (1) apply standards and
best practices and make use of the sug-
gested CLARIN architecture and agree-
ments to understand their limitations
and the requirements for extensions; and
(2) establish requirements and desidera-
ta for the CLARIN infrastructure.
In particular, all projects will have to
make metadata for their resources in
accordance with the CMDI, and con-
tribute to semantic interoperability by
mapping the data categories used to data
categories in ISOCAT, or by creating
new data categories in ISOCAT.
In this way, curated resources and a
range of showcases will become avail-
able. At the same time evidence-based
requirements and desiderata for the
CLARIN infrastructure and supported
standards and best practices will be
obtained making it possible to influence
the final selection of standards and best
practices promoted by CLARIN. These
data curation and demonstrator projects
neatly complement the European
CLARIN preparatory project, in which
the budget for a work package on these
issues was cut by the European
Commission. An overview of the award-
ed projects can be found on the CLAR-
IN-NL website.
Dutch-Flanders cooperation
A 3-year project for cooperation betweenthe Netherlands and Flanders has started.
The cooperation project consists of mul-
tiple aspects, but the core is a project in
which existing tools for Dutch are
adapted to become web services that can
be used in a work flow system. This will
be done for two modalities: text and
speech. The users involved in this pro-
ject are researchers from literary studies
and archaeology on the text side, and
social history on the speech side.
Offering web services in a work flow will
become one of the most important func-
tionalities the CLARIN infrastructure
will offer. The added value of the coop-
eration with Flanders can be found in
the fact that the tools have originally
been developed together and concern
the shared Dutch language.
Education, training andawareness
The CLARIN-NL project includes a
plan for creating awareness and for edu-
cation and training. These activities
(some already undertaken, more in the
pipeline) involve active contributions as
well as financial and logistic support for
a range of events such as workshops,
tutorials, summer and winter schools,
project web site and wiki, newsflashes,
mailing lists, publications in magazines,
etc. CLARIN-NL is also working on
introducing courses for working with
digital data and tools in the CLARIN
infrastructure as part of the regular
humanities curriculum.
Relation with CLARIN Europe
CLARIN-NL is complementary to the
European CLARIN preparatory project
in several respects: First, CLARIN-NL
does not only cover the preparatory
phase, but also the implementation
phase and part of the exploitation phase
of the CLARIN infrastructure. Second,
CLARIN-NL carries out activities in the
preparatory phase, such as the metadata
project, data curation and demonstrator
projects, for which no funds are avail-
able in the European preparatory pro-
ject.
Future
The CLARIN-NL executive board is
currently analyzing the situation in the
Netherlands. It attempts to identify gaps
in topics and disciplines covered that are
as yet underrepresented, as well as essen-
tial infrastructure functionality that is
not covered yet. It will – based on this
analysis, the results of the user survey,
and after consultation with its advisory
panels – work out a proposal for a new
call for sub-projects in 2010 with clearly
identified priorities. It will also deter-
mine the best organisation of this call,
both in terms of character (open call,
tender, direct assignments), budget, and
timing. It is expected that the plans for a
new call will be available in June 2010.
CLARIN-NL aims to create an infra-
structure that is supposed to have a long
term existence, although it is just a pro-
ject, finishing in 2014. It is therefore of
utmost importance to already start
working on embedding the CLARIN
infrastructure in the normal research
and centre activities and preparing both
a governance structure and structural
financing to ensure the long-term exis-
tence of the infrastructure. Therefore it
is closely tracking the developments in
Europe, especially with regard to the
CLARIN ERIC, and assessing how to
best play a role in this organisation.
Conclusions
The CLARIN-NL project can serve as
an excellent example for other national
CLARIN projects: its set-up as a mix of
programme and project, creates flexibil-
ity, and is an excellent way of involving
as many relevant national parties.
Moreover data curation and demonstra-
tor projects offer opportunities to seri-
ously test the standards and best prac-
tices promoted by CLARIN, strength-
ening these standards and best practices,
and provide arguments for modifica-
tions or extensions. More generally,
these projects will yield evidence-based
requirements and desiderata for the
CLARIN infrastructure. This is the best
way to ensure possibilities for influenc-
ing a selection of standards and best
practices in CLARIN that is compatible
with the existing national data. In addi-
tion the projects will yield curated data,
and, via the demonstrators, a range of
showcases that can be used to explain
and demonstrate the advantages of the
CLARIN infrastructure and the new
possibilities it will offer to researchers.
Finally, the project requires cooperation
between intended users (humanities
researchers) and the technology
providers (infrastructure specialists and
HLT providers), with a central role for
the users' research questions and thus
contributes to bringing these different
communities together in collaborative
projects. C
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 55
Corina Dima“Alexandru Ioan Cuza”University of Ias,i
The CLARIN project internal website is a
space used by the researchers in the pro-
ject to work on their tasks and collaborate
with the other members of their group.
There are currently 194 member institutions
in CLARIN, from 33 different countries,
and a number of 406 registered users.
In March this year the CLARIN project
internal website was updated. It received a
fresh look and a lot of new functionality was
added. In this article we'd like to give a short
overview of these changes.
Site navigation
The site navigation system now features a
blue drop-down menu at the top of the page,
which contains handy links to the most
important pages in the site. Here you can
find an overview of CLARIN, its structure
and its member institutions, in the Clarinmenu, a series of published material under
Publications, links to the home pages of the
project's groups under Clarin Groups, a list
of events relevant for CLARIN under Events,a list of metadata inventories hosted by
CLARIN under Resources and a list of fre-
quently asked questions about CLARIN and
a Website section under Help Desk.
News, events and newsletter
On the left side of the page, three informa-
tion blocks contain updates relevant to the
CLARIN community:
1. The News section brings under the mem-
bers attention information about new
developments in CLARIN and related
infrastructures, about important upcom-
ing events and various other news relevant
for the community.
2. The Upcoming events section offers infor-
mation about future events.
3. The CLARIN Newsletter section high-
lights the latest newsletter published by
CLARIN. The newsletter presents infor-
mation and new developments from all
over the CLARIN world, and is published
4 times a year.
Sections for registered users
The employees of CLARIN Member institu-
tions are allowed to create an account on the
CLARIN internal website. Registered users
have access to content creation facilities,
which are found in the grey user menu.
Content creation
Under the Create content tab members can
find several types of content that they are
able to create on the internal site:
• Book page: book pages are the simplest
type of content on the site; they can be
used to add new pages of information.
• Deliverable: the deliverables content type is
used to add the project deliverables and
milestones to the site.
• Event: the event content type allows the
members to add new events to the internal
website; the events can be either public
(like a conference) or organized by CLAR-
IN.
• News item: relevant news items can be
added to the website using this form; ideal-
ly, the added information should be gener-
al and relevant for all the CLARIN mem-
bers;
• Presentation: members can upload their
CLARIN-related presentations using the
provided form;
• Resource: members are encouraged to use
this form to add metadata about existing
linguistic resources to the CLARIN inven-
tory;
• Tool: tools are also harvested in the CLAR-
IN inventory; this form offers the possibil-
ity to add metadata about new tools.
Site outline
All the content on the website was reorga-
nized and added to a hierarchy. It can be
accessed via the Site Outline menu link
found in the grey user menu. On the root
level there are the most important sections of
the site, mainly related to the CLARIN
groups, but also to the most important types
of content. Almost all the created content is
added automatically to the hierarchy. For
example, all the events are by default placed
The New Face of theCLARIN Project Website
CCLLAARRIINN wweebb ssiittee rreeddeessiiggnneedd
Continued on the next page �
under the Events node when they are creat-
ed. The pages are the only type of content
for whom the user can choose the place in
the hierarchy where it should be added. The
site outline can also be edited and the nodes
inside rearranged, but this operation in
restricted to a group of trusted users.
Home pages for groups
All the CLARIN groups now have a home
page. Here one can find information about
the mission of a group and about its mem-
bers. A Documents section presents the pages
that contain relevant information for that
group. A News section is present on the right
side of the page. Group administrators can
add news items that are only relevant for
their group using the Group News form and
selecting their group as a target for that
piece of news. For the larger, work package-
level groups, a Deliverables & Milestones sec-
tion is present. Deliverables are added auto-
matically to these sections as soon as they
are uploaded to the site using the Deliverable
form. Three special pages are the ones for the
CLARIN Members, CLARIN Partners and
CLARIN Executive Board groups. The
CLARIN Members page contains an extra
Flyers & Posters section. The Partners page
has 4 additional sections:
• Templates, containing various templates
with the CLARIN logo (PPT presenta-
tion, deliverable, letter);
• Administration & Finances, with adminis-
trative and financial information related to
CLARIN;
• Consortium Meetings, which presents the
past and future CLARIN consortium
meetings;
• News from the Executive Board, where
small updates, special for partners, coming
from the Executive Board can be seen.
The Executive Board page contains sections
created especially for the Executive Board
Meetings, Agendas, Minutes and Present-
ations. The access to the Partners and
Executive Board pages is restricted to the
members of these groups only.
This is just a short presentation of the inter-
nal site. You can find more detailed infor-
mation in the Help Desk menu under
Website. Also, if you have a question or need
to report a problem regarding the function-
ality of the website, you can use the Contact
webmaster form under Help Desk > Website.
Enjoy your browsing! C
Ville OksanenKrister LindérHanna WesterlundUniversity of Helsinki
One of the most challenging tasks inbuilding language resources is the
management of copyright licences.There are several reasons for this. First ofall, the current European copyright sys-tem is designed to a large extent to satis-fy the commercial actors, e.g. publishers,record companies, etc. This means thatthe scope and duration of the rights arevery extensive and there are even certainforms of protection that do not existelsewhere in the world, e.g. databaserights. On the other hand, the excep-tions for research and teaching are typi-cally very narrow.
In CLARIN the typical flow of the contentis the following: A copyright holder, e.g. anewspaper, licenses its content to a CLAR-IN Content Provider that distributes thecontent to the End Users through a CLAR-IN Service Provider. This means that thelicense chain has to follow a similar struc-ture. Unfortunately even the first step isoften difficult because there is a group ofresources, for which there are no writtenlicense agreements and individuals familiarwith the details are no longer available.Another problem from the CLARIN per-spective is the variation in the existinglicense agreements, which makes it hard tooffer a centralized service. To tackle theseproblems, several sets of agreements have to
be used. For an outline of the resource clas-sification procedure, see Figure 1.
Regarding the license variation, we carriedout an extensive survey and found that it ispossible to categorize the licenses into threedifferent groups:
• Publicly Available Resources
• Resources for Academic Use
• Resources for Restricted Use
We followed the model used by CreativeCommons and created simple icons, i.e. caresymbols, making it easier for the end-user toimmediately see under which conditions theresource can be used, see Figure 2. In addi-tion, a deed describes the rights in humanreadable textual form. Finally, there is alsothe actual license agreement and the meta-data, i.e. the machine readable information.However, Creative Commons is not suffi-cient as such for CLARIN, because CreativeCommons does not allow for distributionrestricted to academia or even more limitedgroups of users, which is essential for manyof the older resources to be included inCLARIN.
Publicly Available (PUB) is one of the cate-gories endorsed by CLARIN. To belong tothis group, the following requirements haveto be met:
• the licence should allow distribution of the
tools and resources from the CLARIN
infrastructure,
• there must be no limitations, e.g. based on
status or geographical location etc., on
who can access and use the tools and
resources and
• there must be no limitations on the pur-
pose for which the tools and the resources
are used.
For Academic Use (ACA) the licence agree-ment includes an additional requirementthat the use is somehow related to an acade-mic institution. Here the problem may arisefrom the definition of academic use. Toqualify under this category, the tools andresources:
• should be available at least for anyone
doing research or studying in an academic
66 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
FFiigguurree 11:: RReessoouurrccee ccllaassssiiffiiccaattiioonn ttaasskk
institution recognized by the Identity
Provider Federation and
• should be available for
studying, research and
teaching purposes.
The last category, RestrictedUse (RES) includes theresources that do not fulfillthe previous requirementsbut still could be offered tothe users if certain addition-al requirements are met. Themost typical reasons for aresource to fall under thescope of RES are:
• a requirement to submit
detailed information, e.g.
an abstract, on the planned
usage or
• specific ethical or data pro-
tection-related additional
requirements.
In conjunction with the main license cate-gories PUB, ACA and RES, there can also beall or any of three additional requirements:
• A requirement for strictly non-commercial
use (NC)
• A requirement to inform the copyright
holder regarding the usage of the tools
and/or the resources in published articles
(Inf )
• A requirement to redeposit modified ver-
sions of the tools and resources with the
Service Provider (ReD)
Figure 3 displays the symbols designed forthe additional requirements.
However, this does not solve all the prob-lems. In some cases there either is no licenceagreement at all, because such an agreementhas never been made. It is also quite com-mon that the existing agreement is somehowproblematic, e.g. very low in details, makingthe categorization impossible. For those sit-uations we created the CLARIN UpdateModel Agreements with the purpose to pro-cure the required rights. The best option isto re-license the content with the CC0-license. See the Berlin Declaration (2003)for best scientific licensing practices. It iswell-understood and offers enough rights for
all parties in different digi-tal and non-digital envi-ronments. It is also com-patible with most of theother open content licen-ces. Unfortunately it is notalways possible to useCC0 due to the demandsof the copyright holders.Thus Update ModelAgreements for Academicand Non-Commercial Useare also available.
In order to test the usabil-ity of the classification sys-tem and our classificationguidelines, we did an ini-tial classification test. Wesent out a request to thecustodians of 116resources found in the
CLARIN LRT inventory. We received ananswer for 40 of the resources, i.e. 34.5%.In Figure 4, we see that more than half of theresources were classified into the (RES)restricted category. Approximately one thirdwere classified as (PUB) publicly or (ACA)academically available. Finally, one sixth wasfound to be exceptional.
One of the publicly available resources(PUB) was also classified as non-commer-cial, whereas all of the academically availableresources (ACA) were non-commercial. Therestricted resources (RES) were roughlyequally divided among no additional restric-tions (27.3 %), a non-commercial restric-tion (31.8 %) and a requirement that thelicense be personally granted by the contentowner (40.9 %).
Only one resource was such that the contentprovider found no applicable distributiontype because there was no formal agreementbetween the content owner and the contentprovider. In addition, there were 5 resourcesfor which the research project was still ongo-ing and the question of distribution wouldbe discussed only after the project had fin-ished. All in all, some kind of classificationwas received for a total of 40 resources inthis initial survey. For additional informa-
tion, see the Oksanen & al (2010). C
References
Berlin Declaration. (2003). Berlin Declaration onOpen Access to Knowledge in the Sciences andHumanities. Berlin. http://www.zim.mpg.de/ope-naccess-berlin/ berlindeclaration.html.
Hietanen, H., Oksanen, V., and Välimäki, M. (2007).Community created content. Law, business andpolicy.
Oksanen, Lindén, Westerlund (2010). LaundrySymbols and License Management – PracticalConsiderations for the Distribution of LRs basedon experiences from CLARIN. In Proceedings ofLREC 2010. Malta.
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 77
Laundry Symbols and Licence ManagementPractical Considerations for the Distribution of LRs
based on experiences from CLARIN
FFiigguurree 22:: SSyymmbboollss ffoorr tthhee mmaaiinnddiissttrriibbuuttiioonn ccllaasssseess
FFiigguurree 33:: SSyymmbboollss ffoorr aaddddiittiioonnaallddiissttrriibbuuttiioonn rreessttrriiccttiioonnss
FFiigguurree 44:: MMaaiinn ddiissttrriibbuuttiioonn ccaatteeggoorriieess..
Catia CucchiariniNederlandse Taalunie, The Hague
The interest in applying speech technolo-gy and in particular Automatic Speech
Recognition (ASR) technology to secondlanguage (L2) learning has been growingconsiderably. The addition of ASR technol-ogy to Computer Assisted LanguageLearning (CALL) systems makes it possibleto assess oral skills in a second language andto provide corrective feedback automatically.The latter feature appears particularlyappealing, because providing practice andfeedback on speaking proficiency is extreme-ly time-consuming, with the consequencethat the necessary amount of practice isalmost never achieved in traditional teacher-fronted lessons. Against this background,ASR-based CALL systems would seem tomake for an interesting supplement to tradi-tional L2 classes.
An additional advantage of ASR-basedCALL systems that often goes unmentionedis that such systems can also be employed forlanguage resource acquisition and for con-ducting innovative research on languagelearning. In this contribution we explainhow.
Developing ASR-based CALL systems thatcan provide accurate and useful feedback onoral proficiency is not trivial, because thespeech of non-natives poses special difficul-ties to ASR technology. In addition, existingsystems usually fail to provide correctivefeedback that is detailed enough and accu-rate, especially on L2 pronunciation, whichis considered a particularly challenging skill,both for L2 learners and CALL systems. Thekey factor in developing such systems con-sists in circumventing the limitations of thetechnology while taking account of itsadvantages and of pedagogical criteria (Neriet al. 2009). Recent research shows that thisis possible (Cucchiarini, et al. 2009).
Another problem that has hampered therealization of ASR-based CALL systems,especially for the smaller languages, is that
although publishers are willing to use thetechnology once it is developed, in generalthey do not have the means to finance thedevelopment of such technology and theunderlying language resources required. Inthis connection it may be interesting to referto an initiative in the Netherlands andFlanders aimed at supporting research anddevelopment in the field of languageresources and language and speech technol-ogy, which, among other things, is likely tohave positive consequences for the field ofCALL and language learning in general.
In 2004 the Dutch-Flemish programmeSTEVIN (a Dutch acronym that stands forEssential Language Resources in Dutch) waslaunched with the aim of stimulating thedevelopment of basic language and speechresources and technology for the Dutch lan-guage (see:http://taalunieversum.org/taal/technologie/stevin/english/). The programme is fundedby the Flemish and Dutch governments andis coordinated by the Dutch-Flemish organi-zation for language policy, the DutchLanguage Union(http://taalunieversum.org/taalunie).
Within the framework of the STEVIN pro-gramme various resources have been or arebeing realized which will be beneficial forthe development of ASR-based CALL appli-cations, and, more generally, for research onlanguage learning, such as an ASR toolkit, acorpus of non-native speech and a prototypeof an ASR-based CALL system. Theseresources are briefly discussed below.
The SPRAAK speech recognizer
The STEVIN project SPRAAK extendedthe work on the ASR package that had beendeveloped for over 15 years by ESAT at theUniversity of Leuven to a) develop a highlymodular toolkit for research into speechrecognition algorithms and b) provide astate-of-the art recogniser for Dutch with asimple interface that could be used by non-specialists. (Demuynck et al., 2008).SPRAAK is distributed as open source foracademic usage and at moderate cost for
commercial exploitation (for further details,see http://www.spraak.org/).
The JASMIN corpus of Dutchnon-native speech
The STEVIN project JASMIN (Cucchiariniet al. 2008) produced a corpus of speech bychildren of different age groups, elderly peo-ple and non-natives with different mothertongues (http://www.esat.kuleuven.be/psi/spraak/projects/JASMIN), which was col-lected in the Netherlands and Flanders andis specifically aimed at facilitating the devel-opment of speech-based applications forchildren, non-natives and elderly people. Inthe case of non-native speakers the applica-tions envisaged were especially languagelearning applications because there is consid-erable demand for CALL products that canhelp making Dutch L2 teaching more effi-cient.
The DISCO prototype of anASR-based CALL application
The STEVIN project DISCO (Develop-ment and Integration of Speech technologyinto COurseware for language learning)aims at developing a prototype of an ASR-based CALL system for practicing oral skillsin Dutch (Strik et al, 2009; for furtherdetails, see: http://lands.let.ru.nl/~strik/research/DISCO/). This project makes useof the SPRAAK toolkit and the JASMINnon-native speech corpus. DISCO addressesdifferent aspects of speaking proficiency,detects errors in speaking performance,points them out to the learners and givethem the opportunity to try again until theymanage to produce the correct form.
One of the interesting features of theSTEVIN projects is that the resources andthe technology developed are made publiclyavailable to interested users (researchers,HLT companies and publishers) through theDutch-Flemish HLT Agency(http://www.inl.nl/tst-centrale), a centralrepository for Dutch digital languageresources set up and financed by the Dutch
88 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
Language Resources, Speech Technologyand Language Learning:How to Establish a Virtuous CirclesState of the art in speech technologies as a part of CLARIN infrastructure
Language Union and hosted by the Institutefor Dutch Lexicology in Leiden andAntwerp. These resources can be employedfor developing applications and for conduct-ing research.
ASR-based CALL and languagelearning research
Once an ASR-based CALL system has beendeveloped, language learners can use it topractice oral skills. The system can thus beused for acquiring additional non-nativespeech data, for extending already existingcorpora like JASMIN, or for creating newones.
In addition, the CALL system can bedesigned and developed in such a way that itis possible to log details regarding the inter-actions with the users. The logbook can con-tain information on what appeared on thescreen, how long the user waited, how (s)heresponded, the feedback provided by thesystem, and how the user reacted to thisfeedback.
The additional corpus of non-native speechand the log-files can be useful for research onlanguage learning and for developing new,improved CALL systems.
In particular, ASR-based CALL systemsallow to create research conditions that werehitherto impossible, thus opening up oppor-
tunities for innovative research on languagelearning.
For instance, at the moment a project isbeing carried out at the Radboud Universityof Nijmegen, which is aimed at studying theimpact of corrective feedback on the acquisi-tion of syntax in oral proficiency(http://lands.let.kun.nl/~strik/research/FASOP).
Within this project an adapted version of theDISCO ASR-based CALL system is used tostudy how corrective feedback on oral skillsis processed on-line, whether it leads touptake in the short term and to actual acqui-sition in the long term. This has severaladvantages compared to previous studies: thelearner's oral production can be assessed on-line, near-optimal corrective feedback (clear,intensive, systematic, adapted to the learner'sdevelopmental stage and learning style) canbe provided immediately, all interactionsbetween learner and system can be logged sothat data on input, output and feedback arereadily available to be studied from differentangles.
The approach chosen in this project indi-cates how CALL and language and speechtechnology can offer new opportunities forlanguage learning research (Ellis and Bogart,2007). In turn, new insights in languagelearning can inform the development of bet-
ter CALL systems and language and speechtechnology. In this way a virtuous circle canbe established in which these fields profitfrom each other and lead to increased knowl-edge and better language learning systemsand this systems could be an integral part of
CLARIN infrastructure. C
References
Cucchiarini, C., Driesen, J., Van Hamme, H. andSanders, E., Recording speech of children, non-natives and elderly people for HLT applications: theJASMIN-CGN corpus, in Proceedings of LREC,2008.
Cucchiarini, C., Neri, A. and Strik, H., (2009) Oralproficiency training in Dutch L2: The contributionof ASR-based corrective feedback, SpeechCommunication, 51, 853-863.
Demuynck, K., Roelens, J., Compernolle, D. van, andWambacq, P. (2008).SPRAAK: an open sourceSPeech Recognition and Automatic Annotation Kit,In Proceedings of Interspeech 2008.
Ellis, N.C., Bogart, P.S.H., (2007), Speech andLanguage Technology in Education: the perspectivefrom SLA research and practice, Proceedings ISCAITRW SLaTE, Farmington PA.
Neri, A., Cucchiarini, C., Strik, H. and Boves, L.W.J.(2009). The pedagogy-technology interface in com-puter-assisted pronunciation training. In P.Hubbard (Ed.), Computer Assisted LanguageLearning: Critical Concepts in Linguistics, VolumesI-IV (pp. 140-164). London/New York: Routledge.
Strik, H., Cornillie, F., Colpaert, J., Doremalen,J.J.H.C. van and Cucchiarini, C. (2009).Developing a CALL System for Practicing OralProficiency: How to Design for Speech Technology,Pedagogy and Learners. In Proceedings of SLATE(pp. CD-rom). Warwickshire, U.K.. Available from:http://www.eee.bham.ac.uk/SLaTE2009/papers/SLaTE2009-32.pdf [05-09-2009].
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 99
FFiigguurree 11:: EExxaammppllee ooff aa DDIISSCCOO pprroonnuunncciiaattiioonn eexxeerrcciissee wwiitthh ffeeeeddbbaacckk
1100 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
Nicoletta CalzolariClaudia SoriaRiccardo Del GrattaIstituto di LinguisticaComputazionale “AntonioZampolli”, CNRS, Pisa
This year, the LREC Conference intro-
duced a very special feature, the LREC
Map of Language Resources and Tools,
jointly sponsored by FLaReNet and ELRA.
The purpose of the Map is to shed light onthe vast amount of resources that representthe background of the research presented atLREC, in the attempt to fill in gaps in com-munity knowledge about the resources thatare used or created worldwide.
Basic information aboutresources in one place
In a nutshell, the Map can be described as acompanion to LREC providing basic infor-mation about all the resources (in a broadsense, i.e. also tools, standards, evaluationpackages, etc.) – either used or created byLREC authors and described in their papers.
Knowledge about existing resources is essen-tial to the overall advancement of research inthe field of Language Resources andTechnology: it is important to be able tolocate and retrieve the right resources for theright applications, and to exploit existingones before building new ones from scratch.Having a clear picture of which resources areavailable for which languages and for whichuse is important in order to identify existinggaps for certain languages at a given timeand estimate the amount of investmentneeded to fill them in. Knowledge about thecurrent use of resources is equally important.Knowing which resources are most used forthe various applications will help to betterunderstand the reason behind their success(their intrinsic quality, their wide availabili-ty, their licensing model, etc.). Knowingwhich standards are used in resource repre-sentation would help improve the develop-ment of standards themselves, by gettingthem more tuned to actual needs andrequirements.
Clear and easy-to-reach information of thistype about resources and related technolo-gies is lacking. At the same time, it is veryimportant to stress that most resources arevery poorly documented, or not document-ed at all, thus hindering their accessibilityand in the end, their full deployment.
The LREC 2010 Mapof Language Resourcesand ToolsLREC2010, Valetta, Malta, May 17-23, 2010
LREC2010:The Inside Story
Mike Rosner
University of Malta
So LREC 2010 is over. And yes, it seems tohave run smoothly, to judge from the reac-
tions of some of the 1200 participants. Great!We managed to fool the lot of them.Appearances can be deceptive. Read belowfor a blow-by-blow account of some of thepotential disasters.
The story begins in October 2009 when a del-egation of veteran LRECians from Paris andPisa visited to reconvince themselves thatMalta really was an appropriate venue forLREC 2010. We almost blew it on the first
evening. To create a goodimpression, I had chosen awell-known restaurant for din-ner on the Tower Road prome-nade. Everything was fine untildessert time when one of thePisa team turned white veryfast and had to be escortedfrom the table supported bytwo colleagues. Next day, asecond member of the Pisateam was looking shaky butstoically maintained a brave but quiet faceduring our first meeting with the enthusiasticRector of the University. Was this food poison-ing? Was this the delayed effect of some north-ern Italian bacteria? We shall never know forsure. What we can safely report is that themeal made an impression on all - but perhapsnot quite the expected one; and Maltaremained the choice.
LREC is dominated by two events: a stand-upwelcome reception, usually held on the first
evening, and a sit-down ban-quet, held on the last. Bothpresent considerable problemsfor the local organizer, sincevenues for 1200 people donot exactly grow on trees (par-ticularly in treeless Malta), andalso, in May, contrary to popu-lar opinion, the weather is veryunpredictable so purely out-door locations are out of thequestion.
We decided to adopt an indoor solution forthe banquet. But the reception venueremained undecided until the second visit ofthe Paris/Pisa team in March. On this occa-sion, accompanied by the Rector, we went tosee the President of the Republic in his mag-nificent Palace in Valletta. The mission hadthree goals to achieve: to ask for LREC to beheld under his Excellency's Patronage, to askfor his Excellency to open the conference, andalso to solve the problem of the reception
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 1111
The Map, thus, is intended to complementongoing cataloguing efforts in order toenhance visibility for all resources, for alllanguages and intended applications, in par-ticular the ELRA and LDC catalogues aswell as the ELRA Universal Catalogue.
Apart from providing a portrait of theresources behind the LREC community, theLREC 2010 Map intends to become a mea-suring instrument for monitoring the field oflanguage resources. In particular, the Map istailored to portraying uses and usability ofresources, either newly developed or alreadyexisting.
The Map contains information about 1994resources (data and tools) for 170 differentlanguages: their type, modality, availability,intended or actual application purposes,innovation potential, etc.
The first analyses alreadyavailable
The large amount of information containedin the Map can be analyzed in many differ-ent ways. For instance, it provides informa-tion about the most frequent type ofresource, the most represented language, theapplications for which resources are used orare being developed, the proportion of newresources vs. already existing ones, or theway in which resources are distributed to thecommunity.
The first analyses of the LREC data are avail-able at www.resourcebook.eu, where thebrowsing interface which is being developedwill be soon made accessible.
The Map is conceived as a community-builtand community-maintained repository ofinformation: the intention is to make itevolve so that it may become a place whereeach user of a resource can update and addnew resources or edit already submittedones.
The Map has a great potential for many uses.In addition to being an information gather-ing tool it may become a great instrumentfor monitoring the evolution of the field(useful also for funders), if applied in differ-ent contexts and times. It can be seen as ahuge joint effort, the beginning of an evenlarger cooperative action. It can thus becomealso an “educational” means towards thebroad acknowledgment of the need of meta-research activities with the active involve-ment of many. Finally, the Map is alsoinstrumental in introducing the new notionof “citation of resources” that could providean award and a means of scholarly recogni-tion for researchers engaged in resource cre-ation.
Lack of information and documentationabout resources is, in the e-science paradigm,a very critical issue. With the Map we aimnot only at starting to fill this gap, but alsoat encouraging the needed “change in cul-ture”. We need in fact to make each andevery researcher aware of the importance ofhis/her personal engagement in document-ing resources. It is a task as important as cre-ating new resources and is not an accessoryto be disregarded. It is the necessary serviceto the whole community that we all have toprovide.
The Map is already being adopted in othercontexts. The Language Resources andEvaluation Journal will also make use of it, asannounced in the new Special Issue LREC2008: Selected papers (freely accessible to allLREC participants). And quite recently alsoCOLING 2010 has decided to make use ofthe Map.
This work could not have been possiblewithout the invaluable contribution of allthe LREC authors, who patiently providedthe input requested. The entire communityis deeply indebted to them. C
venue. How exactly does youquietly slip a request to lend aPalace into the conversation?Another mystery that only histo-ry may solve. Rector uttered amagic incantation in Malteseand Lo! The President's nextwords were ".... and I may havejust the place for your recep-tion". Verdala Palace, a formerhunting lodge used by theKnights of Malta for gamehunting, surrounded by its own wood, was theperfect solution. With the mission totallyaccomplished, we emerged from that meetingfeeling triumphant.
That feeling remained until three weeks beforethe conference, when out of the blue thePresident's office issued a communication: dueto urgent works, Verdala Palace is no longeravailable, so please find somewhere else and,sorry for the inconvenience and all that, but wedon't have any other palaces available. The
same day, we learned that thePresident, who was visitingChina at the time, had fallenoff a platform, and had to behospitalized. So the one per-son who might have beenable to intervene was not onlyout of the country but indis-posed. What does one dounder such circumstances?Simple. You fight like with like.And what is like the President's
Office? Well how about the Prime Minister'sOffice? Helped again by the Rector, we man-aged to get our message into appropriate cor-ridor at the Auberge de Castille. And Lo! Oneweek later the President's Office called back tosay that after careful consideration, they haddecided to delay the works that we might pro-ceed with our reception at Verdala. Phew!
But that's not all, folks. Knowing that thePresident would be unable to open the confer-ence as planned due to his injuries, I was
forced to use magic again to come up with asolution. And Lo! At very short notice theMinister of IT kindly accepted to perform theopening. Once again the warm feeling of tri-umph, which lasted until… the night before theconference when I was informed, just like that,that the Minister was no longer available.Help! Just 30 mins before the official openingthe conference was lacking an official opener.Clearly, more magic was required … anotherseries of desparate phone calls. And Lo! TheMayor of Valletta had managed to cancel ameeting and would be arriving in 10 mins. Theopening was saved.
What have we learned from this? Well, magicstill works in Malta but has to be used spar-ingly since it only works in the right context.That context was provided by the many peoplewho, though their hard work, contributed tothat overwhelming impression that LREC 2010ran without a hitch. Thanks to all of you. C
IImmaaggee ccoo
uurrttee
ssyy ooff WW
oorrdd
llee ((hh
ttttpp::////ww
wwww
..wwoorrdd
llee..nn
eett))
CICLing2010, Ias,i,March 21-27, 2010
Alexander GelbukhCentro de Investigación enComputación, InstitutoPolitécnico Nacional, MexicoCorina Fora
u
scu
University Alexandru IoanCuza, Ias,i
From 21st to 27th March 2010, the 11th
CICLing, International Conferenceon Intelligent Text Processing andComputational Linguistics (http://profs.info.uaic.ro/~cicling2010 and http://www.cicling.org/2010) took place at theAlexandru Ioan Cuza University of Ias,i.The conference was jointly organized bythe Faculty of Computer Science of theAlexandru Ioan Cuza University of Ias,i(UAIC), Romania, and the NaturalLanguage Processing Laboratory of theCIC (Center for Computing Research) ofthe IPN (National Polytechnic Institute),Mexico. Romania was chosen for manyreasons, from its touristic attractions tothe host institution's experience in orga-nizing international events. Most impor-tantly, however, with this event CICLingpaid tribute to a nation that gave theworld many of the greatest wonderfulcomputational linguists, of which somehave been Program Committee members
or keynote speakers at past CICLingevents. Yet another reason for holdingCICling in Ias,i was the sesquicentennialjubilee of the Alexandru Ioan CuzaUniversity of Iasi, the oldest higher educa-tion institution in Romania.
European edition
Held for the first time in Europe, this year'sCICLing event received the record highnumber of submissions in its 11-year histo-ry: 271 submitted papers from 47 countries,the previous record being 232 papersreceived for 2008 event in Haifa, Israel. The61 accepted oral papers and the three invit-ed papers were published in ComputationalLinguistics and Intelligent Text Processing, aspecial issue of Springer's Lecture Notes inComputer Science. In addition, two otherjournals dedicated special issues to thisCICLing event: 27 papers were published inNatural Language Processing and its Applic-ations, volume 46 of Research in ComputingScience, IPN, México, and 18 papers in thefirst inaugural issue of International Journalof Computational Linguistics and Applicat-ions, Bahri Publications, India.
Almost 150 participants, including regis-tered participants, organizers, students, andthe great student volunteers, enjoyed theinvited lectures delivered by Shuly Wintnerof the University of Haifa, Israel, who pre-sented the keynote talk ComputationalModels of Language Acquisition, and JamesPustejovsky of Brandeis University, USA,
who called his keynote talk The Recognitionand Interpretation of Motion in Language. Inaddition, each keynote speaker organizedvery interesting “special events” – this is adistinctive feature of CICLing conferences.In this year both special events took theform of a discussion on ComputationalLinguistics vs. Natural Language Engineering.
Gheorghe Grigoras,, the Dean of the Facultyof Computer Science, and Dan Cristea of
the Al. I. Cuza University of Ias,i, who kind-ly accepted to be the Honorary Local Chairof CICLing 2010 and delivered a veryappreciated presentation about dictionaries,addressed the participants warm welcomespeeches at the opening ceremony. A verydear friend of CICLing, Nancy Ide of theVassar College, USA, who was withCICLing from its early days and is a pastCICLing keynote speaker, kindly acceptedto address some words to the CICLingersright after the ceremony organized by the Al.I. Cuza University where she was awardedthe title of Honorary Professor of the Al. I.
Cuza University of Ias,i.
Best paper awards and liveinternet coverage
The scientific program included, apart from
the keynote lectures and special events, oral
presentations of the CICLing authors. The
CICLing sessions were organized under the
following sections: Lexical Resources, Syntax
and Parsing, Word Sense Disambiguation
and Named Entity Recognition, Semantics
and Dialog, Humor and Emotions, Machine
Translation and Multilingualism, Infor-
mation Extraction, Information Retrieval,
Text Categorization and Classification,
Plagiarism Detection, Text Summarization,
Speech Generation. All scientific activities
took place in the Mihai Eminescu Aula
Magna of the Al. I. Cuza University of Ias,i(see the picture to the left). The poster ses-
sion, which started with a brief oral presen-
tation of all posters, was combined with the
welcome reception and was held in the Hall
of the Lost Footsteps of the University, see
the other picture.
Based on the Program Committee votes,three best papers awards were offered:
• First place: George Tsatsaronis, IraklisVarlamis, and Kjetil Nørvåg, AnExperimental Study on Unsupervised Graph-Based Word Sense Disambiguation; accord-ing to a ballot among all attendees, these
1122 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
CICLing Comes to Europe
authors also won the best presentationaward.
• Second place: Paolo Annesi and RobertoBasili, Cross-lingual Alignment of FrameNetAnnotations through Hidden MarkovModels;
• Third place: Lieve Macken and WalterDaelemans, A Chunk-driven BootstrappingApproach to Extracting Translation Patterns.
Another special award was offered for thebest student paper: Integer LinearProgramming for Dutch Sentence Com-pression, co-authored by Jan De Belder andMarie-Francine Moens.
For the first time in the history of CICLingand also in the history of the UniversityAula, all the CICLing 2010 scientific activi-ties were transmitted live over the internet;
er, with only small rains, and only duringthe days of scientific program. This wasgood also for the cultural program ofCICLing 2010-from the very first CICLingevent its famous touristic program is one ofits most attractive features. This yearCICLingers participated in three tours tothe most attractive touristic areas ofRomania. The first tour, on Sunday, March21, was dedicated especially to nature and
history of the Neamt, county. Wednesday,
March 24, was dedicated to the myths: we
visited the old city centre of Bras,ov, with the
Black Church, and after a tour at the Peles,Castle in Sinaia we had an excellent lunchwith Romanian cuisine; those CICLingerswho were very keen at the idea and still hadenergy, went also to the Bran Castle, famousfor its connection with the legend of Count
Association of Students in Informatics in
Ias,i (ASII).
Abundant local support
We also greatly appreciate the support of the
Romanian Academy. In particular, Dan
Tufis, was closely involved in the organiza-
tion of PROMISE conference (ProcessingROmanian in Multilingual, Interoperationaland Scalable Environments), a satellite event
of CICLing 2010 co-organised under the
aegis of the Romanian Academy together
with the Al. I. Cuza University of Ias,i. Very
special thanks go to Rada Mihalcea, one of
the greatest and oldest friends of CICLing,
who was, like many times before, guiding
and helping the organizing team in many
the recordings of all the sessions are availablefor download at http://profs.info.uaic.ro/~cicling2010/historyCic.html. Other pic-tures and videos are available at http://pica-saweb.google.com/cicling2010.iasi; all par-ticipants willing to share their memories cancontribute their files to this page.
Since it was the first CICLing held inEurope, the usual dates of CICLing, whichused to be in mid-April, had to be changedtaking into account other important scien-tific events as well as the local Romanianprogram and, of course, the RomanianSpring weather. Luckily, even though theusual weather in the Spring is a rainy one,during CICLing we had a very nice weath-
Dracula. The last trip, on Saturday, March27, was dedicated to the Romanian paintedmonasteries in the Bucovina county.
From all the feedback received from theCICLingers during and after the confer-ence, we can consider that the conferencewas a success, especially taking into accountthe world economic crisis. This success is ina great degree due to active participationand constant support of many people. Firstof all we want to thank all people involvedfrom the Alexandru Ioan Cuza University,starting from the leaders of the University,passing through all the University depart-ments and ending with the wonderful stu-dent volunteers – most of them from the
activities, behind the scene but always pre-sent.
Finally, after many months of close collabo-ration from the opposite sides of theAtlantic – and now separated by it again –we would say a lot of very warm wordsabout each other if we were not both theauthors of these lines – it's a pity we have toomit here this part of our emotions andgratitude...
Some months after CICLing 2010 we arehappy we survived all the arduous work, andwe are looking forward for the nextCICLing in 2011, and for the courageouslocal organizers willing to contribute to yet
another successful CICLing event. C
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 1133
GGrroouupp pphhoottoo ooff CCIICCLLiinngg ppaarrttiicciippaannttss
1144 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
Utrecht, March 3-4, 2010
Steven KrauwerCLARIN coordinator
As many of you may know, CLARINhas, in addition to the Executive
Board, three other important Boards. TheScientific Board consists of high-level sci-entists, one from each participating coun-try and appointed by the national fundingagency. It will monitor the execution ofthe programme and ensure the overall sci-
entific soundness, coherence, complete-ness, consistency and feasibility. It deter-mines the overall scientific strategy andreviews the project deliverables. TheStrategic Coordination Board consists ofrepresentatives appointed by the fundingagencies, one per country. It will monitorthe execution of the programme of workwith a view to compliance with nationalgovernments' and funding agencies' poli-cies. It determines the overall governanceand financial strategy. The StrategicCoordination Board will review all projectdeliverables.
The International Advisory Board consists ofhigh-ranking international experts, and willgive advice to the Executive, Scientific andStrategic Coordination Boards on issues ofcommon interest, such as opportunities for
collaboration, coordination and harmoniza-tion with initiatives at the internationallevel. On http://www.clarin.eu/external/index.php?page=about-clarin&sub=4 onecan find a table that presents the composi-tion of the three boards.
All-boards meeting
On March 3rd this year the three Boardsmet for the first time with the ExecutiveBoard in order to discuss the project resultsand deliverables of the first two years of theproject. On March 4th there was a meetingbetween the Strategic Coordination Boardand the Executive Board, with representa-tives from the Dutch Ministry of Science,where the focus was on the future structure
of CLARIN after completion of thePreparatory Phase project.
The various components of the project werepresented by the Work Package leaders, anddiscussed with the participants. On thewhole the members of the Boards expressedtheir satisfaction with the results presentedand were happy to endorse our deliverables.It was encouraging to hear that some of themembers of our international advisory boardtold us that they were impressed by ourachievements, especially in comparison withwhat was happening outside Europe.
The discussion on the first day brought upmany different topics, both at the global andat the detail level, and we collected a largenumber of comments and suggestions. Arecurring topic was, as could be expected,how to reach the users, especially users from
outside the field of linguistics. Even if mostof the activities related to this specific issuehad to be deleted from our original workplan at the request of the EC it was general-ly felt that this was one of the most pressingproblems that should be urgently addressedin the following phases of CLARIN, both atthe European and at the national level.Collaboration with existing and emerginginfrastructure activities within and outsideEurope could be extremely beneficial, andthe creation of CHAIN (Coalition ofHumanities and Arts Infrastructures andNetworks, http://www.chaincoalition.org/)was seen as an important step towards this.
Heading for ERIC framework
On March 4th the focus was on the futuregovernance and financing of CLARIN. Thecurrent vision to create an ERIC for CLAR-IN was presented for the Strategic Coor-dination Board. An important feature of theERIC is that it is a consortium of govern-ments (as opposed to universities, digitalarchives or research organizations). The roleof the present project can only be to makethe proper preparations for this, so that thegovernments can take over to conduct thefinal negotiations. At the meeting the cur-rent plans for the organisation and theimplementation of the ERIC were presentedand discussed. The presence of representa-tives from ministries or funding agenciesfrom most CLARIN countries made it anexcellent opportunity for an open discussionof possible obstacles. Representatives of theDutch ministry of research confirmed theirintention to take the lead in this process andto act as the host country for CLARIN. Theparticipants expressed their agreement withthe general direction taken and their willing-ness to continue the discussions. One of theimportant conclusions from this meetingwas that the original time schedule wherebythe ERIC would become operational on Jan1st 2011 was recognized as unrealistic, asmost countries needed more time to finalizetheir national roadmaps and to start thefunding allocation process. It was agreedthat the project would apply for an unfund-ed extension till July 1st 2011, to avoid a gapbetween the end of the preparatory phaseand the construction phase of CLARIN.
The meeting on March 3rd and 4th was avery important one for CLARIN, becausethis was the first opportunity for a directexchange of views and ideas between CLAR-IN and its three Boards. The fact that therewas broad support for the directions the pro-ject is taking (along its various dimensions)was very encouraging and makes us feel con-fident that we will succeed in makingCLARIN happen! C
The First Joint Meeting of the CLARIN Boards
AA ppiiccttoorreessqquuee bbuuiillddiinngg ooff tthhee UUttrreecchhtt UUnniivveerrssiittyy wwhheerree bbooaarrddss mmeeeettiinngg ttooookk ppllaaccee
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 1155
EstablishingPortugueseCLARINLisbon, March 19, 2010
António BrancoUniversity of LisbonMarko Tadi}University of Zagreb
The CLARIN Lisbon Meeting onCommon Language Resources and
Technology Infrastructure for the Humanitiesand Social Sciences took place in March 19,2010. It was requested by the Portuguesefunding agency FCT-Fundação para aCiência e Tecnologia, from the MCTES-Ministério da Ciência, Tecnologia e EnsinoSuperior (Portuguese Ministry for Science,Technology and Higher Education) in orderto establish the Portuguese CLARIN com-munity. Also, the ultimate goal of this meet-ing was to support the national decisionprocess on the joining of the CLARIN infra-structure.
The meeting was divided in four sessions whereeach was conducted by a chair person and a rap-porteur.
Session A
This introductory session started with CLARINcoordinator Steven Krauwer and his overall pre-sentation of the CLARIN project, its goals,achievements, challenges and present state ofCLARIN community that is shaping up at theEuropean as well as at the different national lev-els. Krauwer’s presentation was followed by threetalks from three different partners presentingtheir national situations.
The first one was given by Adam Przepiórkowski(Polish Academy of Sciences, Warsaw) and it wasa report from a national network with a status oflanguage resources and technology developmentand maturity quite similar to the one of Portugal.The second talk was given by Núria Bel(University Pompeu Fabra, Barcelona) where shepresented a report from a national network with amajor language from the neighbouring country,whose global projection matters. The third onewas given by Erhard Hinrichs (University ofTübingen) and he reported on a national networkthat is a major driving force of the infrastructure,with a large companion national funding pro-gramme of support. All three talks presented dif-ferent approaches and sources of inspiration toorganize a CLARIN community at the nationallevel thus giving examples to the PortugueseCLARIN initiative.
Session B
The main purpose of Session B was to plot thesituation concerning the Portuguese language andits position in the CLARIN project. The speakersfocused on the Portuguese participation in theCLARIN project and the formation of thenational CLARIN network, the Portuguese andBrazilian NLP and Linguistics community work-ing on Portuguese language, and the objective offostering collaboration between Portuguese andBrazilian research centres under CLARIN (seealso contribution in this issue of CLARINNewsletter at page 18). This session was closedwith the presentation of the view of thePortuguese Funding Agency on the ESFRI pro-jects and on the involvement of Portugal in theforthcoming development of the CLARIN frame-work.
The introductory talk in session B was given byAntónio Branco (University of Lisbon), coordina-tor of the CLARIN-Portugal network titled TheCLARIN project in Portugal. He was followed byVera Lúcia Strube de Lima (FACIN-PUCRS,Porto Alegre, Brazil) with the talk Research inNLP in Brazil: an overview. The closing contribu-tion in this section was given by Isabel Figueiredo(on behalf of Lígia Amâncio, FCT Vice-Presidentand Portuguese delegate at ESFRI) under the titleThe Portuguese participation in ESFRI projects.
Session C
This session had two main goals: description ofexamples of use cases and presentation and dis-cussion of technical problems and possible solu-tions.
The description of use cases was provided in thefollowing presentations:
1. Language databases in the psychologist's toolbox,by São Luís Castro (University of Porto);
2. Research in History meets LT: the case of Portu-guese Royal Inquiries of 1258, by Luís Gomes(University of Azores);
3. Multilinguality: global access to information,entrepreneurship and research: a global Portu-guese perspective, by José Gabriel Lopes (NewUniversity of Lisbon.)
The discussion on technical problems was initiat-ed by Daan Broader (Max-Planck Institute forPsycholinguistics) who made a detailed analysis ofthe main issues involved:
• An ecology of “e-infrastructures”, from networkservices to community services
• Technical, organizational and juridical issues
• What is needed (in terms of ICT infrastructure)to allow researchers to find, process, and storelanguage resources?
Session D
The last session started with Overview of Con-tributions from 16 CLARIN-PT members on theirown RUs, by Maria Francisca Xavier (Centro deLinguistica da Universidade Nova de Lisboa) andwas followed by a round table. The discussion inthe round table was organized around the follow-ing questions the Research Units (RUs) had beenpreviously given, and their position papers trig-gered by such questions:
• the kind of research that should be supported byCLARIN;
• the expected role of CLARIN in preservingknowledge and scientific results;
• the resources and services that CLARIN shouldmade available;
• the expected impact of CLARIN on researchand on the support of Portuguese language;
• the contributions to CLARIN that the RU willbe ready to make available.
The CLARIN Lisbon meeting finished with afruitful discussion around these key issues thusundoubtedly leading to an expected Portugueseparticipation in ERIC This meeting yielded threedeliverables, available for distribution:Memorandum with the institutonal profiles of thePortuguese CLARIN members (24 pp.);Memoradum with the postion papers of thePortuguese CLARIN members for the round table(23 pp.); The Proceedings of the meeting (128
pp.). C
PPaarrttiicciippaannttss ooff PPoorrttuugguueessee CCLLAARRIINN kkiicckk--ooffff mmeeeettiinngg
1166 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
CLARIN and FLaReNetworkshop, StockholmNovember 25-26, 2009
Rolf CarlsonKjell EleniusDavid HouseKTH, Stockholms
In November 2009, CLARIN andFLaReNet organized a two day work-
shop at KTH in Stockholm. The themewas “Best practices for speech and multi-modal databases”. The participants werespecially invited to cover different aspectsof infrastructure for spoken and multi-modal databases. The workshop web pagecan be found on http://www.speech.kth.se/clarin/ with links to all presenta-tions. The following is a short report fromthis meeting including a few snapshotsfrom the subjects that were presented anddiscussed during these intensive two days.
The background for the workshop was todiscuss research needs and share ideas andviews among spoken language researchersnot yet heavily involved in the CLARINmission. It is obvious that currentadvances and rapid development ofspeech-based interfaces particularly haveto a large degree benefited from successfulcollection and collaborative use of large-scale speech and multimodal databases.
The spoken and multimodal researchcommunity is large and varied, and thereis a great need to define the requirementsfor infrastructures which can be fruitfullyand easily shared. The annotation of inter-esting corpora in this field is, however,divergent and would benefit from findingways for harmonization and interoperabil-ity. Thus, it is of importance to look intostandards and best practices to encode var-ious relevant features of these corpora tofacilitate their use for a wide diversity ofresearchers. The features may be intona-tion, facial expressions, gestures, turn-tak-ing and emotions.
When we look at speech and multimodalresearch topics in the humanities andsocial sciences such as child-directedspeech, accent shifts, dialects and soci-olects and dialogues in different environ-ments, we do not have a strong tradition
for collection and distribution of relevantdatabases. Currently it is very much theresearchers' own task to collect the data ortry to do research on a corpus collected forsome other purpose. To be able to easilylocate, access and share data oriented tothe humanities and social sciences wouldbe a big step for many researchers. Thusan important goal towards increasing ourunderstanding of human communicationand making our applications more intelli-gent is to make language resources andtechnology available and readily usable forall kinds of scholars. Common standardsare essential for achieving this.
CLARIN framework
The first session, chaired by ErhardHinrichs, was opened by Steven Krauwerexplaining the CLARIN framework andsetting up the challenges we are facing toinclude or engage the spoken languageand multimodal community. While it is sofar the case that most CLARIN playersoriginate from text processing, it isacknowledged that speech and multi-modal resources are also important, fore.g. phoneticians, linguists, historians ofthe future and social scientists. Despitethis, speech and multimodal actors areunderrepresented in the CLARIN activi-ties. Steven pointed out that sharingspeech and multimodal resources pays offbecause of the high cost (in time &money) of annotation, and that consulta-tion with the community, especially aboutstandards is urgently needed. This is in thecommon interest of both CLARIN andFLaReNet.
Several presentations were focused on cur-rent work on data collection, annotationand analyses. The aim of PeterWittenburg's contribution was to reviewsome of the multimodal annotations inuse and point to the heterogeneity ofmany of the encoding schemes even whenbased on a harmonized format for annota-tion structures. Anton Batliner's presenta-tion was entitled “Annotations andbeyond – what (not) to standardize, andhow come that sometimes, it just hap-
Following thePrevious SuccessLT Days, LuxembourgMarch 22-23, 2010
Marko Tadi}
Editor
Following the event successfully organ-ised in January 2009 (see CLARIN
Newsletter No 4 for full report and intro-ductory details on pages 1 and 8) whereLanguage Resources and Technologieswere strongly supported by the pro-grammes that EC grants through FP7 ICTand CIP ICT-PSP calls, this year the sameformula was applied.
New calls
LT Days 2010 aimed at informing potentialproposers about announced ICT-PSP 4thCall, Theme 6: Multilingual Web and theICT-FP7 Work Programme 2010-2011.The details of each programme and callwere presented by EC officials in Sessions
3 and 4 preceeded by contextualisation ofthis calls within existing projects/initiativesin Session 1: “Different initatives, differenttarget communities, different businessmodels... one common effort” and Session2: “Enterprise Language Processing:mega-trends and developments” wherehigh-level speakers gave presentationscovering varied related topics, includingnewly started projects, the commercialviewpoint and future prospect.
In this introductory session Steven Krauwersuccesfully presented CLARIN and itsachievements so far, particularly when itcomes to sharing LRT. Common metadataschemes that would allow different projectsand initiatives to coordinate and cooperatein order to avoid the doubling of efforts,were listed as one of the top priorities. C
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 1177
pens” and also detailed some other typesof (missing) standards, wrong standards,and lacunae (such as agreed upon perfor-mance measures) where we badly needsome conventions which, however, possi-bly should not be pinned down as “newstandards”. Concluding the first sessionwas Volker Steinbiss with a talk entitled“Best practices that could help avoidingthe mess”. Among several aspects he pre-sented a few observations and best prac-tices that have been helpful to keep thestress of working with large amounts ofdata at an acceptable level.
The tools for spoken data collection devel-oped by Christoph Draxler have the
potential to become important web basedservices for efficient and economic corpuswork. Data accessibility using web basedservices were also presented by AndersEriksson, together with a description ofthe extensive dialect project SWEDIA.Important aspects of all corpus work arethe ethical or privacy issues which was thefinal subject in the session discussed byMartti Vainio.
Sarah Hawkins' presentation “Tools forwork that uses acoustic-phonetic analysis”gave a closer view on the detailed work inphonetics and especially within the S2Snetwork. Of special interest is the devel-
opment of freeware acoustic-phoneticanalysis tools that will allow student oroccasional professional users to quicklylearn how to do specific measurements,possibly manually, yet also appeal toexpert researchers working with corpora,including those who are competent pro-grammers. Such a common platformshould aim to build on and integrate,rather than replace, existing software, andto facilitate cross-disciplinary collaborativeresearch. Such a development is also ofimportance for sharing speech data fromsmall languages discussed in the presenta-tion by Ingunn Amdal. Emotions in spon-taneous conversation and in music were
themes covered by Klara Vicsi andRoberto Bresin respectively.
Hot topic: Multimodality
Multimodality is a current hot topic inhuman interaction research and the needfor annotation on multiple levels was acommon denominator for the sequence ofpresentations by Jean Carletta, KirstenBergmann, Jonas Beskow, Jens Edlundand David House.
Based on his experience of working withseveral different tools for manual annota-tion and also talking to people who usethem regularly, Nick Campbell concluded
that using these tools can be tedious andrestrictive. Rather than prescribe a stan-dard at this time, he suggested that wemight benefit more from creating a sup-port group whereby people who annotatedata regularly can communicate and sharesamples, tools, and formats.
Questions of ethics and privacy
During the course of the workshop, ampletime was devoted to general discussionwhich covered many of the topicsreviewed above. Especially lively discus-sion points concerned ethics and privacyas well as tools and standards. The finaldiscussion was led by Lou Boves andfocused on concrete objectives for CLAR-IN in meeting the challenges of engagingthe spoken language community to amuch larger extent. Lou summarized theimportant medium-term topics discussedduring the workshop as the following:
• Taxonomy of research objectives and therequired degree of granularity in annota-tions;
• Different resources and different toolsfor different research objectives;
• Re-purposing of resources;
• How dangerous is easy-to-use software?;
• Better understanding of the interplay ofresearch and infrastructure;
while the long term issues concerned are:
• Legal Issues;
• Data preservation;
• Additional Centres;
• Sustainable business model;
• Requirements of journals and fundingagencies to make data available;
• How can we create a research environ-ment in which research on small lan-guages makes similar contributions toscience as the NIST competition.
It is our conviction that the workshop
resulted in a deeper understanding of
research issues and needs within the two
research communities dealing with either
text or spoken language and also helped to
more clearly articulate and define the
basic challenges we need to face to make
further progress in the future. C
Best practices for speech and multimodal databases
KKTTHH ccaammppuuss
1188 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
PROPOR, Porto Alegre,April 27-30, 2010
António Branco
University of Lisbon
Vera Lucia Strube de Lima
PUCRS Porto Alegre
Thiago Pardo
University of São Paulo
Steven Krauwer
Utrecht University
PROPOR is the Portuguese verb 'topropose'. It is also the acronym of
the international conference on thecomputational processing of Portuguese,the sixth language with the largest num-ber of speakers, spread over theAmerican, African and Asian continents.
This conference is held every other year,alternating between Portugal and Braziland gathering over one hundred partici-pants. It features the full range of initia-tives of a mature scientific event of itssize, with oral and poster presentations,plenary keynote presentations, demosessions, tutorials, best dissertationawards, etc. Its selected papers are pub-lished in Springer's Lecture Notes inArtificial Intelligence, with a very com-petitive acceptance rate, scoring 27% inthe last edition. Of course, it is also a fes-tive occasion where old acquaintancesmeet again and new friends are made.
A relevance to CLARIN
In terms of research community mobi-lization, the PROPOR conference playsthe role of what in other communities isthe attribution of some formal associa-tive structure. It is the reference focalpoint of this research community fromboth margins of the Atlantic, and fromall over the world, whose work is on thelanguage resources and technology forPortuguese.
As its acronym suggests in thePortuguese reading, the PROPOR con-ference is the place to propose new ideasand lines of action. That was what hap-pened once more in the last edition,which took place in last April 27-30 inPorto Alegre, Brazil (http://www.inf.pucrs.br/~propor2010). And this timewith special relevance for our CLARINproject.
At the end of the morning of the secondday, the conference accommodated aone hour special plenary session devotedto CLARIN. This provided a uniqueopportunity to present the project andits progress to an audience includingboth future researchers – starting now asPhD students – and today's major play-ers in the language resources and tech-nologies from Portugal and Brazil.
This special session included three parts.In a first slot, Steven Krauwer presenteda broad overview of the project anddescribed its results and current status.Next, António Branco reported on theachievements of the Portuguese CLAR-IN network and underlined the interest
of enlarging the participation of thePortuguese language in CLARIN withthe contribution of the Brazilian col-leagues. Finally, there was time to answerquestions and clarify issues raised by theaudience.
Initiative for cooperation ofCLARIN with Brasilian centres
In the following day, we were invited bythe conference organizers to a side meet-ing with Brazilian colleagues, includingthe major players and group leaders inBrazil. The goal was to get a deeperunderstanding of the project and a bet-ter grasp of the possibility for Brazil tojoin CLARIN and the measures needed,both in Brazil and in the project, toeventually achieve that goal.
At the end of the meeting the enthusi-asm and confidence among participantswas very high about these new chancesof cooperation across continents. It isour belief that very soon we will hearabout more developments coming outof these seeds of collaboration sown inPorto Alegre.
As Brazil is a South-American country,this would clearly represent a firstopportunity and test bed towards mak-ing CLARIN a truly internationalendeavour across continents. Naturally,it brings about also the challenge forCLARIN to find ways to expand its ini-tial parameters and our accrued respon-sibility of making it an open and a reli-able organization for colleagues from allover the world! C
Bridging the Portuguese Speaking Margins ofthe Atlantic with CLARIN
SStteevveenn KKrraauuwweerr aanndd AAnnttóónniioo BBrraannccoo pprreesseennttiinngg CCLLAARRIINN
CCLLAARRIINN NNeewwsslleetttteerr // NNoo 99--1100 1199
Ru-ta Marcinkevi~iene.
Vytautas Magnus University,Kaunas
Introduction
The last two decades have been important forLithuania as a number of activities relating toHuman Language Technologies (HLT) havebeen carried out, including:
• localization of general tools,
• digitization (including adaptation of digi-tized resources),
• compilation of tools, language resources andknowledge bases,
• training and research,
• documentation and dissemination.
The first two types of activities, i.e. localizationof the user interface and digitization of cultur-al heritage, cannot be classified under HLTproper. However, some types of digitized prod-ucts can be used as linguistic resources, e.g.Database of Old Lithuanian Writings,Dictionary of Lithuanian Language,Dictionary of Contemporary LithuanianLanguage, Dictionary of Toponyms, Databaseof Lithuanian Dialects etc.
However, digitized resources are of limited useas resources, therefore a greater prominence isgiven to the third type of activity, the compila-tion of general and special corpora and lan-guage processing tools.
An overview of Lithuanian HLT
The most obvious outcome of the programmefor Lithuanian HLT was compilation of thecorpus of 100 million running words and sometools (e.g. corpus query system and collocationextraction tool, a system of morphologicalannotation and disambiguation) available forpublic use. Later developments in the field,financed mostly by national foundations,resulted in the production of the followingtools and resources:
• a morphologically annotated corpus (115million running words),
• an annotated manually checked corpus ofone million words,
• a set of parallel corpora:
• a bi-directional Czech-Lithuanian andLithuanian-Czech corpus of five millionwords
• English-Lithuanian corpus of 18 millionwords in size,
• a database of Lithuanian nominal colloca-tions, extracted from the corpus of 100 mil-lion words,
• a number of tools such as:
• a tool for the automatic identification oftext functions for the Lithuanian language,
• the tool for the extraction of collocations,
• a Lithuanian tagger,
• the Aligner2067,
• an automatic accentuation tool for theLithuanian language,
• a corpus of Spoken Lithuanian language,
• a universal annotated database of speechrecordings.
The above list is confined to the tools for lan-guage resources made at Vytautas MagnusUniversity and sponsored mainly by twonational funding agencies, the Lithuanian StateLanguage Commission and the LithuanianState Science and Studies Foundation.
Other institutions have developed a set of toolsand databases for public use or purchase. TheState Commission of the Lithuanian Languageis monitoring an open terminological database.The Institute of Mathematics and Informaticshas digitized term dictionaries from 27 disci-plines into one database. A private company
Fotonija is known for its electronic dictionariesInterleksis, T��; English-Lithuanian dictionar-ies Alkonas and Anglonas, French-Lithuaniandictionary Frankonas and a spellchecker Juodosavys. A corpus of academic discourse has beenstarted at Vilnius University, Faculty ofPhilology.
The most recent jointly developed tool was arule-based machine translation system for thetranslation of English internet texts intoLithuanian. It was developed by a group ofcompanies including Promt (St. Petersburg),Fotonija (Vilnius), Alna Software (Kaunas).They co-operated within the framework of aproject financed by EU Structural Funds.
Before the automatic machine translation sys-tem there was an automatized translation toolVertimo Vedlys incorporated in text editor Tilde.sbiuras. together with a spellchecker and multi-lingual support software. It translates nounphrases and simple sentences.
National policies related toresearch infrastructures
On a national level various research and devel-opment programs continue to promote HLTrelated activities. However, the most importantdevelopment and support of resources is fore-seen in the framework of the NationalResearch Infrastructure (NRI) compatible withESFRI requirements for national states. Thestrategy of NRI includes documentation andunification of existing national resources aswell as support for trans-national initiativessuch as CLARIN, CESSDA and other similarjoint infrastructures for the Social Sciences andHumanities (SSH). National support forresearch infrastructures in general and HLT inparticular is timely since “SSH researchers relyon new technologies, and real overhead costs
for SSH research have increased dramaticallyover the past 20 years, without governmentsubsidies necessarily reflecting these changes.Consequently, more and more SSH researchdepends on capital injections to develop cut-ting edge data sets and develop retrieval sys-tems” (METRIS report 2009).
It can be concluded that most of LithuanianHLT related activities, mentioned in theIntroduction, are taken care of on nationallevel. Training and research, however, remainthe least attended activities. Therefore, it isexpected that our partnership in CLARIN con-sortium will boost the development of bothresearch infrastructures and research itself. C
The State of Lithuanian HLT
TThhee bbuuiillddiinngg ooff VVyyttaauuttaass MMaaggnnuuss UUnniivveerrssiittyy iinn KKaauunnaass
2200 NNoo 99--1100 // NNeewwsslleetttteerr CCLLAARRIINN
Join CLARINThe CLARIN project is a combination ofCollaborative Projects and Coordinationand Support Actions, registered at the EUunder the number FRA-2007-2.2.1.2. Itstarted with the preparatory phase in 2008that will make the grounds for the nextphases and it will cover the generic, lan-guage independent activities. In order todo our work properly we have to rely on amuch wider circle than just the formal con-sortium partners in the project. For thisreason we have opened up all our projectworking groups for participation by orga-nizations that are not part of the consor-tium.
MembersAAuussttrriiaa ((NNCCPP:: GGeerrhhaarrdd BBuuddiinn)) University of Graz (Graz): AustrianGerman Research Centre (C:Rudolf Muhr); Institut für Romanistik(C:Stefan Schneider)Austrian Academy of Sciences (Vienna): Austrian Academy Corpus
(C:Christoph Benda); Department of Linguistics andCommunication Research (C:Sabine Laaha); Institute ofLexicography of Austrian Dialects and Names (C:Eveline Wandl-Vogt)
Secure Business Austria (Vienna): (C:Edgar R. Weippl)University of Vienna (Vienna): Center for Translation Studies
(C:Gerhard Budin)BBeellggiiuumm ((NNCCPP:: IInneekkee SScchhuuuurrmmaann)) University of Antwerp (Antwerp):
Center for Dutch Language and Speech (C:Walter Daelemans)Vrije Universiteit Brussel (Brussels): Laboratory for Digital Speech and
Audio Processing, Department of Electronics and InformationProcessing (C:Werner Verhelst)
Gent University (Gent): Digital Speech and Signal Processing researchgroup at the Electronics and Information Systems department(C:Jean-Pierre Martens)
University College Ghent (Gent): Faculty of Translation Studies,Language and Translation Technology Team (C:Veronique Hoste)
Katholieke Universiteit Leuven (Leuven): Center for ComputationalLinguistics (C:Frank Van Eynde); ESAT-PSI/Speech (C:PatrickWambacq); Language Intelligence & Information Retrieval(C:Marie-Francine Moens)
Katholieke Universiteit Leuven (Leuven - Kortrijk): iTec(Interdisciplinary research on Technology, Education &Communication) (C:Hans Paulussen)
BBuullggaarriiaa ((NNCCPP:: KKiirriill SSiimmoovv)) University of Plovdiv (Plovdiv): Faculty ofMathematics and Informatics (C:Veska Noncheva)
Bulgarian Academy of Sciences (Sofia): Department of ComputationalLinguistics, Institute for Bulgarian Language (C:Svetla Koeva);Institute for Parallel Processing of Bulgarian Academy of Sciences(Sofia): Linguistic Modelling Department (C:Kiril Simov); Instituteof Mathematics and Informatics, Bulgarian Academy of Sciences(Sofia): Mathematical Linguistics Departement (C:LudmilaDimitrova)
St. Cyril and St. Methodius University (Veliko Turnovo): (C:BoryanaBratanova)
CCrrooaattiiaa ((NNCCPP:: MMaarrkkoo TTaaddii}})) Institute of Croatian Language andLinguistics (Zagreb): (C:Damir ]avar)
University of Zagreb (Zagreb): Department of Linguistics, Faculty ofHumanities and Social Sciences (C:Marko Tadi}); Zagreb UniversityComputing Center (C:Zoran Beki})
CCyypprruuss ((NNCCPP:: --)) Cyprus College (Nicosia): Research Center (C:AntonisTheocharous)
CCzzeecchh RReeppuubblliicc ((NNCCPP:: EEvvaa HHaajjii~~oovváá)) Masaryk University (Brno):Faculty of Informatics (C:Aleš Horák)
Charles University (Prague): Institute of Formal and AppliedLinguistics, Faculty of Mathematics and Physics (C:Eva Haji~ová)
The Institute of the Czech Language, Czech Academy of Sciences(Prague): The Institute of the Czech Language (C:Karel Oliva)
DDeennmmaarrkk ((NNCCPP:: BBeennttee MMaaeeggaaaarrdd)) Copenhagen Business School(Copenhagen): Department of International Language Studies andComputational Linguistics (C:Peter Juel Henrichsen)
Dansk Sprognævn - Danish Language Council (Copenhagen):(C:Sabine Kirchmeier-Andersen)
Society for Danish Language and Literature (Copenhagen): (C:JørgAsmussen)
The National Museum of Denmark (Copenhagen): (C:Birgit Rønne)The Royal Library (Copenhagen): (C:Anders Conrad)University of Copenhagen (Copenhagen): Centre for Language
Technology, Faculty of Humanities (C:Bente Maegaard)University of Southern Denmark (Kolding): Faculty of Humanities
(C:Johannes Wagner)EEssttoonniiaa ((NNCCPP:: TTiiiitt RRoooossmmaaaa)) University of Tartu (Tartu): Institute of
Computer Science (C:Tiit Roosmaa)FFiinnllaanndd ((NNCCPP:: KKiimmmmoo KKoosskkeennnniieemmii)) CSC - the Finnish IT Center for
Science (Espoo): (C:Pirjo-Leena Forsström)Lingsoft Inc. (Helsinki): (C:Juhani Reiman)The Research Institute for the Languages of Finland (Helsinki): (C:Toni
Suutari)University of Helsinki (Helsinki): Department of General Linguistics
(C:Kimmo Koskenniemi)University of Joensuu (Joensuu): Department of Foreign Languages
and Translation Studies (C:Jussi Niemi)
University of Oulu (Oulu): Faculty of Humanities, Finnish Language(C:Marketta Harju-Autti)
University of Tampere (Tampere): Faculty of Information Sciences,Department of Information Studies and Interactive Media (C:EeroSormunen)
FFrraannccee ((NNCCPP:: JJeeaann--MMaarriiee PPiieerrrreell)) Centre de ressources pour la docu-mentation de l'oral (Aix-en-Provence) (C:Bernard Bel)
National Center for Scientific Research (CNRS) (Marseille): Laboratoired'Informatique Fondamentale de Marseille (LIF-CNRS) (C:MichaelZock)
Centre National de Ressources Textuelles et Lexicales (CNTRL) (Nancy):(C:Bertrand Gaiffe)
National Center for Scientific Research (CNRS) (Nancy): Analyse etTraitement Informatique de la Langue Française (ALTIF) (C:Jean-Marie Pierrel)
National Center for Scientific Research (CNRS) (Orsay): Institute forMultilingual and Multimedia Information (IMMI-CNRS) (C:JosephMariani)
Evaluations and Language resources Distribution Agency (ELDA)(Paris): (C:Khalid Choukri)
National Center for Scientific Research (CNRS) (Paris): TraitementÉlectronique des Manuscrits et des Archives (TELMA/DIS)(C:Florence Clavaud)
Université Paris 4 Sorbonne (Paris): Centre de linguistique théoriqueet appliquée (CELTA) (C:Andre Wlodarczyk)
Université de Strasbourg (Strasbourg): Equipe de recherche LiLPa(Linguistique, Langues, Parole) (C:Amalia Todirascu)
National Center for Scientific Research (CNRS) (Vandoeuvre lesNancy): L'Institut de l'Information Scientifique et Technique (INIST-CNRS) (C:Fabrice Lecocq)
University Paris Est/Paris 12 (Vitry Sur Seine): LISSI Laboratory(C:Yacine Amirat)
GGeerrmmaannyy ((NNCCPP:: EErrhhaarrdd HHiinnrriicchhss)) University of Augsburg (Augsburg):Philologisch-Historische Fakultät (C:Ulrike Gut)
Berlin-Brandenburg Academy of Sciences (Berlin): (C:AlexanderGeyken)
Humboldt-University Berlin (Berlin): Institut für deutsche Sprache undLinguistik (C:Anke Lüdeling)
Technische Universität Darmstadt (Darmstadt): Ubiquitous KnowledgeProcessing (UKP) Lab (C:Iryna Gurevych)
TU Dortmund University (Dortmund): Institute for German Languageand Literature (C:Michael Beißwenger)
Universität Duisburg-Essen (Essen): Fakultät Geisteswissenschaften /Germanistik / Linguistik (C:Bernhard Schröder)
University of Frankfurt/Main (Frankfurt/Main): Comparative LinguisticsDepartment (C:Jost Gippert)
University of Giessen (Giessen): Institut für Germanistik (C:HenningLobin)
DFG-Project “Language Variation in Northern Germany”(Sprachvariation in Norddeutschland - SiN) (Hamburg): (C:IngridSchröder)
University of Hamburg (Hamburg): Faculty for Language Literatureand Media, Arbeitsstelle “Computerphilologie” (C:Cristina Vertan);Fakultät für Geisteswissenschaften, Fachbereich Sprache, Literatur,Medien (C:Angelika Redder); Institute of German Sign Languageand Communication of the Deaf (C:Thomas Hanke); SFB 538Multilingualism (C:Thomas Schmidt)
University of Heidelberg (Heidelberg): Computational LinguisticsDepartment (C:Anette Frank)
University of Cologne (Köln): Institut für Linguistik - Phonetik(C:Dagmar Jung)
Max Planck Institute for Evolutionary Anthropology (Leipzig):Department of Linguistics (C:Hans-Jörg Bibiko)
University of Leipzig (Leipzig): Institut für Informatik, AbteilungAutomatische Sprachverarbeitung (C:Codrina Lauth)
Institut für Deutsche Sprache (Mannheim): (C:Marc Kupietz)Westfälische Wilhelms-Universität Münster (Münster): Institut für
Allgemeine Sprachwissenschaft (C:Gabriele Müller)University of Potsdam (Potsdam): Department of Linguistics
(C:Manfred Stede)German Research Center for Artificial Intelligence (Saarbrücken):
Language Technology Lab (C:Thierry Declerck)University of Stuttgart (Stuttgart): Institut für Maschinelle
Sprachverarbeitung (C:Ulrich Heid)Universität Trier (Trier): Kompetenzzentrum für elektronische
Erschließungs- und Publikationsverfahren in denGeisteswissenschaften (C:Andrea Rapp)
Universität Tübingen (Tübingen): Asien-Orient-Institut (C:Ulrich Apel);Seminar für Sprachwissenschaft (C:Erhard Hinrichs)
GGrreeeeccee ((NNCCPP:: SStteelliiooss PPiippeerriiddiiss)) Institute for Language and SpeechProcessing (Athens): Department of Language TechnologyApplications (C:Stelios Piperidis)
HHuunnggaarryy ((NNCCPP:: TTaammááss VVáárraaddii)) Hungarian Academy of Sciences(Budapest): Research Institute for Linguistics (C:Tamás Váradi);Institute for Psychological Research of the Hungarian Academy ofSciences (Budapest): (C:Bea Ehmann)
Budapest University of Technology and Economics (Budapest):Department of Sociology and Communications, Media ResearchCenter (C:Peter Halacsy)
Department of Telecommunication and Media Informatics, Laboratoryof Speech Acoustics (C:Klára Vicsi)
MorphoLogic Ltd. (Budapest): MorphoLogic Ltd. (C:László Tihanyi)University of Szeged (Szeged): Department of Informatics, Human
Language Technology Group (C:Dóra Csendes)IIcceellaanndd ((NNCCPP:: EEiirrííkkuurr RRööggnnvvaallddssssoonn)) Icelandic Centre for Language
Technology (Reykjavík): (C:Eiríkur Rögnvaldsson)University of Iceland (Reykjavík): Institute of Linguistics (C:Eiríkur
Rögnvaldsson)IIrreellaanndd ((NNCCPP:: --)) National University of Ireland (Galway): Department
of English (C:Sean Ryder)IIssrraaeell ((NNCCPP:: --)) Technion-Israel Institute of Technology (Haifa):
Computer Science Department (C:Alon Itai)
IIttaallyy ((NNCCPP:: NNiiccoolleettttaa CCaallzzoollaarrii)) European Academy Bozen/Bolzano(Bolzano): Institute for Specialised Communication andMultilingualism (C:Andrea Abel)
Università di Pavia (Pavia): Dipartimento di Linguistica Teorica eApplicata (C:Andrea Sansò)
National Research Council (Pisa): Istituto di LinguisticaComputazionale (C:Nicoletta Calzolari)
University of Rome “Tor Vergata” (Rome): Department of ComputerScience (C:Fabio Massimo Zanzotto)
LLaattvviiaa ((NNCCPP:: IInngguunnaa SSkkaaddiinnaa)) Tilde Language Technologies (Riga):Tilde Language Technologies (C:Andrejs Vasiljevs)
University of Latvia (Riga): Institute of Mathematics and ComputerScience (C:Inguna Skadina)
LLiitthhuuaanniiaa ((NNCCPP:: RRuuttaa MMaarrcciinneekkiieevviicciieennee)) Vytautas Magnus University(Kaunas): Center of Computational Linguistics (C:RutaMarcinkeviciene)
Institute of the Lithuanian Language (Vilnius): (C:Daiva Vaisniene)LLuuxxeemmbboouurrgg ((NNCCPP:: --)) European Language Resources Association
(ELRA) (Luxembourg): (C:S.Piperidis/K.Choukri)MMaallttaa ((NNCCPP:: MMiikkee RRoossnneerr)) University of Malta (Malta): Department of
Computer Science (C:Michael Rosner)NNeetthheerrllaannddss ((NNCCPP:: JJaann OOddiijjkk)) Meertens Institute (Amsterdam):
Meertens Institute (C:H.J. Bennis)Univerity of Amsterdam (Amsterdam): Intelligent Systems Lab
Amsterdam (ISLA) (C:Maarten de Rijke)Vrije Universiteit Amsterdam (Amsterdam): Computational Lexicology,
Faculteit der Letteren (C:Piek Vossen)Data Archiving and Networked Services (Den Haag): (C:Henk
Harmsen)Huygens Instituut KNAW (Den Haag): (C:K.van Dalen-Oskam)University of Twente (Enschede): Human Media Interaction Group,
Department of Electrical Engineering, Mathematics and ComputerScience (C:Roeland Ordelman)
University of Gröningen (Gröningen): Faculty of Arts, Center forLanguage and Cognition (C:Wyke van der Meer)
Digital Library for Dutch Literature (Leiden): (C:C.A. Klapwijk)Institute for Dutch Lexicology (Leiden): Instituut voor Nederlandse
Lexicologie (C:Remco van Veenendaal)Universiteit Leiden (Leiden): Leiden University Centre for Linguistics,
Faculty of Humanities (C:Jeroen van de Weijer)Max Planck Institute for Psycholinguistics (Nijmegen): (C:Peter
Wittenburg)Radboud University (Nijmegen): Centre for Language and Speech
Technology (C:L. Boves / N. Oostdijk); Centre for Language Studies(C:Pieter Muysken)
Tilburg University (Tilburg): ILK Research Group, Department ofCommunication and Information Sciences, Faculty of Humanities(C:Antal van den Bosch)
University of Utrecht (Utrecht): Utrecht Institute of Linguistics OTS,Faculty of Humanities (C:Steven Krauwer)
NNoorrwwaayy ((NNCCPP:: KKooeennrraaaadd DDee SSmmeeddtt)) Norwegian School of Economicsand Business Administration (NHH) (Bergen): (C:Gisle Andersen)
Unifob AS (Bergen): (C:Eli Hagen)University of Bergen (Bergen): Language Models and Resources group
(C:Koenraad de Smedt)SINTEF (Oslo): (C:Diana Santos)The Language Council of Norway (Oslo): (C:Torbjoerg Breivik)The National Library of Norway (Oslo) (C:Kristin Bakken)University of Oslo (Oslo): Department of Linguistics and Nordic
Studies, Faculty of Humanities (C:Janne Bondi Johannessen)University of Tromsø (Tromsø): Det humanistiske fakultet (C:Trond
Trosterud)Norwegian University of Science and Technology (Trondheim):
Department of Electronics and Telecommunications (C:TorbjørnSvendsen)
PPoollaanndd ((NNCCPP:: MMaacciieejj PPiiaasseecckkii)) University of Lodz (Lodz): Institute ofEnglish Language (C:Piotr Pezik)
Polish Academy of Sciences (Warsaw): Institute of Computer Science,Department of Artificial Intelligence (C:Adam Przepiórkowski);Institute of Slavic Studies (C:Violetta Koseska-Toszewa)
Polish-Japanese Institute of Information Technology (Warsaw):(C:Krzysztof Marasek)
University of Wroclaw (Wroclaw): Instytut Informatyki Stosowanej(C:Maciej Piasecki)
Wroclaw University of Technology (Wroclaw): Institute of AppliedInformatics (C:Maciej Piasecki)
PPoorrttuuggaall ((NNCCPP:: AAnnttoonniioo BBrraannccoo)) Universidade Católica Portuguesa(Braga): Centro de Estudos Filosóficos e Humanísticos (C:AugustoSoares da Silva)
University of Minho (Braga): Centro de Estudos Humanísticos (C:PilarBarbosa)
New University of Lisbon (Caparica): Faculdade de Ciências eTecnologia (C:José Gabriel Pereira Lopes)
Instituto de Telecomunicações (Coimbra): Pólo de Coimbra(C:Fernando Perdigão)
University of Coimbra (Coimbra): Centro de Esudos de LinguísticaGeral e Aplicada (CELGA ) (C:Cristina Martins); Centro deInvestigação do Núcleo de Estudos (C:José Augusto SimõesGonçalves Leitão)
Universidade de Évora (Évora): School of Sciences and Technology(C:Paulo Quaresma)
INESC-ID, Instituto de Engenharia de Sistemas e ComputadoresInvestigação e Desenvolvimento em Lisboa (Lisboa) (C:NunoMamede)
Instituto de Linguística Teórica e Computacional (Lisbon): (C:MargaritaCorreia)
New University of Lisbon (Lisbon): Centro de Linguística (C:MariaFrancisca Xavier)
University of Lisbon (Lisbon): Centro de Linguística da Universidade deLisboa (CLUL) (C:Amália Mendes); Natural Language and SpeechGroup (NLX-Group), Department of Informatics (C:António Branco)
University of Azores (Ponta Delgada (Azores)): (C:Luis Mendes Gomes)
University of Porto (Porto): Centro de Linguística (C:Fátima Oliveira);Laboratory of Artificial Intelligence and Computer Science(C:Miguel Filgueiras)
RRoommaanniiaa ((NNCCPP:: DDaann TTuuffiiss,,)) Romanian Academy of Sciences(Bucharest): Research Institute for Artificial Intelligence (C:DanTufis,)
University Babes-Bolyai (Cluj-Napoca): Faculty of Mathematics andComputer Science (C:Tatar Doina)
“Al. I. Cuza” University of Ias,i (Ias,i): Faculty of Computer Science(C:Dan Cristea)
Romanian Academy of Sciences (Ias,i): Institute of Computer Science(C:Horia-Nicolai Teodorescu)
University of Pites,ti (Pites,ti): Faculty of Letters (C:Mihaela Mitu)West University of Timis,oara (Timis,oara): Faculty of Mathematics and
Informatics (C:Viorel Negru)SSeerrbbiiaa ((NNCCPP:: --)) University of Belgrade (Belgrade): Faculty of
Mathematics (C:Du{ko Vitas)SSlloovvaakkiiaa ((NNCCPP:: --)) Slovak Academy of Sciences (Bratislava): L’. Štúr
Institute of Linguistics (C:Radovan Garabík)SSlloovveenniiaa ((NNCCPP:: TToommaa`̀ EErrjjaavveecc)) Alpineon d.o.o. (Ljubljana): (C:Jerneja
Zganec Gros)Josef Stefan Institute (Ljubljana): Dept. of Knowledge Technologies
(C:Toma� Erjavec)SSppaaiinn ((NNCCPP:: NNúúrriiaa BBeell)) University of Alicante (Alicante):
Departamento de Lenguajes y Sistemas Informáticos (C:PatricioMartínez-Barco)
Institut d'Estudis Catalans (Barcelona) (C:Joan Soler i Bou)Technical University of Catalonia (UPC) (Barcelona): Centro de
Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)(C:Asuncion Moreno)
Universitat Autònoma de Barcelona (Barcelona): Facultat de Filosofiai Lletres, Dpt. de Filologia Anglesa i de Germanística (C:AnaFernández Montraveta)
Universitat de Barcelona (Barcelona): Departament de LingüisticaGeneral (C:Irene Castellón)
Universitat Oberta de Catalunya (Barcelona): Department ofLanguages and Cultures (C:Salvador Climent)
Universitat Pompeu Fabra (Barcelona): Institut Universitari deLingüística Aplicada (C:Núria Bel)
University of Barcelona (Barcelona): Facultat de Filologia - RamonLlull Documentation Centre (C:Joana Alvarez)
Autonomous University of Barcelona (Bellaterra): Facultad de Letras,Dept. Filología Española (C:Carlos Subirats)
Girona City Council (Girona): Records Management, Archives andPublications Service (C:Joan Boadas i Raset)
University of Jaén (Jaén): Escuela Politécnica Superior, Departamentode Informática, SINAI group (C:María Teresa Martín Valdivia)
University of the Basque Country (Leioa): Computer Science Faculty,Natural Language Processing Group (C:Arantza Diaz de Ilarraza)
University of Lleida (Lleida): Departament d'Angles i Linguistica(C:Gloria Vázquez)
Autonomous University of Madrid (Madrid): Laboratorio de LingüísticaInformática (C:Manuel Alcantara Plá)
University of Málaga (Málaga): Facultad de Filosofía y Letras, Dept.of English, French, and German Philology (C:Antonio Moreno Ortiz)
Universidad Politécnica de Valencia - ITACA (Valencia): Grid and HighPerformance Computing Group (C:Vicente Hernández García)
University of Vigo (Vigo): Facultade de Filoloxía e Tradución,Department of English, Research group LVTC (C:Javier Perez-Guerra); Seminario de Lingüística Informática, Departamento deTradución e Lingüística, TALG Research Group (C:Xavier GómezGuinovart)
University of Zaragoza (Zaragoza): Facultad de Filosofía y Letras(C:Carmen Pérez-Llantada)
SSwweeddeenn ((NNCCPP:: LLaarrss BBoorriinn)) University of Gothenburg (Gothenburg):Department of Linguistics, Faculty of Arts (C:Anders Eriksson);Språkbanken, Dept. of Swedish Language (C:Lars Borin)
Linköping University (Linköping): Department of Computer andInformation Sciences (C:Lars Ahrenberg)
Lund University (Lund): Humanities Laboratory (C:Sven Strömqvist)KTH Royal Institute of Technology (Stockholm): Department of
Speech, Music and Hearing, CSC (C:Rolf Carlson)Language Council of Sweden (Stockholm): (C:Rickard Domeij)Swedish Institute of Computer Science AB (Stockholm): (C:Björn
Gambäck)Umeå University (Umeå): HUMlab (C:Patrik Svensson)Uppsala University (Uppsala): Department of Linguistics and
Philosophy (C:Joakim Nivre)TTuurrkkeeyy ((NNCCPP:: GGüüllss,,eenn EErryyiiggvviitt)) Istanbul Technical University (Istanbul):
Elektrics-Electronics Faculty, Computer Science Department,Natural Language Processing Group (C:Güls,en Eryigvit)
Sabanci University (Istanbul): Human Language and SpeechLaboratory, Faculty of Engineering and Natural Sciences (C:KemalOflazer)
UUnniitteedd KKiinnggddoomm ((NNCCPP:: MMaarrttiinn WWyynnnnee)) Bangor University (Bangor):Language Technologies Unit (C:Briony Williams)
University of Birmingham (Birmingham): Department of English(C:Oliver Mason)
University of Surrey (Guildford): Department of Computing, Faculty ofEngineering and Physical Science (C:Lee Gillam)
Lancaster University (Lancaster): Department of Linguistics andEnglish Langauge (C:Paul Rayson)
National Centre for Text Mining (Manchester): National Centre for TextMining (C:Bill Black)
Oxford Text Archive (Oxford): Oxford University Computing Services(C:Martin Wynne)
University of Sheffield (Sheffield): Natural Language Processinggroup, Department of Computer Science (C:Wim Peters)
University of Wolverhampton (Wolverhampton): Research Institute ofInformation and Language Processing (C:Constantin Orasan)
CCLLAARRIINN nneewwsslleetttteerr •• ppuubblliisshheerr CCLLAARRIINN •• eeddiittoorr--iinn--cchhiieeff MMaarrkkoo TTaaddii}} •• eeddiittoorriiaall bbooaarrdd GGeerrhhaarrdd BBuuddiinn,, DDaann CCrriisstteeaa,, KKooeennrraaaadd DDee SSmmeeddtt,, LLaauurreenntt RRoommaarryy,,
MMaarrttiinn WWyynnnnee •• hhttttpp::////wwwwww..ccllaarriinn..eeuu