Multilingual Information Services in the area of agricultural data
The AGRIS use case
Fabrizio Celli – FAO of the UN – 06/02/2014
2
OVERVIEW
3
The setting
• The AGRIS database is a collection of more than 7.7 million bibliographic references in the agricultural domain
• They are enhanced by the AGROVOC thesaurus, which is extensively used by cataloguers to enrich data indexing in agricultural information systems
• AGRIS is an RDF-aware system (http://agris.fao.org ), a mashup application that allows users to query the AGRIS-RDF content, interlinking all records to external sources of information
• 7 million bibliographic records become 7 million mashup pages!
4
Some statistics
• 7.7 million bibliographic references• 190 million triples• ~ 300.000 visits/month• World wide used (accessed from more than
200 countries)
5
How data come to AGRIS
• Centralization: bibliographic references in the AGRIS domain (agriculture, forestry, animal husbandry, aquatic sciences and fisheries, and human nutrition)
• Interlinking: other kinds of information related to the AGRIS domain (statistics, maps, country profiles, etc.)
6
Data consuming
• AGRIS consumes metadata provided by the community and publishes them as open data
• Metadata are captured either by pulling data through harvesting from clients (e.g. aggregators, institutional repositories, using protocols such as OAI-PMH)
• or by pushing data to AGRIS from clients (e.g. national libraries or journal publishers)
7
Accept any input format!
8
The AGRIS metadata format
• AGRIS tries to accept any input format• The AGRIS input module is responsible for the
translation of the source input format to the AGRIS RDF
• The translation currently requires an intermediate step, in which metadata are converted to the AGRIS AP, a metadata standard based on Dublin Core
9
MULTILINGUAL METADATA
10
Multilingual metadata
• 80% of AGRIS references have an english content: title, abstract, etc.
• The most of the time, when the reference comes in another language, English is used as a translation for both the abstract and the title
• Data providers send us multilingual records, where English is quite the default
11
<dc:title xml:lang="en">Effects of straw returned to the field on growth and …</dc:title><dc:title xml:lang="Zh">砂姜黑土区秸秆还田对玉米生育及水分利用效率的影响 </dc:title><dc:creator>
<ags:creatorPersonal>Shen Xueshan, Anhui Agricultural University, Hefei(China)</ags:creatorPersonal><ags:creatorPersonal>Li Jincai, Anhui Agricultural University, Hefei(China)</ags:creatorPersonal><ags:creatorPersonal>Qu Huijuan, Anhui Agricultural University, Hefei(China)</ags:creatorPersonal>
</dc:creator><dc:date>
<dcterms:dateIssued>Apr. 2011</dcterms:dateIssued></dc:date><dc:subject>
<ags:subjectClassification scheme="ags:ASC">F01</ags:subjectClassification><ags:subjectThesaurus scheme="ags:AGROVOC">…</ags:subjectThesaurus>
</dc:subject><dc:description>
<dcterms:abstract xml:lang="Zh"> 摘 要:为了在淮北砂姜黑土区推广小麦玉米秸秆全量还田技术 ,采用大田定位试验 , 设置小麦玉米秸秆不还田、小麦玉米秸秆单季还田和小麦玉米秸秆两季还田 4 种
秸 秆还田方式 , 研究了小麦、玉米秸秆全量粉碎还田对机播夏玉米出苗、 ...</dcterms:abstract>
<dcterms:abstract xml:lang="En">The effects of straw returned to the field which including no straw returning ( CK ) ,wheat straw returning ( T1 ) ,maize straw returning ( T2 ) and wheat and maize straw returning ( T3 ) on emergence,growth...</dcterms:abstract>
</dc:description><dc:language scheme="ags:ISO639-1">Zh</dc:language>
12
What about Agrovoc
• AGRIS records are indexed with the AGROVOC thesaurus, the FAO multilingual vocabulary containing more than 40 000 concepts in 21 languages
• Each record can contain one or more AGROVOC strings in a specific language
• The translation to RDF allows to assign AGROVOC URIs to AGRIS record
• From an AGROVOC URI the user can extract many information, as the translation of AGROVOC strings in many languages
13
MULTILINGUALITY PROBLEMS AND NEEDS
14
The scope of this presentation
• Multilinguality problems and needs for the AGRIS online service
15
Two issues
• Displaying multilingual information• Multilingual search
16
Displaying multilingual information
• AGRIS can display its content in all the languages available in the source metadata
• For other languages, a naive translation is provided by the Google translator gadget (this step could be improved)
• http://agris.fao.org/agris-search/search.do?request_locale=en&recordID=CN2012002999
17
18
19
Multilingual search
• Currently not available• AGRIS records are indexed with AGROVOC
keywords in a specific language• The translation to RDF provides AGROVOC
URIs, which could be used to perform a multilingual search
• Currently, only AGROVOC strings go to the Apache Solr index
20
An example of the issue
• The search: +agrovoc:(AROMATIC COMPOUNDS) +agrovoc:(EXTRACTION)
returns 467 results, but they don’t include the article «Degradacion de compuestos aromaticos por microorganismos y sus aplicaciones biotecnologicas» that was indexed with «Compuestos aromaticos», in Spanish
21
A possible need
• It would be great for the AGRIS community if, when the user looks for «Aromatic compounds», the system returns also records indexed with «Composti aromatici», «Compuestos aromáticos», « 芳香类 », etc.
• AGROVOC could help
22
Thank you !