18
Presentations by AIMS is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 3.0 Unported License. Fabrizio Celli Johannes Keizer AGRIS – exploiting bibliographic records to create rich Linked Open Data pages AIMS Webinar

Release of AGRIS 2.0: Searching agricultural bibliografic data

Embed Size (px)

Citation preview

http://agris.fao.org 2

Outline

AGRIS network and dataflow

Data Consumption• Centralization

• Interlinking

Provenance

http://agris.fao.org

AGRIS

The AGRIS database is a collection of more than 7.7 million bibliographic records in the agricultural domain

They are enhanced by the AGROVOC thesaurus, which is extensively used by cataloguers to enrich data indexing in agricultural information systems

AGRIS is an RDF-aware system, a mashup application that allows users to query the AGRIS-RDF content, interlinking all records to external sources of information

7 million bibliographic records become 7 million mashup pages!

http://agris.fao.org

AGRIS data consumption

Centralization: bibliographic references in the AGRIS domain (agriculture, forestry, animal husbandry, aquatic sciences and fisheries, and human nutrition)

Interlinking: other kinds of information related to the AGRIS domain (statistics, maps, country profiles, etc.)

http://agris.fao.org

Data consuming

AGRIS consumes metadata provided by the community and publishes it as open data

The metadata is captured either by pulling data through harvesting from clients (e.g. aggregators, institutional repositories, using protocols such as OAI-PMH)

or by pushing data to AGRIS from clients (e.g. national libraries or journal publishers)

http://agris.fao.org

Interoperability -Accept any input format!

http://agris.fao.org

AGRIS data flow

http://agris.fao.org

Centralization: Data processing

Metadata are randomly manually checked to look for inconsistencies or recurring semantic errors

Input format is mapped to AGRIS RDF

Metadata are converted to AGRIS RDF, running the AgroTagger when Agrovoc keywords are not available

Before adding metadata to the triplestore and indexing them in the Solr index, duplicates are detected and managed, as the same record may be indexed in multiple collections or be duplicated in the same repository

http://agris.fao.org

AgroTagger

Not yet implemented

Maui is named after the

Polynesian mythological hero

and demi-god, which would

transform himself into different kinds of birds to perform

many of his exploits.

http://agris.fao.org

RDF-ization

bibo:Articlebibo:abstractbibo:doibibo:isbnbibo:presentedAt -> bibo:Conference -> dct:titlebibo:uridct:alternativedct:creator -> foaf:organization -> foaf:namedct:creator -> foaf:Person -> foaf:namedct:dateSubmitteddct:descriptiondct:extentdct:identifierdct:language

dct:isPartOfdct:issueddct:publisher -> foaf:Organization -> foaf:namedct:sourcedct:subjectdct:titledct:typedct:rights

Choose of vocabularies and mapping!

http://agris.fao.org

RDF/XML snapshot

http://agris.fao.org

Provenance

Each AGRIS record has an identifier (ARN), which has a predefined structure and contains information on the data source together with the bibliographic record’s year of creation

“IT 2008 0 00091” refers to a record created in 2008 from a specific AGRIS data provider in Italy, whose progressive number is 91

Data providers information are stored in the CIARD RING and triplified in the AGRIS centers dataset (each data provider has its own unique URI)

http://agris.fao.org

Storage system

AGRIS RDF is stored in Malaysia, at MIMOS (http://www.mimos.my/ )

Triples are managed by Allegrograph triplestore (http://www.franz.com/agraph/allegrograph/)

A 90GB machine is dedicated to the triplestore. Some month ago we used a 32 GB machine, but Allegrograph once a month (at least) went down (pending processes, memory problems)

We did tests with OWLIM and we could move to this triplestore, or find another kind of solution

http://agris.fao.org

Interlinking

Agrovoc is the backbone

Align Agrovoc to other thesauri (skos:exactMatch, skos:closeMatch)

Discover Sparql endpoints

Discover Webservices and APIs

Write the code and interlink!

http://agris.fao.org

The IFPRI case

A user queries the system

AGRIS record with Agrovoc

keywords

At least one Agrovoc keyword is a Country

name

The system queries IFPRI sparql endpoint (http://data.ifpri.org/sparql/ ) to retrieve the global hunger index (GHI) and the child mortality rate related to the Country

http://agris.fao.org

Some numbers (02/12/2013)

7,636,069 bibliographic records

187,238,716 triples in the AGRIS records datasethttp://202.45.142.113:10035/repositories/agris

372,462 triples in the AGRIS serials datasethttp://202.45.142.113:10035/repositories/jad

11,414 triples in the AGRIS centers datasethttp://202.45.142.113:10035/repositories/centers

http://agris.fao.org

AGRIS RDF RECORD

AGROVOC

Thank you !