Upload
johannes-keizer
View
337
Download
0
Embed Size (px)
Citation preview
AGRISFrom a bibliographical database to a linked
open data application extending knowledge mining to the world wide web
Fabrizio Celli and Johannes Keizer – 04/11/2015
fabrizio celli johannes keizerhttp://aims.fao.org 2
Outline
What is AGRIS? (S)Mash-up! Mining and indexing the web
WHAT IS AGRIS?
fabrizio celli johannes keizerhttp://aims.fao.org 4
AGRIS The International System for Agricultural
Science and Technology A collection of more than 8 million
multilingual bibliographic resources A network of more than 150 institutions
from 65 countries A Web portal (http://agris.fao.org/)
fabrizio celli johannes keizerhttp://aims.fao.org
fabrizio celli johannes keizerhttp://aims.fao.org
AGRIS 2001
fabrizio celli johannes keizerhttp://aims.fao.org
AGRIS 2001
7
johannes keizerhttp://aims.fao.org
AGRIS 2015
fabrizio celli johannes keizerhttp://aims.fao.org 9
AGRIS users
• Researchers, professors, graduated students looking for bibliographies
• Librarians, cataloguers • Small journal publishers, professional
associations, conference organizers• Government officers asking for reports on a
specific topic
fabrizio celli johannes keizerhttp://aims.fao.org
Impact
10
It supports both developed and developing countriesAccessed from more than 200 countries and territories
Google Analytics October 2015
fabrizio celli johannes keizerhttp://aims.fao.org 11
Statistics
8,142,755 multilingual bibliographic records~ 400,000 from Latin America~ 150,000 from Africa~ 760,000 from Asia + 400,000 links to CASDD (China)
253,286,038 triples
(S)Mash-up!
12
fabrizio celli johannes keizerhttp://aims.fao.org 13
LOD infrastructure
Since December 2013 AGRIS moved to the RDF world
Generation of mashup pages• users looking for specific topics can access a
publication from the AGRIS database, combined with other related resources extracted from other preselected datasets
• external resources are not only bibliographic metadata, but also distribution maps, statistics, germplasm accessions, and so on.
fabrizio celli johannes keizerhttp://aims.fao.org 14
The RDF-ization process
Translation of the AGRIS AP XML database to RDF• Selection of existing vocabularies• Data cleaning and normalization• Index all records with the AGROVOC thesaurus• Run the conversion and publish RDF data!
Selection of external datasets we want to interlink to AGRIS
fabrizio celli johannes keizerhttp://aims.fao.org 15
AGRIS RDFbibo:Articlebibo:abstractbibo:doibibo:isbnbibo:presentedAt -> bibo:Conference -> dct:titlebibo:uridct:alternativedct:creator -> foaf:organization -> foaf:namedct:creator -> foaf:Person -> foaf:namedct:dateSubmitteddct:descriptiondct:extentdct:identifier
dct:languagedct:isPartOfdct:issueddct:publisher -> foaf:Organization -> foaf:namedct:sourcedct:subjectdct:titledct:typedct:rights
fabrizio celli johannes keizerhttp://aims.fao.org 16
AGROVOC The FAO multilingual vocabulary containing
around 32 000 concepts in up to 21 languages
Backbone: the magic that allows the interlinking to external datasets
Two ways to implement the interlinking:• Using AGROVOC formal aligments to other thesauri • Querying external WebServices with scientific names
johannes keizerhttp://aims.fao.org
Relationships, Relationshipshttp://aims.fao.org/aos/agrovoc/c_1474.html
johannes keizerhttp://aims.fao.org
johannes keizerhttp://aims.fao.org
http://agris.fao.org
http://agris.fao.org/agris-search/search.do?recordID=PH2011000084
http://agris.fao.org/agris-search/search.do?recordID=PL2003002036
20
Mashup
fabrizio celli johannes keizerhttp://aims.fao.org
From AGRIS to DBPedia
AGRIS URI
AGROVOC URI
dcterms:subject
DBPedia URI
skos:closeMatchskos:exactMatch
DBPedia Abstract
Wikipedia URL
DBPedia Picture
foaf:isPrimaryTopicOfdbpedia-owl:abstractfoaf:depiction
Entry point!
AGROVOC is the
backbone
fabrizio celli johannes keizerhttp://aims.fao.org
SPARQL in action!1. From an AGRIS URI, get the list of the AGROVOC URIs (dcterms:subject)
PREFIX dct: <http://purl.org/dc/terms/>SELECT ?agrWHERE {<AGRIS_Uri> dct:subject ?agr .
}
2. For each AGROVOC URI2.1. Get skos:closeMatch and skos:exactMatch (formal alignments to other thesauri)
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT?em ?cm {OPTIONAL { <AGROVOC_Uri> skos:exactMatch ?em } . OPTIONAL { <AGROVOC_Uri> skos:closeMatch ?cm } .
}
fabrizio celli johannes keizerhttp://aims.fao.org
Get DBPedia
2.2. The JAVA code filters DBPedia URIs, to avoid adding a new FILTER in the SPARQL query (it’s heavy…)
2.3. For each DBPedia URI, query the DBPedia SPARQL endpoint to get information to display in an AGRIS widget
SELECT ?abs ?img ?wiki WHERE {
OPTIONAL {<DBP_Uri> dbpedia-owl:abstract ?abs} . OPTIONAL {<DBP_Uri> foaf:depiction ?img} . OPTIONAL {<DBP_Uri> foaf:isPrimaryTopicOf ?wiki} . FILTER ( (lang(?abs ) =\"en\") || (!bound(?abs)) ) }
fabrizio celli johannes keizerhttp://aims.fao.org 24
Bibliography
«Migrating bibliographic datasets to the Semantic Web: The AGRIS case». Stefano Anibaldi, Yves Jaques, Fabrizio Celli, Armando Stellato, Johannes Keizer. Semantic Web journal «OpenAGRIS: using bibliographical data for linking into the agricultural knowledge web». Fabrizio Celli, Stefano Anibaldi, Maria Folch, Yves Jaques, Johannes Keizer. AOS 2011
25
Mining and indexing the web
fabrizio celli johannes keizerhttp://aims.fao.org 26
The context Scientists and researchers publish their
results not only in journals or at conferences, but also via web 2.0 tools and other media
Corpora of ongoing research activities, unpublished material, grey literature, quick discussions, and experiments with negative results and ideas
This information is usually unstructured and not exposed using web services
fabrizio celli johannes keizerhttp://aims.fao.org 27
Goal Crawl the web (manually preselected
websites) Machine learning algorithms to index
discovered web resources using AGROVOC Select relevant resources using a
recommender system Interlink to AGRIS!
fabrizio celli johannes keizerhttp://aims.fao.org
Crawling and indexing
28
https://github.com/fcproj/agrotagger
fabrizio celli johannes keizerhttp://aims.fao.org
Recommender system
29
• A JAVA component that computes meaningful intersections between the Crawler Database and the AGRIS database
• Offline process, recommendations are stored in a triplestore
fabrizio celli johannes keizerhttp://aims.fao.org
Interlinking
30
https://github.com/fcproj/recommender
fabrizio celli johannes keizerhttp://aims.fao.org 31
fabrizio celli johannes keizerhttp://aims.fao.org
Bibliography
32
Discovering, Indexing and Interlinking Information Resources Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan Vudragović. F1000 ResearchVersion 2 under revision