Upload
nikos-manouselis
View
215
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Slides of my talk to members of the Agricultural Information Institute (AII) of the Chinese Academy of Agricultural Sciences (CAAS), on September 19th, 2014.
Citation preview
Agro-Know & the European agricultural research
information ecosystem
Nikos Manouselis (PhD)CEO Agro-Knowwww.agroknow.gr
ToC
• about me & Agro-Know• our context of work• building a European data e-infrastructure for
agricultural research• collaboration between CAAS AII & Agro-Know
about me
Nikos• MSc, MΕng, PhD• >150 pubs• 1 post-doc• 1 project
management position
• Agro-Know
Κρήτη (Crete)
• Crete is the largest and most populous of the Greek islands
• It forms a significant part of the economy and cultural heritage of Greece while retaining its own local cultural traits (such as its own poetry, and music)
• Crete was once the center of the Minoan civilization (circa 2700–1420 BC), which is currently regarded as the earliest recorded civilization in Europe
Minoan civilisation
• Named after King Minos
• A king of Crete, son of Zeus and Europa
Minoans: enemies with Athens
• Every nine years, King Minos of Crete made King Aegeus of Athens to pick seven young boys and seven young girls to be sent to his palace, the labyrinth, to be eaten by the monster Minotaur (half man, half bull)
Theseus prince of Athens
princess Ariadne, daughter of Minos
so the myth is about navigating
through a labyrinth
helping people navigate through agricultural information
An extraordinary company that captures, organizes and adds value to the rich information available in agricultural and biodiversity sciences, in order to
make it universally accessible, useful and meaningful.
http://www.agroknow.gr
We develop and put in real practice solutions that transform data into meaningful knowledge
and services
We help people solve problems
informed by data
Unorganized Content in local and remote sites
Widgets
Authoring services
Data Discovery Services
Analytics services
Data Framework
Ingestion Translation Publication
Harvesting BlossomCultivation
Organized and structured Content in local and remote
DBs
Educational
Bibliographic
Other
Enrichment
Aggregate data from diverse sources
Works with different type
of data
Prepare data for
meaningful services
Educational
Bibliographic
data aggregation & sharing solutions
working with high profile partners & clients
• Food and Agriculture Organization (FAO) of the United Nations
• World Bank Group• UK’s Dept for International Development (DFID)• Michigan State University (MSU)• Wageningen University & Research (WUR)• French Institute of Agricultural Research (INRA)• Creative Commons
context
CIARD• “towards a Knowledge Commons on
Agricultural Research for Development”• “agricultural knowledge is freely accessible
and contributes to reducing hunger and poverty”
• “open knowledge makes it easier to provide better solutions”http://www.ciard.net/about/manifesto
Open Knowledge Convening (February 2013)
• Open Knowledge for Agricultural Development Convening, hosted by MSU in February 2013
launch of RDA (March 2013) • joint USA, EU, Australia Research Data Alliance – “researchers and innovators openly sharing data
across technologies, disciplines, and countries to address the grand challenges of society”
• Interest Group on Agricultural Data Interoperability– Wheat Data Interoperability Working Group– Germplasm Data Interoperability Working Group– …morehttps://rd-alliance.org
G8 conference (April 2013)“How Open Data can be harnessed to help meet the challenge of sustainably feeding nine billion people by 2050”
GODAN initiative• “support global efforts to make agricultural and
nutritionally relevant data available, accessible, and usable for unrestricted use worldwide”
• “advocate for the release and re-usability of data in support of Innovation and Economic Growth, Improved Service Delivery and Effective Governance, and Improved Environmental and Social Outcomes”http://godan.info/statement.html
building a European data e-infrastructure for agricultural research
• Agricultural research can be broadly defined as any research activity aimed at improving productivity and quality of crops– by genetic improvement, better plant protection , irrigation,
storage methods, farm mechanization , efficient marketing, better management of resources, human development
[Loebenstein & Thottappilly, 2007]
agricultural research
• Primary data:– Structured, e.g. datasets as tables– Digitized : images, videos, etc.
• Secondary data (elaborations, e.g. a dendogram)• Provenance information, incl. authors, their organizations and
projects• Methods and procedures followed• Reports, including papers• Secondary documents, e.g. training resources• Metadata about the above• Social data, tags, ratings, etc.
agricultural research information
there is a lot of data
…but where do I start searching?
simple goal of agINFRA• demonstrate how we can make information on
European agricultural research – more discoverable– better linked– interoperable & exchangeable
• focus on selected types of information (primarily bibliographic information, educational resources; also germplasm data, soil maps, …)
• collaboration cases with international partners (such as CAAS)
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools VEST registry
LOD Vocabularies
AGROVOCLocal KOSsControlled lists- Document types- Data types- File formats (IANA +)- Protocols- Audiences- Licenses etc.
agINFRA RDFvocabularies
agINFRA LOD KOSs
BibliographicEducationalGermplasmSoilDatasetsAPIsetc.
agINFRA data sources
agINFRA collections
agINFRA APIs
Including:
Information services
Grid
jobs
Grid
wor
kflow
ssag
KEA,
ag@
RDF,
agH
arve
st…
Publ
ic R
EST
APIs
agH
arve
st,
agTr
ansf
orm
, ag
Tagg
erCloud / SaaS tools
Omeka, AgriDrupal, AgriOceanDSpace
VocBench
Shared URIs
agIFNRA e-infrastructure
Call APIs
Data providers
Information systems
providers
Researchers
Taxonomists
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Policy makers
Developers
actors over the infrastructure
new agINFRA RING
moving forward
HARVESTER
OAI-PMH Service Provider #1
Schema #1
OAI-PMH Service Provider #n
Schema #n
INDEXER
AggregatedXML Repository
Web Portals
Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)
VOA3R (UAH)...
AGRIS AP Schema
IEEE LOM Schema
DC Schema
...
RDF Triple Store
Common Schema
SPARQL endpoint(Data Source #1)
SPARQL endpoint(Data Source #n)
INDEXER
Web Portals
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
HARVESTER
OAI-PMH Service Provider #1
Schema #1
OAI-PMH Service Provider #n
Schema #n
INDEXER
AggregatedXML Repository
Web Portals
Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)
VOA3R (UAH)...
AGRIS AP Schema
IEEE LOM Schema
DC Schema
...
RDF Triple Store
Common Schema
SPARQL endpoint(Data Source #1)
SPARQL endpoint(Data Source #n)
INDEXER
Web Portals
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
problem when scaling up
• enable the seamless federation of:– large, live, constantly updated datasets and
streams–heterogeneous data
• involve data publishers that– cannot or will not join a tight, centrally
controlled distributed database– cannot or will not directly and immediately
make the transition to new vocabularies
the SemaGrow solution• a SPARQL endpoint that federates several
heterogeneous data sources– client poses a query in their preferred schema• no need to know where to ask for what• no need to know the source’s schema
– by means of collecting and indexing meta-information about the data stored in each data source
• in this manner the data sources do not need to be cloned and re-hashed, and the way data is distributed among them does not need to be centrally controlled
Query
Federated endpoint Wrapper
SemaGrow SPARQL endpoint
Resource Discovery
Query results
query fragment,Source
(#1)
Instance StatisticsData Summaries
SPARQL endpoint
POWDER Inference Layer
P-Store
InstanceStatistics
query fragment,target Source
transformed query
Query Decomposition
querypatterns
Query Results Merger
query fragment,Source
(#n)
queryresults
Client
Reactivityparameters
Query Decomposer
Data Source(s) Selector
Ctrl
Candidate Source(s) List· Instance Statistics· Load Info· Semantic Proximity
Query Transformation Service
SchemaMappings
SPARQL endpoint(Data Source #n)
SPARQLquery
Ctrl
Ctrl
Load Info
Instance Statistics
Data Summaries
Set of query
patternsQuery Pattern Discovery
Service
equivalentpatterns
querypattern
SemanticProximity
Resource Selector
query results schema
transformed schema
queryrequest #1
queryrequest #n
queryresults
SPARQL endpoint(Data Source #1)
SPARQLquery
Query Manager
what Semantic Web can bring into the picture• One Data Access Point for the entire Data Cloud–Enabling Service-Data level agreements with Data providers
• Application-level Vocabularies / Thesauri / Ontologies–Enabling different application facets for different communities of users over the same data pool
•Going beyond existing Distributed Triple Store Implementations–Link Heterogeneous but Semantically Connected Data–Index Extremely Large Information Volumes (Peta Sizes)–Improve Information Retrieval response
• Data (+Metadata) physically stored in Data Provider–No need for harvesting
• Vocabularies / Thesauri / Ontologies of Data Provider choice–No need for aligning
according to common schemas
research challenges• develop novel methods for querying
distributed triple stores – that can overcome the problems stemming from
heterogeneity and the undetermined distribution of data over nodes
• develop scalable and robust semantic indexing algorithms – that can serve detailed and accurate data source
annotations (metadata) about extremely large datasets
what is next
similar/relevant efforts
• PubAg: forthcoming service by National Agricultural Library (NAL) for discovering USDA publications – and beyond
• LGU community of ag knowledge: forthcoming service federating institutional repositories of Land Grant Universities in the US
• CGIAR open: (to be) federating & providing access to publications and data from all CG center repositories
• …and maybe more to come
collaboration between CAAS AII & Agro-Know
a route for sharing knowledge
what happens when we are hosting?
we make a formal intro & present plans
then we eat
we do some work
we eat again
we drink a bit
we drink a bit more
and of course we eat
what will happen when you will host us?
I have gotten an idea…
who is next?
thank you!
[email protected]://blog.agroknow.gr