Upload
openmintedeu
View
278
Download
1
Embed Size (px)
Citation preview
TEXT MINING: THE NEXT
DATA FRONTIER An Infrastructural Approach
@openminted_eu
An EU infrastructure Project
Text mining – it seems so easy:
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
NLP Analysis Entity
Recognition
Data Mining
Knowledge
Discovery
Information Extraction
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Information Retrieval
OPENMINTED = The Open Mining Infrastructure for Text and Data
But it actually poses many challenges…
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
? ?
? ?
?
?
?
? ? ? ? ?
?
? ?
?
?
Current TDM challenges for researchers
1. Content challenges - Barriers and obstacles due to non-availability, technical
restrictions, copyright law or licensing issues
- No uniform way to search for, retrieve and access
content for TDM
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Current TDM challenges for researchers
2. Services challenges How to identify the most fitting TDM service? Do I have
permission to use it?
How to combine with other TDM services I have access
to? How to use them on my content?
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Current TDM challenges for researchers
3. Processing challenges
Where to deploy? Are my machines powerful enough?
How can I get access to powerful machines?
Where to store intermediate and final results?
How to ensure persistence of storage?
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD offers a solution for all TDM challenges:
It establishes an open and sustainable TDM infrastructure where researchers can collaboratively:
create, discover, share and re-use
knowledge from a wide range of text based scientific-related sources.
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD brings together:
@openminted_eu
8
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
TDM
COMMUNITIES
VALUE ADDED
APPS
Via standardised programmatic interfaces and access rules
Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text
Operate on public e-Infrastructures via standarized APIs
Different scientific communities have different challenges
Community-driven applications to illustrate the value of the infastructure. Engage with industry.
OPENMINTED = The Open Mining Infrastructure for Text and Data
The project Starts: June 2015
Duration: 3 years
16 Partners:
- 6 mining research groups
- 3 content providers
- 1 data center
- 1 library association
- 2 legal experts
- 6 community related partners
- 2 SMEs
ICT2015 conference - Lisbon, 20-22 Oct
Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling
PARTNERS
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Infrastructural approach
• OpenMinted does not build new services, but adopts and adapts existing services for new communities
• Focus on interoperability across text mining services and content providers
• Open & collaborative
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
The OpenMinTeD landscape
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
Data centre Data centre Data centre Data centre
in public cloud
Publisher text corpus
OpenAIRE/CORE text corpus
PMC text corpus
Other text corpora
Other text corpora
Other text corpora
Other types of text corpora
Layer 3:
Interoperability
to shared storage and
computing resources
Language resources Language resources
Language resources Language resources
Layer 2:
Interoperability of
language resources
& corpora
Layer 1:
Interoperability
of text mining services
(platforms or
components)
Language resources and corpora registry service
Platform services
Users: researchers, curators, text-miners and new services developers
Registry Workflow Management Auth2 & Policy management Annotator Accounting
Mining Platforms Mining Platforms Mining Platforms
Proprietary architectures
Mining Platforms
OPENMINTED = The Open Mining Infrastructure for Text and Data
Design
Interoperability framework
Bringing together mining tools, resources and content:
1. Content metadata & transfer standards
To document scientific literature, language resources, taxonomies and provenance and to transfer protocols for full text retrieval
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Interoperability framework
Bringing together mining tools, resources and content:
2. Service metadata & pipelining
To document and classify text mining services, how they receive input, in what form they output their resutls, how they combine for workflows, what granularity to consider.
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Interoperability framework
Bringing together mining tools, resources and content:
3. IPR and licensing
To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and non-commercial mining research
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
Working groups
1. Resource metadata: content, services, language resources
2. Text, lexica, terminologies and ontologies representation and access
3. IPR and licensing
4. Text annotation and text-mining services workflows
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD’s users
1. End users who will consume TM services
- Researchers, data base curators, …
- Novice: use services to advance their science
- Advanced: include TM services into more complex research workflows (SMEs).
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
OpenMinTeD’s users
2. Content and service providers that will provide their content and/or TM services for consumption
- Publishers, libraries, scientific data base centres, etc
- TM research communities
- SMEs
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
ICT2015 conference - Lisbon, 20-22 Oct
@openminted_eu
RESEARCH ANALYTICS
SOCIAL SCIENCES
AGRICULTURE
LIFE SCIENCES
Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results.
OPENMINTED = The Open Mining Infrastructure for Text and Data
What can OpenMinTeD do for you?
Are you a content provider? (datacentre, library, publisher, etc)
OpenMinTeD helps you make your content available for mining
Register your collections in the OpenMinTeD registry, make them discoverable!
Go to www.openminted.eu
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
What can OpenMinTeD do for you?
Are you a TDM service?
OpenMinTeD helps you share and collaborate with other TDM services
Register your TDM service in the OpenMinTeD registry, make it easily discoverable!
Go to www.openminted.eu
@openminted_eu
OPENMINTED = The Open Mining Infrastructure for Text and Data
THANK YOU!
Go to: www.openminted.eu
to get involved!
@openminted_eu