Upload
mbruemmer
View
257
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Alessio Bosca (CELI) presented how CELI is exploiting linked data. Their focus is on speech applications, semantic search, text analytics, opinion mining and social media intelligence. The core technology used encompasses language processes such as language identification, morphological analyses and semantic analysis. CELI exploits the linked data in the LOD cloud a) as a user by making use of for NER, and b) as a provider for internal use and for crafting RDF artifacts. Two projects were addressed: a book project for the digital humanities and the Homer project for multilingual interfaces to assessing data from different public administration. From the work with linked open data the the LOD cloud community is advised to put more emphasis on truly linking of the datasets. With regard to the public sectors it is suggested that more data should be published as linked open data and that international standards should be used. The issue of publishing companies’ linked data under an open license was also addressed. The speaker made the point that besides the resistance to sharing, because of valid competitive concerns, company data is generally over-fitted to their solutions and clients. In other words, companies need to be able to manage ‘micro-domains’ which are regarded as less useful in general. Compromisingly it was suggested by the audience that companies should not answer the question why they do not publish their linked data, but what they could publish.
Citation preview
Linked Data for content analytics in Celi Semantics 2014 - Leipzig Alessio Bosca
Agenda ü Presentation of Celi ü Technologies (and what we do with
them) ü Focus on LOD for content analytics
in Celi ü … what we’d like to do
2
1999 CELI srl was born
1999 2005 2010
2002 Speech Technology
2006 BlogMeter
2013 Korean Market
2011 Cross Library
2010 Milan, Rome,
Trento
3
4 Seats
Torino Milano Trento Roma
6 Markets
Italy Belgium France Spain Corea Poland
50 Employees + Collaborators
>100 Active clients
4 Business branches
15 Years of experience
NLP components Speech technology Social Media Intelligence Digital Humanities
4
>50 Published papers
15 Research projects
Relationships with the scientific community
6 Agreements with research centers
Scuola Normale Superiore Università di Torino Università di Pisa Università di Trento Fondazione Bruno Kessler Politecnico di Milano
5
6
Core technology
opinion mining,
mood and sentiment
analysis
language identification
normalization
tokenization
NSW processing morphological
analysis
disambiguation
chunking and phrasing
phonetic transcription
with word stress
semantic clustering
automatic classification
named entities
Techs
Guava
Kestrel
Virtuoso OpenSource
7
8
Clients
Speech Technology Semantic Solutions Social Media Monitoring
Linked (and/or Open) Data
Linked Data
Open Data
?
LOD
9
Private Sector: how Celi exploits L(O)D
• as user LODs as linguistic resources for NER, content enrichment, machine linking, discovery search… • as provider for the PA publishing, data integration • internal use (e.g. assets management) • crafting of RDF artifacts for custom projects and applications
10
LOD for NER
• GENDER GUESSER • LOCATION GUESSER • ENTITY LINKER • ETC .
11
INDEXER
DUMP
CELI TRIPLE STORES
INDEXES
Linguistic Analysis
SPARQL QUERIES
SEARCHER
CUSTOM RDF
WEBAPPS
Faceted Semantic Search
Browse through documents and contents
Relations between Facets
12
LOD for CLIR
THE AGROVOC THESAURUS HAS BEEN USED IN THE ORGANIC.LINGUA PROJECT FOR ONTOLOGY-BASED CLIR
13
Sem-web techs for internal models Information in the CRUNCHED BOOK is represented using combinations of RDF and GRAPH DBS
14
Public Sector: clear process …
acquire data
set open license
open formats publish
15
Celi for the public sector (CSI Piemonte): the Homer project
(Public sector contd.) … but …
LACK OF MONEY
LACK OF WILLINGNESS
USE OF “STANDARDS”
… hard problems OPAQUE DATASETS
POOR RDF/SPARQL SUPPORT
16
Why companies’ RDF is not published
HENCE à OVERFITTING:
Provocation It would not be interesting nor usable
WAY OUTS: having more standard models for particular micro-domains could permit their direct (re)use by the private company (and hence the publication of enhanced versions)
• It reflects customers’ needs • It reflects internal data models
17
Receipts
Public Sector: use “true” LOD technologies (RDF dumps and SPARQL endpoints) Private companies: use standard data models, internally and for their artifacts OpenData Community: please stress the linked in LOD!
The success of LOD is bound to the use of Linked Data (as a technology) The use of LD in the Private Sector will positively feedback on the diffusion of the necessary expertise and sensibility in the Public Sector too
18
Thank You!