Linked Open data: CNR

  • Published on

  • View

  • Download

Embed Size (px)




<ul><li> 1. and the Semantic ScoutCNR Semantic Technology LabISTC - SIAldo Gangemi, Alberto Salvati, Enrico Daga, Gianluca TroianiThanks to Claudio Baldassarre (UN-FAO) and Alfio Gliozzo (IBM-Watson) http://stlab.istc.cnr.ithttp://data.cnr.it</li></ul> <p> 2. data.cnr.it2 3. Enhanced SPARQL endpoint3 4. Ontologies 4 5. Sample class from ontology 5 6. The Semantic Scout A framework for search, presentation, and analysis of entities andtheir associated knowledge Employs SW, LOD, NLP, IR Scientific work goes back to 2006, first presented at ISWC2007 An evolving prototype for requirements of the EU IP IKS: semanticsearch, hybrid IR/SW identity management, automatic documentclassification (against DBpedia) 2009 requirements from the technology transfer office of CNR for theNetwOrK initiative6 7. The CNR CNR is the largest research institution in Italy about 8000 permanent researchers (+14000) 7 departments focused on the main scientific research areas 108 institutes spread all over Italy Subdivided into research units, labs, etc.7 8. The CNR data sourcesOrganizational data File SystemDBDBAdministrationDBFrameworks,Departments documentationProgrammes,WorkpackagesInstitutes,Central admin, Publications Activity-related dataOnly partly as open data! DB DBCurriculaPermanentDB employeesDB Financial dataAccounting, OtherContracts, research Invoicingemployees, Personnel-related dataExternallyfunded projects 8 9. The CNR tasks Strategic objective: matching the researchdemand to the research supply Requirements Semantic interoperability between heterogeneous data sources Expert finding based on competence Monitoring funding and evolution of different research areas and units Browsing and reporting capabilities9 10. Architecture 10 11. 11 12. Methods for data conversion, extraction, inference,integration, linking, publishing, and searching12 13. Figures }28 modules 120 classes CNR Ontology 300 relations }1200 axioms&gt;200K entities3M facts (about 2M inferred or extracted)CNR Data240 datasets 13 14. Sources and lifting Situation usually not as clean as using aunique CMS for most organizational tasks DB (e.g. SQL Server) + a lot of textualrecords + HTML Web Site + textual corpus +linked open data DB + interaction schemata (XML templatesand HTML scraping, needed because ofschemata degradation and user perspectiveevolution)14 15. Ontology design Starting from XML templates as module/pattern drafts Reengineering XML and scraped templates Reengineering DB schemata (system engineerinvolved) Obtained modular, pattern-based, task-based ontology Textual DB records with identity: precondition forhybridizing IR and SW (see later) Alignments to FOAF, SIOC, SKOS, WordNet ontologies Used patterns: situation, place, transitive reduction15 16. The CNRontology 16 17. Data design Triplifiers based on SQL rules (automaticscripting on JDBC drivers not enough becauseof legacy degradation of physical schemata) Cf. also: Semion reengineering tool Inferences: OWL (Pellet, HermiT), SPARQLCONSTRUCT Extraction tool: Semiosearch, categorizer overWikipedia categories Next: deep parsing approach (facts, relations, entities) 17 18. Publishing and hybridizing Publishing OWL-RDF datasets linked data approach (persistent URIs, triple stores for RDF dataset management,linking to common vocabularies: FOAF, DBpedia, Geonames, Bibo, ...) OWL ontologies for dataset generation, querying, inference (new enricheddatasets) Subgraph extraction through SNA Virtual semantic corpus IRW to distinguish information and non-information resources SPARQL rules to generate virtual texts associated with entities Indexing Lucene+LSA indexing of semantic corpus Semantic Lucene extension to produce tight coupling of virtual texts withentities Multilinguality18 19. Consuming SPARQL endpoint, with interface enhancement Keyword-based search Semantic browsing with SPARQL-based AJAX DHTML, RDFrelation browser, or XML-based relation browser Category-based search Keyword-based result focusing19 20. 20 21. 21 22. 23. Expert finding: Task-based testing It is based on the ability to materialize ondemand a contextual network of relevantinformation. It is performed with a combination of tools in thetoolkit to: Identify the main topics of research Recursively search the CNR data cloud23 24. Identifying the main topics of research:project description Reputation is a social knowledge, on which a number of social decisions areaccomplished. Regulating society from the morning of mankind becomes morecrucial with the pace of development of ICT technologies, dramaticallyenlarging the range of interaction and generating new types of aggregation.Despite its critical role, reputation generation, transmission and use areunclear. The project aims to an interdisciplinary theory of reputation and tomodeling the interplay between direct evaluations and meta-evaluations inthree types of decisions, epistemic (whether to form a given evaluation),strategic (whether and how interact with target), and memetic (whether andwhich evaluation to transmit). Project About: Social Knowledge for e-Governance. Topics can be manually annotated, or automatically induced,e.g.: ethics, sociology, collaboration, social network,reputation 24 25. Identifying the main topics ofresearch: text categorization Query: ethics, sociology, collaboration, social network, reputation 25 26. Search the CNR data cloud: identify an entry point Commessa (programme): Il Circuito dellIntegrazione: Mente, Relazionie Reti Sociali. Simulazione Sociale e Strumenti di Governance26 27. Search the CNR data cloud: identify key people Ing. Jordi Sabater: Cognitive Science; Dott. Mario Paolucci: Sociology, Psichology; Gennaro di Tosto: Artificial Intelligence; Walter Quattrociocchi: Interdisciplinary Fields; Giuseppe Castaldi: Ethics;27 Aldo Gangemi: Semantic Web, Knowledge representation. 28. Expert Finding: Results The description of eRep project was adopted as agold standard to evaluate the results when testing theSemantic Scout. 6 out of 10 CNR researchers, were correctly retrievedand a project member affiliated with anotherinstitution. Project Coordinator: Dott. Mario Paolucci External Member: Jordi Sabater Mir28 29. Functional evaluation of SemanticScout (example) Expert finding accuracy All the 6 retrieved people scored among the first 10 in the result from the search engine. Benefit of integrated data cloud The user judged an activity to be relevant to his goal and used it as entry point to the CNR newtork of resources.29 30. Functional evaluation of Semantic Scout Accessibility and Interaction Multiple users interfaces guarantee the users an adaptive levelof interaction to each specific type of required information Completeness of retrieval 4 people have not been included in our result set. Antonietta Di Salvatore: scored below the first 10 people in thelist;(+1) Giulia Andrighetto was not listed among the people relevant tothe query, but belongs to the social network of Dr. RosariaConte.(+1) Marco Capenni and Stefano Picascia: have a technician profile,hence they are neither reported among the people relevant tothe search query, nor belong to the network of any of the otherresearchers. 30 31. Ongoing work More data linking (e.g. DBLP,Georeferencing) Synchronization with data sources More interaction paradigms Privacy issues interlaced with hierarchicaland idiosyncratic practices31 32. Conclusions Hybridizing several semantic and retrievaltechnologies provides added value to aresearch organization Scalability works for CNR figures Interaction is a core selling point Try it at @data_cnr_it, @semanticscout,@aldogangemi 32 </p>