8
D6.2 – Interim Project Report 10 / 81 2 Publishable Summary 2.1 Context and Objectives It is broadly acknowledged that a unified solution for transforming and renovating existing data sources, regardless of the original data format, would greatly enhance the ability of public organisations to provide usable, machine‐processable linked data, while offering SMEs the opportunity to combine and link existing public sector information with privately‐owned data in the most resourceful and cost‐effective manner. Towards this direction, however, there is also a strong need for supporting consumers unfamiliar with the linked data paradigm through interfaces that hide the underlying complexity and allow the re‐use of existing software apps and database management systems. LinDA aims to assist SMEs and data providers in renovating public sector information, analysing and interlinking with enterprise data by developing: x A cross‐platform, extensible software framework that provides a simplified workflow for renovating and converting a set of common data containers, structures and formats into arbitrary RDF graphs. The framework can be used to develop custom solutions for SMEs and public sector organisations or be integrated into existing open data applications, in order to support the automated conversion of data into linked data. The platform will allow the export of arbitrary RDF graphs as tabular data, allowing SMEs to store the final results of data linking into relational databases or process further with spreadsheet and data analysis software. x A repository for accessing and sharing Linked‐Data vocabularies and metadata amongst SMEs that can be linked to the LOD (Linked Open Data) cloud. The system will allow SMEs to reference and enrich metadata shared by well‐established vocabulary catalogues (LOV, prefix.cc, LODStats), thus contributing to easy and efficient mapping of existing data structures to the RDF format as well as to increasing the semantic interoperability of the SMEs datasets. x An ecosystem of Linked Data publication and consumption apps, which can be bound together in a dynamic manner, leading to new, unpredicted insights. While traditional RDF representations and SPARQL query access is provided to support advanced users, a Linked Data API will be deployed as a proxy to provide access in other widely established formats, such as CSV, JSON and XML, based on the internal RDF data (RDF2Any). This will allow both consumers familiar with the linked data paradigm and those unfamiliar with it, to leverage the provided knowledge bases. In particular, this solution enables the re‐use of advanced, JavaScript‐based data visualisation components for data presentation, as well as Java‐based analytics / data mining components. We aim to realise an ecosystem of data extractions and visualisations, which can be bound together in a dynamic and unforeseen way. This will enable users to explore datasets even if the publisher of the data does not provide any exploration or visualisation means. Most existing work related to visualizing RDF is focused on concrete domains and concrete datatypes, so the envisioned visualisation ecosystem is one of the main innovations of LinDA. x A library of visualisation tools for different data modalities (e.g. spatial, temporal and statistic) based on HTML, CSS and JavaScript that can consume output from the Linked Data API and generic web APIs. Such visualisations will include map views of spatial information (e.g. for WMS/WFS endpoints, geocoded data) as well as common graphs and charts for statistical information (e.g. statistical data in the DataCube RDF vocabulary as well as CSV time series data). x A library of end‐user Analytics and data mining apps library, based on existing Java‐based components (e.g. Weka) extended to point to RDF as source format, specifically targeted to leverage the potential of Linked Data sources, especially in terms of pattern and link analysis. x End‐to‐end business scenarios and models for Linked‐Data utilisation on analytics by SMEs.

2 Publishable Summary - CORDIS

Embed Size (px)

Citation preview

D6.2–InterimProjectReport

10 / 81  

2 PublishableSummary

2.1 ContextandObjectivesIt is broadly acknowledged that a unified solution for transforming and renovating existing datasources, regardless of the original data format, would greatly enhance the ability of publicorganisationstoprovideusable,machine‐processablelinkeddata,whileofferingSMEstheopportunityto combine and link existing public sector information with privately‐owned data in the mostresourcefulandcost‐effectivemanner.Towardsthisdirection,however,thereisalsoastrongneedforsupporting consumers unfamiliar with the linked data paradigm through interfaces that hide theunderlying complexity and allow the re‐use of existing software apps and database managementsystems.LinDAaimstoassistSMEsanddataprovidersinrenovatingpublicsectorinformation,analysingandinterlinkingwithenterprisedatabydeveloping:

x A cross‐platform, extensible software framework that provides a simplified workflow forrenovating and converting a set of common data containers, structures and formats intoarbitraryRDFgraphs.TheframeworkcanbeusedtodevelopcustomsolutionsforSMEsandpublic sectororganisationsorbe integrated intoexistingopendataapplications, inorder tosupporttheautomatedconversionofdataintolinkeddata.TheplatformwillallowtheexportofarbitraryRDFgraphsastabulardata,allowingSMEstostorethefinalresultsofdatalinkingintorelationaldatabasesorprocessfurtherwithspreadsheetanddataanalysissoftware.

x ArepositoryforaccessingandsharingLinked‐DatavocabulariesandmetadataamongstSMEsthat can be linked to the LOD (Linked Open Data) cloud. The system will allow SMEs toreference and enrich metadata shared by well‐established vocabulary catalogues (LOV,prefix.cc,LODStats),thuscontributingtoeasyandefficientmappingofexistingdatastructurestotheRDFformataswellastoincreasingthesemanticinteroperabilityoftheSMEsdatasets.

x AnecosystemofLinkedDatapublicationandconsumptionapps,whichcanbeboundtogetherin a dynamic manner, leading to new, unpredicted insights. While traditional RDFrepresentations and SPARQL query access is provided to support advanced users, a LinkedDataAPIwill bedeployed as aproxy toprovide access in otherwidely established formats,suchasCSV, JSONandXML,basedonthe internalRDFdata(RDF2Any).Thiswillallowbothconsumersfamiliarwiththelinkeddataparadigmandthoseunfamiliarwithit,toleveragetheprovided knowledge bases. In particular, this solution enables the re‐use of advanced,JavaScript‐based data visualisation components for data presentation, aswell as Java‐basedanalytics/dataminingcomponents.Weaimtorealiseanecosystemofdataextractionsandvisualisations,whichcanbeboundtogetherinadynamicandunforeseenway.Thiswillenableuserstoexploredatasetsevenifthepublisherofthedatadoesnotprovideanyexplorationorvisualisation means. Most existing work related to visualizing RDF is focused on concretedomainsandconcretedatatypes,sotheenvisionedvisualisationecosystemisoneofthemaininnovationsofLinDA.

x Alibraryofvisualisationtoolsfordifferentdatamodalities(e.g.spatial,temporalandstatistic)basedonHTML,CSS and JavaScript that can consumeoutput from theLinkedDataAPI andgenericwebAPIs. Such visualisationswill includemap viewsof spatial information (e.g. forWMS/WFS endpoints, geocoded data) as well as common graphs and charts for statisticalinformation (e.g. statistical data in theDataCubeRDF vocabulary aswell as CSV time seriesdata).

x A library of end‐user Analytics and data mining apps library, based on existing Java‐basedcomponents (e.g.Weka) extended to point to RDF as source format, specifically targeted toleveragethepotentialofLinkedDatasources,especiallyintermsofpatternandlinkanalysis.

x End‐to‐endbusinessscenariosandmodelsforLinked‐DatautilisationonanalyticsbySMEs.

D6.2–InterimProjectReport

11 / 81  

Figure2‐1:TheLinDAConcept

TheconsortiumpartnerswillusetheirportfoliotobringalongSMEsthatwilltesttheLinDAsuitetobedeployedandareinterestedinadoptingtheLinDAsolutions.ThethreepilotsdevelopedduringtheLinDAprojectarethefollowing:

x LinkedDataAnalyticsinBusinessIntelligence(CPPilot)‐Themainobjectiveofthispilotistodemonstrate innovative andgainful business intelligence‐based consulting to customersandstrategyplanningthroughtheLinDAtransformationandanalytictools.

x LinkedDataAnalyticsintheEnvironmentalSector(HYPERBOREAPilot)‐Theobjectiveofthispilot istoutilisetheLinDAsolutionsfortheefficientmanagementandanalysisoftheItalianRegionsEnvironmentaldata.DataavailableintheexistingdatagovinitiativesandrepositorieswillbetransformedintheLinkedDataformat.

x LinkedDataAnalyticsintheMediaIndustry(TTNEWS24Pilot)‐Thepurposeofthispilotistodemonstrate the potential of the LinDA renovation and consumption tools in the Mediaindustry,aswellassetupaninitiallibraryofvisualisationandexplorationapplicationscreatedforservingTTNEWS24servicesandtobesharedthroughtheLinDAecosystem.

The overall realisation of the LinDA project will be done through the realisation of the followingobjectives:

x Objective1:Enhancetheabilityofdataproviders,especiallypublicorganisationstoprovidere‐usable,machine‐processablelinkeddata.

x Objective2: Provide out‐of‐the‐box software components and analytic tools for SMEs thatoffer the opportunity to combine and link existing public sector informationwith privately‐owneddatainthemostresourcefulandcost‐effectivemanner.

x Objective3:DeliveranecosystemofLinkedDataPublicationandConsumptionapplicationsthatcanbeboundtogetherindynamicandunforeseenways.

x Objective4:Demonstrate the feasibilityand impactof theLinDAapproach in theEuropeanSMEsSector,overasetofpilotapplications.

x Objective 5: Achieve international recognition and spread excellence for the researchundertaken during the LinDA implementation towards enterprises, scientific communities,dataprovidersandend‐users.Diffuseandcommunicatereadily‐exploitableprojectresults,ofapro‐normativenature.Contributetostandardisationandeducation.

D6.2–InterimProjectReport

12 / 81  

2.2 ConsortiumTheconsortiumofLinDAconsistsof7partnerscomingfrom4EuropeanCountries.Thepartnersoftheconsortiumareshownbelow.

NATIONALTECHNICALUNIVERSITYOFATHENSDECISIONSUPPORTSYSTEMSLABORATORY

(NTUA‐DSSLab)Co‐ordinator

Greece

FRAUNHOFER‐GESELLSCHAFTZURFOERDERUNGDERANGEWANDTENFORSCHUNGE.V.(FOKUS) Germany

GIOUMPITEKMELETISCHEDIASMOSYLOPOIISIKAIPOLISIERGONPLIROFORIKISETAIRIAPERIORISMENISEFTHYNIS(UBITECH)

Greece

UNIVERSITYOFBONN(UBONN) Germany

PIKSELSPA(PIKS) Italy

CRITICALPUBLICSLTD(CP) UnitedKingdom

HYPERBOREAS.R.L.(HYPERBOREA) Italy

TTNEWS24S.R.L.(TTNEWS24) Italy

D6.2–InterimProjectReport

13 / 81  

2.3 WorkPerformedandResultsAchievedDuringthefirstyearoftheproject,the1stversionoftheLinDAtoolshasbeendevelopedanddeployedinacommonenvironment(LinDAWorkbench)accordingtheuserrequirementsandusagescenariosthat were defined and communicated with the pilot users. Moreover, a comprehensive pilot’soperationandevaluationplanhasbeendeveloped,thatwillguidethepilot’soperationduringthe2ndyearoftheprojectandprovidefeedbackforfurtherenhancementandfine‐tuningoftheLinDAtools.Ingeneral,theactivitiesoftheLinDAapproachhaveproceededaccordingtoplanandasperDoWandhaveproducedsignificantresultswhicharesummarizedinthecomingcategory:

UserRequirementsandBusinessScenariosDuringthe1styearoftheLinDAproject,emphasiswasgiventothedefinitionofuserrequirementsandBusinessScenariosthatsetthegroundworkforthecreationoftheLinDAtools.Morespecificallythekeyachievementsforthe1styearare:

x Definitionanddetaileddescriptionof10BusinessScenariosfortheutilizationofLinkedDatain the domain of Business Intelligence, Environmental Sector, Media Industry, Tourism,AnalyticsandPublicDataproviders.TheBusinessScenariosgenerated35userstoriesinviewofformingthebaseelementsfortheLinDAproject

x Astateoftheartanalysisonexistingopensource/commercialmethodsandcomponentsthatcanbeintegratedintotheLinDATransformationandAnalyticssuites.ForeachoftheLinkedDatacomponentsarespectivetestbedenvironmenthasbeensetupinordertoanalyseandreporttheirbenefits,capabilities,shortcomingsandlimitations(e.g.easeofintegration,licensingissues,complexity)

x AnonlineLandscapeofLinkedDataTools4sectionintheLinDAprojectwebsitethatprovidesaconvenient overview of the state‐of‐the‐art analysis as well as a much more efficientmaintenanceandupdateoftheanalysedlinkeddatatools.

x An initial list of 75 technical functional and non‐functional requirements driven by thebusinessscenariosanduserstories.TherequirementshavebeenrepresentedandmanagedasGithub issues (https://github.com/LinDA‐tools/LindaWorkbench/issues) for the efficientmanagementoftheLinDAToolsdevelopment.

x AcompletesetofAcceptanceCriteriaandAcceptanceTestingProcedurehavebeendefined.

LinDADevelopmentDuringthe1styearoftheLinDAimplementation,thefollowingkeyachievementshavebeenreached:

x The1stversionoftheLinDATransformationEnginehasbeendevelopedanddeployedtotheLinDAWorkbench. The engine can be used to support the mapping and transformation oftraditional data structures and formats into linked data. To this end, the TransformationEnginehasfocusedontwowidelyuseddatasourcesinYear1oftheproject;a)relationaldatafromdatabasessuchasPostgreSQL,andb)tabulardatafrom.CSVfilesandExcelsheets.

x The1stversionoftheLinDAVocabularyandMetadataRepositoryhasbeencreated.TheLinDAVocabulary and Metadata Repository leverages and syncs with existing online linked datavocabulary services (LOV, prefix.cc, LODstats) in order to assist users and SMEs during thedatatransformationprocesstoselectappropriatevocabularies(intermsofpopularity,rating

4LandscapeofLinkedDataTools‐http://linda‐project.eu/linked‐data‐tools/

D6.2–InterimProjectReport

14 / 81  

and relevance to the specific domain / industry) for the semantic representation of theircurrentdatastructurestotheRDFformat.

x A “Suggest API” has been developed that performs automatic and intelligent mappingsuggestionsbasedontheLinDAvocabularytobeusedbyexternalapps, includingtheLinDATransformationEngine.

x The LinDA Workbench, an integrated environment for hosting the LinDA tools has beencreated. The LinDAWorkbench facilitates the workflow between the tools and handles themaincommunicationwithaselectedtriplestore.

x The 1st version of the RDF2Any API for data transformation from Linked Data format to anumberofformatsincludingRDB,XML,CSVandPDF.

x TheQuery Builder tool that enables non‐experts to formulate a SPARQL query and exploreopendatasets.

x ThedevelopmentoftheConQuerOntology,whichdefinestransformationsexecutedusingthePublicationandConsumptionFramework.

x TheQueryDesignertoolthatprovidesaninnovativeandeasywaytousegraphicalmethodstointeractivelybuildasimpleorcomplexqueryovermultipledatasourcesandviewtheresultsin a SPARQL editor. The Query Designer follows the paradigm and quality of SQL Querydesigners of popular relational databasemanagement systems (Oracle, SQL Server, etc) butseamlessly adjusted to harness the potential of Linked data. With simple drag n dropfunctionalityusers canperformcomplexSPARQLqueriesand filtering including interlinkingwithexternalSPARQLendpointsthroughtheuseofSPARQL1.1FederatedQuery.

x TheLinDASPARQLeditorthatprovidesfunctionalityofatext‐basedquerywizardoverlinkeddata.MorespecificallytheLinDASPARQLeditorprovidescodestyleformattingandintelligentcodecompletion(byclickingcltr‐space)forsuggestingSPARQLsyntax,namespaces,availableendpoints,classesandproperties.

x The 1st Version of the LinDA Visualization and Exploration system, allows different visualrepresentations of data sources in the Linked Data format and provides automaticrecommendations for determining the compatibility between the selected dataset and theavailablevisualisationsandsuggestingalistofvisualisationsaccordingly.

x The1stVersionoftheLinDAAnalyticsEnginethatallowfortheconstructionandprocessingofanalyticalgorithmsandproceduresregardingdatacomingoutofLinkedDatasets

LinDAPilotsFor the first year, a comprehensive pilot operation and evaluation plan has been developed. Morespecificallythefollowingtaskshavebeenperformed:

x Threepilotshavebeensetupwiththedirectcontributionofthepilotusers‐SMEs,namelytheBusiness Intelligence Analytics (BIA) pilot, the Environmental Analytics pilot and theMediaAnalyticspilot.

x A detailed description of the LinDA pilots and details regarding the operation and theevaluationphasehavebeenexaminedanddocumented

x An in‐depth analysis has been performed for a) the redesign of business processes that isrequiredbasedontheuseoftheLinDAtools,b)thedatasetsthathavetobecreatedorusedforthe analysis, c) the type of the analysis along with the selected algorithms and d) theconsumptionapplicationsthataregoingtobedeveloped.

x AdefinesetofEvaluationcriteria, targetsandevaluationplanfortheLinDAWorkbenchandtheLinDAPilotshasbeenidentifiedanddocumented.

D6.2–InterimProjectReport

15 / 81  

Networking,disseminationandExploitationDuringthe1styear,LinDAdissemination,engagementandexploitationtasksfocusedtothefollowingtasks:

x Online Tools and printed material (leaflets, brochures, etc) have been created fordisseminationpurposes.

x AfirstversionoftheLinDAwebsitehasbeenlaunchedforthedisseminationoftheproject’sresults.

x Establishmentof5socialchannelstomaintaintheusers’interestaliveanddrivetraffictothewebsite(Facebook,Twitter,Google+,Youtube,Slideshare)

x Apressrelease,writteninEnglish,hasbeenproducedtoannouncethestartoftheproject.Thepress release has been translated to Greek and Italian and submitted to popular newschannels.

x Liaisonandcollaborationactivitiesandwithmorethan10relatedprojects.x Establishment of collaboration agreements (signed MoUs) with 3 projects in the LOD field

(SDI4APPS,PolicyCompass,E‐SPACE).x Activeparticipationto15conferencesandworkshopswithLinDApresentationsandposter.x 2conferencepapersx Organizationof2ProjectWorkshops

o “LinkedOpenData:ImprovingSMECompetitivenessandGeneratingNewValue”on2ndSeptember2014hostedinLeipziginconjunctionwithSemantics2104

o “Making (Linked)OpenData Available for Business” on 30thOctober 2014 hosted inBelfastinconjunctionwitheChallenges2014.

x 1stversionoftheExploitationandSustainabilityplanthat identifiedthepotentialexploitableassets of LinDA, conducted a market analysis and proposed a set of exploitation andsustainabilitypathswhichwillbethoroughlydiscussedatconsortiumlevelduringthesecondyearoftheprojectinordertoconcludewiththemostappropriateoptions.

Allpublicdeliverablesoftheproductcanbeaccessedonlineat:http://linda‐project.eu/deliverables/LinDAKeyPublicDeliverablesforthe1st yearD1.1‐LinkedDataComponents&ToolsSotA andBusinessScenariosforLinkedDataUtilization‐1stVersion

D1.2‐LinDATechnicalRequirementsD2.1‐LinDAArchitecture:TransformationEngine&LinkedDataVocabulariesandMetadataRepositoryD2.2‐LinDATransformationEngine,LinkedDataVocabulariesandMetadataRepository‐ 1stVersionD3.1‐LinDAPublicationandConsumptionFrameworks,LibrariesandInterfaces‐1stVersion

D5.3‐DisseminationPlanD5.4‐DisseminationActivitiesReport‐1stVersion

D6.2–InterimProjectReport

16 / 81  

2.4 WebsiteandSocialMediaChannelsIn order to achieve the purpose of reaching people at a pan‐European scale, which is not possiblethrough physical contact by the consortium members, the project has invested in the design,developmentandregularupdateofaninteractive,Web2.0portalthatwilloperateasaone‐stop‐shopforanyinterestedpartythatwishesbeinformedaboutLinDA’sadvancements.Theproject’swebsiteincludesalltherequiredinformationfortheLinDAproject.SinceLinDAbringsinto effect aWeb 2.0 strategy to stakeholders’ involvement besides the traditional communicationchannels,thereisanoticeablepresenceinsocialmediawhicharelinkedtothewebsite.TheLinDAwebsiteresidesat:http://www.linda‐project.euandisbasedona3‐tierarchitecture,builtusing open source software. Specifically, the infrastructure contains a web server (Apache5) and adatabase server (runningMySQLCommunityEdition6),which allwork together toprovide a seamlessexperience to the visitors of the site. The CMS engine that is currently used is version 4.0.1 ofWordpress7.Thesitecomplementedbytheestablishmentof5socialchannelstomaintaintheusers’interestaliveanddrivetraffictothewebsite.Thesearethefollowing:

x Facebook(news),https://www.facebook.com/LinDAFP7x Twitter(news),https://twitter.com/LinDa_FP7x Google+(news),https://www.google.com/+Linda‐projectEux Youtube(videos),https://www.youtube.com/channel/UCfZHhkxIN_O1jRovE2lTWwAx Slideshare(presentations),http://www.slideshare.net/LinDa_FP7

Thefollowingscreenshotpresentthelandingofthewebsiteatitspresentstate.Ithastobenotedthatthewebsitewillbeoptimisedforofferinganimproveduserexperience,whilecontentpopulationwillbecontinuous.

5ApacheWebServerfromwww.apache.org6MySQLisanopensourceproductofOracleCorporationand/oritsaffiliatesavailablefromwww.mysql.com7Wordpressisanopensourcewebauthoringsoftwareavailablefromwww.wordpress.org

D6.2–InterimProjectReport

17 / 81  

Figure2‐2:WebsiteLandingPage