Click here to load reader

WP3 – Information Platform

  • Upload
    kamuzu

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

WP3 – Information Platform. Mário J. Silva Universidade de Lisboa, Faculdade de Ciências, Departamento de Informática [email protected]. What will be necessary to predict epidemics precisely ?. Data of many different types and many unrelated sources . - PowerPoint PPT Presentation

Citation preview

TBD

Mrio J. SilvaUniversidade de Lisboa, Faculdade de Cincias, Departamento de [email protected] Information Platform

What will be necessary to predict epidemics precisely?15 Mar 2011 - 2nd Epiwork Review Brussels2Data of many different types and many unrelated sources.Improved accuracy makes required data a never-ending storyWe all want to see realistic and timely plots of epidemics propagation.Available, but hard to find, collect and maintain!15 Mar 2011 - 2nd Epiwork Review Brussels3Epiwork

The EPIWORK project proposes a multidisciplinary research effort aimed at developing the appropriate framework of tools and knowledge needed for the design of epidemic forecast infrastructures to be used in by epidemiologists and public health scientists. The project is a truly interdisciplinary effort, anchored to the research questions and needs of epidemiology research by the participation in the consortium of leading epidemiologists, public health specialists and mathematical biologists.

Epidemic researchers along with informatics, computer science, complex systems and physics leading scientists, will tackle most of the much needed development in epidemic forecast of modeling, computational and ICT tools such as i) the foundation and development of the mathematical and computational methods needed to achieve prediction and predictability of disease spreading in complex techno-social systems; ii) the development of large scale, data driven computational models endowed with a high level of realism and aimed at epidemic scenario forecast; iii) the design and implementation of original data-collection schemes motivated by identified modelling needs, such as the collection of real-time disease incidence, through innovative web and ICT applications; v) the set up of a computational platform for epidemic research and data sharing that will generate important synergies between research communities and countries.

http://www.gripenet.pt/

15 Mar 2011 - 2nd Epiwork Review Brussels4

Other Internet Monitoring Sources15 Mar 2011 - 2nd Epiwork Review Brussels5

Social Media Sources15 Mar 2011 - 2nd Epiwork Review Brussels6

Data.gov.uk, keyword=epidemiology

15 Mar 2011 - 2nd Epiwork Review Brussels7data.gov, epidemiology15 Mar 2011 - 2nd Epiwork Review Brussels

8Linked Data15 Mar 2011 - 2nd Epiwork Review Brussels9

http://linkeddata.org/Data in EpiworkClassic SourcesModern Sources15 Mar 2011 - 2nd Epiwork Review Brussels10[National Bureau of Statistics] demographics, transportation data, ..[Public Health authorities] surveillance data (maybe?)[Internet Monitoring Sources][Social Media]behavioural data

To be shared by epidemic modellers in a digital library, dubbed the Epidemic Marketplace

Epiwork11

15 Mar 2011 - 2nd Epiwork Review BrusselsThe EPIWORK project proposes a multidisciplinary research effort aimed at developing the appropriate framework of tools and knowledge needed for the design of epidemic forecast infrastructures to be used in by epidemiologists and public health scientists. The project is a truly interdisciplinary effort, anchored to the research questions and needs of epidemiology research by the participation in the consortium of leading epidemiologists, public health specialists and mathematical biologists.

Epidemic researchers along with informatics, computer science, complex systems and physics leading scientists, will tackle most of the much needed development in epidemic forecast of modeling, computational and ICT tools such as i) the foundation and development of the mathematical and computational methods needed to achieve prediction and predictability of disease spreading in complex techno-social systems; ii) the development of large scale, data driven computational models endowed with a high level of realism and aimed at epidemic scenario forecast; iii) the design and implementation of original data-collection schemes motivated by identified modelling needs, such as the collection of real-time disease incidence, through innovative web and ICT applications; v) the set up of a computational platform for epidemic research and data sharing that will generate important synergies between research communities and countries.Outline15 Mar 2011 - 2nd Epiwork Review Brussels12The need for an Epidemic MarketplaceEpidemic Marketplace 1.0D3.3 Public Release of the Epidemic Marketplace PlatformWhere we stand and plans for work ahead

Steps for Creating the EM15 Mar 2011 - 2nd Epiwork Review Brussels13Elaborate meta-model for describing datasets used by epidemic modellers.Provide query services over the meta-data to discover resources.Select ontologies for characterizing data and develop an ontology of epidemic concepts.Ingest, harmonize and cross-link data.Provide query services to select epidemic data using the EM meta-data and ontologies.

Common Reference Model15 Mar 2011 - 2nd Epiwork Review Brussels14Open domain: detailed description of the datasets used in the models of all sorts of epidemics would require describing virtually every kind of information, given the diversity of factors and the interdisciplinary of epidemiologic studies.

Data model needs to support interlinked data.

The description of the datasets used in the models of all sorts of epidemics would require all the necessary to propose a model capable of describing virtually every kind of information, given the diversity of factors and the interdisciplinary of epidemiologic studies. In the study of a specific disease it is possible to have datasets describing the disease, how it spreads, clinical data about a population and so on. Data may be geo-referenced and geospatial data may be necessary for the modelling of the disease transmission. Other data can be important for the study of diseases, such as genetic, socio-economic, demographic, environmental and behavioural data. The need to encompass so many areas of study will reflect on the contents of the datasets and ultimately on their metadata, calling for a data organisation supporting interlinked data (Bodenreider and Stevens 2006; Bizer in press)Given the high diversity and heterogeneity of epidemic data involved, a common reference model based on metadata is needed. Metadata terms are being defined based on controlled vocabularies and ontology terms, and ontologies will be also used to characterize the entities and relationships among them in the datasets. As a result, the information model of the Epidemic Marketplace is directly defined through metadata and ontologies. Together, they will be essential in the development of epidemic modelling digital libraries, as they make documents and other data sources accessible in a more sophisticated, structured and meaningful manner. For example, using a specific ontology to describe a specific disease makes everybody referring to a specific disease to use the same term, making the information discovery simpler and more complete. But it also keeps the metadata text simpler, since the ontology itself contains other data that doesnt need to be inserted as metadata. For example, through an ontology of places (a geographic ontology), if we have a specific location code, we can obtain other information about that location, such as country, coordinates, altitude, city and so on.

Meta-data and Ontologies15 Mar 2011 - 2nd Epiwork Review Brussels15The information model of the EM is directly defined as metadata and ontologies. Ontology and Meta-data standards, the Pros and Cons of using them, annotation and deployment strategies, and the steps for creating an metamodel for epidemic data were the subject of D3.1 reviewed last year.

EM: Main Components15 Mar 2011 - 2nd Epiwork Review Brussels16

EM 1.0 Software ComponentsFedora Commons 2.X for the implementation of the main features of the repository. Access control in the platformXACML (OASIS 2010), LDAP (Tuttle et al. 2004)Shibolleth (identity management). Front-end based in MuradoraForum based on phpBB (+ Muradora)17

15 Mar 2011 - 2nd Epiwork Review BrusselsOutline15 Mar 2011 - 2nd Epiwork Review Brussels18The need for an Epidemic MarketplaceEpidemic Marketplace 1.0D3.3 Public Release of the Epidemic Marketplace PlatformWhere we stand and plans for work ahead

What is new since Mar 2010?15 Mar 2011 - 2nd Epiwork Review Brussels19Improved reliabilityMEDCollector automatic data collectorMeta-data policies and editorWeb services API + Simple EM ClientImproved user interfacePublic: anyone can browse and register (required for upload)

Improved ReliabilityReorganizations and back-end Services Before Public DeploymentVirtualized environment: every major component running on two separate virtual machines - production + development environments (Xen+CentOS)Monitoring and alerts for all services (Nagios)Logging and Analysis (Google Analytics)15 Mar 2011 - 2nd Epiwork Review Brussels20

MEDCollectorWeb ServicesWorkflow ProcessesLocal StorageDashboard for Workflow Design2115 Mar 2011 - 2nd Epiwork Review Brussels21

Geonames.org: All Countries and CapitalsUMLS: twitter searched ion subset of 89 terms related to Disease or SyndromeMEDCollector Data Model2215 Mar 2011 - 2nd Epiwork Review Brussels22MEDCollector ServicesData Collection ServicesQuery Selection ServicesData Harvesting ServicesXML Transformation ServicesDatabase Loading ServiceData Packaging ServicesTo CSV

2315 Mar 2011 - 2nd Epiwork Review Brussels23MEDCollector - BPEL

Language to define how Web-Services CommunicateStandard graphical notation BPMN Complex!2415 Mar 2011 - 2nd Epiwork Review Brussels24

MEDCollector: DashboardWireIt! - http://javascript.neyric.com/wireit/2515 Mar 2011 - 2nd Epiwork Review Brussels25MEDCollector: Dashboard

Watch the Demo!2615 Mar 2011 - 2nd Epiwork Review Brussels26Automatically Collected DataTwitter: 89 diseases, world-coverageProMed-mailGoogle Flu TrendsCDC RSS FeedsFlu updatesTravel Notices...Periodically packed and uploaded to the EM repository15 Mar 2011 - 2nd Epiwork Review Brussels27What is new since Mar 2010?15 Mar 2011 - 2nd Epiwork Review Brussels28Improved reliabilityMEDCollector automatic data collectorMeta-data policies and editorWeb services API + Simple EM ClientImproved user interfacePublic: anyone can browse and register (for upload)

Meta-data Policies and EditorMeta-data introduction simplifiedEditor that pops-up on upload now fills most of the entries with appropriate defaults.EM Repository Meta-data VocabularyGeneric DCTERMS adopted for datasets characterisationEpidemics-specific DCTERMS defined for epidemic datasets characterisation15 Mar 2011 - 2nd Epiwork Review Brussels29

DC Term Example: RightsHolder15 Mar 2011 - 2nd Epiwork Review Brussels30

EM Term Example: HostGroup15 Mar 2011 - 2nd Epiwork Review Brussels31

Mediator Web Services15 Mar 2011 - 2nd Epiwork Review Brussels32OpenLDAPMediatorClient

FedoraCommonsRepositoryOAI-PMHRESTful InterfaceOAI-OREFetch/SearchUpload

Simple EM ClientMapping of client filenames to EM resources (FC data streams and Collections)Operations: Check-out, check-in

15 Mar 2011 - 2nd Epiwork Review Brussels33Watch the Demo!Download from http://epimarketplace.net/mediator/EM15 Mar 2011 - 2nd Epiwork Review Brussels34

Old Graphic Style3415 Mar 2011 - 2nd Epiwork Review Brussels35

Try it at:http://epimarketplace.netOutline15 Mar 2011 - 2nd Epiwork Review Brussels36The need for an Epidemic MarketplaceThe Epidemic MarketplaceD3.3 Public Release of the Epidemic Marketplace PlatformWhere we stand and plans for work ahead

WP3: status (what we have done)15 Mar 2011 - 2nd Epiwork Review Brussels37Deliverable D3.1 (meta-model) releasedDeliverable D3.2 (prototype) releasedHardware and base software deployed; Initial prototype of EM with initial set of characterized datasetsDeliverable D3.3 (public version) releasedData-collectorEM DCAP and meta-data handlingWeb Services

Events 2nd yearLondon, Delhi, Bilbao, ERCIM News

15 Mar 2011 - 2nd Epiwork Review Brussels38

Publications in the 1st year15 Mar 2011 - 2nd Epiwork Review Brussels39Mrio J. Silva, Fabrcio A.B. Silva, Lus Filipe Lopes, Francisco M. Couto, Building a Digital Library for Epidemic Modelling. Proceedings of ICDL 2010 - The International Conference on Digital Libraries 1, p. 447--459, New Delhi, India, 23--27 February, 2010. TERI Press -- New Delhi, India. Invited Paper.Luis Filipe Lopes, Joo Zamite, Bruno Tavares, Francisco Couto, Fabrcio A.B. Silva, Mrio J. Silva, Automated Social Network Epidemic Data Collector. INForum - Simpsio de Informtica September, 2009.

EM-related Publications (2nd year)Mrio J. Silva, Fabrcio A.B. Silva, Lus Filipe Lopes, Francisco M Couto, Building a Digital Library for Epidemic Modelling. Proceedings of ICDL 2010 - The International Conference on Digital Libraries 1, p. 447459, New Delhi, India, 2327 February, 2010. TERI PressNew Delhi, India. Invited Paper.Fabrcio A.B. Silva, Mrio J. Silva, Francisco M Couto, Epidemic Marketplace: an e-Science Platform for Epidemic Modelling and Analysis. ERCIM News 82 Special Theme: Computational Biology. July, 2010.Luis Filipe Lopes, Fabrcio A.B. Silva, Francisco M Couto, Joo Zamite, Hugo Ferreira, Carla Sousa, Mrio J. Silva, Epidemic Marketplace: An Information Management System for Epidemiological Data. Proceedings of ITBAM'10 - 1st International Conference on Information Technology in Bio- and Medical Informatics - DEXA 2010 August, 2010.Joo Zamite, Fabrcio A.B. Silva, Francisco M Couto, Mrio J. Silva, MEDCollector: Multisource Epidemic Data Collector. Proceedings of ITBAM'10 - 1st International Conference on Information Technology in Bio- and Medical Informatics - DEXA 2010 August, 2010.Joo Zamite, Multisource Epidemic Data Collector, Master Dissertation, University of Lisbon, Faculty of Sciences, September 2010. Luis Filipe Lopes, A Metadata Model for the Annotation of Epidemiological Data, Master Dissertation, University of Lisbon, Faculty of Sciences, September 2010. Hugo Ferreira, O Mediador do Epidemic Marketplace. Master Dissertation, University of Lisbon, Faculty of Sciences, September, 2010; (in Portuguese).

15 Mar 2011 - 2nd Epiwork Review Brussels40WP3: status (what we will do)Overcoming the initial difficulties in hiring the planned resourcesRefreshed team with competencies required for the 2nd and 3rd year; Hiring 1 sw eng for push in release of EM 2.0Working on Epidemic Marketplace 2.0D3.4 and D3.5 due Feb 2012site analyticsinterlinkingPeeking on how to address challenges for the 4th year negotiating access to content15 Mar 2011 - 2nd Epiwork Review Brussels41Changes in UL WP3 TeamOutFabrcio SilvaLuis F. Lopes (meta-data)Hugo Ferreira (mediator)InDulce Domingos (access control)Juliana Duque (information architecture, graphics)Joo Ferreira (ontologies)+ (always in)MrioFranciscoJoo Zamite

15 Mar 2011 - 2nd Epiwork Review Brussels42Scheduled Deliverables15 Mar 2011 - 2nd Epiwork Review Brussels43

15 Mar 201143Todo List and Planning(Brussels, Mar 2011)Evolve Simple EM Client and GleamViz to become showcase for tight integration with Computational PlatformRefine and populate the catalogue of epidemic resources: enrichment, interlinking and semantification of epidemic dataRelease second version of the EM.Re-implemented Web Services (no more Muradora)New information architecture, new front-end designNew social network access control

4415 Mar 2011 - 2nd Epiwork Review BrusselsOn the nature of Soc Intelligent SystemsWho should learn behaviours about individuals from the network?No Silver BulletClassic Engineering approaches too slow for 21st century paceWe are now all part of a huge Living LabHow much longer will the fact that your cat sneezed be relevant?...we might have to ask again.Are we still under control?We may need more flexible ways to control access to sensitive data..

452 Aug 2010 - Assyst, LondonClassical ApproachesRole Based Access Control (RBAC):Advantages:Roles are intuitive concepts in organizationsUsers can easily be reassigned from one role to anotherDisadvantages:Central Administration has to manage rolesDoes not take into account collaborative/social dynamics

46Access Control Based on Social NetworksObjects have owners (or publishers)Owners are part of a social network and define access policies based on the network information

47EM 2.0 Software ComponentsFedora Commons 3.4 - main features of the repository. Mediator services reimplemented. Webservices provided by FC invoked directly.Access control in the platformXACML + LDAP (Tuttle et al. 2004)Shibolleth (identity management). Access Control Based on Social NetworksFront-end based in the Drupal CMSIntegrated forum48

15 Mar 2011 - 2nd Epiwork Review BrusselsEM 2.0 Mock-up interface15 Mar 2011 - 2nd Epiwork Review Brussels49

http://v2.epimarketplace.net/mockupWP3 SWOT Analysis StrengthsEpiwork-driven EMStandards-basedOpen Source modulesSupported (until 2012)WeaknessesUnpopulated EMLooking for the right policiesWhat are the incentives?Interfaces to WP4 and WP5?

15 Mar 2011 - 2nd Epiwork Review Brussels50Unchanged !WP3 SWOT Analysis OpportunitiesEpiwork testbedCreation of a baseline for epidemic modellingShowcase for partners outputsThreatsConsortium enters everyone for himself mode.Somebody will take care of that attitudeEM perceived as a very expensive, complex and useless cache15 Mar 2011 - 2nd Epiwork Review Brussels51Unchanged !

15 Mar 2011 - 2nd Epiwork Review Brussels52

http://epimarketplace.net

The EPIWORK project proposes a multidisciplinary research effort aimed at developing the appropriate framework of tools and knowledge needed for the design of epidemic forecast infrastructures to be used in by epidemiologists and public health scientists. The project is a truly interdisciplinary effort, anchored to the research questions and needs of epidemiology research by the participation in the consortium of leading epidemiologists, public health specialists and mathematical biologists.

Epidemic researchers along with informatics, computer science, complex systems and physics leading scientists, will tackle most of the much needed development in epidemic forecast of modeling, computational and ICT tools such as i) the foundation and development of the mathematical and computational methods needed to achieve prediction and predictability of disease spreading in complex techno-social systems; ii) the development of large scale, data driven computational models endowed with a high level of realism and aimed at epidemic scenario forecast; iii) the design and implementation of original data-collection schemes motivated by identified modelling needs, such as the collection of real-time disease incidence, through innovative web and ICT applications; v) the set up of a computational platform for epidemic research and data sharing that will generate important synergies between research communities and countries.Current Focus EM 2.015 Mar 2011 - 2nd Epiwork Review Brussels53Designing and implementing the new user interface.Must be useful to the expert and occasional user.New Access Control mechanisms addressing data privacy in socially intelligent environment. Refining and populating, enriching the catalogue of epidemic resources using initial prototype.The method of scanning published epidemic modelling studies and then inferring the metadata descriptions has shown to be very useful.

Todo list and planning(Torino, Nov 2009)15 Mar 2011 - 2nd Epiwork Review Brussels54Populate RepositoryLinked Epidemic DataEthics, Privacy and AnonimizationAccess control policiesDataset selection generationDistributed AuthenticationReplicate EM node

Todo list and planning(Brussels, Mar 2010)Populate RepositoryLinked Epidemic DataEthics, Privacy and AnonimizationAccess control policiesDataset selection generationDistributed AuthenticationReplicate EM node

15 Mar 2011 - 2nd Epiwork Review Brussels55Todo list and planning(Torino, Dec 2010)Populate RepositoryNew Front-end and updated components (prepare market)Access control policies & distributed authenticationLinked Epidemic DataEthics, Privacy and AnonimizationReplicate EM node

15 Mar 2011 - 2nd Epiwork Review Brussels56The falacies of free-text15 Mar 2011 - 2nd Epiwork Review Brussels57Initial proof-of-concept prototype showed the limitations spanning from annotating the datasets using free text in the meta-data description fields.A much simpler model, inspired on web2.0 tags. EM users will be able to freely annotate their datasets using their own terminologies (also dubbed as folksonomies). Kdnuggets, march 201015 Mar 2011 - 2nd Epiwork Review Brussels58