Upload
herman-tolentino
View
197
Download
2
Embed Size (px)
Citation preview
Herman Tolentino, MDDirector, Public Health Informatics Fellowship Program
Presentation Outline
What is EpiSPIDER? Why was EpiSPIDER built? What is event-based surveillance? How was EpiSPIDER built? The EpiSPIDER “Information Ecosystem” Evolution of EpiSPIDER How has EpiSPIDER been used? What are the challenges in implementing EpiSPIDER? Overall challenges in event-based surveillance Next steps Summary
What is EpiSPIDER?
The acronym stands for Semantic Processing and Integration of Distributed Electronic Resources for Epidemics and disasters
Key words Semantic processing Integration of distributed electronic resources
• “Mashup”• Visualization
Why was EpiSPIDER built?
2005: Request from ProMED Mail to represent their emerging infectious disease reports in time and space and provide RSS feeds to their members
2006: Growth beyond ProMED Mail and Google maps
2009 and beyond: Leveraging linked data to reduce information overload
Why was EpiSPIDER built?
Early response to disease outbreaks is a public health priority Emerging infectious diseases may not be part of routine public
health reporting in many countries We can potentially leverage non-traditional sources of data to
provide practitioners with early warning Specifically, leverage Internet killer applications to collect and
exchange health event information Extracting and visualizing event information from unstructured
data can be done using computer algorithms such as NLP and text mining (80% of health information remain locked in free text)
The Role of Information Technology and Surveillance Systems in Bioterrorism Readiness. Bioterrorism and Health System Preparedness, Issue Brief No. 5. AHRQ Publication No. 05-0072, March 2005. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/news/ulp/btbriefs/btbrief5.htm
What is event-based surveillance?WHO DEFINITION
Definition: The organized and rapid capture of information about events that are a potential risk to public health
Can be rumors and other ad-hoc reports transmitted through formal channels (i.e. established routine reporting systems) and informal channels (i.e. media, health workers and nongovernmental organizations reports), including: Events related to the occurrence of disease in humans, such as clustered cases of a
disease or syndromes, unusual disease patterns or unexpected deaths as recognized by health workers and other key informants in the country; and
Events related to potential exposure for humans, such as events related to diseases and deaths in animals, contaminated food products or water, and environmental hazards including chemical and radio-nuclear events.
Information received through event-based surveillance should be rapidly assessed for the risk the event poses to public health and responded to appropriately
Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
Role of event-based surveillance in national surveillance system (WHO)
Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf
Indicator-based Surveillance
Routine reporting of cases of disease, including•Notifiable disease surveillance system•Sentinel surveillance•Laboratory-based surveillance
Commonly•Health care facility based•Weekly, monthly reporting
Event-based Surveillance
Rapid detection, reporting, confirmation, assessment of public health events including•Clusters of disease•Rumors of unexplained deaths
Commonly•Immediate reporting
ResponseLinked to surveillance
National and subnational capacity to respond to alerts
Role of event-based surveillance in national surveillance (ECDC)
Indicator-based component
Surveillance Systems
Event-based component
Event-monitoring
Data Events
Signal
Public health alert
Control measures
CaptureFilterValidate
CollectAnalyseInterpret
Assess
Investigate
Disseminate
Confidential: EWRSRestricted access: network inquiries, ECDC threat bulletinPublic: Eurosurveillance, press release, web site
Paquet C, et..al. Epidemic intelligence: A new framework for strengthening disease surveillance in Europe. Euro Surveill. 2006;11(12): 212-4. URL: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=665
Major challenges in developing automated event-based surveillance systems
Can event-based surveillance systems be automated?
Major challenges: Describing what information can be extracted from
event reports Identifying methods to extract desired information Identifying methods to convert unstructured to
structured data
How was EpiSPIDER built?
Began as a fellowship project in 2005 with Dr. Raoul Kamadjeu
On a “shoestring budget,” utilizing Open-Source software and freely available web services and data sources Linux, Apache, MySQL and PHP (LAMP) Initially Scalable Vector Graphics then Yahoo Maps and
Google Maps Existing RSS feeds and unstructured web content
Custom-developed NLP later replaced with OpenCalais NLP web service
The Ecosystem Definition: Any natural unit or entity including living and non-living parts
that interact to produce a stable system through cyclic exchange of materials [NASA Earth Observatory Glossary].
Concept can be applied to Internet-based applications that function as information-consuming or information producing “organisms” that interact with each other in an interdependent way through exchange of information.
This information “ecosystem” has: Producers of data Transformers of data Consumers of data
http://earthobservatory.nasa.gov/Glossary/?mode=all
Graphical depiction of “ecosystem”
Yahoo Pipes
ProMED MailUNDPCIAWAHID
Unstructured Text
Google NewsMoreover ReutersWHOGDACSTwitter
RSSRSS
RSS, GeoRSS
OpenCalaisAlchemyUMLSKSuClassifier
GeonamesGoogle TranslateYahoo MapsWikipedia
KML
Exhibit
Faceted Browsing
Google Maps
JSON data
RDF, XML
XMLSOAP REST
Mobile Provider
SMTP
SMS
Dapper
RSS
Consumers
Transformers
Producers
RSS
EpiSPIDER
RSS
RSS
EpiSPIDER Web ServicesCATEGORIES BY TASK
Task Category Services
Information retrieval Search engines , RSS feeds, Raw HTML sources
Information extraction Dapper, Yahoo Pipes, Alchemy
Language identification Alchemy, Twitter, uClassifier
Language translation Google Translate
Keyword extraction Alchemy
Named entity recognition OpenCalais, Alchemy
Text classification uClassifier
Visualization SIMILE Exhibit, Google Visualization API, Google Maps
Georeferencing Google Maps, Yahoo Maps, Geonames, Twitter, OpenCalais, Alchemy
Concept annotation UMLS Knowledge Source Server
Technology Adoption Timeline
2005 2006 2007 2008
Data sourcesRSS Feeds (2)Unstructured content (1)
Visualization toolsScalable Vector GraphicsJPGraph
Web servicesYahoo MapsaskMEDLINE
ProductsRSS feedsVisualizations
Data sourcesRSS Feeds (4)Unstructured content (3)Email
Visualization toolsGoogle, Yahoo MapsJPGraph
Web servicesYahoo MapsGoogle MapsaskMEDLINEGeonames
ProductsRSS feedsVisualizations
Data sourcesRSS Feeds (8)Unstructured content (4)Email
Visualization toolsSIMILE ExhibitAJAX visualization tools
Web servicesYahoo MapsGoogle MapsaskMEDLINEGeonamesWikipedia
ProductsRSS , GeoRSS feedsKML feedsSMSVisualizationsCustom products
Data sourcesRSS Feeds (8)Unstructured content (4)Email(Server)
Visualization toolsSIMILE ExhibitAJAX visualization toolsGoogle Earth
Web servicesYahoo MapsGoogle MapsGoogle Visualization API (1)askMEDLINEGeonamesWikipediaUMLSKSOpenCalaisYahoo PipesDapper
ProductsRSS, GeoRSS feedsKML feedsSMSVisualizationsCustom products
Data sourcesRSS Feeds (9)Unstructured content (6)Linked DataEmail(Server)Social networks: Twitter
Visualization toolsSIMILE ExhibitAJAX visualization toolsGoogle EarthWordle
Web servicesYahoo MapsGoogle MapsGoogle TranslateGoogle Visualization API (3)askMEDLINEGeonamesWikipediaUMLSKSOpenCalaisYahoo PipesDapperuClassifierAlchemyTwitterURL services
ProductsRSS, GeoRSS feedsKML feedsSMSVisualizationsCustom products
2009
EpiSPIDER, 2005-2006SCALABLE VECTOR GRAPHICS MAP INTERFACE
EpiSPIDER, 2006GOOGLE MAPS INTERFACE
ProMED Mail RSS Feeds, 2006
EpiSPIDER, 2009SIMILE EXHIBIT INTERFACE
EpiSPIDER, 2009
EpiSPIDER, 2009
EpiSPIDER, 2008KML FEEDS FOR GOOGLE EARTH
EpiSPIDER, 2009SMS USING MOBILE PROVIDER GATEWAYS
Server Load Alert RSS Feed Outage ProMED Mail Latest
How has EpiSPIDER been used?
How has EpiSPIDER been used?
Access by type (most to least) RSS Exhibit KML
Access by organization Government agencies Academic institutions Research organizations Health departments
Access by individuals
Challenges in implementing EpiSPIDER
Changing nature of data Emergent nature of web services Understanding and developing connections with
complex APIs Information extraction and data linking
challenges Service delivery expansion increases resource
demands
Changing nature of web dataCHALLENGES IN IMPLEMENTING EPISPIDER
Challenges with underlying HTML structure Non-standard HTML use prevents effective parsing of
content
Need to map data to shared terminologies and ontologies and knowledge metadata For better integration into an information ecosystem,
system needs to let other “organisms” know what information it needs and what type of information it produces
Emergent nature of web servicesCHALLENGES IN IMPLEMENTING EPISPIDER
Adapting to changing interfaces Must go beyond “taping” applications together manually
- need for automated “duct tape” adjustments Difficult for some interfaces (non-SOAP)
Feed URL changes Have to subscribe to multiple mailing lists
Changes in data structure of service response Service may have new data elements Example, new Twitter geolocation elements
Understanding complex APIsCHALLENGES IN IMPLEMENTING EPISPIDER
APIs are in continuous development Complexity increasing Knowledge base rapidly expanding
Example: OpenCalais and Alchemy - addition of named entities
and relationships and linked data (Wikipedia, Freebase) for disambiguation
Promising developments Number of APIs in different task categories increasing
Information extraction and data linking challengesCHALLENGES IN IMPLEMENTING EPISPIDER
Named entity recognition and disambiguation Named entity recognition by web services of emerging
diseases may lag behind and provide non-specific references
Example: H1N1 may just be tagged as “influenza” (nonspecific)
Missing piece: UMLS Knowledge Source Server named-entity extraction and concept annotation web service Currently a standalone download: Metamap Transfer
Service delivery increases resource demandsCHALLENGES IN IMPLEMENTING EPISPIDER
Managing contention for scarce computing resources How to process huge amounts of information without
crashing the server Automated responses to certain parameters –
feedback loop Avoiding process collisions
Alerting mechanisms How to send alerts when the server is about to crash
Overall challenges in event-based surveillance for public health threats
Increasing dependence on and need for development of semantic tools to: Identify emerging outbreaks Assign outbreak severity Track escalation/decline, social disruption and government
response over time
Promoting semantic data sharing among similar systems Shared terminologies Ontologies Knowledge metadata
Chute C. Biosurveillance, Classification, and Semantic Health Technologies (editorial), J Am Med Inform Assoc. 2008;15:172–173.
Advantages of web services
Main advantages Outsource complex tasks to agents who can devote
resources and economies of scale to deliver high quality, reliable service and outputs
Promote use of standards for information exchange
Other advantages Develop and reuse standard tools for processing
unstructured information
What could be next steps?
Critical Incorporation of and mapping of knowledge base to ontology for event-based
surveillance to enable sharing of data across event-based surveillance systems Implementing event-based surveillance systems at national level to enable
targeted, distributed collection of event-based data Exposing underlying database as Resource Description Framework (RDF) or other
standards-based data Collaboration across event-based surveillance systems to enable system-to-system
interoperability
Non-critical Continue to explore new data sources Annotated view of news articles Providing citizen reporting and participatory information processing interfaces to
end-users
Summary
Inflection point in evolution of web services just “around the corner”
Challenges remain in: Automation and integration of web services in event-
based surveillance systems Integrating event-based surveillance in national
surveillance systems (local public health context) Enabling sharing of data across event-based
surveillance systems
Acknowledgements
NCIRD: Raoul Kamadjeu NLM: Paul Fontelo, Fang Liu, Olivier Bodenreider ProMED Mail: Larry Madoff, Marjorie Pollack,
Alison Bodenheimer, Drew Tenenholz
The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention