Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IBM Research
copy 2006 IBM Corporation
Technology Challenges for Health Monitoring and Automated Stream Analysis
Archan Misra (archanusibmcom) IBM TJ Watson Research Center USA
contributions from Watson colleagues (Marion Blount Maria Ebling Anastasios Kementsietsidis Iqbal Mohomed Daby Sow Min Wang) and IBM UCL Korea
IBM Research
copy 2006 IBM Corporation
Century
Contents of Talk
Motivation and Challenges for Remote Monitoring and AnalyticsHarmoni Context-based Event Processing on the Mobile HubCentury Technologies Stream Storage Provenance and
QoE
Server
Server
Harmoni
IBM Research
copy 2006 IBM Corporation
Chapter 1
The Motivation and Challenges for Remote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Business Motivation
The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in
the United States
ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs
ndash Chronic diseases require long-term management
By 2010 the US will experience the most citizens in history age 65 or over
ndash 200000 Doctor Deficit by the year 2010
ndash Huge growth forecast in managed care residences
bull 45 million Americans (15 of population) will be 65 years or older by 2015
bull About 4 million long-term care beds in US as of 2003
IBM Research
copy 2007 IBM Corporation
Personalized Health Care
Remote Monitoring
Traditional HCPatient Reported Data
Episodic Treatment Electronic Health Records Information Augmented
Chronic Disease Mgmt
Clinical Trial Data Collection
In-Pt Automated Vitals
Rules Based Clinical Response
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
Rev
olut
iona
ryTe
chno
logy
Automated Systems
Non-specific (Treat Symptoms)
InformationCorrelation
1st Generation Diagnosis
Organized(Error Reduction)
Personalized(Disease Prevention)
Thro
ughp
ut A
naly
tics
Data and Systems Integration
Information-based Medicine The Remote Monitoring Roadmap
CDI
Source Kathy Schweda
Clinical Decision Support
Century
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Century
Contents of Talk
Motivation and Challenges for Remote Monitoring and AnalyticsHarmoni Context-based Event Processing on the Mobile HubCentury Technologies Stream Storage Provenance and
QoE
Server
Server
Harmoni
IBM Research
copy 2006 IBM Corporation
Chapter 1
The Motivation and Challenges for Remote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Business Motivation
The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in
the United States
ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs
ndash Chronic diseases require long-term management
By 2010 the US will experience the most citizens in history age 65 or over
ndash 200000 Doctor Deficit by the year 2010
ndash Huge growth forecast in managed care residences
bull 45 million Americans (15 of population) will be 65 years or older by 2015
bull About 4 million long-term care beds in US as of 2003
IBM Research
copy 2007 IBM Corporation
Personalized Health Care
Remote Monitoring
Traditional HCPatient Reported Data
Episodic Treatment Electronic Health Records Information Augmented
Chronic Disease Mgmt
Clinical Trial Data Collection
In-Pt Automated Vitals
Rules Based Clinical Response
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
Rev
olut
iona
ryTe
chno
logy
Automated Systems
Non-specific (Treat Symptoms)
InformationCorrelation
1st Generation Diagnosis
Organized(Error Reduction)
Personalized(Disease Prevention)
Thro
ughp
ut A
naly
tics
Data and Systems Integration
Information-based Medicine The Remote Monitoring Roadmap
CDI
Source Kathy Schweda
Clinical Decision Support
Century
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Chapter 1
The Motivation and Challenges for Remote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Business Motivation
The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in
the United States
ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs
ndash Chronic diseases require long-term management
By 2010 the US will experience the most citizens in history age 65 or over
ndash 200000 Doctor Deficit by the year 2010
ndash Huge growth forecast in managed care residences
bull 45 million Americans (15 of population) will be 65 years or older by 2015
bull About 4 million long-term care beds in US as of 2003
IBM Research
copy 2007 IBM Corporation
Personalized Health Care
Remote Monitoring
Traditional HCPatient Reported Data
Episodic Treatment Electronic Health Records Information Augmented
Chronic Disease Mgmt
Clinical Trial Data Collection
In-Pt Automated Vitals
Rules Based Clinical Response
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
Rev
olut
iona
ryTe
chno
logy
Automated Systems
Non-specific (Treat Symptoms)
InformationCorrelation
1st Generation Diagnosis
Organized(Error Reduction)
Personalized(Disease Prevention)
Thro
ughp
ut A
naly
tics
Data and Systems Integration
Information-based Medicine The Remote Monitoring Roadmap
CDI
Source Kathy Schweda
Clinical Decision Support
Century
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Business Motivation
The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in
the United States
ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs
ndash Chronic diseases require long-term management
By 2010 the US will experience the most citizens in history age 65 or over
ndash 200000 Doctor Deficit by the year 2010
ndash Huge growth forecast in managed care residences
bull 45 million Americans (15 of population) will be 65 years or older by 2015
bull About 4 million long-term care beds in US as of 2003
IBM Research
copy 2007 IBM Corporation
Personalized Health Care
Remote Monitoring
Traditional HCPatient Reported Data
Episodic Treatment Electronic Health Records Information Augmented
Chronic Disease Mgmt
Clinical Trial Data Collection
In-Pt Automated Vitals
Rules Based Clinical Response
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
Rev
olut
iona
ryTe
chno
logy
Automated Systems
Non-specific (Treat Symptoms)
InformationCorrelation
1st Generation Diagnosis
Organized(Error Reduction)
Personalized(Disease Prevention)
Thro
ughp
ut A
naly
tics
Data and Systems Integration
Information-based Medicine The Remote Monitoring Roadmap
CDI
Source Kathy Schweda
Clinical Decision Support
Century
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Personalized Health Care
Remote Monitoring
Traditional HCPatient Reported Data
Episodic Treatment Electronic Health Records Information Augmented
Chronic Disease Mgmt
Clinical Trial Data Collection
In-Pt Automated Vitals
Rules Based Clinical Response
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
Rev
olut
iona
ryTe
chno
logy
Automated Systems
Non-specific (Treat Symptoms)
InformationCorrelation
1st Generation Diagnosis
Organized(Error Reduction)
Personalized(Disease Prevention)
Thro
ughp
ut A
naly
tics
Data and Systems Integration
Information-based Medicine The Remote Monitoring Roadmap
CDI
Source Kathy Schweda
Clinical Decision Support
Century
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem
Research Background From Today To 2015
Care DeliveryPatients
HealthStatus Setting
Socio-economic
StatusCatchments
Area Access Location ServiceProvider
Healthy
Minor Ailments
At Risk
Acutely Ill
Chronically Ill
Catastrophic-ally Ill
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
WellnessRisk
Assessment
Prevention
Acute Care
Chronic Care
Complement-ary Care
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
Local
Regional
National
International
Patients Care Delivery
Age Group SettingSocio-
economicStatus
Access Location Provider Service
Infants
Adolescent
Adult Men
Adult WomenSenior Men
Senior Women
Rural
Suburban
Urban
High
Medium
Low
In Person
Telephonic
Electronic
Home
Outpatient Setting
HospitalEmergency Departmen
tLong Term
Care
Internet
Call Center
Risk Assessme
ntPrevention
Acute -Diagnosis
Acute -TreatmentChronic -DiagnosisChronic -Treatment
Traditional ProvidersPublicPriv
ate InsurersAlternate ProvidersMidlevel ProviderHealth
Infomediary
CatchmentsArea
Local
Regional
National
International
Ubiquitous ComputingTechnologies
- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor
SNUrsquos non-intrusive ECG SensorMultichannel ECG
- Low power wireless communication(ex) Bluetooth ZigBee
Patient Centric Network(Interoperability)
- IHE Continua
Healthcare AnalyticsResearch
- Techniques to monitor effects of drugof patients
- Biomedical trend analysis- Preventive analysis
IT Infrastructure
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Remote Health Monitoring The Opportunity
Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side
effectsndash Reduced workload on doctors nurses and other healthcare providers
Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors
Server
Patient Diary
BT
Data
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Chapter 2
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
Harmoni Context-based Event Processing on the Mobile Hub
Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Harmoni (Healthcare Adaptive Remote Monitoring) Overview
PAN (Bluetooth)WAN
(GPRS)Data
Type of Sensor Device
Bits sensor sample
Channels device
Raw data rate
(KBday)
GPS 1408 1 14850
SpO2 3000 1 94922
EKG (cardiac) 12 6 194400
Accelero-meter 64 3 202500
EEG (brain) 12 12 388800
EMG (muscle) 12 6 777600
1 Context-Aware Event Filtering
HARMONI Novel Features
IF (user in lsquogymrsquo amp 90lt hrlt120)
THEN send AVG(hr) min
2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)
3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data
Need to reduce
uplink transmission
ratespower
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
HARMONI Architecture Key Components on Mobile Device
Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and
consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo
ndash Processed events themselves act as predicates for new rules
Rule Managerndash Coordinates with server to determine patterns of current interest and
consequent actionndash RM populates or modifies the rules in the event engine
Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server
Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted
availability of network connections and incoming sensor data rates
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Mobile Device Remote ServerSensor
Short-RangeWireless Link
Wireless Linkto the Internet
DB
PatternRecognition
Engine
PatternLearningEngine
External Context Sources
External Rule
Specifications
304
External Action
Specifications
ExternalAction
TriggeringMechanism
Data CollectionData Adapters
Light-Weight PatternRecognition
Engine
UserInterface
ActionTriggering
Mechanism
Intelligent Data Transmission
AnticipationMechanism
Context
SensorReadings
SensorControl
DeviceResources
Data Processing
Rule Manager
TAPAS
Rule Server
HARMONI Functional Architecture
Simple recognition of pre-specified temporalpatterns across sensor streams
Event engine driven by Deterministic Finite Automata (DFAs)
Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)
Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo
Use of intelligent compression schemes to reduce the volume of traffic
Store-and-forward otherwise
Data stored in memory buffer and on FLASH storage
Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules
simultaneously and dynamicallyDynamic downloading of rules provides
implicit ldquocontext awarenessrdquo at client
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Pattern Recognition Engine Implementation
Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA
accessible across statesndash Run-time LISPScheme interpreter allows
arbitrary instructions to be carried out upon initialization and state transitions in the DFA
Compiler
Rule Server
EventScript
DFA
Rule Manager
Desktop
770
Remote Server
Event
Engine
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can
be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development
environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler
(httpwwwscratchboxorg)
Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes
WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating
Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Impact of HARMONI on Transmission Bandwidth Idealized Context
Data Generated From Sensor
0
50
100
150
200
1 1049 2097 3145 4193 5241 6289 7337 8385 9433
Sample
Hear
t Rat
e (b
pm)
Compression (S2) by itself results in gt 50 reduction in bandwidth consumption
Filtering (S3) results in 85 reduction in bandwidth consumption
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ
7030
2906
4890
1959
2964
17021454
2204
0
10
20
30
40
50
60
70
80
None Office Gym OfficeGym
Context Assumed
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed With LZ-78 Compression
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Result 2 Impact of Personalization on Event Filtering
Variation in Heart Rate Readings for Different Individuals
0
50
100
150
200
Time
Hea
rt Ra
te
(bea
ts p
er m
inut
e)
9510
6500
3494
23022018 1902
1184 926
0
10
20
30
40
50
60
70
80
90
100
User 2-1 User 2-2
Dat
a Tr
ansm
itted
Ove
r Net
wor
k (in
KB
)
Uncompressed
Compressed
UncompressedGeneric Context-filtering
Uncompressed Personalized Context filtering
Sensor streams exhibit significant statistical variation across individuals
Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
HARMONI in Practice Sensor-based Context
Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold
for 3 distinct states (0-70 70-110 110-170) across all users
ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients
Bandwidth savings between 26-73 for our sample population
Context-Based Filtering
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Chapter 3
The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics
HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub
Century Technologies Stream Storage Provenance and QoE
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services
Challenges Architectural Goals Technologies amp Approaches
Integrating massive volumes of disparate data
Extensible Data ModelScalable Infrastructure
Need for sophisticated analytics
Growing collaboration across ecosystem
Integrity of data ampTracking the analytic data
Timely feedback
Open and scalable analysis framework
Enterprise sharing of sensor data
Quality of Events (QoE) and provenance support for backtracking amp auditing
Real-time Processing
Large number of concurrent users
Privacy and Security
Scalable device and user management
Protection of personal medical data
Extensible Stream Storage System
Distributed Stream Analysis Runtime
Quality Of Event
Hybrid Provenance System
Interoperability
Privacy amp Security Technology
Stream Processing Platform
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)
Scalabilityndash Handles large numbers of patients
Robustnessndash Resilient to sensor or infrastructure failures
bull Exacerbated in remote monitoring settings
Easy to Usendash Separate medical domain knowledge from IT knowledge
Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics
Programming model centered around System S concept of Processing Element (PE)
ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics
Century also provides useful medical analytic services
ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams
Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature
Vertically designed systems for the analysis of health monitoring data exist today
ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials
Analysis Framework (1) The Approach
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
IPN
NES
DUP
PSDDFESPASPM
VBFSPA
SPM DFE PSDDataSource
VBF
Source PEWLG SGD PCP JAE DSN
N(CE)S^2
DUPP
IPN
SCF
JoinCCIPN
N(CE)S^2
PFA IPN
SSFJoinSC IPN
N(CE)S^2
SLF
JoinL
IPN
NES
DUP
IPN
ES IPN
JoinL IPN
DataSource
Source PEWLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
SPA
SPM
DFE PSD
INQ GUI
DataSource
Source PE
NES IPNVBF
ldquoMarkrdquo
DUP
WLG
Analysis Component
AnalysisJob
Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)
Current Solutions New Requirements Our Innovations
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Analysis Framework (2) PE Example
bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify
or create new SDOs
Normalization
Fast FourierTransform
Input ECG
AR PE
AlertGenerator
Output Alert
Neural Network Trained during
initialization
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Century Event Management Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Event Management Service (1)
Annotator
time 1 2 3
XML
Annotator
time 1 2 3
Not extensibleLimited scalability
Fully extensibleBut does not scale to
high event rate
The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders
Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language
XML
XML
XML
Potential Approaches
1 Relational DB (Annotation table)
2 Relational-XML DB (XML-based Annotation data model)
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Event Management Service (3) Hybrid Storage Model
Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate
time
arrival rateaverage arrival rate
traffic average databaseservice rate
Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ
UnstabilityRegion
Annotator
time 1 2 3
3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)
Stream Buffer How do we designthe stream buffer
XMLXML
XML
XML
D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Event Management Service (4) The Stream Buffer
System S STG(File System Store)
Traffic Monitor
Storage Requests(events of type ldquoeventStorerdquo)
DatabasePE
Traffic Models Traffic
Predictor Event Store Load Monitor
Storage Requests
PE1
PE2
PE3
PE4
PE5SPE
Analysis Graph
Towards the delivery system
Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database
EventStore
(Relational-XML
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2007 IBM Corporation
Century Provenance Service
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation26
An Example of Data Provenance Use
Urgent AlertPatient Doe John
Condition Abnormal ReactionRecommendation
The Century System has found a developing issue in your patientrsquos
medical condition The condition of the patient is deteriorating A known
side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day
Ultimately the physician is responsible for medical decisions and any actions taken In order
for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly
Dr Lee has prescribed a specific medication to patient John Doe for his heart condition
With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe
A few days later Dr Lee receives an alert from the Century prescription program
The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)
Thatrsquos unusual and a large decrease Before agreeing to
this change I need to understand on what basis the
system has made this recommendation and how
accurate it is
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Provenance (1) Two Approaches to Provenance
f1
f2
f3
Provenance ServerStore
I ran algorithm f1 with parameters uv and w at time t0 I published on port 1
PE1
PE2
I recvd some data on port 2 I ran algorithm f3 on the data at time t1
What is process provenancebull The ability to retrace
which ldquosystem componentsrdquowere on the data path
What is data provenancebull The ability to retrace
which ldquodata elementsrdquo ledto generation of output data
PEj
Sensor
Sensor
Data2
DataOut4
Data1 Data2
I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2
Data1
Data1
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Provenance (2) Prior Work on Provenance
EU Provenance (PASOA PreServ))
Captures Workflow bindings
Scientific SOA Workflows (KARMA)Workflow and application binding notifications
Stateless application-defined data provenance
Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL
Data dependencies specified statically for relations
Event and Processing Rates
Stream Provenance (Simhan)
Store temporal history of stream connectors as a per-stream stack
System-level automatic metadata collection at stream-levelTy
pe o
f Pr
oven
ance
Rec
onst
ruct
ion
Proc
ess
prov
enan
ceD
ata
prov
enan
ce
File Systems PASS LFSCaptures system calls and
modifications to file recordsAnnotation per file
Medical Stream Provenance Arbitrary logic for invidual PEs (
statefulness and long memory) Dynamic stream bindings
High proessingthroughput
Century
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Provenance Challenges Unique to Streams
Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation
is very heavyweight
Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput
Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data
Dynamic Stream Bindings Streams may be ephemeral as a consequence the
transformation graph may vary with time
We need a provenance system that
Is reactivemdashgenerating provenance events only when things change
Does not require annotating large amounts of metadata per sensor sample
Can accommodate variations in the statefulness model
We need to support both
Process provenance Show me the set of components that are involved in the generation of alert Ei
Data provenance Show me stream segments related to the generation of alert Ei
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
TVC Model of Hybrid Provenance
Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data
element but remains constant for the same processing logic
Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input
streams) within a designated interval Δ
( ) ( ) UComponent
tttouti
jjjki inps
input to is
ΔminusisinlArr
Processing Element
Si(t) Si(t-1) Si(t-2)
Output stream Si
Sj(t)Sj(t-1)
Sj(t-2)Sj(t-3)
Sk(t)Sk(t-1) Sk(t-2)
Sk(t-3)
Input stream Sj
Input stream Sk
Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2
Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)
ldquogymrdquo1
Dependency RulePE State
Rule Block ID
TVC Dependency Ruleblocks
State defined by Location Context
The dependency equation can however vary based on
ndash The PErsquos internal state or some external context
IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)
ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)
ndash Solution is to allow multiple functions each associated with a distinct range of states
ndash Store the state as metadata with the data
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
Resolving TVC-based Provenance Queries
Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is
distributed across multiple DBsndash Tradeoff faster provenance storage
processing for higher provenance reconstruction overhead
Determine ID of input ports and causative stream bindings
Retrieve appropriate data elements within specific timesequence window specified by TVC Equation
Apply TVC filter to obtain causative set of elements
PC Dependency Table
PEj f1() Po1
Dynamic Stream Mapping Table
t=10 S1 Pi1t=15 S15 I4t=16 S17 O6
Data Store)
element45 ts=15 SID= 12 state=ldquogymrdquo
element66 ts=16 SID= 17 state=lsquohomersquo
Stream and Port Mapping (SAPM)
Provenance Query Server
1 Store bindings between every input output port and associated streams
2 Store creation modification times of PEs
3 Store TVC function f() for every output port input port
Query (Ei)
1 Obtain IO stream mappings
2 Retrieve Dependency equation
4 Apply equation to determine set of causativeelements
Return set of elements
3 Retrieve elements
M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
Key Research Benefits of TVC Model for Provenance
1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases
TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata
per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)
ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)
ndash Statefulness- time and value functions allow compact representation of long dependency windows
ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)
Other provenance systems cannot handle O(100 Keventssec) system throughput
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
Quality of Events in Century
USN Gateway
DataTransformer
Sensor DataSender
Sensor DataReception
Toolkit
IDE forApplicationDeveloper
SensorSimulator
Event ManagementClient Library
Patient Portal
Administrator Portal
StakeholderInfo
PatientInfo
DeviceInfo
ApplicationInfo
Provenance Service
ProvenanceStorage Manager
StateData
ProvenanceMetadata
EventStore
Storages
Portal Service
Patient DataManager
Service DataManagement
User Group Data Manager
JobApplicationManager
Event Filter Annotator
Analysis Applications
Analysis Framework
QoEManager
StateManager
GroupManager
DeviceCatalogue
Authenticationamp Authorization
PrivacyManager
PlatformService
SubscriptionService
Event StoreQuery Service
ProvenanceQuery Service
GroupManagement
Service
IHE Adapter
GroupData
PrivacyPolicy
Event StorageManager
EventManagement
Service
EventPreprocessor
Remote AccessManager
CENTURY SERVER INFRASTRUCTURE
Stakeholder Portal
Doctor LifestyleConsultant
HealthcareIT Specialist
GovtCDC
Solution DeliveryServices
ProvenanceQuery Manager
ProvenanceAccess Control
Resolver
Device DataManager
InteroperabilityContainer
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
PE
Quality of Events in Data Streams
Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding
However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or
persistent)ndash Transmission impairments Loss of stream elements or
delayorder reversal of streams
Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or
adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis
component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality
ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality
t-3 t-5t-4t+1
t t-5t-1t+1
What generated this alert and
can we trusted
ALERT
t
t-2t-1t
t-3 t-4t-2
t-16t-14t-13tt+1
The sensor is faulty and generates
stream elements of low quality
Stream elements are missing
(possibly due to network issues)
Replace
sensor
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
Century QoE Some Basic QoE Measures
t-3 t-5t-4t-2t-1t
PE
t-5t-2t The current
input
The input that
was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing
t
PE
t-12t-11t-10
t-5t-2t
t
Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync
t-3 t-5t-4t-2t-1t
PE
The current
inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements
t
The ldquousefulrdquo input
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM CorporationIBM Research
Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit
redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition
cost) from ECG SpO2 BP monitors
Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback
from medical professionals on quality of analysis results should affect QoE of upstream sensors)
How to compute QoE of medical streams and how
to refine them based on QoE of other streams
How to adaptively alter acquisition pattern based
on (cost accuracy needs)
How to develop a combination of hybrid model-replay
based provenance
How to define context-dependent access-control
models of provenance data in a multi-provider
environment
Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if
the state of analysis components can be stored
Better techniques for establishing provenance relationshipsndash User-specified models may not always be available
learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of
analysis logic
Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the
currnt lsquorolersquo of the provider and the userrsquos medical history
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation
Conclusions
Successful transition to large-scale remote monitoring requires significant innovations at both server and relay
Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and
predictive data transmission
Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and
eventinformation quality
IBM Research
copy 2006 IBM Corporation
IBM Research
copy 2006 IBM Corporation