38
IBM Research © 2006 IBM Corporation Technology Challenges for Health Monitoring and Automated Stream Analysis* Archan Misra, ([email protected] ) IBM TJ Watson Research Center, USA *contributions from Watson colleagues (Marion Blount, Maria Ebling, Anastasios Kementsietsidis, Iqbal Mohomed, Daby Sow, Min Wang) and IBM UCL, Korea

Technology Challenges for Health Monitoring and Automated

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

IBM Research

copy 2006 IBM Corporation

Technology Challenges for Health Monitoring and Automated Stream Analysis

Archan Misra (archanusibmcom) IBM TJ Watson Research Center USA

contributions from Watson colleagues (Marion Blount Maria Ebling Anastasios Kementsietsidis Iqbal Mohomed Daby Sow Min Wang) and IBM UCL Korea

IBM Research

copy 2006 IBM Corporation

Century

Contents of Talk

Motivation and Challenges for Remote Monitoring and AnalyticsHarmoni Context-based Event Processing on the Mobile HubCentury Technologies Stream Storage Provenance and

QoE

Server

Server

Harmoni

IBM Research

copy 2006 IBM Corporation

Chapter 1

The Motivation and Challenges for Remote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Business Motivation

The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in

the United States

ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs

ndash Chronic diseases require long-term management

By 2010 the US will experience the most citizens in history age 65 or over

ndash 200000 Doctor Deficit by the year 2010

ndash Huge growth forecast in managed care residences

bull 45 million Americans (15 of population) will be 65 years or older by 2015

bull About 4 million long-term care beds in US as of 2003

IBM Research

copy 2007 IBM Corporation

Personalized Health Care

Remote Monitoring

Traditional HCPatient Reported Data

Episodic Treatment Electronic Health Records Information Augmented

Chronic Disease Mgmt

Clinical Trial Data Collection

In-Pt Automated Vitals

Rules Based Clinical Response

Pre-symptomatic Treatment

Lifetime Treatment

Evolutionary Practices

Rev

olut

iona

ryTe

chno

logy

Automated Systems

Non-specific (Treat Symptoms)

InformationCorrelation

1st Generation Diagnosis

Organized(Error Reduction)

Personalized(Disease Prevention)

Thro

ughp

ut A

naly

tics

Data and Systems Integration

Information-based Medicine The Remote Monitoring Roadmap

CDI

Source Kathy Schweda

Clinical Decision Support

Century

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Century

Contents of Talk

Motivation and Challenges for Remote Monitoring and AnalyticsHarmoni Context-based Event Processing on the Mobile HubCentury Technologies Stream Storage Provenance and

QoE

Server

Server

Harmoni

IBM Research

copy 2006 IBM Corporation

Chapter 1

The Motivation and Challenges for Remote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Business Motivation

The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in

the United States

ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs

ndash Chronic diseases require long-term management

By 2010 the US will experience the most citizens in history age 65 or over

ndash 200000 Doctor Deficit by the year 2010

ndash Huge growth forecast in managed care residences

bull 45 million Americans (15 of population) will be 65 years or older by 2015

bull About 4 million long-term care beds in US as of 2003

IBM Research

copy 2007 IBM Corporation

Personalized Health Care

Remote Monitoring

Traditional HCPatient Reported Data

Episodic Treatment Electronic Health Records Information Augmented

Chronic Disease Mgmt

Clinical Trial Data Collection

In-Pt Automated Vitals

Rules Based Clinical Response

Pre-symptomatic Treatment

Lifetime Treatment

Evolutionary Practices

Rev

olut

iona

ryTe

chno

logy

Automated Systems

Non-specific (Treat Symptoms)

InformationCorrelation

1st Generation Diagnosis

Organized(Error Reduction)

Personalized(Disease Prevention)

Thro

ughp

ut A

naly

tics

Data and Systems Integration

Information-based Medicine The Remote Monitoring Roadmap

CDI

Source Kathy Schweda

Clinical Decision Support

Century

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Chapter 1

The Motivation and Challenges for Remote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Business Motivation

The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in

the United States

ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs

ndash Chronic diseases require long-term management

By 2010 the US will experience the most citizens in history age 65 or over

ndash 200000 Doctor Deficit by the year 2010

ndash Huge growth forecast in managed care residences

bull 45 million Americans (15 of population) will be 65 years or older by 2015

bull About 4 million long-term care beds in US as of 2003

IBM Research

copy 2007 IBM Corporation

Personalized Health Care

Remote Monitoring

Traditional HCPatient Reported Data

Episodic Treatment Electronic Health Records Information Augmented

Chronic Disease Mgmt

Clinical Trial Data Collection

In-Pt Automated Vitals

Rules Based Clinical Response

Pre-symptomatic Treatment

Lifetime Treatment

Evolutionary Practices

Rev

olut

iona

ryTe

chno

logy

Automated Systems

Non-specific (Treat Symptoms)

InformationCorrelation

1st Generation Diagnosis

Organized(Error Reduction)

Personalized(Disease Prevention)

Thro

ughp

ut A

naly

tics

Data and Systems Integration

Information-based Medicine The Remote Monitoring Roadmap

CDI

Source Kathy Schweda

Clinical Decision Support

Century

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Business Motivation

The United States spends $19 trillion on healthcare or more than 16 of its GDPMore than 133 million Americans live with chronic illnessesndash Chronic diseases account for 70 of all deaths in

the United States

ndash The medical care costs of people with chronic diseases account for more than 75 of the nationrsquos $14 trillion medical care costs

ndash Chronic diseases require long-term management

By 2010 the US will experience the most citizens in history age 65 or over

ndash 200000 Doctor Deficit by the year 2010

ndash Huge growth forecast in managed care residences

bull 45 million Americans (15 of population) will be 65 years or older by 2015

bull About 4 million long-term care beds in US as of 2003

IBM Research

copy 2007 IBM Corporation

Personalized Health Care

Remote Monitoring

Traditional HCPatient Reported Data

Episodic Treatment Electronic Health Records Information Augmented

Chronic Disease Mgmt

Clinical Trial Data Collection

In-Pt Automated Vitals

Rules Based Clinical Response

Pre-symptomatic Treatment

Lifetime Treatment

Evolutionary Practices

Rev

olut

iona

ryTe

chno

logy

Automated Systems

Non-specific (Treat Symptoms)

InformationCorrelation

1st Generation Diagnosis

Organized(Error Reduction)

Personalized(Disease Prevention)

Thro

ughp

ut A

naly

tics

Data and Systems Integration

Information-based Medicine The Remote Monitoring Roadmap

CDI

Source Kathy Schweda

Clinical Decision Support

Century

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Personalized Health Care

Remote Monitoring

Traditional HCPatient Reported Data

Episodic Treatment Electronic Health Records Information Augmented

Chronic Disease Mgmt

Clinical Trial Data Collection

In-Pt Automated Vitals

Rules Based Clinical Response

Pre-symptomatic Treatment

Lifetime Treatment

Evolutionary Practices

Rev

olut

iona

ryTe

chno

logy

Automated Systems

Non-specific (Treat Symptoms)

InformationCorrelation

1st Generation Diagnosis

Organized(Error Reduction)

Personalized(Disease Prevention)

Thro

ughp

ut A

naly

tics

Data and Systems Integration

Information-based Medicine The Remote Monitoring Roadmap

CDI

Source Kathy Schweda

Clinical Decision Support

Century

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healthcare ecosystem

Research Background From Today To 2015

Care DeliveryPatients

HealthStatus Setting

Socio-economic

StatusCatchments

Area Access Location ServiceProvider

Healthy

Minor Ailments

At Risk

Acutely Ill

Chronically Ill

Catastrophic-ally Ill

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

WellnessRisk

Assessment

Prevention

Acute Care

Chronic Care

Complement-ary Care

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

Local

Regional

National

International

Patients Care Delivery

Age Group SettingSocio-

economicStatus

Access Location Provider Service

Infants

Adolescent

Adult Men

Adult WomenSenior Men

Senior Women

Rural

Suburban

Urban

High

Medium

Low

In Person

Telephonic

Electronic

Home

Outpatient Setting

HospitalEmergency Departmen

tLong Term

Care

Internet

Call Center

Risk Assessme

ntPrevention

Acute -Diagnosis

Acute -TreatmentChronic -DiagnosisChronic -Treatment

Traditional ProvidersPublicPriv

ate InsurersAlternate ProvidersMidlevel ProviderHealth

Infomediary

CatchmentsArea

Local

Regional

National

International

Ubiquitous ComputingTechnologies

- Emergence of sensor devices that generate more complex events at higher rates (ex) ETRIrsquos blood test sensor

SNUrsquos non-intrusive ECG SensorMultichannel ECG

- Low power wireless communication(ex) Bluetooth ZigBee

Patient Centric Network(Interoperability)

- IHE Continua

Healthcare AnalyticsResearch

- Techniques to monitor effects of drugof patients

- Biomedical trend analysis- Preventive analysis

IT Infrastructure

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Remote Health Monitoring The Opportunity

Long-term monitoring offers benefitsndash Early disease detection and trend analysis for healthy and at-risk individuals ndash Treatment and progress monitoring for patientsndash Participants in drug trials or experimental treatments to gauge efficacy and side

effectsndash Reduced workload on doctors nurses and other healthcare providers

Enabled by rapid improvements in two key technologiesndash Improvements in wireless communications (WiFi 3G Bluetooth)ndash Continuing miniaturization of wireless sensors

Server

Patient Diary

BT

Data

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Chapter 2

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

Harmoni Context-based Event Processing on the Mobile Hub

Century Technologies Stream Storage Century Technologies Stream Storage Provenance and Provenance and QoEQoE

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Harmoni (Healthcare Adaptive Remote Monitoring) Overview

PAN (Bluetooth)WAN

(GPRS)Data

Type of Sensor Device

Bits sensor sample

Channels device

Raw data rate

(KBday)

GPS 1408 1 14850

SpO2 3000 1 94922

EKG (cardiac) 12 6 194400

Accelero-meter 64 3 202500

EEG (brain) 12 12 388800

EMG (muscle) 12 6 777600

1 Context-Aware Event Filtering

HARMONI Novel Features

IF (user in lsquogymrsquo amp 90lt hrlt120)

THEN send AVG(hr) min

2 Personalization of Rules Apply Machine Learning at Server to learn ldquoNormal for user A gym= 110150)

3 Anticipation-Driven Transmission Based on past Prob(80211 within 2 hrs) gt 07 cache data

Need to reduce

uplink transmission

ratespower

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

HARMONI Architecture Key Components on Mobile Device

Lightweight RuleEvent processing enginendash Identifies appropriate temporal patterns in sensor data stream(s) and

consequent actionbull eg if 60ltAVG(last 10 heart rate values) lt90 then ldquotransmit AVG to serverrdquo

ndash Processed events themselves act as predicates for new rules

Rule Managerndash Coordinates with server to determine patterns of current interest and

consequent actionndash RM populates or modifies the rules in the event engine

Intelligent Data Transmissionndash Compresses the filtered data (from event engine) for transmission to server

Anticipation Mechanismndash Schedules the transmission of (compressed) data based on predicted

availability of network connections and incoming sensor data rates

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Mobile Device Remote ServerSensor

Short-RangeWireless Link

Wireless Linkto the Internet

DB

PatternRecognition

Engine

PatternLearningEngine

External Context Sources

External Rule

Specifications

304

External Action

Specifications

ExternalAction

TriggeringMechanism

Data CollectionData Adapters

Light-Weight PatternRecognition

Engine

UserInterface

ActionTriggering

Mechanism

Intelligent Data Transmission

AnticipationMechanism

Context

SensorReadings

SensorControl

DeviceResources

Data Processing

Rule Manager

TAPAS

Rule Server

HARMONI Functional Architecture

Simple recognition of pre-specified temporalpatterns across sensor streams

Event engine driven by Deterministic Finite Automata (DFAs)

Actions associated with rules include data transmission data transformation and local triggers (alarms reminders)

Compressed transmission ro server when interface is available AND ldquoanticipationrdquo indicates ldquoOK-to-Sendrdquo

Use of intelligent compression schemes to reduce the volume of traffic

Store-and-forward otherwise

Data stored in memory buffer and on FLASH storage

Rule Manager coordinates with backend server to download currently active or applicable rulesCan run activate or deactivate multiple rules

simultaneously and dynamicallyDynamic downloading of rules provides

implicit ldquocontext awarenessrdquo at client

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Pattern Recognition Engine Implementation

Rules are specified using EventScript which is similar to regular expressions A compiler (on a desktop computer) translates EventScript into a specification for a Deterministic Finite Automaton (DFA) We implemented a run-time engine on the mobile device that takes input events and ldquoexecutes DFA specificationsEffectively we are running finite state machines with some tweaksndash Unique memory heap for each DFA

accessible across statesndash Run-time LISPScheme interpreter allows

arbitrary instructions to be carried out upon initialization and state transitions in the DFA

Compiler

Rule Server

EventScript

DFA

Rule Manager

Desktop

770

Remote Server

Event

Engine

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

HARMONI Implementation PlatformNokia 770 Internet tablet N800 ndash ARM processor Linux-basedndash High-resolution display(800x480) touch screen with up to 65536 colors ndash 64-128 MB RAM 64 MB FLASH storage (expandable up to 1GB hellip can

be used for virtual memory)ndash Built-in Bluetooth (BlueZ stack) and 80211 interfacesndash Relatively cheap $350ndash httpwwwmaemoorg provides open-source software and development

environmentndash Code compiled on an IntelDebian Linux 31 box using cross-compiler

(httpwwwscratchboxorg)

Nonin Model 4100 Sp02heart rate monitorndash Provides Heart rate and Oxygen saturationndash Supports Bluetooth Serial Port Profile (SPP)ndash 120 hours of continuous operation with 2 AA batteriesndash Three packets transmitted per second where each packet is 375 bytes

WiTilt 3-axis Accelerometerndash Output baud of 576 Kbpsndash 40 mA consumption when operating

Delrone Earthmate GPSI Mohomed A Misra W Jerome M Ebling and A Misra HARMONI Context-aware Filtering of Sensor Data for Continuous Remote Health Monitoring to appear Percom2008

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Impact of HARMONI on Transmission Bandwidth Idealized Context

Data Generated From Sensor

0

50

100

150

200

1 1049 2097 3145 4193 5241 6289 7337 8385 9433

Sample

Hear

t Rat

e (b

pm)

Compression (S2) by itself results in gt 50 reduction in bandwidth consumption

Filtering (S3) results in 85 reduction in bandwidth consumption

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ

7030

2906

4890

1959

2964

17021454

2204

0

10

20

30

40

50

60

70

80

None Office Gym OfficeGym

Context Assumed

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed With LZ-78 Compression

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Result 2 Impact of Personalization on Event Filtering

Variation in Heart Rate Readings for Different Individuals

0

50

100

150

200

Time

Hea

rt Ra

te

(bea

ts p

er m

inut

e)

9510

6500

3494

23022018 1902

1184 926

0

10

20

30

40

50

60

70

80

90

100

User 2-1 User 2-2

Dat

a Tr

ansm

itted

Ove

r Net

wor

k (in

KB

)

Uncompressed

Compressed

UncompressedGeneric Context-filtering

Uncompressed Personalized Context filtering

Sensor streams exhibit significant statistical variation across individuals

Improvement from Filtering (S3) to Context-Sensitive Filtering (S4) not significant due to lack of exact pattern match of floating point values in LZ compressor

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

HARMONI in Practice Sensor-based Context

Accelerometer amplitude can be used to classify user into 3 different states sitting walking runningndash Higher lsquonormalrsquo threshold

for 3 distinct states (0-70 70-110 110-170) across all users

ndash Aside accelerometer readings used to classify lsquofallsrsquo for elderly patients

Bandwidth savings between 26-73 for our sample population

Context-Based Filtering

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Chapter 3

The Motivation and Challenges for The Motivation and Challenges for Remote Monitoring and AnalyticsRemote Monitoring and Analytics

HarmoniHarmoni Context Context--based Event based Event Processing on the Mobile HubProcessing on the Mobile Hub

Century Technologies Stream Storage Provenance and QoE

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Century Research Goals and TechnologiesOur research goal is to make Online Health Analytic Infrastructure for healthcare ecosystem to support the various services

Challenges Architectural Goals Technologies amp Approaches

Integrating massive volumes of disparate data

Extensible Data ModelScalable Infrastructure

Need for sophisticated analytics

Growing collaboration across ecosystem

Integrity of data ampTracking the analytic data

Timely feedback

Open and scalable analysis framework

Enterprise sharing of sensor data

Quality of Events (QoE) and provenance support for backtracking amp auditing

Real-time Processing

Large number of concurrent users

Privacy and Security

Scalable device and user management

Protection of personal medical data

Extensible Stream Storage System

Distributed Stream Analysis Runtime

Quality Of Event

Hybrid Provenance System

Interoperability

Privacy amp Security Technology

Stream Processing Platform

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Extensiblendash To new data sources (eg EMG EEG accelerometer Activity)ndash To new analysis algorithms (eg myocardia polyneuropathy Alzheimer)

Scalabilityndash Handles large numbers of patients

Robustnessndash Resilient to sensor or infrastructure failures

bull Exacerbated in remote monitoring settings

Easy to Usendash Separate medical domain knowledge from IT knowledge

Framework built on top of IBM System Sndash state of the art middleware for high-volume stream processing analytics

Programming model centered around System S concept of Processing Element (PE)

ndash PEs represent atomic stream transformersndash Stream data flow with pub-sub semantics

Century also provides useful medical analytic services

ndash Persistence of state (eg analyze HR variations over 6 months)ndash Management of patient data (eg patientrsquos intermediate diagnosis)ndash Computation and rebinding based on the quality of events from medical streams

Century supports an open set of data sourcesndash Currently supporting ECG BP Glucose Weight SPO2 Temperature

Vertically designed systems for the analysis of health monitoring data exist today

ndash Closed custom-built for specific diagnosisndash Non-extensible Support for limited sensor devicesndash Small patient footprint only good for trials

Analysis Framework (1) The Approach

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

IPN

NES

DUP

PSDDFESPASPM

VBFSPA

SPM DFE PSDDataSource

VBF

Source PEWLG SGD PCP JAE DSN

N(CE)S^2

DUPP

IPN

SCF

JoinCCIPN

N(CE)S^2

PFA IPN

SSFJoinSC IPN

N(CE)S^2

SLF

JoinL

IPN

NES

DUP

IPN

ES IPN

JoinL IPN

DataSource

Source PEWLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

SPA

SPM

DFE PSD

INQ GUI

DataSource

Source PE

NES IPNVBF

ldquoMarkrdquo

DUP

WLG

Analysis Component

AnalysisJob

Major Practical BenefitSeparate PE developerrsquos job (medical domain knowledge) from infrastructure issues(scalability fault tolerance stream connections etc)

Current Solutions New Requirements Our Innovations

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Analysis Framework (2) PE Example

bull PE developers need not to be concerned with the communication of Stream Data Objectsbull When an SDO is received by the PE System S routes the SDO to the appropriate PE bull Developers only need to code the logic of the algorithms used by the PE to annotate modify

or create new SDOs

Normalization

Fast FourierTransform

Input ECG

AR PE

AlertGenerator

Output Alert

Neural Network Trained during

initialization

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Century Event Management Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Event Management Service (1)

Annotator

time 1 2 3

XML

Annotator

time 1 2 3

Not extensibleLimited scalability

Fully extensibleBut does not scale to

high event rate

The ProblemPersistence of large amount of streaming dataEfficient query mechanisms for stakeholders

Requirements- Event rate scalability- Resiliency- Extensibility- Expressiveness of the query language

XML

XML

XML

Potential Approaches

1 Relational DB (Annotation table)

2 Relational-XML DB (XML-based Annotation data model)

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Event Management Service (3) Hybrid Storage Model

Key assumptions-Traffic is bursty modeled by a Poisson process- average arrival rate lt average database service rate

time

arrival rateaverage arrival rate

traffic average databaseservice rate

Let μ be the service rate of the databaseLet λ be the arrival rate at the databaseLet ρ = λμ

UnstabilityRegion

Annotator

time 1 2 3

3 Hybrid Storage Model (Stream Buffer + Relational-XML DB)

Stream Buffer How do we designthe stream buffer

XMLXML

XML

XML

D Sow L Lim M Wang A Kim Persisting and Querying Biometric Event Streams with Hybrid Relational-XML DBMS DEBS June 2007

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Event Management Service (4) The Stream Buffer

System S STG(File System Store)

Traffic Monitor

Storage Requests(events of type ldquoeventStorerdquo)

DatabasePE

Traffic Models Traffic

Predictor Event Store Load Monitor

Storage Requests

PE1

PE2

PE3

PE4

PE5SPE

Analysis Graph

Towards the delivery system

Approach- Keep the Relational-XML database in the design for extensibility and efficient query mechanisms - Avoid driving the database in the unstability region- Throttle the traffic arriving at the database

EventStore

(Relational-XML

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2007 IBM Corporation

Century Provenance Service

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation26

An Example of Data Provenance Use

Urgent AlertPatient Doe John

Condition Abnormal ReactionRecommendation

The Century System has found a developing issue in your patientrsquos

medical condition The condition of the patient is deteriorating A known

side effect has emerged Century recommends that you decrease the patientrsquos dosage of the prescribed medication to 10 mg twice a day

Ultimately the physician is responsible for medical decisions and any actions taken In order

for medical professionals to accept the upcoming technology we must provide them with the information they need to make these decisions responsibly

Dr Lee has prescribed a specific medication to patient John Doe for his heart condition

With this drug Dr Lee has also prescribed a program to monitor the effect of the drug on Mr Doe

A few days later Dr Lee receives an alert from the Century prescription program

The information technology that will provide the foundation for this understanding is provenance and quality of event(QoE)

Thatrsquos unusual and a large decrease Before agreeing to

this change I need to understand on what basis the

system has made this recommendation and how

accurate it is

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Provenance (1) Two Approaches to Provenance

f1

f2

f3

Provenance ServerStore

I ran algorithm f1 with parameters uv and w at time t0 I published on port 1

PE1

PE2

I recvd some data on port 2 I ran algorithm f3 on the data at time t1

What is process provenancebull The ability to retrace

which ldquosystem componentsrdquowere on the data path

What is data provenancebull The ability to retrace

which ldquodata elementsrdquo ledto generation of output data

PEj

Sensor

Sensor

Data2

DataOut4

Data1 Data2

I put lsquometadatarsquo inDataout4 saying itwas dependent onData1 and Data2

Data1

Data1

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Provenance (2) Prior Work on Provenance

EU Provenance (PASOA PreServ))

Captures Workflow bindings

Scientific SOA Workflows (KARMA)Workflow and application binding notifications

Stateless application-defined data provenance

Database View Inversion (Trio Widom)Specific Transforms and extensionsto SQL

Data dependencies specified statically for relations

Event and Processing Rates

Stream Provenance (Simhan)

Store temporal history of stream connectors as a per-stream stack

System-level automatic metadata collection at stream-levelTy

pe o

f Pr

oven

ance

Rec

onst

ruct

ion

Proc

ess

prov

enan

ceD

ata

prov

enan

ce

File Systems PASS LFSCaptures system calls and

modifications to file recordsAnnotation per file

Medical Stream Provenance Arbitrary logic for invidual PEs (

statefulness and long memory) Dynamic stream bindings

High proessingthroughput

Century

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Provenance Challenges Unique to Streams

Storage Efficiency High-volume of data implies that ldquoper-recordrdquo annotation

is very heavyweight

Processing Efficiency Generation of timestamps and large amount of metadataper stream event will reduce system throughput

Statefulness Transformations on medical data has long temporal statemdasha single output depends on a large windowof stream data

Dynamic Stream Bindings Streams may be ephemeral as a consequence the

transformation graph may vary with time

We need a provenance system that

Is reactivemdashgenerating provenance events only when things change

Does not require annotating large amounts of metadata per sensor sample

Can accommodate variations in the statefulness model

We need to support both

Process provenance Show me the set of components that are involved in the generation of alert Ei

Data provenance Show me stream segments related to the generation of alert Ei

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

TVC Model of Hybrid Provenance

Associate an lsquoequation-basedrsquo specification of data dependency between every output port of individual PE and the corresponding input portsndash The dependency is thus not specified on every data

element but remains constant for the same processing logic

Provenance data recorded at the granularity of stream ldquosub-setsrdquomdashdependence over sets of time windowsndash output stream of PEi is a function of specific samples (from input

streams) within a designated interval Δ

( ) ( ) UComponent

tttouti

jjjki inps

input to is

ΔminusisinlArr

Processing Element

Si(t) Si(t-1) Si(t-2)

Output stream Si

Sj(t)Sj(t-1)

Sj(t-2)Sj(t-3)

Sk(t)Sk(t-1) Sk(t-2)

Sk(t-3)

Input stream Sj

Input stream Sk

Ei(t) Sj(t-2t) cup Sk (t-1t)ldquohomerdquo2

Ei(t) Sj(t) cup Sj(t-3t-2) cupSk (t-2t-1)

ldquogymrdquo1

Dependency RulePE State

Rule Block ID

TVC Dependency Ruleblocks

State defined by Location Context

The dependency equation can however vary based on

ndash The PErsquos internal state or some external context

IF IF loc= loc= ldquoldquogymgymrdquordquo then generate ALERT then generate ALERT iffiff(AVG(HR(t(AVG(HR(t--1010t)gt140)t)gt140)

ELSE ELSE generate ALERT generate ALERT iffiff (AVG(HR((AVG(HR(tt--30t30t)gt )gt 90)90)

ndash Solution is to allow multiple functions each associated with a distinct range of states

ndash Store the state as metadata with the data

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

Resolving TVC-based Provenance Queries

Reconstruct provenance only in response to corresponding queryndash For efficiency the relevant metadata is

distributed across multiple DBsndash Tradeoff faster provenance storage

processing for higher provenance reconstruction overhead

Determine ID of input ports and causative stream bindings

Retrieve appropriate data elements within specific timesequence window specified by TVC Equation

Apply TVC filter to obtain causative set of elements

PC Dependency Table

PEj f1() Po1

Dynamic Stream Mapping Table

t=10 S1 Pi1t=15 S15 I4t=16 S17 O6

Data Store)

element45 ts=15 SID= 12 state=ldquogymrdquo

element66 ts=16 SID= 17 state=lsquohomersquo

Stream and Port Mapping (SAPM)

Provenance Query Server

1 Store bindings between every input output port and associated streams

2 Store creation modification times of PEs

3 Store TVC function f() for every output port input port

Query (Ei)

1 Obtain IO stream mappings

2 Retrieve Dependency equation

4 Apply equation to determine set of causativeelements

Return set of elements

3 Retrieve elements

M Blount J Davis A Misra D Sow M Wang D Sow L Lim M Wang A Time-and-Value Centric Provenance Model and Architecture for Medical Event Streams Healthnet June 2007

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

Key Research Benefits of TVC Model for Provenance

1 TVC model overhead reduces when1 Monitoring is long-lived (over weeks)2 Per-instance state is minimized3 Number of monitored individuals increases

TVC Model is a new model for provenance specially tuned to the challenges of high-rate stream computing ndash Storage efficiencymdashonly minimal medata

per data element functional specification per PC (only 1-2 overhead compared to ~60-100 for annotation)

ndash Processing efficiencymdash stream and port bindings stored only when stream-level bindings change (O(10msec) provenance latency compared to O(sec) in EU ProvenanceKarma systems)

ndash Statefulness- time and value functions allow compact representation of long dependency windows

ndash Dynamic stream mappings- time-based capture of stream and port bindings (capturing arbitrary data provenance unlike Calder)

Other provenance systems cannot handle O(100 Keventssec) system throughput

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

Quality of Events in Century

USN Gateway

DataTransformer

Sensor DataSender

Sensor DataReception

Toolkit

IDE forApplicationDeveloper

SensorSimulator

Event ManagementClient Library

Patient Portal

Administrator Portal

StakeholderInfo

PatientInfo

DeviceInfo

ApplicationInfo

Provenance Service

ProvenanceStorage Manager

StateData

ProvenanceMetadata

EventStore

Storages

Portal Service

Patient DataManager

Service DataManagement

User Group Data Manager

JobApplicationManager

Event Filter Annotator

Analysis Applications

Analysis Framework

QoEManager

StateManager

GroupManager

DeviceCatalogue

Authenticationamp Authorization

PrivacyManager

PlatformService

SubscriptionService

Event StoreQuery Service

ProvenanceQuery Service

GroupManagement

Service

IHE Adapter

GroupData

PrivacyPolicy

Event StorageManager

EventManagement

Service

EventPreprocessor

Remote AccessManager

CENTURY SERVER INFRASTRUCTURE

Stakeholder Portal

Doctor LifestyleConsultant

HealthcareIT Specialist

GovtCDC

Solution DeliveryServices

ProvenanceQuery Manager

ProvenanceAccess Control

Resolver

Device DataManager

InteroperabilityContainer

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

PE

Quality of Events in Data Streams

Prior work has focused on defining streams in terms of their output data typendash Enables pub-sub model of stream binding

However the quality of any stream (raw sensor or intermediate analytic result) varies dynamically Various types of dynamismndash Sensor (source) impairments Sensor faults (transient or

persistent)ndash Transmission impairments Loss of stream elements or

delayorder reversal of streams

Quality of a streamrsquos elements should become a first-class attribute of a stream processing system It can affectndash Action taken on analysis outcome Doctors may ignore or

adjust their responsiveness to specific types of alertsndash Modification of analysis logic Resolution of analysis

component (eg coefficients of a Fourier Transform or set of features used) may be varied depending on stream quality

ndash Dynamic rebinding of stream graph Rebind to HR readings from the ECG sensor (more-expensive) if the HR readings from the SpO2 sensor dip in quality

t-3 t-5t-4t+1

t t-5t-1t+1

What generated this alert and

can we trusted

ALERT

t

t-2t-1t

t-3 t-4t-2

t-16t-14t-13tt+1

The sensor is faulty and generates

stream elements of low quality

Stream elements are missing

(possibly due to network issues)

Replace

sensor

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

Century QoE Some Basic QoE Measures

t-3 t-5t-4t-2t-1t

PE

t-5t-2t The current

input

The input that

was expectedBy correlating what was expected with what was received we can estimate the quality of our input For example in this particular case can say that the output of the PE has low quality since half of the expected input is missing

t

PE

t-12t-11t-10

t-5t-2t

t

Another possibility is to correlate the timestamps of the inputs to estimate the quality For example in this particular case can say that the output of the PE has low quality since the two inputs are out of sync

t-3 t-5t-4t-2t-1t

PE

The current

inputFinally we can correlate theconsumed input with the useful input(that which is actually used to trigger an output) For example we can say here that the output of the PE has low quality since its generation can be thought of as circumstantial due to the small fraction of useful input stream elements

t

The ldquousefulrdquo input

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM CorporationIBM Research

Open Challenges in Health Stream Collection and DiagnosticsModel driven acquisition of data from sensor feedsndash Many environments (eg retirement homes) exhibit

redundancy in sensor type and coveragendash HR may be obtained at different (accuracyacquisition

cost) from ECG SpO2 BP monitors

Sensors have different time-varying QoE ndash QoE determination should be bidirectional (feedback

from medical professionals on quality of analysis results should affect QoE of upstream sensors)

How to compute QoE of medical streams and how

to refine them based on QoE of other streams

How to adaptively alter acquisition pattern based

on (cost accuracy needs)

How to develop a combination of hybrid model-replay

based provenance

How to define context-dependent access-control

models of provenance data in a multi-provider

environment

Provenance architectures must address prohibitive cost of storing all stream datandash Much of the intermediate analysis can be re-created if

the state of analysis components can be stored

Better techniques for establishing provenance relationshipsndash User-specified models may not always be available

learn provennace from observed behaviorndash Explicit provenance may not be accurate reflection of

analysis logic

Provenance data itself is sensitive and subject to privacy requirementsndash The right to access provenance may depend on the

currnt lsquorolersquo of the provider and the userrsquos medical history

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

Conclusions

Successful transition to large-scale remote monitoring requires significant innovations at both server and relay

Pervasive device must be more than a relaymdashan extended part of the analytic infrastructurendash Key features include adaptive event filtering personalization and

predictive data transmission

Server must not be simply a repository of sensor streams but must provide flexible and scalable analyticsndash Key features include high-volume stream support provenance and

eventinformation quality

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions

IBM Research

copy 2006 IBM Corporation

  • Technology Challenges for Health Monitoring and Automated Stream Analysis
  • Contents of Talk
  • Chapter 1
  • Remote Health Monitoring The Business Motivation
  • Information-based Medicine The Remote Monitoring Roadmap
  • Ubiquitous computing technologies Patient centric network and Healthcare Analytics are the key factors to establish the healt
  • Remote Health Monitoring The Opportunity
  • Chapter 2
  • Harmoni (Healthcare Adaptive Remote Monitoring) Overview
  • HARMONI Architecture Key Components on Mobile Device
  • HARMONI Functional Architecture
  • Pattern Recognition Engine Implementation
  • HARMONI Implementation Platform
  • Impact of HARMONI on Transmission Bandwidth Idealized Context
  • Result 2 Impact of Personalization on Event Filtering
  • HARMONI in Practice Sensor-based Context
  • Chapter 3
  • Century Research Goals and Technologies
  • Analysis Framework (1) The Approach
  • Analysis Framework (2) PE Example
  • Century Event Management Service
  • Event Management Service (1)
  • Event Management Service (3) Hybrid Storage Model
  • Event Management Service (4) The Stream Buffer
  • Century Provenance Service
  • An Example of Data Provenance Use
  • Provenance (1) Two Approaches to Provenance
  • Provenance (2) Prior Work on Provenance
  • Provenance Challenges Unique to Streams
  • TVC Model of Hybrid Provenance
  • Resolving TVC-based Provenance Queries
  • Key Research Benefits of TVC Model for Provenance
  • Quality of Events in Century
  • Quality of Events in Data Streams
  • Century QoE Some Basic QoE Measures
  • Open Challenges in Health Stream Collection and Diagnostics
  • Conclusions