23
Graphical Causal Models: Determining Causes from Observations William Marsh Risk Assessment and Decision Analysis (RADAR) Computer Science

Graphical Causal Models: Determining Causes from Observations William Marsh Risk Assessment and Decision Analysis (RADAR) Computer Science

Embed Size (px)

Citation preview

Graphical Causal Models: Determining Causes from Observations

William MarshRisk Assessment and Decision Analysis

(RADAR)Computer Science

RADAR Group, Computer Science

Risk Assessment and Decision Analysis Research areas

Software engineering, safety, finance, legal A new initiative in medical data analysis:

DIADEM

Norman FentonGroup leader

Martin Neil

http://www.dcs.qmul.ac.uk/researchgp/radar/

Outline

Graphical Causal Models Bayesian networks: prediction or

diagnosis Causal induction: learning causes from

data Causal effect estimation: strength of

causal relationships from data

DIADEM project

Bayesian Nets

Detecting Asthma Exacerbations

Aim to assist early detection of asthma episodes in Paediatric A&E Using only data

already available electronically

Network created by Experts Data

Bayes’ Theorem

)().|()().|(),( APABPBPBAPBAP

Joint probability

)().|()|( APABPBAP

Revised belief about A, given

evidence B

Prior probability of A

Factor to update belief about A, given evidence B

Bayes’ Theorem (Made Easy)

A person has a positive test result How likely is it they are infected? 17%

Infection

Test

yes, no

pos, negFalse positive P(T=pos|I=no) = 5%Negligible false negative

Infection rate: P(I) = 1%

Medical Uses of BNs

Diagnosis Differential diagnosis from symptoms

Prediction Likely outcome

Building a BN From expert knowledge expert

system From data data mining

Beyond Bayesian Networks

Cause versus Association

Both represent fever infection association ‘Causal model’ has arrow from cause to effect

Infection

Fever Infection

Fever

or ?

)().|(

)().|(

),(

FPFIP

IPIFP

FIP

Joint probability same:

Causal Induction

Discover causal relationships from data

Sometimes distinguishable

… different conditional independence

A B C

A B C

Causal Induction – Application

Discover causal relationships from data Need lots of data

Applied to gene regulatory networks Data from micro-array experiments Recent explanation of limitations

Estimating Causal Effects

Suppose A is a cause of B

What is the causal effect? Is it p(B | A) ?

A B

Benefits of Sports?

Is there a relationship between sport and exam success? Data available ‘Intelligence’ correlate

Is this the correct test?

intelligence

sport exam result

P(exam=pass|sport) > P(exam=pass| no-sport)

Benefits of Sports?

When we condition on ‘sport’ Probability for ‘exam result’ Probability for ‘intelligence’ changes

What if I decide to start sport?

p(pass|sport) > p(pass| no-sport)

73% 67%

observe

intelligence

sport exam result

Intervention v Observation

Causal effect differs from conditional probability

Mostly interested in consequence of change Causal effects can be measured by a Randomised

Control Trial Causal effect of sport on exam results not identifiable

change

P(pass|do(sport)) < P(pass| do(no sport))

intelligence

sport exam result

Benefit of Sport

New observable variable ‘attendance at lectures’

Causal effect of sport on exam results now identifiable

sport (S) exam result (E)

intelligence

attendance (A)

SA

SPASEPSAPSdoEP )().,|()|())(|(

Estimating Causal Effects

Rules to convert causal to statistical questions Generalises e.g. stratification, potential outcomes Assumptions: a causal model Some assumptions may be testable

Causal model Some variables observed, others not measured Some causal effects identifiable

Challenges Causal models for complex applications Statistical implications

Example Application

Royal London trauma service Criteria for activation of the trauma team Aim to prevent unnecessary trauma team calls

Extensive records of trauma patient outcomes US study of 1495 admissions proposed new

‘triage’ criteria Significant decrease in overtriage 51% 29% Insignificant increase in undertriage 1% 3% None of the patients undertriaged by new criteria

died Does this show safety of new criteria?

DIADEM Project

Digital Economy in Healthcare

Data Information and Analysis for clinical DEcision Making

EPSRC Digital Economy Cluster

Partnership between solution providers and clinical data analysis problem holders

Summarise unsolved data analysis needs, in relation to the analysis techniques available

Join the DIADEM cluster

Cluster Activities and Outcomes

Engage stakeholders and build a community: Creation of a community web-site and

forum Meetings with potential ‘problem holders’ Workshops

A road map: data and information Follow-up proposal

A self-sustaining website – health data analytics

Summary

Bayesian networks Prediction and diagnosis

Causal induction Identify (some) causal relationships from

(lots of) data Causal effects

Experimental results from … … non-experimental data … assumptions (causal model)

Join the DIADEM cluster