31
MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R- package focused on medical applications

MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Embed Size (px)

Citation preview

Page 1: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

MUDIM (Petr Šimeček, Euromise)

system for multidimensional

compositional models (Radim Jiroušek)

C++ code, distributed as R-package

focused on medical applications

Page 2: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Contents:

idea of conditional independence and (de)composition

possible applications of MUDIM expert system data mining

STULONG dataset

Page 3: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

Page 4: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

Statisticallyconnected

Do storks deliver newborns?

Page 5: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENT

No!

Page 6: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENTconnectedco

nnected

Page 7: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Weather

WEATHERYESTERDAY

WEATHERTOMORROW

WEATHERTODAY

Page 8: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Weather

WEATHERYESTERDAY

WEATHERTODAY

WEATHERTOMORROW

Page 9: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Sample Medical Data

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 10: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Sample Medical Data

(unconditional) statistical connection(correlation) betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 11: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

Page 12: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

Page 13: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

CI – Sample Medical Data

causality betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 14: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Locality - illustrationVariable X

Directly explanatoryvariables for X

Other variables

If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.

Page 15: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

Page 16: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

Page 17: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

Page 18: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

Page 19: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

Page 20: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

)κ(

)κ()π(

)κ()π(

2

3221

3221

X

,XX,XX

,XX,XX

Idea of Compositional Models

Page 21: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Applications – Expert Systems

Causality

What is the distribution of if we know ?

Page 22: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Data Mining

We don’t know “anything”, there are lots of variables and lots of possible

relations between them.

We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).

Page 23: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Data Mining

Variables

Data

Page 24: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Direction of Causality Problem

is equivalent to

are equivalent, but they are notequivalent to

Page 25: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

STULONG Dataset= Dataset containing research data on

cardiovascular disease (1976-79)

1417 patients (Czech middle-aged men)

244 attributes surveyed with each patient at the entry examination

37 selected attributes are described here

Page 26: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

(Incomplete) List of Attributes

AGE MARITAL STATUS EDUCATION OCCUPATION PHISICAL ACTIVITY TRANSPORT TO

JOB SMOKING ALCOHOL TEA AND COFFEE

MYOCARDIAL INFARCTION

HYPERTENSION ICTUS HYPERLIPIDEMIA CHEST PAIN ASTHMA HEIGHT & WEIGHT BLOOD PREASURE …

Page 27: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Graph of Correlated PairsMARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

464 of 666possiblepairs arestatisticallyconnected(p=0.05)

Page 28: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Graph of Correlated Pairs 2

160 of 666possiblepairs arestatisticallyconnected(p=0.05/666)

MARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

Page 29: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

MARIT.STAT

EDUC

RESP

ACT.IN.JOB

ACT.AFTER.JOB

TRANSPORT

TRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA SUGAR

IM

HT

HTD

HTL

DIABET

HYPLIP

PAIN.CHEST PAIN.LL

ASTHMA

HEIGHT

WEIGHT

SYST1DIAST1

SYST2

DIAST2

TRICSUBSC

CHLST

TRIGL URINE

AGE

56arrows

Page 30: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Risk Factors for Hypertension>summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="bino

mial"))

Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 ***IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 ***SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 .---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'

0.1 ` ' 1

Page 31: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical

Risk Factors for Hypertension

Interpretation: HYPERLIPIDEMIA and IM triple odds

of ratio Each three years of AGE double

odds of ratio There is also small, but evincible

connection to skinfold above musculus subscapularis (SUBSC)