MUDIM (Petr Šimeček, Euromise)
system for multidimensional
compositional models (Radim Jiroušek)
C++ code, distributed as R-package
focused on medical applications
Contents:
idea of conditional independence and (de)composition
possible applications of MUDIM expert system data mining
STULONG dataset
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
Statisticallyconnected
Do storks deliver newborns?
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
ENVIRONMENT
No!
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
ENVIRONMENTconnectedco
nnected
CI – Weather
WEATHERYESTERDAY
WEATHERTOMORROW
WEATHERTODAY
CI – Weather
WEATHERYESTERDAY
WEATHERTODAY
WEATHERTOMORROW
CI – Sample Medical Data
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
CI – Sample Medical Data
(unconditional) statistical connection(correlation) betweenthe pair of variables
=
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
CI – Storks & Weather
BIRTH RATESTORK
POPULATION
ENVIRONMENT
YESTERDAY
TODAY
TOMORROW
CI – Storks & Weather
BIRTH RATESTORK
POPULATION
ENVIRONMENT
YESTERDAY
TODAY
TOMORROW
CI – Sample Medical Data
causality betweenthe pair of variables
=
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
Locality - illustrationVariable X
Directly explanatoryvariables for X
Other variables
If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.
Applications – Expert Systems
Causality
Applications – Expert Systems
Causality
Applications – Expert Systems
Causality
Applications – Expert Systems
Causality
Applications – Expert Systems
Causality
)κ(
)κ()π(
)κ()π(
2
3221
3221
X
,XX,XX
,XX,XX
Idea of Compositional Models
Applications – Expert Systems
Causality
What is the distribution of if we know ?
Data Mining
We don’t know “anything”, there are lots of variables and lots of possible
relations between them.
We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).
Data Mining
Variables
Data
Direction of Causality Problem
is equivalent to
are equivalent, but they are notequivalent to
STULONG Dataset= Dataset containing research data on
cardiovascular disease (1976-79)
1417 patients (Czech middle-aged men)
244 attributes surveyed with each patient at the entry examination
37 selected attributes are described here
(Incomplete) List of Attributes
AGE MARITAL STATUS EDUCATION OCCUPATION PHISICAL ACTIVITY TRANSPORT TO
JOB SMOKING ALCOHOL TEA AND COFFEE
MYOCARDIAL INFARCTION
HYPERTENSION ICTUS HYPERLIPIDEMIA CHEST PAIN ASTHMA HEIGHT & WEIGHT BLOOD PREASURE …
Graph of Correlated PairsMARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB
TRANSPORTTRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA
SUGAR
IM
HTHTD
HTLDIABETHYPLIPPAIN.CHESTPAIN.LL
ASTHMAHEIGHT
WEIGHT
SYST1
DIAST1
SYST2
DIAST2
TRIC
SUBSC
CHLST
TRIGL
URINEAGE
464 of 666possiblepairs arestatisticallyconnected(p=0.05)
Graph of Correlated Pairs 2
160 of 666possiblepairs arestatisticallyconnected(p=0.05/666)
MARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB
TRANSPORTTRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA
SUGAR
IM
HTHTD
HTLDIABETHYPLIPPAIN.CHESTPAIN.LL
ASTHMAHEIGHT
WEIGHT
SYST1
DIAST1
SYST2
DIAST2
TRIC
SUBSC
CHLST
TRIGL
URINEAGE
MARIT.STAT
EDUC
RESP
ACT.IN.JOB
ACT.AFTER.JOB
TRANSPORT
TRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA SUGAR
IM
HT
HTD
HTL
DIABET
HYPLIP
PAIN.CHEST PAIN.LL
ASTHMA
HEIGHT
WEIGHT
SYST1DIAST1
SYST2
DIAST2
TRICSUBSC
CHLST
TRIGL URINE
AGE
56arrows
Risk Factors for Hypertension>summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="bino
mial"))
Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 ***IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 ***SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 .---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1
Risk Factors for Hypertension
Interpretation: HYPERLIPIDEMIA and IM triple odds
of ratio Each three years of AGE double
odds of ratio There is also small, but evincible
connection to skinfold above musculus subscapularis (SUBSC)