Upload
edward-silas-miller
View
217
Download
0
Embed Size (px)
Citation preview
1
Finding Fuzzy Approximate Dependencies within STULONG
Data
Discovery Challenge, ECML/PKDD 2003September 22-27, 2003
Berzal F., Cubero J.C., Sanchez D., Serrano J.M., Vila M.A.
University of Granada (Spain)
2 Discovery Challenge – ECML/PKDD 2003
Introduction KDD allow us to obtain relations within
data. Non-trivial. Previously unknown. Potentially useful.
Fuzzy data KDD tools and techniques extensions.
3 Discovery Challenge – ECML/PKDD 2003
Problem representation Fuzzy relational database.
aij values: Numeric, scalar (nominal), linguistic labels.
Membership degrees. Fuzzy similarity relations, SA1, ..., SAm.t# A1 A2 ... Am
t1 a11, t1(A1) a12, t1(A2) ... a1m, t1(Am)
t2 a21, t2(A1) a22, t2(A2) ... a2m, t2(Am)
t3 a31, t3(A1) a32, t3(A2) ... a3m, t3(Am)
… … ... …
4 Discovery Challenge – ECML/PKDD 2003
Fuzzy Approximate Dependencies We define Fuzzy Approximate Dependencies
relaxing some properties in Functional Dependencies,
V W t,s t[V] = s[V] t[W] = s[W]
Equality relaxation
Considering linguistic labels and membership degrees
Universal quatifier
relaxation (exceptions
allowing)
5 Discovery Challenge – ECML/PKDD 2003
FAD Measures Relevance degree:
Support, supp(VW) Fulfilment degrees:
Confidence, conf(VW) Certainty factor, CF(VW) [Shortliffe and
Buchanan, 1975] Measures belief degree variations. CF(VW) = 1 Maximum increment (Perfect positive). CF(VW) = –1 Maximum decrement. CF(VW) = 0 Statistical independence.
6 Discovery Challenge – ECML/PKDD 2003
Applications Fuzzy Databases. Approximate Dependencies Discovery. Functional Dependencies Discovery. Other applications:
Low granularity data. Overlapping semantics.
7 Discovery Challenge – ECML/PKDD 2003
STULONG Database Entry Table.
Normal Group (attribute KONSKUP having values 1 or 2).
Risk Group (attribute KONSKUP having values 3 or 4).
Pathologic Group (value 5 for attribute KONSKUP).
8 Discovery Challenge – ECML/PKDD 2003
Data Preprocessing (I) Problem: Semantic overlapping in
symbolic or scalar attributes. Similarity fuzzy relations (subjective). I.e.: DOPRAVA (Means of transport for
getting to work):by bike
public means
car not stated
on foot 0.4 0.3 0.3 0.0
by bike 0.3 0.3 0.0
public means
0.4 0.0
9 Discovery Challenge – ECML/PKDD 2003
Data Preprocessing (II) Problem: High granularity in numeric
attributes. Linguistic labels sets definition starting from
intervals. Numeric value <Label, degree>
P.e.: BMI (Body mass index):1
25.0 25.1224.73
thin overweight
10 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (I) Dependencies between social factors and
physical activity.
ROKVSTUP STAV VZDELANI ZODPOV
TELAKTZA 0.67/0.14
0.24/0.37 0.25/0.28
AKTPOZAM 0.14/0.47 0.58/0.28
0.14/0.49 0.18/0.47
DOPRAVA 0.20/0.32 0.64/0.14
0.19/0.32 0.26/0.32
DOPRATRV 0.17/0.47 0.57/0.22
0.16/0.46 0.21/0.44
11 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (II) Dependencies between social factors and
smoking.
ROKVSTUP STAV VZDELANI ZODPOV
KOURENI 0.68/0.07
DOBAKOUR 0.64/0.11
0.26/0.25
BYVKURAK 0.10/0.64 0.42/0.39
0.09/0.65 0.13/0.64
12 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (III) Dependencies between social factors and
alcohol consumption.ROKVSTUP STAV VZDELANI ZODPOV
ALKOHOL 0.21/0.35 0.63/0.15 0.19/0.34 0.24/0.31
PIVO10 0.16/0.43 0.58/0.21 0.16/0.43 0.21/0.41
PIVO12 0.10/0.62 0.47/0.39 0.10/0.62 0.13/0.61
VINO 0.16/0.43 0.58/0.21 0.16/0.44 0.21/0.41
LIHOV 0.16/0.43 0.58/0.21 0.16/0.43 0.20/0.41
PIVOMN 0.21/0.33 0.65/0.14 0.20/0.32 0.24/0.29
VINOMN 0.20/0.33 0.64/0.15 0.19/0.33 0.24/0.31
LIHMN 0.20/0.31 0.64/0.14 0.19/0.30 0.25/0.29
13 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (IV) Dependencies between social factors and
physical features.
ROKVSTUP STAV VZDELANI ZODPOV
BMI 0.16/0.44 0.58/0.23
0.15/0.45 0.20/0.42
SYST1 0.65/0.12
0.25/0.26
DIAST1 0.19/0.32 0.63/0.14
0.19/0.32 0.24/0.30
SYST2 0.65/0.12
0.25/0.25
DIAST2 0.19/0.33 0.63/0.15
0.18/0.33 0.23/0.30
14 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (V) Dependencies between physical activity
and smoking.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
KOURENI 0.50/0.11 0.45/0.13
DOBAKOUR 0.27/0.24 0.47/0.18 0.30/0.24 0.42/0.19
BYVKURAK 0.13/0.62 0.26/0.51 0.15/0.51 0.23/0.55
15 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (VI) Dependencies between physical activity
and alcohol consumption.TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
ALKOHOL 0.27/0.31 0.46/0.23 0.29/0.30 0.41/0.25
PIVO10 0.22/0.39 0.40/0.30 0.24/0.39 0.35/0.33
PIVO12 0.14/0.59 0.29/0.50 0.16/0.59 0.23/0.50
VINO 0.22/0.40 0.40/0.31 0.24/0.39 0.35/0.33
LIHOV 0.22/0.39 0.39/0.30 0.24/0.38 0.35/0.33
PIVOMN 0.27/0.29 0.46/0.21 0.30/0.29 0.42/0.24
VINOMN 0.27/0.31 0.46/0.23 0.28/0.30 0.41/0.24
LIHMN 0.27/0.28 0.46/0.21 0.29/0.27 0.41/0.23
16 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (VII) Dependencies between physical activity
and physical features.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
BMI 0.21/0.41 0.39/0.32 0.23/0.40 0.34/0.34
SYST1 0.27/0.26 0.46/0.19 0.29/0.25 0.42/0.21
DIAST1 0.25/0.29 0.44/0.22 0.28/0.29 0.39/0.23
SYST2 0.27/0.25 0.47/0.18 0.29/0.24 0.42/0.20
DIAST2 0.25/0.29 0.45/0.22 0.27/0.29 0.39/0.24
17 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (VIII) Dependencies between physical activity
and cholesterol degrees.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
CHLST 0.28/0.24 0.47/0.17 0.30/0.23 0.42/0.19
TRIGL 0.49/0.13 0.45/0.14
18 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (IX) Dependencies between alcohol
consumption and physical features.BMI SYST1 DIAST1 SYST2 DIAST2
ALKOHOL 0.40/0.24 0.25/0.30 0.28/0.29 0.24/0.31 0.28/0.29
PIVO10 0.35/0.33 0.21/0.39 0.38/0.24 0.20/0.40 0.24/0.38
PIVO12 0.25/0.52 0.14/0.60 0.16/0.59 0.13/0.60 0.17/0.58
VINO 0.35/0.32 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
LIHOV 0.35/0.33 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
PIVOMN 0.41/0.23 0.25/0.28 0.29/0.27 0.25/0.29 0.29/0.27
VINOMN 0.40/0.24 0.25/0.30 0.28/0.28 0.24/0.30 0.28/0.28
LIHMN 0.41/0.22 0.25/0.28 0.29/0.27 0.24/0.28 0.29/0.27
19 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (X) Dependencies between alcohol
consumption and smoking.KOURENI DOBAKOUR BYVKURAK
ALKOHOL 0.23/0.30 0.61/0.15
PIVO10 0.13/0.44 0.20/0.40 0.56/0.22
PIVO12 0.08/0.65 0.13/0.60 0.44/0.40
VINO 0.13/0.44 0.20/0.40 0.56/0.22
LIHOV 0.13/0.44 0.20/0.40 0.56/0.22
PIVOMN 0.23/0.28 0.61/0.14
VINOMN 0.23/0.30 0.61/0.15
LIHMN 0.24/0.28 0.62/0.14
20 Discovery Challenge – ECML/PKDD 2003
Analytical Questions (XI) Dependencies between skin folds and BMI,
[TRIC] [BMI], supp 15.85%, CF 0.54 [SUBSC] [BMI], supp 17.28%, CF 0.58
21 Discovery Challenge – ECML/PKDD 2003
Concluding Remarks FAD’s allow us to discover relations within
imprecise or uncertain data. Experts aid is desirable.
Data preprocessing. Results interpretation.