40
= j X j 2 d j 2 +2 j<k X j X k d j d k - " = j=1..n (X j 2 - X j 2 )d j 2 + +(2 j=1..n<k=1..n (X j X k - X j X k )d j d k ) V(d)≡VarDPP d (X) = (Xod) 2 - ( Xod ) 2 = i=1..N ( j=1..n x i,j d j ) 2 - ( j=1..n X j d j ) 2 N 1 - ( j X j d j ) ( k X k d k ) = i ( j x i,j d j ) ( k x i,k d k ) N 1 = i j x i,j 2 d j 2 + j<k x i,j x i,k d j d k - j X j 2 d j 2 +2 j<k X j X k d j d k N 1 N 2 V(d)= 2a 11 d 1 + j1 a 1j d j 2a 22 d 2 + j2 a 2j d j : 2a nn d n + jn a nj d j Heuristic1: d o =e kk k s.t. a k max. Choose d 1 (V(d 0 )) Choose d 2 (V(d 1 )) til F(d k ) stable. + jk a jk d j d k V(d)= j a jj d j 2 subject to i=1..n d i 2 =1 d T o VX o d = VarDPP d X≡V V i X i X j -X i X ,j : d 1 ... d n V(d)= ij a ij d i d j = d 1 : d n ij a ij d i d j V(d) = 2a 11 a 12 ... a 1n a 21 2a 22 ... a 2n : ' a n1 ... 2a nn d 1 : d i : d n = A o d x 1 x 2 : x N x 1 od x 2 od x N od = X o d = F d (X)=DPP d (X d 1 d n FAUST CLUSTER Where are we at? Perfecting FAUST Clustering (using distance dominated functional gap analysis). Primary functional is DPP d (x). Sequence through grid d's until good gap is found? (expensive?) Heuristic for picking a "great" d? Optimizes the variance of F(X)? Why? Because if there is low dispersion, there can't be lots of large gaps. But, just because there is high dispersion, does not mean there IS a large gap (simple example follows). The best starting d would be one that maximizes the maximum consecutive difference within the [sorted] array, F(X). A candidate "good" heuristic is to find the d that maximizes | Mean(F(X)) - Median(F(X)) | but the latter (so far) seems difficult to calculate? We can estimate it with F(VectorOfMedians)=F(VOM) which we can calculate.

= j X j 2 d j 2 +2 j Publish Cody Fields, Modified 2 years ago

Embed Size (px)

Citation preview

Page 1: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

= jXj

2 dj2 +2

j<kXjXkdjdk - "

=

j=1..n(Xj

2

- Xj2)dj

2 ++(2j=1..n<k=1..n

(XjXk - XjXk)djdk )

V(d)≡VarDPPd(X) = (Xod)2 - ( Xod )2

= i=1..N

(j=1..n xi,jdj)

2 - ( j=1..n

Xj dj )2

N 1

- (jXj dj) (

kXk dk) =

i(

j xi,jdj) (

k xi,kdk) N 1

= i

jxi,j

2dj2 +

j<k xi,jxi,kdjdk

-

jXj

2dj2 +2

j<k XjXkdjdk N 1

N2

V(d)= 2a11d1 +j1a1jdj

2a22d2+j2a2jdj

:

2anndn +jnanjdj

Heuristic1: do=ekk k s.t. ak max.

Choose d1≡(V(d0))

Choose d2≡(V(d1)) til F(dk) stable.

+ jkajkdjdkV(d)=jajjdj2

subject to i=1..ndi2=1

dT o VX o d = VarDPPdX≡V

V

i XiXj-XiX,j

:

d1 ... dn V(d)=ijaijdidj=d1:

dn

ijaijdidjV(d) =

2a11 a12 ... a1n

a21 2a22 ... a2n

:' an1 ... 2ann

d1

:di

:dn

= A o d

x1

x2

:xN

x1odx2od

xNod=

X o d = Fd(X)=DPPd(X)d1

dn

FAUST CLUSTER

Where are we at?Perfecting FAUST Clustering (using distance dominated functional gap analysis).

Primary functional is DPPd(x).

Sequence through grid d's until good gap is found? (expensive?)

Heuristic for picking a "great" d?

Optimizes the variance of F(X)?Why? Because if there is low dispersion, there can't be lots of large gaps. But, just because there is high dispersion, does not mean there IS a large gap (simple example follows).

The best starting d would be one that maximizes the maximum consecutive difference within the [sorted] array, F(X).

A candidate "good" heuristic is to find the d that maximizes | Mean(F(X)) - Median(F(X)) | but the latter (so far) seems difficult to calculate? We can estimate it with F(VectorOfMedians)=F(VOM) which we can calculate.

Page 2: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

CONC4150(C,W,FA,A)

d=(0010)(-.33,-.09,.86,.37)FX Ct gp4 (F-MN)/8 0 1 2 2 1 2 4 2 2 6 2 2 8 1 1 9 1 110 1 111 1 213 2 215 1 217 1 522 1 123 3 124 1 125 1 126 3 127 1 128 2 129 2 130 3 131 2 132 1 133 1 134 1 135 2 136 8 137 7 138 7 139 3 140 3 141 3 142 2 244 7 145 3 146 4 147 8 148 7 149 5 150 6 151 10 152 6 153 2 154 6 155 3 156 1 157 3 259 1 261 2 162 2 163 1 265 2

akk=(104320, 30955, 605471, 4683)

(-.15,-.1,.93,-.32) F(d)=12002(-.2, -.1, .91,-.35) F(d)=12318(-.24,-.1, .9,-.36) F(d)=12493(-.27,-.1,.88,-.37) F(d)=12598

( -.3 ,-.1,.87,-.37) F(d)=12665

___ ___ [0,22) 0L 14M 0H C1

d=HillClimbedUntGradient ( 0, 0, 1, 0 ) F(d)=11376

(-.33,-.09,.86,-.37) F(d)=12708(-.33,-.09,.86,-.37) F(d)=12736

___ ___ [22,66) 43L 38M 55H C2

106405 30207 613481 3653 akk d1 d2 d3 d4 V(d) 0.00 0.00 1.00 0.00 9430-0.03 -0.09 0.98 -0.16 10581-0.06 -0.10 0.96 -0.24 10991-0.09 -0.10 0.95 -0.28 11178-0.12 -0.10 0.94 -0.31 11281-0.15 -0.10 0.93 -0.32 11347-0.18 -0.10 0.92 -0.32 11395-0.22 -0.10 0.92 -0.32 11432-0.25 -0.09 0.91 -0.32 11461-0.30 -0.09 0.89 -0.32 11486-0.30 -0.09 0.89 -0.32 11506-0.33 -0.09 0.88 -0.32 11522-0.36 -0.09 0.87 -0.32 11535-0.38 -0.08 0.86 -0.32 11545-0.40 -0.08 0.85 -0.32 11552-0.42 -0.08 0.85 -0.32 11557-0.44 -0.08 0.84 -0.31 11559-0.46 -0.08 0.83 -0.31 11560-0.48 -0.08 0.82 -0.31 11559C2X Ct gp4 (F-MN)/8 0 1 1 1 1 2 3 1 1 4 2 2 6 1 1 7 1 2 9 1 211 1 314 1 115 1 419 2 120 1 121 1 122 1 123 2 124 1 125 3 126 2 127 2 128 2 129 2 130 2 131 1 132 6 133 8 134 7 135 3 136 3 137 2 138 2 139 3 241 5 142 3 143 7 144 4 145 6 146 3 147 5 148 3 149 4 150 7 151 6 152 4 153 2 154 2 256 1 157 2 259 4 160 1

___ ___ [0,19) 0L 10M 1H C2.1

___ ___ [19,61) 43L 28M 54H C2.2

104883 29672 618463 2618 akk d1 d2 d3 d4 V(d)0 0 1 0 82330.01 -0.08 0.99 -0.11 87570.02 -0.09 0.98 -0.15 88620.03 -0.09 0.98 -0.16 88940.05 -0.09 0.98 -0.17 89060.06 -0.10 0.98 -0.17 89150.08 -0.10 0.98 -0.17 89220.10 -0.10 0.98 -0.17 89320.13 -0.10 0.97 -0.16 89450.16 -0.10 0.97 -0.16 89620.20 -0.10 0.96 -0.16 89860.24 -0.10 0.95 -0.15 90190.29 -0.10 0.94 -0.15 90630.35 -0.10 0.92 -0.14 91210.41 -0.10 0.90 -0.13 91950.48 -0.10 0.86 -0.12 92860.55 -0.10 0.82 -0.11 93930.62 -0.10 0.77 -0.10 95130.69 -0.10 0.71 -0.08 96380.76 -0.09 0.64 -0.07 97610.81 -0.09 0.58 -0.05 98730.86 -0.08 0.51 -0.04 99690.89 -0.08 0.45 -0.03 100470.92 -0.07 0.39 -0.02 101080.94 -0.07 0.33 -0.00 101530.96 -0.07 0.29 0.00 101860.97 -0.06 0.25 0.01 102090.98 -0.06 0.21 0.02 102260.98 -0.06 0.18 0.02 102370.99 -0.06 0.16 0.03 102440.99 -0.05 0.14 0.03 102490.99 -0.05 0.12 0.04 102520.99 -0.05 0.11 0.04 102540.99 -0.05 0.10 0.04 102550.99 -0.05 0.09 0.04 102560.99 -0.05 0.08 0.04 10256C2.2X Ct gp3 (F-MN)/8 0 2 1 1 1 2 3 3 1 4 2 1 5 1 1 6 4 1 7 4 1 8 3 1 9 5 110 2 111 7 112 3 113 3 114 3 115 2 318 1 119 5 120 3 121 1 122 3 123 4 124 2 125 1 227 1 128 15 129 5 231 3 132 3 133 1 134 2 236 8 137 10 340 1 141 1 142 7 749 1 150 2

___ ___ [0,18) 32L 13M 0H C2.2.1

___ ___ [18,40) 11L 10M 47H C2.2.2

___ ___ [40,49) 0L 3M 6H C2.2.3

___ ___ [49,51) 0L 2M 1H C2.2.3

The method fails on CONCRETE4150.

On the next slide I investigate whether that failure might become a success if a different starting point is used.

I will try using d=akk/|akk|

Page 3: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

CONC4150(C,W,FA,A)

104320 30955 605471 4683 akk d1 d2 d3 d4 V(d) 0.17 0.05 0.98 0.01 9327 0.17 0.05 0.98 0.01 9327 0.12 -0.09 0.98 -0.15 10920 0.06 -0.11 0.96 -0.26 11594 0.01 -0.11 0.94 -0.32 11973-0.05 -0.11 0.93 -0.35 12210-0.10 -0.11 0.92 -0.37 12373-0.15 -0.10 0.91 -0.38 12489-0.19 -0.10 0.90 -0.38 12575-0.23 -0.10 0.89 -0.38 12638-0.27 -0.10 0.88 -0.38 12684-0.30 -0.10 0.87 -0.38 12717-0.33 -0.09 0.86 -0.37 12741-0.35 -0.09 0.85 -0.37 12757-0.38 -0.09 0.85 -0.37 12767-0.40 -0.09 0.84 -0.36 12773 FX Ct gp4 (F-MN)/8 0 1 2 2 1 2 4 1 1 5 1 2 7 2 1 8 1 1 9 1 211 2 112 1 113 1 215 1 217 1 421 1 223 3 124 1 327 4 128 2 129 3 130 1 131 3 132 1 133 1 134 2 135 3 136 5 137 8 138 7 139 4 140 2 141 5 142 2 143 1 144 1 145 5 146 8 147 2 148 8 149 10 150 3 151 5 152 6 153 5 154 2 155 8 156 3 157 1 158 2 260 2 262 2 163 2 164 2

___ ___ [0,21) 0L 14M 1H C1

CONCRETE4150.

try starting at d=akk/|akk|

From here is fails again.

Page 4: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

IRIS4150

3414 933 1398 144 akk d1 d2 d3 d4 V(d)0.90 0.24 0.37 0.04 1800.43 -0.02 0.87 0.24 4110.27 -0.05 0.93 0.24 4080.24 -0.05 0.94 0.23 405 FX Ct gp4 (F-MN)/8 0 2 3 3 5 1 4 5 1 5 14 1 6 11 1 7 6 1 8 1 1 9 5 110 1 515 1 823 1 225 2 227 1 229 1 130 2 131 1 132 2 133 2 134 5 135 3 136 2 137 2 138 4 139 3 140 6 141 2 142 4 143 2 144 6 145 4 146 7 147 1 148 1 149 3 150 4 151 3 152 3 153 2 154 3 155 4 156 1 157 3 259 1 160 2 161 1 263 1 265 1 166 2 268 1

starting at d=akk/|akk|

___ ___ [0,23) 50set 0vers 1virg C1

___ ___ [2369) 0set 50vers 49virg C2

3925 824 2408 280 akk d1 d2 d3 d4 V(d)0.84 0.18 0.51 0.06 980.64 0.14 0.72 0.21 1160.84 0.18 0.51 0.06 980.64 0.14 0.72 0.21 1160.55 0.13 0.79 0.23 1170.52 0.13 0.81 0.24 116C2X Ct gp3 (F-MN)/8 0 1 1 1 2 2 3 1 4 7 2 1 8 2 1 9 1 110 2 111 2 112 2 113 5 114 4 115 2 116 2 117 1 118 4 119 1 120 4 121 6 122 4 123 6 124 4 125 3 126 1 127 3 128 2 129 4 130 3 131 2 132 3 133 3 134 1 135 4 136 1 137 1 138 1 139 1 140 1 141 1 142 1 345 1 146 2 248 2

___ ___ [0,7) 0set 4vers 0virg C2.1

___ ___ [7,45) 0set 4vers 44virg C2.2 0set 0vers 5virg C2.3

3894 828 2384 282 akk d1 d2 d3 d4 V(d)0.17 0.05 0.98 0.01 550.35 0.10 0.90 0.24 720.42 0.12 0.85 0.29 76C21XCt gp2 (F-MN)/8 0 1 1 1 1 1 2 2 1 3 2 1 4 2 1 5 3 1 6 3 1 7 6 2 9 2 110 3 111 2 112 3 113 4 114 4 115 4 116 5 117 6 118 4 119 2 221 4 223 7 124 1 125 4 126 1 127 3 128 3 129 1 130 2 131 1 132 1 133 1 134 1 135 1

___ ___ [0,7) 0set 19vers 1virg C2.2.1

___ ___ [7,21) 0set 32vers 13virg C2.2.1___ ___ [21,23) 0set 0vers 4virg C2.2.3

___ ___ [23,36) 0set 0vers 27virg C2.2.4

3939 1024 3988 232 akk d1 d2 d3 d4 V(d)0.69 0.18 0.70 0.04 80630.06 0.12 0.99 -0.01 128800.05 0.11 0.99 -0.01 128480.05 0.11 0.99 -0.01 12847

C21XCt gp2 (F-MN)/8 0 1 1313 1 215 1 116 1 117 1 825 1 126 1 228 2 432 1 335 2 237 1 138 1 139 2 140 1 444 2 549 2 251 1 556 2 460 1 161 1 263 1 164 1 266 2 369 2 271 1 273 1 174 1 175 1 277 2 683 1 285 3 186 1 288 2

___ ___ [0,7) 0set 5vers 0virg C2.2.1

___ ___ [7,35) 0set 8vers 1virg C2.2.1

___ ___ [35,60) 0set 14vers 0virg C2.2.1

___ ___ [60,69) 0set 5vers 1virg C2.2.1

___ ___ [69,83) 0set 2vers 6irg C2.2.1

___ ___ [83,89) 0set 2vers 5irg C2.2.1

Page 5: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 0 0 0 0 0 0 0 1 0 5 0 0 0 0 0 2 0 5 2 0 0 0 0 3 0 5 2 3 0 0 0 4 0 5 4 3 6 0 0median 5 0 5 4 3 6 9 0 6 0 5 6 6 6 9 10 7 0 5 6 6 6 9 10 8 0 5 8 6 9 9 10 9 0 5 8 9 9 9 10 10 10 10 10 10 10 10 10std 3.16 2.87 2.13 3.20 3.35 3.82 4.57 4.98variance 10.0 8.3 4.5 10.2 11.2 14.6 20.9 24.8mean 5.00 0.91 5.00 4.55 4.18 4.73 5.00 4.55

consecutive 1 0 5 0 0 0 0 0differences 1 0 0 2 0 0 0 0 1 0 0 0 3 0 0 0 1 0 0 2 0 6 0 0 1 0 0 0 0 0 9 0 1 0 0 2 3 0 0 10 1 0 0 0 0 0 0 0 1 0 0 2 0 3 0 0 1 0 0 0 3 0 0 0 1 10 5 2 1 1 1 0

avgCD 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00maxCD 1.00 10.00 5.00 2.00 3.00 6.00 9.00 10.00|mean-median| 0.00 0.91 0.00 0.55 1.18 1.27 4.00 4.55

Mean(DPPdX) = j=1..n

Xjdj

Finding a good unit vector, d, for Dot Product functional, DPP. to maximize gaps

Mean(DPPdX) = (1/N)i=1..N

j=1..n xi,jdj

X X1...Xj ...Xn

x1

x2

.

.

.xi xi,j

xN

x1odx2od

xiod

xNod

d1

dn

=

X mm d = DPPd(X)

subjected to i=1..n di2= 1

Maximize wrt d, |Mean(DPPd(X)) - Median(DPPd(X)|

Mean(DPPdX) = j=1..n

( (1/N)i=1..N xi,j ) dj

But how do we compute Median(DPPd(X) ? We want to use only pTree processing.We want to end up with a formula involving d and numbers only (like the one above for the mean (involves only the vector d and the numbers X1 ,..., Xn )

A heuristic is to substitute the Vector of Medians (VOM) for Median(DPPd(X)???

Should we maximize variance?

MEAN-MEDIAN picks out the last two sequences, which have the best gaps (discounting outlier gaps at the extremes).

Sooo...

Page 6: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 1 5 1 7 2 13 2 16 3 17 1 19 1 20 2 21 1 22 1 23 2 24 5 25 1 26 2 27 1 28 2 29 1 30 2 31 1 32 2 33 1 34 2 35 4 36 3 37 2 38 1 39 2 40 1 41 2 42 1 43 1 44 1 45 4 46 3 47 4 48 3 49 2 50 1 51 2 53 3 54 3 56 3 57 2 58 4 59 1 60 1 61 1 63 1 65 1 66 1 68 1 70 2 76 1100 1102 1103 1105 1107 3108 6109 2110 8111 8112 5113 7114 1115 3116 1117 1119 1120 1

_________[0.88) 50 versicolor 48 virginica CLUS_1

50 setosa 2 virginica CLUS_2

VOM mean on IRIS150(SL,SW,PL,PW) and spread out F values using G=(F-minF)*2 (since F-minF ranges over [0,60]).

Splitting at the MaxGap=24 first, [76,100]

Split at next hi MaxGap=6, [7,13], [70,76] (outliers)

Split at thinning [39,44]

_________[0.39) 2 vericolor 44 virginica CLUS_1.1

5 versicolor 3 virginica CLUS_1.2 . [45,76) 42 versicolor 1 virginica CLUS_1.3

10 111 112 213 714 1215 1416 717 418 119 230 133 235 236 137 138 139 340 541 342 443 244 445 846 347 548 349 550 451 852 253 254 255 356 657 358 359 260 261 363 164 166 167 269 1

_________[0.25) 50 setosa 1 virginica CLUS1

CLUS2

Choosing d=ek to be the k w max std (Here d=e3 = ePL on IRIS150

_________[25,49) 46 versicolor 2 virginica CLUS2.1

CLUS2.2

_________[49,54) 4 versicolor 17 virginica CLUS2.2.1

CLUS2.2.2

_________[54,70) 4 versicolor 30 virginica

d=VOMMN DPP on IRIS_150_SEI_(SL,SW,PL,PW). CLUS1.1 F[0,39] 44_Virginica with 2_Versicolor errors; CLUS1.3 F[45,76] 42_Versicolor with 1_Virginica error; CLUS2 F[80,120] 50_Setosa with 2_Virginica errors; and CLUS1.2 F[39,44], if classified as 5_Versicolor has 3_Virginica errors. So the classification accuracy is 142/150 or 94.6%

CLUS2.2 F[49,69] 47_Virginica with 4_Versicolor errors; CLUS2.1 F[25,48] 46_Versicolor with 2_Virginica errors; CLUS1 F[0, 25] 50_Setosa with 1 Virginica error. So the classification accuracy is 143/150 or 95.3%

IRIS

Page 7: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 1 1 4 2 4 3 5 4 3 5 8 6 7 7 1 8 7 9 2 10 5 11 4 12 5 13 7 14 3 15 3 17 2 18 5 19 4 20 3 21 1 22 4 23 4 24 5 25 1 26 2 27 1 29 2 30 1 31 1 32 1 34 1 36 1 37 3 38 1 39 2 40 1 43 6 44 4 46 2 47 1 48 1 50 1 52 1 56 1 60 1 63 1 65 2 67 1 72 1 74 1 82 1 83 1 85 1 86 1 87 2 88 1 99 1105 1113 1119 1

_________[0.94) 57 low 89 high CLUS_1

0 low 4 high CLUS_2

VOMmean on WINE150hl(FA,FS,TS,AL)

_________[0 .78) 57 low 82 high CLUS_1.1

[78,94) 0 low 7 high CLUS1.2

7 1 8 4 9 4 10 5 11 4 12 7 13 7 14 8 15 2 16 5 17 4 18 5 19 7 20 3 21 3 23 2 24 9 25 3 26 1 27 4 28 4 29 5 30 1 31 2 32 1 34 2 35 1 36 1 37 1 39 1 41 1 42 3 43 1 44 2 45 1 47 6 48 4 49 2 50 1 51 1 53 1 55 1 59 1 63 1 65 1 67 2 69 1 74 1 75 1 84 1 85 1 86 1 87 1 88 2 89 1100 1106 1113 1119 1

_________[0.95) 57 low 89 high CLUS1 4 high CLUS2

STDs=(1.9,9,23,1.2)maxSTD=23 for d=eTS on WN150hl(FA,FS,TS,AL

_________[0 .70) 57 low 80 high CLUS_1.1.1 [70,78) 0 low 2 high CLUS 1.1.2

_________[0 .58) 57 low 75 high CLUS_1.1.1.1 [58,70) 0 low 5 high CLUS 1.1.1.2

_________[0 . 31) 57 low 47 high CLUS_1.1.1.1.1 [31,58) 0 low 28 high CLUS 1.1.1.1.2

_________[0 . 16) 57 low 12 high CLUS 1.1.1.1.1 [16,31) 0 low 35 high CLUS 1.1.1.1.2

_________[0 . 10) 42 low 0 high CLUS_1.1.1.1.1.1 [10,16) 15 low 12 high CLUS 1.1.1.1.1.2

d=VOMMEAN DPP on WINE_150_HL_(FA,FSO2,TSO2,ALCOHOL). Some agglomeration required: CLUS1.1.1.1.1.1 is LOW_Quality F[0,10], else HIGH Quality F[13,119] with 15 LOW error.Classification accuracy = 90% (if it had been cut 13, 99.3% accuracy!)

_________[0.80) 57 low 89 high CLUS1.1 [80,95] 7 high CLUS1.2

_________[0.72) 57 low 89 high CLUS1.1.1 [72,80] 2 high CLUS1.1.2

_________[0 .60) 57 low 75 high CLUS_1.1.1.1 [60,72) 0 low 5 high CLUS 1.1.1.2

_________[0 . 33) 57 low 44 high CLUS_1.1.1.1.1 [33,60) 0 low 31 high CLUS 1.1.1.1.2

_________[0 . 22) 57 low 12 high CLUS_1.1.1.1.1 [22,33) 0 low 32 high CLUS 1.1.1.1.2

_________[0 . 16) 42 low CLUS 1.1.1.1.1.1 [16,22) 15 low 12 high CLUS 1.1.1.1.1.2

_________[0 . 19) 56 low . [19,22) 1 low 12 high But no algorithm would pick 19 as a cut!___ _____[0 . 13) 56 low

[13,16) 1 low 12 high But no alg would pick a 13 cut!

Identical cuts and accuracy! Tells us that d=eTotal_SO2 is responsible for all separation.

WINE

Page 8: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 1 3 1 6 4 9 311 112 714 115 316 217 1020 1022 124 125 126 227 128 429 131 332 334 335 437 238 239 340 141 242 443 544 146 548 249 251 353 154 456 357 359 560 164 166 167 169 170 271 172 173 375 478 881 684 185 287 188 1

_________[0.62) 50 Kama 16 Rosa 50 Canada CLUS_1 [62,89) 0 Kama 34 Rosa 0 Canada CLUS_2

VOMmean w G=DPP(xod*10) SEED4150(AREA,LENKER,ASYMCOEF,LENKERGRV)

11 1812 2513 1814 1815 1516 1317 818 819 2120 221 4

STDs=(2.9, .6, 1.6, .5))maxSTD=2.9 for e1

d=eA SEED4150(A,LK,AC,LCG)

11 errors, so accuracy = 93%

But that's the only thinning! Therefore, we are unable to separate Kama and Canada at all.

_________[0.19) 0 Kama 0 Rosa 33 Canada CLUS_1.1 [19,62) 50 Kama 16 Rosa 17 Canada CLUS_1.2_________[19,23) 0 Kama 0 Rosa 11 Canada CLUS_1.2.1 [23,62) 50 Kama 16 Rosa 6 Canada CLUS_1.2.2

_________[23,30) 6 Kama 0 Rosa 4 Canada CLUS_1.2.2.1 [30,62) 44 Kama 16 Rosa 2 Canada CLUS_1.2.2.2_________[30,33) 5 Kama 0 Rosa 1 Canada CLUS_1.2.2.2.1 [33,62) 39 Kama 16 Rosa 1 Canada CLUS_1.2.2.2.2_________[33,36) 6 Kama 0 Rosa 1 Canada CLUS_1.2.2.2.2.1 [36,62) 33 Kama 16 Rosa 0 Canada CLUS_1.2.2.2.2.2

_________[36,45) 18 Kama 2 Rosa 0 Canada CLUS_1.2.2.2.2.2.1 [45,62) 15 Kama 14 Rosa 0 Canada CLUS_1.2.2.2.2.2.2

_________[45,50) 8 Kama 1 Rosa 0 Canada CLUS_1.2.2.2.2.2.2.1_________ [50,52) 0 Kama 3 Rosa 0 Canada CLUS_1.2.2.2.2.2.2.2.1

_________ [52,55) 3 Kama 2 Rosa 0 Canada CLUS_1.2.2.2.2.2.2.2.2.1

_________ [55,58) 3 Kama 3 Rosa 0 Canada CLUS_1.2.2.2.2.2.2.2.2.2.1 [58,62) 1 Kama 5 Rosa 0 Canada CLUS 1.2.2.2.2.2.2.2.2.2.2

_________[0.17) 49 Kama 8 Rosa 50 Canada CLUS_1 [17,22) 1 Kama 42 Rosa 0 Canada CLUS_2

[0,14) 1 Kama 42 Canada .But no algorithm would pick 14 as a cut!

[13,14) 10 Kama 8 Canada .That's either 8 or 10 errorsand no algorithm would cut at 14.

[14,15) 18 Kama .But no algorithm would cut at 15.

[15,16) 13 Kama 2 Rosa no alg w cut.[16,17) 7 Kama 6 Risa

SEEDS

Page 9: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

_______ . =24 0 Lo 8 Med 0 Hi CLUS_5

0 1 1 1 5 1 6 1 7 1 8 4 9 1 10 1 11 2 12 1 13 5 14 1 15 3 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 25 4 26 8 27 7 28 7 29 10 30 3 31 1 32 3 33 6 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 49 1 56 1 58 1 61 1 65 1 66 1 69 1 71 1 77 1 80 1 83 1 86 1100 1103 1105 1108 2112 1

_________[0.90) 43 Low 46 Medium 55 High CLUS_1 [90,113) 0 Low 6 Medium 0 High CLUS_2

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag) 0 4 6 3 7 2 12 10 13 2 14 3 18 9 20 4 22 5 23 3 24 5 27 3 31 3 36 4 41 2 42 1 43 4 44 3 46 3 48 2 49 2 55 13 58 8 60 6 62 5 65 4 71 16 72 4 74 3 82 4 83 7 97 2100 1

STD=(101,28,99,81) d=e1 Conc4150

Entirely inconclusive using e1 !

______ [52,90) 0 Low 11 Medium 0 High CLUS_1.2

. [39,52) 0 Low 12 Medium 1 High CLUS_1.1.2

[36,39) 1 Low 1 Medium 2 High CLUS_1.1.1.2

[31,36) 3 Low 7 Medium 11 High CLUS_1.1.1.1.2

[23,31) 25 Low 4 Medium 19 High CLUS_1.1.1.1.1.2

[18,23) 6 Low 1 Medium 13 High CLUS_1.1.1.1.1.1.2

[14,18) 5 Low 2 Medium 4 High CLUS_1.1.1.1.1.1.1.2

_________[0,9) 2 Low 5 Medium 5 High CLUS_1.1.1.1.1.1.1.1.1 [9,14) 1 Low 3 Medium 5 High CLUS_1.1.1.1.1.1.1.1.2

________[0.80) 43Low 46Medium 55High CLUS_1 [80,101) 0 Low 7 Medium 7 High CLUS_2

______ [52,80) 4 Low 17 Medium 38High CLUS_1.2

. [39,52) 5 Low 5 Medium 7 High CLUS_1.1.2

[31,39) 1 Low 3 Medium 3 High CLUS_1.1.1.2

[16,31) 18 Low 11 Medium 0 High CLUS_1.1.1.1.2

[9,16) 8 Low 7 Medium 0 High CLUS_1.1.1.1.1.2

0 15 3 2 5 4 17 3 19 8 21 1 29 3 41 28 46 3 47 8 48 3 52 4 53 15 58 3 62 4 63 4 64 1 65 7 67 3 69 4 72 3 73 12 75 2 78 5 83 1100 4

d=e3 Conc4150

________ [0.32) 4 Low 24 Medium 8 High CLUS_1 [32,101) 39 Low 28 Medium 47 High CLUS_2

________ [0.9) 3 Low 16 Medium 2 High CLUS_1.1 [9,32) 1 Low 8 Medium 6 High CLUS_1.2

________ [32,55) 21 Low 12 Medium 28 High CLUS_2.1 [55,101) 1 Low 8 Medium 6 High CLUS_2.2

Inconclusive on e2!

0 17 1 11 3 12 6 3513 2522 2524 844 767 489 291 4

d=e4 Conc4150

________=67 0 Lo 4 Med 0 Hi CLUS_1________=89 0 Lo 4 Med 0 Hi CLUS_2________=91 0 Lo 4 Med 0 Hi CLUS_4

________=44 0 Lo 7 Med 0 Hi CLUS_3

_______ . =22 0 Lo 6 Med 19 Hi CLUS_6________=13 3 Lo 3 Med 19 Hi CLUS_7________=6 13 Lo 5 Med 17 Hi CLUS_8________=3 12 Lo 0 Med 0 Hi CLUS_9________=1 2 Lo 9 Med 0 Hi CLUS_10________=0 13 Lo 4 Med 0 Hi CLUS_11

e4 accuracyrate = 104/150=69%

0 2 3 4 8 410 212 813 415 416 1417 318 119 320 822 1524 427 929 330 631 332 333 734 435 837 538 353 23

d=e2 Conc4150

________=53 1 Lo 22 Med 0 Hi CLUS_1

________[0,5) 3 Lo 1 Med 2 Hi CLUS_2________=8 0 Lo 1 Med 3 Hi CLUS_3________=10 0 Lo 2 Med 0 Hi CLUS_4

________[10,14) 1 Lo 4 Med 9 Hi CLUS_5

________[14,21) 9 Lo 9 Med 15 Hi CLUS_6

________[21,25) 3 Lo 1 Med 15 Hi CLUS_7________[25,28) 5 Lo 1 Med 3 Hi CLUS_8

________[28,36) 14 Lo 12 Med 8 Hi CLUS_9 [36.40) 7 Lo 1 Med 0 Hi CLUS_10

e2 accuracy=93/150=62%

VOM2,4MN2,4

0 2 1 5 2 6 3 12 4 2 5 1 6 6 7 6 8 1 9 4 10 12 11 11 12 5 13 3 18 2 19 9 20 10 21 4 29 1 30 4 31 9 32 4 34 9 35 4 36 1 62 2 64 5 93 4121 2125 4

[93,m) 0L 10M 0H C1

[50,93) 0L 7 M 0H C2

[33,50) 0L 14 M 0H C3

[25,33) 0L 0 M 18H C4

[15,25) 2L 4 M 19H C5

=13 0L 3 M 0H C6=12 5L 0 M 0H C7=11 5L 2 M 4H C8

[9,11) 3L 0 M 13H C9

[5,9) 14L 0M 0H C10

=4 0L 2M 0H C11

[2,4) 11L 6M 1H C12

=1 1L 4M 0H C13=0 2L 0M 0H C14

Accuracy= 127/150=85%

CONCRETE

Page 10: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Concrete4150(C, W, FA, Ag): Redo without cheating ;-) Even though the accuracy is high, no algorithm would make all of those cuts. VOM2,4MN2,4

0 2 1 5 2 6 3 12 4 2 5 1 6 6 7 6 8 1 9 4 10 12 11 11 12 5 13 3 18 2 19 9 20 10 21 4 29 1 30 4 31 9 32 4 34 9 35 4 36 1 62 2 64 5 93 4121 2125 4

[78,m) 0L 10M 0H C1 ((gap=29)

[50,78) 0L 7 M 0H C2 (gap=26)

[25,50) 0L 14 M 18H C3 (gap=8)

[15,25) 2L 4 M 19H C4 (gap=5)

Cut only at gaps 5 on first round. Then we iteratively repeat on each subcluster. C1 and C2 accuracy=100%, so we skip them and concentrate on C3,C4,C5 to see if a second round will purify them. Start with C5: (F-MN)/4

[0,15) 41L 17 M 18H C5

0 2 6 3 7 2 12 6 13 2 14 2 18 1 19 3 21 3 23 4 24 2 25 1 26 3 27 1 29 2 33 4 37 1 41 2 42 1 43 2 44 3 46 1 49 2 54 3 58 2 60 3 61 2 66 3 69 1 77 4 78 1 87 1 99 2110 1 =110 0L 0M 1H (outlier)

= 99 0L 0M 2H (outliers)= 87 0L 0M 1H (outlier)

_______ [77,83) 0L 0M 5H C1

=0 2L 0M 0H (outliers)

_______[1,10) 5L 0M 0H C2

________ [63,73) 0L 0M 4H C3

________ [52,66) 0L 5M 5H C4

[56,59) 0L 2M 0H '[59,66) 0L 0M 5H(These will show up when we get to gaps of 4 and 2 (actual gap sizes 16 and 8)

[52,56) 0L 3M 0H '

_______[10,16) 10L 0M 0H C5

= 33 4L 0M 0H c6= 37 4L 0M 0H c7

________ [48,52) 0L 2M 0H C3

________ [45,48) 0L 1M 0H outlier

________ [39,45) 0L 8M 0H c8

_______[19,20) 3L 0M 0H C9

_______[20,32) 16L 0M 0H C10

Accuracy=100% on C5!

C3 next: (F-MN)/4 0 2 7 1 11 1 13 1 16 1 18 1 21 1 22 2 23 1 25 1 31 2 32 1 34 1 36 1 41 2 47 3 50 1 54 1 68 1 70 1 72 1 94 1107 1119 1121 1126 1

________ [83,127) 5M 0H C1

________ [61,83) 2M 1H C2 (68=H, separated at gap=2)

=0 2M 0H (outliers)

________ [44,61) 1M 4H C3 (50=M, separated at gap=3)

________ [31,44) 1M 6H C4 (one 31=M so 1 error!)

=7 0M 1H (outliers)

_______[9,15) 2M 0H C10

_______[15,20) 0M 2H C10

=25 1M 0H (outliers)_______[20,24) 0M 4H C10

Accuracy=100% on C1 and C2! 1 error on C1,2,3,5

C4 next: (F-MN)/3 0 1 5 1 10 2 11 1 13 1 18 1 21 1 27 1 29 1 33 1 35 1 36 1 40 1 42 2 47 1 49 1 51 1 58 1 59 1 64 1 66 1 68 1117 1 =117 0L 1M 0H outlier

_______ [55,83) 1L 3M 1H C1 {58,59} are the L and H: doubleton outlier set bdd by gaps of 7 and 5

_______ [0,24) 0L 0M 8H C2

_______ [45,55) 0L 0M 3H C3

_______ [38,45) 0L 0M 3H C4

_______ [24,31) 1L 0M 1H C5 (29=L outlier with gaps 2,4) [31,38) 0L 0M 3H C6

I need to redo this using all 4 attributes.

Another issue is: How can we follow this with an agglomeration step which might glue the intra-class subclusters back together?

Agglomerate after FAUST Gap Clustering using "separation of the subcluster medians" [or means?] as the measure?!?!

So there is but 1 error (in the C3 step) for an accuracy of 149/150=99.3%.However, I realized I am still cheating ;-( How would I know to do as the first round instead of ? VOM2,4MN2,4

VOM1,2,3,4MN1,2,3,4

CONCRETE

Page 11: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 1 1 1 5 1 6 1 7 1 8 4 9 1 10 1 11 2 12 1 13 5 14 1 15 3 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 25 4 26 8 27 7 28 7 29 10 30 3 31 1 32 3 33 6 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 49 1 56 1 58 1 61 1 65 1 66 1 69 1 71 1 77 1 80 1 83 1 86 1100 1103 1105 1108 2112 1

________ [0.90) 43L 46 M 55H gap=14 [90,113) 0L 6M 0H CLUS_1

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag)

______ gap=6 [74,90) 0L 4M 0H CLUS_2

Redo with all 4 attributes and Fgap5 (which is actually gap=5*4=20).

______ CLUS 4 gap=7 [52,74) 0L 7M 0H CLUS_3

CLUS 4 (F=(DPP-MN)/2, Fgap2 0 3 7 4 9 110 1211 812 715 418 1021 322 723 225 226 327 128 229 131 332 134 240 447 352 153 354 355 456 257 358 160 261 262 464 467 268 171 772 379 585 187 2

______ gap=7 =79 5L 0M 0H CLUS_4.1.1 gap=6 Median=79 Avg=79 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=87 Avg=86.3

_______ =0 0L 0M 3H CLUS 4.4.1 gap=7 Median=0 Avg=0 =7 0L 0M 4H CLUS 4.4.2 gap=2 Median=7 Avg=7 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H Median=11 Avg=10.7

gap=2 [30,33] 0L 4M 0H CLUS 4.2.1 gap=2 Median=31 Avg=32.3 =34 0L 2M 0H CLUS 4.2.2 gap=6 Median=34 Avg=34______ =40 0L 4M 0H CLUS_4.2.3 gap=7 Median=40 Avg=40 =47 0L 3M 0H CLUS_4.2.4 gap=5 Median=47 Avt=47

gap=3 [70,79) 10L 0M 0H CLUS_4.5 Median=71 Avg=71.7

gap=2______ =64 2L 0M 2H CLUS 4.6.1 gap=3 Median=64 Avg=64 2 H errs in L [66,70) 10L 0M 0H CLUS 4.6.2 Median=67 Avg=67.3

gap=3______ =15 0L 0M 4H CLUS 4.3.1 gap=3 Median=15 Avg=15 =18 0L 0M 10H CLUS 4.3.2 gap=3 Median=18 Avg=18

Accuracy=90%

______ [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 Median=22 Avg=22 2H errs in L [24,30) 10L 0M 0H CLUS_4.7.1 Median=26 Avg=26

______ [50,59) 12L 1M 4H CLUS 4.8.1 gap=2 Median=55 Avg=55 1M+4H errs in L [59,63) 8L 0M 0H CLUS_4.8.2 Median=61.5 Avg=61.3

Agglomerate (build dendogram) by iteratively gluing together clusters with min Median separation.Should I have normalize the rounds?Should I have used the same Fdivisor and made sure the range of values was the same in 2nd round as it was in the 1st round (on CLUS 4)?Can I normalize after the fact, I by multiplying 1st round values by 100/88=1.76?Agglomerate the 1st round clusters and then independently agglomerate 2nd round clusters?

C1 C2 C3 C4

_____________At this level, FinalClus1={17M} 0 errors

med=62

med=33

med=17

med=71

med=23

med=21

med=9

med=34

med=57

med=86

med=71

med=10

med=56

med=14

med=61

med=18

med=40

Let's review agglomerative clustering in general next (dendograms)

CONCRETE

Page 12: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Hierarchical Clustering

Any maximal anti-chain (maximal set of nodes in which no 2 are directly connected) is a clustering (a dendogram offers many).

A

B C

BC

D E

DE

F G

FG

DEFGABC

Page 13: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Hierarchical Clustering

But the “horizontal” anti-chains are the clusterings resulting from thetop down (or bottom up) method(s).

Page 14: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

0 1 1 1 5 1 6 1 7 1 8 4 9 1 10 1 11 2 12 1 13 5 14 1 15 3 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 25 4 26 8 27 7 28 7 29 10 30 3 31 1 32 3 33 6 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 49 1 56 1 58 1 61 1 65 1 66 1 69 1 71 1 77 1 80 1 83 1 86 1100 1103 1105 1108 2112 1

________ [0.90) 43L 46 M 55H gap=14 [90,113) 0L 6M 0H CLUS_1

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag)

______ gap=6 [74,90) 0L 4M 0H CLUS_2

______ CLUS 4 gap=7 [52,74) 0L 7M 0H CLUS_3

CLUS 4 (F=(DPP-MN)/2, Fgap2 0 3 7 4 9 110 1211 812 715 418 1021 322 723 225 226 327 128 229 131 332 134 240 447 352 153 354 355 456 257 358 160 261 262 464 467 268 171 772 379 585 187 2

______ gap=7 =79 5L 0M 0H CLUS_4.1.1 gap=6 Median=79 Avg=79 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=87 Avg=86.3

_______ =0 0L 0M 3H CLUS 4.4.1 gap=7 Median=0 Avg=0 =7 0L 0M 4H CLUS 4.4.2 gap=2 Median=7 Avg=7 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H Median=11 Avg=10.7

gap=2 [30,33] 0L 4M 0H CLUS 4.2.1 gap=2 Median=31 Avg=32.3 =34 0L 2M 0H CLUS 4.2.2 gap=6 Median=34 Avg=34______ =40 0L 4M 0H CLUS_4.2.3 gap=7 Median=40 Avg=40 =47 0L 3M 0H CLUS_4.2.4 gap=5 Median=47 Avt=47

gap=3 [70,79) 10L 0M 0H CLUS_4.5 Median=71 Avg=71.7

gap=2______ =64 2L 0M 2H CLUS 4.6.1 gap=3 Median=64 Avg=64 2 H errs in L [66,70) 10L 0M 0H CLUS 4.6.2 Median=67 Avg=67.3

gap=3______ =15 0L 0M 4H CLUS 4.3.1 gap=3 Median=15 Avg=15 =18 0L 0M 10H CLUS 4.3.2 gap=3 Median=18 Avg=18

Accuracy=90%

______ [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 Median=22 Avg=22 2H errs in L [24,30) 10L 0M 0H CLUS_4.7.1 Median=26 Avg=26

______ [50,59) 12L 1M 4H CLUS 4.8.1 gap=2 Median=55 Avg=55 1M+4H errs in L [59,63) 8L 0M 0H CLUS_4.8.2 Median=61.5 Avg=61.3

Agglomerate (build dendogram) by iteratively gluing together clusters with min Median separation.Should I have normalize the rounds?Should I have used the same Fdivisor and made sure the range of values was the same in 2nd round as it was in the 1st round (on CLUS 4)?Can I normalize after the fact, I by multiplying 1st round values by 100/88=1.76?Agglomerate the 1st round clusters and then independently agglomerate 2nd round clusters?C1 C2 C3 C4

_____________At this level, FinalClus1={17M} 0 errors

med=62

med=33

med=17

med=71

med=23

med=21

med=9

med=34

med=57

med=86

med=71

med=10

med=56

med=14

med=61

med=18

med=40

Suppose we know (or want) 3 clusters, Low, Medium and High Strength. Then we find Suppose we know that we want 3 strength clusters, Low, Medium and High. We can use an anti-chain that gives us exactly 3 subclusters two ways, one show in brown and the other in purpleWhich would we choose? The brown seems to give slightly more uniform subcluster sizes.Brown error count: Low (bottom) 11, Medium (middle) 0, High (top) 26, so 96/133=72% accurate. The Purple error count: Low 2, Medium 22, High 35, so 74/133=56% accurate.What about agglomerating using single link agglomeration (minimum pairwise distance?

CONCRETE

Page 15: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

CLUS 4 (F=(DPP-MN)/2, Fgap2 0 3 7 4 9 110 1211 812 715 418 1021 322 723 225 226 327 128 229 131 332 134 240 447 352 153 354 355 456 257 358 160 261 262 464 467 268 171 772 379 585 187 2

______ gap=7 =79 5L 0M 0H CLUS_4.1.1 gap=6 Median=79 Avg=79 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=87 Avg=86.3

_______ =0 0L 0M 3H CLUS 4.4.1 gap=7 Median=0 Avg=0 =7 0L 0M 4H CLUS 4.4.2 gap=2 Median=7 Avg=7 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H Median=11 Avg=10.7

gap=2 [30,33] 0L 4M 0H CLUS 4.2.1 gap=2 Median=31 Avg=32.3 =34 0L 2M 0H CLUS 4.2.2 gap=6 Median=34 Avg=34______ =40 0L 4M 0H CLUS_4.2.3 gap=7 Median=40 Avg=40 =47 0L 3M 0H CLUS_4.2.4 gap=5 Median=47 Avt=47

gap=3 [70,79) 10L 0M 0H CLUS_4.5 Median=71 Avg=71.7

gap=2______ =64 2L 0M 2H CLUS 4.6.1 gap=3 Median=64 Avg=64 2 H errs in L [66,70) 10L 0M 0H CLUS 4.6.2 Median=67 Avg=67.3

gap=3______ =15 0L 0M 4H CLUS 4.3.1 gap=3 Median=15 Avg=15 =18 0L 0M 10H CLUS 4.3.2 gap=3 Median=18 Avg=18

Accuracy=90%

______ [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 Median=22 Avg=22 2H errs in L [24,30) 10L 0M 0H CLUS_4.7.1 Median=26 Avg=26

______ [50,59) 12L 1M 4H CLUS 4.8.1 gap=2 Median=55 Avg=55 1M+4H errs in L [59,63) 8L 0M 0H CLUS_4.8.2 Median=61.5 Avg=61.3

Agglomerating using single link (min pairwise distance = min gap size! (glue min-gap adjacent clusters 1st)

The first thing we can notice is that outliers mess up agglomerations which are supervised by knowledge of the number of subclusters expected. Therefore we might remove outliers by backing away from all gap5 agglomerations, then looking for a 3 subcluster max anti-chains.

What we have done is to declare F<7 and F>84 as extreme tripleton outliers sets; and F=79. F=40 and F=47 as singleton outlier sets because they are F-gapped by at least 5 (which is actually 10) on either side.

The brown gives more uniform sizes. Brown errors: Low (bottom) 8, Medium (middle) 12 and High (top) 6, so 107/133=80% accurate.

The one decision to agglomerate C4.7.1 to C4.7.2 (gap=3) instead of C4.3.2 to C4.7.2 (gap=3) lots of error. C4.7.1 and C4.7.2 are problematic since they are separate out, but in increasing F order, it's H M L M L, so if we suspected this pattern we would look for 5 subclusters.

The 5 orange errors in increasing F-order are: 6, 2, 0, 0, 8 so 127/133=95% accurate.

If you have ever studied concrete, you know it is a very complex material. The fact that it clusters out with a F-order pattern of HMLML is just bizarre! So we should expect errors. CONCRETE

Page 16: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Redo: Weight d with |MN-VOM|/Len F Ct gap (gap5) 0 2 1 2 1 2 2 124 1 2225 2 126 1 145 2 1946 2 147 1 148 2 166 7 1867 2 168 1 169 3 170 9 171 2 172 7 173 1 175 1 276 1 177 5 178 2 179 7 180 8 181 2 182 1 183 3 184 14 185 1 186 6 187 16 188 7 189 1 190 4 191 6 192 4 193 8 194 3 195 1 1

___ ___ [0,20) 0L 6M 0H C1

CONCRETE

meanVOM w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag)

___ ___ [20,30) 0L 4M 0H C2

___ ___ [30,40) 0L 7M 0H C3 C4

C4 Ct gap (range is ~2 so gaps3) 0 1 20 1 20 29 1 9 36 1 7 37 1 1 42 1 5 44 1 2 45 1 1 46 2 1 49 2 3 51 1 2 53 1 2 54 2 1 56 1 2 57 2 1 58 3 1 59 3 1 60 2 1 61 4 1 62 5 1 63 2 1 64 4 1 65 2 1 66 2 1 67 2 1 68 4 1 69 2 1 70 1 1 71 3 1 72 1 1 74 1 2 75 1 1 76 4 1 77 6 1 79 1 2 80 1 1 81 2 1 83 6 2 84 1 1 85 3 1 86 1 1 87 1 1 88 10 1 89 1 1 90 1 1 91 1 1 92 1 1 93 2 1 94 5 1 95 3 1 96 3 1 97 1 1 98 1 1 99 4 1100 2 1102 2 2103 3 1107 2 4112 1 5118 1 6122 2 4

___ ___ [0,30) 0L 3M 0H C41

___ ___ [30,40) 0L 2M 0H C42

___ __ . [120,123) 0L 2M 0H C43

___ __ . [115,120) 0L 0M 1H C44

___ __ . [110,115) 0L 1M 0H C45

___ __ . [105,110) 0L 0M 2H C46

___ ___ [40,43) 0L 0M 1H C47

___ ___ [43,48) 0L 4M 0H C48 [48,105) 43L 23M 51H C49

C49=(f-mn)/5 F Ct gap (same range so gaps3) 0 110 1 1011 1 112 1 113 3 114 1 117 2 318 1 119 1 120 2 122 4 223 3 124 5 125 4 126 2 127 4 128 1 129 6 130 2 131 2 132 1 133 1 135 1 236 1 137 3 138 2 139 4 140 2 141 2 142 2 143 2 144 5 145 2 146 1 147 3 148 4 149 6 150 3 152 2 254 8 255 2 156 2 157 2 158 1 160 3 261 1 164 2 374 1 1076 1 2

___ ___ [0,5) 0L 1M 0H C491

___ ___ [5,15) 2L 4M 1H C492

___ ___ [15,62) 41L 17M 47H C493___ ___ [62,70) 0L 0M 2H C494

___ ___ [70,77) 0L 1M 1H C495 1 err

This uncovers the fact that repeated applications of meanVOM can be non-productive when each applications basically removes sets of outliers at the extremes of the F-value array (because when outliers are removed, the VOM may move toward the mean).

Page 17: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Xx1 x2 1 2 3 4 5 6 7 8 9 a b1 1 1 1=q3 1 2 32 2 3 2 43 3 45 2 5 59 3 615 1 7 f14 2 815 3 9 6 p d13 4 a b10 9 b c e1110 c9 11 d a1111 e 87 8 f 7 9

The 15 Count_Arrays

z1 2 2 4 1 1 1 1 2 1

z2 2 2 4 1 1 1 1 2 1

z3 1 5 2 1 1 1 1 2 1

z4 2 4 2 2 1 1 2 1

z5 2 2 3 1 1 1 1 1 2 1

z6 2 1 1 1 1 3 3 3

z7 1 4 1 3 1 1 1 2 1

z8 1 2 3 1 3 1 1 2 1

z9 2 1 1 2 1 3 1 1 2 1

za 2 1 1 1 1 1 4 1 1 2

zb 1 2 1 1 3 2 1 1 1 2

zc 1 1 1 2 2 1 1 1 1 1 1 2

zd 3 3 3 1 1 1 1 2

ze 1 1 2 1 3 2 1 1 2 1

zf 1 2 1 1 2 1 2 2 2 1

The 15 Value_Arrays (one for each q=z1,z2,z3,...)

z1 0 1 2 5 6 10 11 12 14

z2 0 1 2 5 6 10 11 12 14

z3 0 1 2 5 6 10 11 12 14

z4 0 1 3 6 10 11 12 14

z5 0 1 2 3 5 6 10 11 12 14

z6 0 1 2 3 7 8 9 10

z7 0 1 2 3 4 6 9 11 12

z8 0 1 2 3 4 6 9 11 12

z9 0 1 2 3 4 6 7 10 12 13

za 0 1 2 3 4 5 7 11 12 13

zb 0 1 2 3 4 6 8 10 11 12

zc 0 1 2 3 5 6 7 8 9 11 12 13

zd 0 1 2 3 7 8 9 10

ze 0 1 2 3 5 7 9 11 12 13

zf 0 1 3 5 6 7 8 9 10 11

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Level0, stride=z1 PointSet (as a pTree mask)

z1z2z3z4z5z6z7z8z9zazbzczdzezf

gap: [F=2, F=5]

APPENDIXFunctional Gap Clustering using Fpq(x)=RND[(x-p)o(q-p)/|q-p|-minF] on Spaeth image (p=avg

z13111110000000000

z12000001000000001

z11000000111111110

pTree masks of the 3 z1_clusters (obtained by ORing)

The FAUST algorithm: 1. project onto each pq line using the dot product with the unit vector from p to q.

2. Generate ValueArrays (also generate the CountArray and the mask pTrees). 3. Analyze all gaps and create sub-cluster pTree Masks.

gap: [F=6, F=10]

Fp=MN,q=z1=0F=1

F=2

Page 18: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Zz1 1 1z2 3 1z3 2 2z4 3 3z5 6 2z6 9 3z7 15 1z8 14 2z9 15 3za 13 4zb 10 9zc 11 10zd 9 11ze 11 11zf 7 8

F=zod 11 27 23 34 53 80118114125114110121109125 83

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

p2 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0

p1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 1

p0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

p2' 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1

p1' 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0

p0' 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0

0 &p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

C=1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

C=2

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

C=1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

C=1

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

&p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

C=0

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

C=2

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

C=2

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

C=6

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

0 &p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

C=3

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

C=3

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

&p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

C=2

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

C=2

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

C=2

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

C=2

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

C=8

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

C=8

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

0p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0C=5

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0C=5

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0C=5

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0C=5

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1C10

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1C10

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1C10

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1C10

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

[000 0000, 000 1111]= [0,15]=[0,16) has 1 point, z1. This is a 24 thinning. z1od=11 is only 5 units from the right edge, so z1 is not declared an outlier)

Next, we check the min dis from the right edge of the next interval to see if z1's right-side gap is actually 24 (the calculation of the min is a pTree process - no x looping required!)

p=

Gap Revealer Width 24

so compute all pTree combinations down to p4 and p'4 d=M-p

[010 0000 , 010 1111] = [32,48).z4od=34 is within 2 of 32, so z4 is not declared an anomaly.

[011 0000, 011 1111] = [ 48, 64). z5od=53 is 19 from z4od=34 (>24) but 11 from 64. But the next int [64,80) is empty z5 is 27 from its right nbr. z5 is declared an outlier and we put a subcluster cut thru z5

[100 0000 , 100 1111]= [64, 80). This is clearly a 24 gap.

1 z1 z2 z72 z3 z5 z83 z4 z6 z94 za5 M 6 78 zf9 zba zcb zd zec0 1 2 3 4 5 6 7 8 9 a b c d e f

[001 0000, 001 1111] = [16,32). The minimum, z3od=23 is 7 units from the left edge, 16, so z1 has only a 5+7=12 unit gap on its right (not a 24 gap). So z1 is not declared a 24 (and is declared a 24 inlier).

[101 0000 , 101 1111]= [80, 96). z6od=80, zfod=83

[110 0000 , 110 1111]= [96,112). zbod=110, zdod=109. So both {z6,zf} declared outliers (gap16 both sides.

[111 0000 , 111 1111]= [112,128) z7od=118 z8od=114z9od=125 zaod=114zcod=121 zeod=125No 24 gaps. But we can consult SpS(d2(x,y) for actual distances:

X1 X2 dX1X2z7 z8 1.4z7 z9 2.0z7 z10 3.6z7 z11 9.4z7 z12 9.8z7 z13 11.7z7 z14 10.8

z8 z9 1.4z8 z10 2.2z8 z11 8.1z8 z12 8.5z8 z13 10.3z8 z14 9.5

X1 X2 dX1X2z9 z10 2.2z9 z11 7.8z9 z12 8.1z9 z13 10.0z9 z14 8.9

z10 z11 5.8z10 z12 6.3z10 z13 8.1z10 z14 7.3

X1 X2 dX1X2z11 z12 1.4z11 z13 2.2z11 z14 2.2

z12 z13 2.2z12 z14 1.0

z13 z14 2.0

Which reveals that there are no 24 gaps in this subcluster.

And, incidentally, it reveals a 5.8 gap between {7,8,9,a} and {b,c,d,e} but that analysis is messy and the gap would be revealed by the next xofM round on this sub-cluster anyway.

Page 19: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Barrel Clustering: (This method attempts to build barrel-shaped gaps around clusters)Allows for a better fit around convex clusters that are elongated in one direction (not round).

p

Gaps in dot product lengths [projections] on the line.

y

q

barrelcap gap

width

barrel radius gap width

Exhaustive Search for all barrel gaps:It takes two parameters for a pseudo- exhaustive search (exhaustive modulo a grid width).1. A StartPoint, p (an n-vector, so n dimensional)2. A UnitVector, d (a n-direction, so n-1 dimensional - grid on the surface of sphere in Rn).

Then for every choice of (p,d) (e.g., in a grid of points in R2n-1) two functionals are used to enclose subclusters in barrel shaped gaps.a. SquareBarrelRadius functional, SBR(y) = (y-p)o(y-p) - ((y-p)od)2 b. BarrelLength functional, BL(y) = (y-p)od

Given a p, do we need a full grid of ds (directions)? No! d and -d give the same BL-gaps.

Given d, do we need a full grid of p starting pts? No! All p' s.t. p'=p+cd give same gaps.Hill climb gap width from a good starting point and direction.

MATH: Need dot product projection length and dot product projection distance (in red).

yo

dot prod proj len

f|f|

f|f|

f

y = y - (yof) fof

f squared is y - (yof) fof

f (yof) fof

fo y - y - f|f|

yo f|f|

squared = yoy - 2 (yof)2 fof

+ fof (yof)2 (fof)2

squared = yoy - 2 (yof)2 fof

+ (yof)2 fof

dot product projection distance

Squared y on f Proj Dis = yoy - (yof)2 fof

Squared y-p on q-p Projection Distance = (y-p)o(y-p) - ( (y-p)o(q-p) )2

(q-p)o(q-p)1st

= yoy -2yop + pop - ( yo(q-p) - p o(q-p |q-p|

2

|q-p| M-p|M-p|

(y-p)o

For the dot product length projections (caps) we already needed:

= ( yo(M-p) - po M-p )|M-p| |M-p|

That is, we needed to compute the green constants and the blue and red dot product functionals in an optimal way (and then do the PTreeSet additions/subtractions/multiplications). What is optimal? (minimizing PTreeSet functional creations and PTreeSet operations.)

Page 20: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F=(y-M)o(x-M)/|x-M|-mn restricted to a cosine cone on IRIS

x=s1cone=1/√2

60 3 61 4 62 3 63 10 64 15 65 9 66 3 67 1 69 2 50

x=s2cone=1/√2

47 159 260 461 362 663 1064 1065 566 467 469 170 1 51

x=s2cone=.9

59 260 361 362 563 964 1065 566 467 469 170 1 47

x=s2cone=.1

39 240 141 144 145 146 147 152 1 i3959 260 461 362 663 1064 1065 566 467 469 170 1 59

x=e1cone=.707

33 136 237 238 339 140 541 442 243 144 145 646 447 548 149 250 551 152 254 255 157 258 160 162 163 164 165 2 60

x=i1cone=.707

34 135 136 237 238 339 540 442 643 244 745 547 248 349 350 351 452 353 254 255 456 257 158 159 160 161 162 163 164 166 1 75

w maxscone=.707

0 2 8 110 312 213 114 315 116 317 518 319 520 621 222 423 324 325 926 327 328 329 530 331 432 333 234 235 236 437 138 140 141 442 543 544 745 346 147 648 649 251 152 253 155 1 137

w maxscone=.93

8 1 i1013 114 316 217 218 119 320 421 124 125 426 1 e21 e3427 229 237 1 i7 27/29 are i's

w maxscone=.925

8 1 i1013 114 316 317 218 219 320 421 124 125 526 1 e21 e3427 228 129 231 1 e3537 1 i731/34 are i's

w maxs-to-minscone=.939

14 1 i2516 1 i4018 2 i16 i4219 2 i17 i3820 2 i11 i4822 223 124 4 i34 i5025 3 i24 i2826 3 i2727 528 329 230 231 332 434 335 436 237 238 239 340 141 2 46 147 248 149 1 i3953 154 255 156 157 858 559 460 761 462 563 564 165 366 167 168 1 11414 i and 100 s/e.So picks i as 0

w xnnn-nxxxcone=.95

8 2 i22 i5010 211 2 i2812 4 i24 i27 i3413 214 415 316 817 418 719 320 521 122 123 134 1 i39 43/50 e so picks out e

w naaa-xaaacone=.95

12 113 214 115 216 117 118 419 320 221 322 523 6 i2124 525 127 128 129 230 2 i7 41/43 e so picks e

w aaan-aaaxcone=.54

7 3 i27 i28 8 1 9 310 12 i20 i3411 712 1313 514 315 719 120 121 722 723 2824 6100/104 s or e so 0 picks i

Corner points

Gap in dot product projections onto the cornerpoints line.

Cosine cone gap (over some angle)

Cosine conical gapping seems quick and easy (cosine = dot product divided by both lengths.

Length of the fixed vector, x-M, is a one-time calculation. Length y-M changes with y so build the PTreeSet.

Cone Clustering: (finding cone-shaped clusters)

Page 21: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

r   r vv r mR   r      v v v v      r    r      v mV v     r    v v    r         v                   

FAUST ClassifierFAUST Classifier Separate classr, classv using midpoints of meansmidpoints of means:

a

Set a = (mR+(mV-mR)/2)od = (mR+mV)/2 o d ?

d

Training amounts to choosing the Cut hyperplane = (n-1)-dimensionl hyperplane (and thus cuts the space in two). Classify with one horizontal program (AND/OR) across the pTrees to get a mask pTree for each class (bulk classification).Improve accuracy? e.g., by considering the dispersion within classes. Use1. vector_of_medians: (vomv≡ (median(v1), median(v2),...)) instead of means; then use stdev ratio to place the cut.2. Cut at Midpt of Max{rod}, Min{vod}. If there is no gap, move Cut until r_errors + v_errors is minimized.3. Hill-climb d to maximize gap (or minimize errors when applied to the training set).4. Replace mr, mv with the avg of the margin points?5. Round classes expected? use SDmr

< |D|/2 for r-class and

SDmv <|D|/2 for v-class.

vomV

v1

v2

vomRdim 2

dim 1

d-line

Pr=P(xod)<a Pv=P(xod)a

where D≡ mRmV d=D/|D|

Page 22: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Datamining Big Data big data: up to trillions of rows (or more) and, possibly, thousands of columns (or many more). I structure data vertically (pTrees) and process it horizontally. Looping across thousands of columns can be orders of magnitude faster than looping down trillions of rows. So sometimes that means a task can be done in human time only if the data is vertically organized.Data mining is [largely] CLASSIFICATION or PREDICTION (assigning a class label to a row based on a training set of classified rows). What about clustering and ARM? They are important and related! Roughly clustering creates/improves training sets and ARM is used to data mine more complex data (e.g., relationship matrixes, etc.).

2/2/13

CLASSIFICATION is [largely] case-based reasoning. To make a decision we typically search our memory for similar situations (near neighbor cases) and base our decision on the decisions we made in those cases (we do what worked before for us or others). We let near neighbors vote. "The Magical Number Seven, Plus or Minus Two... Information"[2] cited to argue that the number of objects (contexts) an average human can hold in working memory is 7 ± 2. We can think of classification as providing a better 7 (so it's decision support, not decision making).One can say that all Classification methods (even model based ones) are a form of Near Neighbor Classification. E.g. in Decision Tree Induction (DTI) the classes at the bottom of a decision branch ARE the Near Neighbor set due to the fact that the sample arrived at that leaf.Rows of an entity table (e.g., Iris(SL,SW,PL,PW) or Image(R,G,B) describe instances of the entity (Irises or Image pixels). Columns are descriptive information on the row instances (e.g., Sepal Length, Sepal Width, Pedal Length, Pedal Width or Red, Green, Blue photon counts). If the table consists entirely of real numbers, then the row set can be viewed [as s subset of] a real vector space with dimension = # of columns.Then, the notion of "near" [in classification and clustering] can be defined using a dissimilarity (~distance) or a similarity. Two rows are near if the distance between them is low or their similarity is high. Near for columns can be defined using a correlation (e.g., Pearson's, Spearman's...)If the columns also describe instances of an entity then the table is really a matrix or relationship between instances of the row entity and the column entity. Each matrix cell measures some attribute of that relationship pair (The simplest: 1 if that row is related to that column, else 0. The most complex: an entire structure of data describing that pair (that row instance and that column instance).In Market Basket Research (MBR), the row entity is customers and the columnis items. Each cell: 1 iff that customer has that item in the basket. In Netflix Cinematch, the row entity is customers and column movies and each cell has the 5-star rating that customer gave to that movie.In Bioinformatics the row entity might be experiments and the column entity might be genes and each cell has the expression level of that gene in that experiment or the row and column entities might both be proteins and each cell has a 1-bit iff the two proteins interact in some way.In Facebook the rows might be people and the columns might also be people (and a cell has a one bit iff the row and column persons are friends)Even when the table appears to be a simple entity table with descriptive feature columns, it may be viewable as a relationship between 2 entities. E.g., Image(R,B,G) is a table of pixel instances with columns, R,G,B. The R-values count the photons in a "red" frequency range detected at that pixel over an interval of time. That red frequency range is determined more by the camera technology than by any scientific definition. If we had separate CCD cameras that could count photons in each of a million very thin adjacent frequency intervals, we could view the column values of that image as instances a frequency entity, Then the image would be a relationship matrix between the pixel and the frequency entities.So an entity table can often be usefully viewed as a relationship matrix. If so, it can also be rotated so that the former column entity is now viewed as the new row entity and the former row entity is now viewed as the new set of descriptive columns.The bottom line is that we can often do data mining on a table of data in many ways:as an entity table (classification and clustering), as a relationship matrix (ARM) or upon rotation that matrix, as another entity table.For a rotated entity table, the concepts of nearness that can be used also rotate (e.g., The cosine correlation of two columns morphs into the cosine of the angle between 2 vectors as a row similarity measure.)

Page 23: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

DBs, DWs are merging as In-memory DBs:

SAP® In-Memory Computing

Enabling Real-Time Computing SAP® In-Memory enables real-time computing by bringing together online transaction proc. OLTP (DB) and online analytical proc. OLAP (DW).

Combining advances in hardware technology with SAP InMemory Computing empowers business – from shop floor to boardroom – by giving real-time bus. proc. instantaneous access to data-eliminating today’s info lag for your business.

In-memory computing is already under way. The question isn’t if this revolution will impact businesses but when/ how.

In-memory computing won’t be introduced because a co. can afford the technology. It will be because a business cannot afford to allow its competitors to adopt the it first.

Here is sample of what in-memory computing can do for you:• Enable mixed workloads of analytics, operations, and performance management in a single software landscape.• Support smarter business decisions by providing increased visibility of very large volumes of business information• Enable users to react to business events more quickly through real-time analysis and reporting of operational data.• Deliver innovative real-time analysis and reporting.• Streamline IT landscape and reduce total cost of ownership.

In manufacturing enterprises, in-memory computing tech will connect the shop floor to the boardroom, and the shop floor associate will have instant access to the same data as the board [[shop floor = daily transaction processing. Boardroom = executive data mining]]. The shop floor will then see the results of their actions reflected immediately in the relevant Key Performance Indicators (KPI).

SAP BusinessObjects Event Insight software is key. In what used to be called exception reporting, the software deals with huge amounts of realtime data to determine immediate and appropriate action for a real-time situation.

Product managers will still look at inventory and point-of-sale data, but in the future they will also receive,eg., tell customers broadcast dissatisfaction with a product over Twitter.Or they might be alerted to a negative product review released online that highlights some unpleasant product features requiring immediate action.

From the other side, small businesses running real-time inventory reports will be able to announce to their Facebook and Twitter communities that a high demand product is available, how to order, and where to pick up.

Bad movies have been able to enjoy a great opening weekend before crashing 2nd weekend when negative word-of-mouth feedback cools enthusiasm. That week-long grace period is about to disappear for silver screen flops.

Consumer feedback won’t take a week, a day, or an hour.

The very second showing of a movie could suffer from a noticeable falloff in attendance due to consumer criticism piped instantaneously through the new technologies.

It will no longer be good enough to have weekend numbers ready for executives on Monday morning. Executives will run their own reports on revenue, Twitter their reviews, and by Monday morning have acted on their decisions.

The final example is from the utilities industry: The most expensive energy a utilities provides is energy to meet unexpected demand during peak periods of consumption. If the company could analyze trends in power consumption based on real-time meter reads, it could offer – in real time – extra low rates for the week or month if they reduce their consumption during the following few hours.

This advantage will become much more dramatic when we switch to electric cars; predictably, those cars are recharged the minute the owners return home from work.Hardware: blade servers and multicore CPUs and memory capacities measured in terabytes. Software: in-memory database with highly compressible row / column storage designed to maximize in-memory comp. tech. [[Both row and column storage! They convert to column-wise storage only for Long-Lived-High-Value data?]]Parallel processing takes place in the database layer rather than in the app layer - as it does in the client-server arch.

Total cost is 30% lower than traditional RDBMSs due to:• Leaner hardware, less system capacity req., as mixed workloads of analytics, operations, performance mgmt is in a single system, which also reduces redundant data storage. [[Back to a single DB rather than a DB for TP and a DW for boardroom dec. sup.]] • Less extract transform load (ETL) between systems and fewer prebuilt reports, reducing support required to run sofwr.

Report runtime improvements of up to 1000 times. Compression rates of up to a 10 times. Performance improvements expected even higher in SAP apps natively developed for inmemory DBs. Initial results: a reduction of computing time from hours to seconds.However, in-memory computing will not eliminate the need for data warehousing. Real-time reporting will solve old challenges and create new opportunities, but new challenges will arise. SAP HANA 1.0 software supports realtime database access to data from the SAP apps that support OLTP. Formerly, operational reporting functionality was transferred from OLTP applications to a data warehouse. With in-memory computing technology, this functionality is integrated back into the transaction system.

Adopting in-memory computing results in an uncluttered arch based on a few, tightly aligned core systems enabled by service-oriented architecture (SOA) to provide harmonized, valid metadata and master data across business processes. Some of the most salient shifts and trends in future enterprise architectures will be:• A shift to BI self-service apps like data exploration, instead of static report solutions.• Central metadata and masterdata repositories that define the data architecture, allowing data stewards to work across all business units and all platforms

Real-time in-memory computing technology will cause a decline Structured Query Language (SQL) satellite databases. The purpose of those databases as flexible, ad hoc, more business-oriented, less IT-static tools might still be required, but their offline status will be a disadvantage and will delay data updates. Some might argue that satellite systems with in-memory computing technology will take over from satellite SQL DBs.SAP Business Explorer tools that use in-memory computing technology represent a paradigm shift. Instead of waiting for IT to work on a long queue of support tickets to create new reports, business users can explore large data sets and define reports on the fly.

Page 24: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

IRIS(SL,SW,PL,PW) DPPMinVec,MaxVEC

i1 63 33 60 25 10 0 0 0 1 0 1 0i2 58 27 51 19 19 0 0 1 0 0 1 1i3 71 30 59 21 11 0 0 0 1 0 1 1i4 63 29 56 18 15 0 0 0 1 1 1 1i5 65 30 58 22 12 0 0 0 1 1 0 0i6 76 30 66 21 5 0 0 0 0 1 0 1i7 49 25 45 17 24 0 0 1 1 0 0 0i8 73 29 63 18 8 0 0 0 1 0 0 0i9 67 25 58 18 12 0 0 0 1 1 0 0i10 72 36 61 25 10 0 0 0 1 0 1 0i11 65 32 51 20 19 0 0 1 0 0 1 1i12 64 27 53 19 16 0 0 1 0 0 0 0i13 68 30 55 21 15 0 0 0 1 1 1 1i14 57 25 50 20 19 0 0 1 0 0 1 1i15 58 28 51 24 17 0 0 1 0 0 0 1i16 64 32 53 23 17 0 0 1 0 0 0 1i17 65 30 55 18 16 0 0 1 0 0 0 0i18 77 38 67 22 6 0 0 0 0 1 1 0i19 77 26 69 23 0 0 0 0 0 0 0 0e50 57 28 41 13 30 0 0 1 1 1 1 0i1 63 33 60 25 10 0 0 0 1 0 1 0i2 58 27 51 19 19 0 0 1 0 0 1 1i3 71 30 59 21 11 0 0 0 1 0 1 1i4 63 29 56 18 15 0 0 0 1 1 1 1i5 65 30 58 22 12 0 0 0 1 1 0 0i6 76 30 66 21 5 0 0 0 0 1 0 1i7 49 25 45 17 24 0 0 1 1 0 0 0i8 73 29 63 18 8 0 0 0 1 0 0 0i9 67 25 58 18 12 0 0 0 1 1 0 0i10 72 36 61 25 10 0 0 0 1 0 1 0i11 65 32 51 20 19 0 0 1 0 0 1 1i12 64 27 53 19 16 0 0 1 0 0 0 0i13 68 30 55 21 15 0 0 0 1 1 1 1i14 57 25 50 20 19 0 0 1 0 0 1 1i15 58 28 51 24 17 0 0 1 0 0 0 1i16 64 32 53 23 17 0 0 1 0 0 0 1i17 65 30 55 18 16 0 0 1 0 0 0 0i18 77 38 67 22 6 0 0 0 0 1 1 0i19 77 26 69 23 0 0 0 0 0 0 0 0i40 69 31 54 21 16 0 0 1 0 0 0 0i41 67 31 56 24 13 0 0 0 1 1 0 1i42 69 31 51 23 18 0 0 1 0 0 1 0i43 58 27 51 19 19 0 0 1 0 0 1 1i44 68 32 59 23 11 0 0 0 1 0 1 1i45 67 33 57 25 12 0 0 0 1 1 0 0i46 67 30 52 23 17 0 0 1 0 0 0 1i47 63 25 50 19 19 0 0 1 0 0 1 1i48 65 30 52 20 18 0 0 1 0 0 1 0i49 62 34 54 23 16 0 0 1 0 0 0 0i50 59 30 51 18 20 0 0 1 0 1 0 0

ID PL PW SL DPPs1 51 35 14 2 60 0 1 1 1 1 0 0s2 49 30 14 2 59 0 1 1 1 0 1 1s3 47 32 13 2 60 0 1 1 1 1 0 0s4 46 31 15 2 58 0 1 1 1 0 1 0s5 50 36 14 2 60 0 1 1 1 1 0 0s6 54 39 17 4 58 0 1 1 1 0 1 0s7 46 34 14 3 60 0 1 1 1 1 0 0s8 50 34 15 2 59 0 1 1 1 0 1 1s9 44 29 14 2 59 0 1 1 1 0 1 1s10 49 31 15 1 58 0 1 1 1 0 1 0s11 54 37 15 2 60 0 1 1 1 1 0 0s12 48 34 16 2 58 0 1 1 1 0 1 0s13 48 30 14 1 59 0 1 1 1 0 1 1s14 43 30 11 1 62 0 1 1 1 1 1 0s15 58 40 12 2 63 0 1 1 1 1 1 1s16 57 44 15 4 61 0 1 1 1 1 0 1s17 54 39 13 4 61 0 1 1 1 1 0 1s18 51 35 14 3 60 0 1 1 1 1 0 0s19 57 38 17 3 58 0 1 1 1 0 1 0s20 51 38 15 3 60 0 1 1 1 1 0 0s21 54 34 17 2 57 0 1 1 1 0 0 1s22 51 37 15 4 59 0 1 1 1 0 1 1s23 46 36 10 2 64 1 0 0 0 0 0 0s24 51 33 17 5 56 0 1 1 1 0 0 0s25 48 34 19 2 56 0 1 1 1 0 0 0s26 50 30 16 2 57 0 1 1 1 0 0 1s27 50 34 16 4 57 0 1 1 1 0 0 1s28 52 35 15 2 59 0 1 1 1 0 1 1s29 52 34 14 2 60 0 1 1 1 1 0 0s30 47 32 16 2 58 0 1 1 1 0 1 0s31 48 31 16 2 57 0 1 1 1 0 0 1s32 54 34 15 4 58 0 1 1 1 0 1 0s33 52 41 15 1 61 0 1 1 1 1 0 1s34 55 42 14 2 62 0 1 1 1 1 1 0s35 49 31 15 1 58 0 1 1 1 0 1 0s36 50 32 12 2 61 0 1 1 1 1 0 1s37 55 35 13 2 61 0 1 1 1 1 0 1s38 49 31 15 1 58 0 1 1 1 0 1 0s39 44 30 13 2 60 0 1 1 1 1 0 0s40 51 34 15 2 59 0 1 1 1 0 1 1s41 50 35 13 3 61 0 1 1 1 1 0 1s42 45 23 13 3 57 0 1 1 1 0 0 1s43 44 32 13 2 60 0 1 1 1 1 0 0s44 50 35 16 6 57 0 1 1 1 0 0 1s45 51 38 19 4 56 0 1 1 1 0 0 0s46 48 30 14 3 58 0 1 1 1 0 1 0s47 51 38 16 2 59 0 1 1 1 0 1 1s48 46 32 14 2 59 0 1 1 1 0 1 1s49 53 37 15 2 60 0 1 1 1 1 0 0s50 50 33 14 2 60 0 1 1 1 1 0 0e1 70 32 47 14 25 0 0 1 1 0 0 1e2 64 32 45 15 27 0 0 1 1 0 1 1e3 69 31 49 15 22 0 0 1 0 1 1 0e4 55 23 40 13 29 0 0 1 1 1 0 1e5 65 28 46 15 24 0 0 1 1 0 0 0e6 57 28 45 13 26 0 0 1 1 0 1 0e7 63 33 47 16 25 0 0 1 1 0 0 1e8 49 24 33 10 37 0 1 0 0 1 0 1e9 66 29 46 13 25 0 0 1 1 0 0 1e10 52 27 39 14 31 0 0 1 1 1 1 1e11 50 20 35 10 34 0 1 0 0 0 1 0e12 59 30 42 15 29 0 0 1 1 1 0 1e13 60 22 40 10 30 0 0 1 1 1 1 0e14 61 29 47 14 24 0 0 1 1 0 0 0e15 56 29 36 13 35 0 1 0 0 0 1 1e16 67 31 44 14 27 0 0 1 1 0 1 1e17 56 30 45 15 26 0 0 1 1 0 1 0e18 58 27 41 10 31 0 0 1 1 1 1 1e19 62 22 45 15 23 0 0 1 0 1 1 1e20 56 25 39 11 32 0 1 0 0 0 0 0e21 59 32 48 18 23 0 0 1 0 1 1 1e22 61 28 40 13 31 0 0 1 1 1 1 1e23 63 25 49 15 21 0 0 1 0 1 0 1e24 61 28 47 12 25 0 0 1 1 0 0 1e25 64 29 43 13 28 0 0 1 1 1 0 0

e26 66 30 44 14 27 0 0 1 1 0 1 1e27 68 28 48 14 23 0 0 1 0 1 1 1e28 67 30 50 17 21 0 0 1 0 1 0 1e29 60 29 45 15 26 0 0 1 1 0 1 0e30 57 26 35 10 36 0 1 0 0 1 0 0e31 55 24 38 11 32 0 1 0 0 0 0 0e32 55 24 37 10 33 0 1 0 0 0 0 1e33 58 27 39 12 32 0 1 0 0 0 0 0e34 60 27 51 16 20 0 0 1 0 1 0 0e35 54 30 45 15 27 0 0 1 1 0 1 1e36 60 34 45 16 27 0 0 1 1 0 1 1e37 67 31 47 15 24 0 0 1 1 0 0 0e38 63 23 44 13 25 0 0 1 1 0 0 1e39 56 30 41 13 31 0 0 1 1 1 1 1e40 55 25 40 13 30 0 0 1 1 1 1 0e41 55 26 44 12 27 0 0 1 1 0 1 1e42 61 30 46 14 26 0 0 1 1 0 1 0e43 58 26 40 12 30 0 0 1 1 1 1 0e44 50 23 33 10 37 0 1 0 0 1 0 1e45 56 27 42 13 29 0 0 1 1 1 0 1e46 57 30 42 12 30 0 0 1 1 1 1 0e47 57 29 42 13 29 0 0 1 1 1 0 1e48 62 29 43 13 28 0 0 1 1 1 0 0e49 51 25 30 11 40 0 1 0 1 0 0 0e50 57 28 41 13 30 0 0 1 1 1 1 0

set 51 35 14 2 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 0set 49 30 14 2 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0set 47 32 13 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0set 46 31 15 2 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0set 50 36 14 2 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0set 54 39 17 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 0 0set 46 34 14 3 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1set 50 34 15 2 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0set 44 29 14 2 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0set 49 31 15 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1set 54 37 15 2 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0set 48 34 16 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0set 48 30 14 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1set 43 30 11 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1set 58 40 12 2 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0set 57 44 15 4 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0set 54 39 13 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0set 51 35 14 3 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1set 57 38 17 3 0 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1set 51 38 15 3 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1set 54 34 17 2 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0set 51 37 15 4 0 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 0set 46 36 10 2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0set 51 33 17 5 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1set 48 34 19 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0set 50 30 16 2 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0set 50 34 16 4 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0set 52 35 15 2 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0set 52 34 14 2 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0set 47 32 16 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0set 48 31 16 2 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0set 54 34 15 4 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 0set 52 41 15 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1set 55 42 14 2 0 1 1 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 1 0set 49 31 15 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1set 50 32 12 2 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0set 55 35 13 2 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 0set 49 31 15 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1set 44 30 13 2 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0set 51 34 15 2 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 0set 50 35 13 3 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1set 45 23 13 3 0 1 0 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 0 0 1 1set 44 32 13 2 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0set 50 35 16 6 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0set 51 38 19 4 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0set 48 30 14 3 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1set 51 38 16 2 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0set 46 32 14 2 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0set 53 37 15 2 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0set 50 33 14 2 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0ver 70 32 47 14 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0ver 64 32 45 15 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1ver 69 31 49 15 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 1 0 0 1 1 1 1ver 55 23 40 13 0 1 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1ver 65 28 46 15 1 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1ver 57 28 45 13 0 1 1 1 0 0 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1ver 63 33 47 16 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0 0ver 49 24 33 10 0 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0ver 66 29 46 13 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1ver 52 27 39 14 0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 1 0ver 50 20 35 10 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0ver 59 30 42 15 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 1 1ver 60 22 40 10 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0ver 61 29 47 14 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0ver 56 29 36 13 0 1 1 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 1 0 1ver 67 31 44 14 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 0

ver 56 30 45 15 0 1 1 1 0 0 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1ver 58 27 41 10 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0ver 62 22 45 15 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1ver 56 25 39 11 0 1 1 1 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1ver 59 32 48 18 0 1 1 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0ver 61 28 40 13 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1ver 63 25 49 15 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 1ver 61 28 47 12 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 0 0ver 64 29 43 13 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 1ver 66 30 44 14 1 0 0 0 0 1 0 0 1 1 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0ver 68 28 48 14 1 0 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0ver 67 30 50 17 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1ver 60 29 45 15 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 1 1ver 57 26 35 10 0 1 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0ver 55 24 38 11 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 1ver 55 24 37 10 0 1 1 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0ver 58 27 39 12 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0ver 60 27 51 16 0 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 0 0 0ver 54 30 45 15 0 1 1 0 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1ver 60 34 45 16 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 0ver 67 31 47 15 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1ver 63 23 44 13 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 1ver 56 30 41 13 0 1 1 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1ver 55 25 40 13 0 1 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1ver 55 26 44 12 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 1 0 0ver 61 30 46 14 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 0 1 1 1 0 0 0 1 1 1 0

ver 58 26 40 12 0 1 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0ver 50 23 33 10 0 1 1 0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0ver 56 27 42 13 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 1 1 0 1ver 57 30 42 12 0 1 1 1 0 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0ver 57 29 42 13 0 1 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 1 0 1ver 62 29 43 13 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 1ver 51 25 30 11 0 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 1ver 57 28 41 13 0 1 1 1 0 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1

vir 63 33 60 25 0 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 0 0 1vir 58 27 51 19 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 0 1 1vir 71 30 59 21 1 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 1vir 63 29 56 18 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 0 0 0 1 0 0 1 0vir 65 30 58 22 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0vir 76 30 66 21 1 0 0 1 1 0 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1vir 49 25 45 17 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1vir 73 29 63 18 1 0 0 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0vir 67 25 58 18 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0vir 72 36 61 25 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1vir 65 32 51 20 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0vir 64 27 53 19 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1vir 68 30 55 21 1 0 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1vir 57 25 50 20 0 1 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1 0 0vir 58 28 51 24 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 1 0 1 1 0 0 0vir 64 32 53 23 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 1vir 65 30 55 18 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0vir 77 38 67 22 1 0 0 1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 1 1 0vir 77 26 69 23 1 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 1 1vir 60 22 50 15 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 0 1 1 1 1vir 69 32 57 23 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1vir 56 28 49 20 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0vir 77 28 67 20 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0vir 63 27 49 18 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0vir 67 33 57 21 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 0 1vir 72 32 60 18 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0vir 62 28 48 18 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0vir 61 30 49 18 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0vir 64 28 56 21 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 1vir 72 30 58 16 1 0 0 1 0 0 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 0vir 74 28 61 19 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1vir 79 38 64 20 1 0 0 1 1 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0vir 64 28 56 22 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0vir 63 28 51 15 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1vir 61 26 56 14 0 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0vir 77 30 61 23 1 0 0 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1vir 63 34 56 24 0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0vir 64 31 55 18 1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 0vir 60 30 18 18 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0vir 69 31 54 21 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1 0 0 1 0 1 0 1vir 67 31 56 24 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 0 0 0vir 69 31 51 23 1 0 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 1 1vir 58 27 51 19 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 0 1 1vir 68 32 59 23 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 1 1vir 67 33 57 25 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 1 0 0 1vir 67 30 52 23 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1vir 63 25 50 19 0 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1vir 65 30 52 20 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0vir 62 34 54 23 0 1 1 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1vir 59 30 51 18 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0

Page 25: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

gap>=4p=nnnnq=xxxx F Count 0 1 1 1 2 1 3 3 4 1 5 6 6 4 7 5 8 7 9 310 811 512 113 214 115 119 120 121 326 228 129 430 231 232 233 434 336 537 238 239 240 541 642 543 744 245 146 347 248 149 550 451 152 353 254 255 356 257 158 159 161 264 266 268 1

Dot Product Projection (DPP) Check F(y)=(y-p)o(q-p)/|q-p| for gaps or thin intervals. Check actual distances at sparse ends.

Sparse Lower end: Checking [0,4] distances 0 1 2 3 3 3 4 s14 s42 s45 s23 s16 s43 s3s14 0 8 14 7 20 3 5s42 8 0 17 13 24 9 9s45 14 17 0 11 9 11 10s23 7 13 11 0 15 5 5s16 20 24 9 15 0 18 16s43 3 9 11 5 18 0 3s3 5 9 10 5 16 3 0

s42 is revealed as an outlier because F(s42)= 1 is 4 from 5,6,... and it's 4 from others in [0,4]

Gaps=[15,19] [21,26] Check dis in [12,28] to see if s16, i39,e49,e8,e11,e44 outliers

12 13 13 14 15 19 20 21 21 21 26 26 28 s34 s6 s45 s19 s16 i39 e49 e8 e11 e44 e32 e30 e31s34 0 5 8 5 4 21 25 28 32 28 30 28 31s6 5 0 4 3 6 18 21 23 27 24 26 23 27s45 8 4 0 6 9 18 18 21 25 21 24 22 25s19 5 3 6 0 6 17 21 24 27 24 25 23 27s16 4 6 9 6 0 20 26 29 33 29 30 28 31i39 21 18 18 17 20 0 17 21 24 21 22 19 23e49 25 21 18 21 26 17 0 4 7 4 8 8 9e8 28 23 21 24 29 21 4 0 5 1 7 8 8e11 32 27 25 27 33 24 7 5 0 4 7 9 7e44 28 24 21 24 29 21 4 1 4 0 6 8 7e32 30 26 24 25 30 22 8 7 7 6 0 3 1e30 28 23 22 23 28 19 8 8 9 8 3 0 4e31 31 27 25 27 31 23 9 8 7 7 1 4 0

So s16,,i39,e49, e11 are outlier. {e8,e44} doubleton outlier. Separate at 17 and 23, giving CLUS1 F<17 ( CLUS1 =50 Setosa with s16,s42 declared as outliers).17<F CLUS2 F<23 (e8,e11,e44,e49,i39 all are already declared outliers)23<F CLUS3 ( 46 vers, 49 virg with i6,i10,i18,i19,i23,i32 declared as outliers)

To illustrate the DPP algorithm, we use IRIS to see how close it comes to separating into the 3 known classes (s=setosa, e=versicolor, i=virginica)We require a DPP-gap of at least 4. We also check any sparse ends of the DPP-range to find outliers (using a table of pairwise distances).We start with p=MinVector of the 4 column minimums and q=MaxVector of the 4 col. maxs. Then we replace some of those with the average.

Sparse Upper end: Checking [57.68] distances 57 58 59 61 61 64 64 66 66 68 i26 i31 i8 i10 i36 i6 i23 i19 i32 i18i26 0 5 4 8 7 8 10 13 10 11i31 5 0 3 10 5 6 7 10 12 12i8 4 3 0 10 7 5 6 9 11 11i10 8 10 10 0 8 10 12 14 9 9i36 7 5 7 8 0 5 7 9 9 10i6 8 6 5 10 5 0 3 5 9 8i23 10 7 6 12 7 3 0 4 11 10i19 13 10 9 14 9 5 4 0 13 12i32 10 12 11 9 9 9 11 13 0 4i18 11 12 11 9 10 8 10 12 4 0i10,i36,i19,i32,i18 singleton outlies because F 4 from 56 and 4 from each other. {i6,i23} is a doubleton outlier set.

CLUS3 outliers removed p=aaax q=aaan F Cnt 0 4 1 2 2 5 3 13 4 8 5 12 6 4 7 2 8 11 9 510 411 512 213 714 315 2

Thinning=[6,7 ] CLUS3.1 <6.5 44 ver 4 vir

LUS3.2 >6.5 2 ver 39 vir

No sparse ends

CLUS3.1 p=anxa q=axna F Cnt 0 2 3 1 5 2 6 1 8 2 9 410 311 612 613 714 715 416 319 2

No Thining. Sparse Lo end: Check [0,8] distances 0 0 3 5 5 6 8 8 i30 i35 i20 e34 i34 e23 e19 e27i30 0 12 17 14 12 14 18 11i35 12 0 7 6 6 7 12 11i20 17 7 0 5 7 4 5 10e34 14 6 5 0 3 4 8 9i34 12 6 7 3 0 4 9 6e23 14 7 4 4 4 0 5 6e19 18 12 5 8 9 5 0 9e27 11 11 10 9 6 6 9 0i30,i35,i20 outliers because F3 they are 4 from 5,6,7,8 {e34,i34} doubleton outlier set

Sparse Upper end: Check [16,19] distances 16 16 16 19 19 e7 e32 e33 e30 e15e7 0 17 12 16 14e32 17 0 5 3 6e33 12 5 0 5 4e30 16 3 5 0 4e15 14 6 4 4 0e15 outlier. So CLUS3.1 = 42 versicolor

CLUS3.2 = 39 virg, 2 vers (unable to separate the 2 vers from the 39 virg)

Page 26: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

"Gap Hill Climbing": mathematical analysisOne way to increase the size of the functional gaps is to hill climb the standard deviation of the functional, F (hoping that a "rotation" of d toward a higher STDev would increase the likelihood that gaps would be larger ( more dispersion allows for more and/or larger gaps).

We can also try to grow one particular gap or thinning using support pairs as follows:

F-slices are hyperplanes (assuming F=dotd) so it would makes sense to try to "re-orient" d so that the gap grows.Instead of taking the "improved" p and q to be the means of the entire n-dimensional half-spaces which is cut by the gap (or thinning), take as p and q to be the means of the F-slice (n-1)-dimensional hyperplanes defining the gap or thinning.This is easy since our method produces the pTree mask of each F-slice ordered by increasing F-value (in fact it is the sequence of F-values and the sequence of counts of points that give us those value that we use to find large gaps in the first place.).

0 1 2 3 4 5 6 7 8 9 a b c d e ff 1 0e 2 3d 4 5 6c 7 8b 9a98765 a j k l m n4 b c q r s3 d e f o p2 g h1 i0

d1

d 1-gap

=p

q=

d2

d2-gap

The d2-gap is much larger than the d1=gap. It is still not the optimal gap though. Would it be better to use a weighted mean (weighted by the distance from the gap - that is weighted by the d-barrel radius (from the center of the gap) on which each point lies?)

0 1 2 3 4 5 6 7 8 9 a b c d e ff 1e 2 3d 4 5 6c 7 8b 9a98765 a j k 4 b c q 3 d e f 2 1 0

d1

d 1-gap

p

q

d2

d2-gap

In this example it seems to make for a larger gap, but what weightings should be used? (e.g., 1/radius2) (zero weighting after the first gap is identical to the previous). Also we really want to identify the Support vector pair of the gap (the pair, one from one side and the other from the other side which are closest together) as p and q (in this case, 9 and a but we were just lucky to draw our vector through them.) We could check the d-barrel radius of just these gap slice pairs and select the closest pair as p and q???

Page 27: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

HILL CLIMBING GAP WIDTH

Dot F p=aaan q=aaax 0 6 1 28 2 7 3 7 4 1 5 1 9 710 311 512 1313 814 1215 416 217 1218 519 620 621 322 823 324 3

CLUS1<7 (50 Set)

7 <CLUS2< 16 (4 Virg, 48 Vers)

On CLUS2unionCLUS3 p=avg<16 q=avg>16 0 1 1 1 2 2 3 1 7 2 9 210 211 312 313 214 515 116 317 318 219 220 421 522 223 524 925 126 127 328 229 130 331 532 233 334 335 136 237 438 139 142 244 145 247 2

Next we attempt to hill-climb the gap at 16 using the half-space averages.

CLUS3>16 (46 Virg, 2 Vers)

No conclusive gaps Sparse Lo end: Check [0,9] 0 1 2 2 3 7 7 9 9 i39 e49 e8 e44 e11 e32 e30 e15 e31i39 0 17 21 21 24 22 19 19 23e49 17 0 4 4 7 8 8 9 9e8 21 4 0 1 5 7 8 10 8e44 21 4 1 0 4 6 8 9 7e11 24 7 5 4 0 7 9 11 7e32 22 8 7 6 7 0 3 6 1e30 19 8 8 8 9 3 0 4 4e15 19 9 10 9 11 6 4 0 6e31 23 9 8 7 7 1 4 6 0i39,e49,e11 singleton outliers. {e8,i44} doubleton outlier set

Sparse Hi end: Check [38,47] distances 38 39 42 42 44 45 45 47 47 i31 i8 i36 i10 i6 i23 i32 i18 i19i31 0 3 5 10 6 7 12 12 10i8 3 0 7 10 5 6 11 11 9i36 5 7 0 8 5 7 9 10 9i10 10 10 8 0 10 12 9 9 14i6 6 5 5 10 0 3 9 8 5i23 7 6 7 12 3 0 11 10 4i32 12 11 9 9 9 11 0 4 13i18 12 11 10 9 8 10 4 0 12i19 10 9 9 14 5 4 13 12 0i10,i18,i19,i32,i36 singleton outliers {i6,i23} doubleton outlier

There is a thinning at 22 and it is the same one but it is not more prominent. Next we attempt to hill-climb the gap at 16 using the mean of the half-space boundary.(i.e., p is avg=14; q is avg=17.

CL123 p is avg=14 q is avg=17 0 1 2 3 3 2 4 4 5 7 6 4 7 8 8 2 9 1110 412 313 120 121 122 223 127 228 129 130 231 432 233 334 435 136 337 438 239 240 541 342 343 644 845 146 247 148 349 351 752 253 254 355 156 357 358 161 263 264 166 167 1

Here, the gap between CLUS1 and CLUS2 is made more pronounced???? (Why?)But the thinning between CLUS2 and CLUS3 seems even more obscure???

Although this doesn't prove anything, it is not good news for the method!

It did not grow the gap we wanted to grow (between CLUSTER2 and CLUSTER3.

Page 28: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

CAINE 2013 Call for Papers 26th International Conference on Computer Applications in Industry and Engineering September 25{27, 2013, Omni Hotel, Los Angles, Califorria, USA Sponsored by the International Society for Computers and Their Applications (ISCA) CAINE{2013 will feature contributed papers as well as workshops and special sessions. Papers will be accepted into oral presentation sessions. The topics will include, but are not limited to, the following areas:Agent-Based Systems Image/Signal Processing Autonomous Systems Information Assurance Big Data Analytics Information Systems/DatabasesBioinformatics, Biomedical Systems/Engineering Internet and Web-Based Systems Computer-Aided Design/Manufacturing Knowledge-based SystemsComputer Architecture/VLSI Mobile Computing Computer Graphics and Animation Multimedia Applications Computer Modeling/Simulation Neural NetworksComputer Security Pattern Recognition/Computer Vision Computers in Education Rough Set and Fuzzy Logic Computers in Healthcare RoboticsComputer Networks Fuzzy Logic Control Systems Sensor Networks Data Communication Scientic Computing Data Mining Software Engineering/CASEDistributed Systems Visualization Embedded Systems Wireless Networks and CommunicationImportant Dates: Workshop/special session proposal . . May 2.5,.2.013 Full Paper Submis . .June 5,.2013. Notice Accept ..July.5 , 2013.Pre-registration & Camera-Ready Paper Due . . . ..August 5, 2013. Event Dates . . .Sept 25-27, 2013

SEDE Conf is interested in gathering researchers and professionals in the domains of SE and DE to present and discuss high-quality research results and outcomes in their fields. SEDE 2013 aims at facilitating cross-fertilization of ideas in Software and Data Engineering, The conference topics include, but not limited to:. Requirements Engineering for Data Intensive Software Systems. Software Verification and Model of Checking. Model-Based Methodologies. Software Quality and Software Metrics. Architecture and Design of Data Intensive Software Systems. Software Testing. Service- and Aspect-Oriented Techniques. Adaptive Software Systems. Information System Development. Software and Data Visualization. Development Tools for Data Intensive. Software Systems. Software Processes. Software Project Mgnt. Applications and Case Studies. Engineering Distributed, Parallel, and Peer-to-Peer Databases. Cloud infrastructure, Mobile, Distributed, and Peer-to-Peer Data Management. Semi-Structured Data and XML Databases. Data Integration, Interoperability, and Metadata. Data Mining: Traditional, Large-Scale, and Parallel. Ubiquitous Data Management and Mobile Databases. Data Privacy and Security. Scientific and Biological Databases and Bioinformatics. Social networks, web, and personal information management. Data Grids, Data Warehousing, OLAP. Temporal, Spatial, Sensor, and Multimedia Databases. Taxonomy and Categorization. Pattern Recognition, Clustering, and Classification. Knowledge Management and Ontologies. Query Processing and Optimization. Database Applications and Experiences. Web Data Mgnt and Deep WebMay 23, 2013 Paper Submission Deadline June 30, 2013 Notification of AcceptanceJuly 20, 2013 Registration and Camera-Ready Manuscript Conference Website: http://theory.utdallas.edu/SEDE2013/

ACC-2013 provides an international forum for presentation and discussion of research on a variety of aspects of advanced computing and its applications, and communication and networking systems. Important Dates May 5, 2013 - Special Sessions Proposal June 5, 2013 - Full Paper Submission July 5, 2013 - Author Notification Aug. 5, 2013 - Advance Registration & Camera Ready Paper Due

CBR International Workshop Case-Based Reasoning CBR-MD 2013 July 19, 2013, New York/USA Topics of interest include (but are not limited to): CBR for signals, images, video, audio and text Similarity assessment Case representation and case mining Retrieval and indexing Conversational CBR Meta-learning for model improvement and parameter setting for processing with CBR Incremental model improvement by CBR Case base maintenance for systems Case authoring Life-time of a CBR system Measuring coverage of case bases Ontology learning with CBR Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013

Workshop on Data Mining in Life Sciences DMLS Discovery of high-level structures, incl e.g. association networks Text mining from biomedical literatur Medical images mining Biomedical signals mining Temporal and sequential data mining Mining heterogeneous data Mining data from molecular biology, genomics, proteomics, pylogenetic classification With regard to different methodologies and case studies: Data mining project development methodology for biomedicine Integration of data mining in the clinic Ontology-driver data mining in life sciences Methodology for mining complex data, e.g. a combination of laboratory test results, images, signals, genomic and proteomic samples Data mining for personal disease management Utility considerations in DMLS, including e.g. cost-sensitive learning Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013

Workshop on Data Mining in Marketing DMM'2013 In business environment data warehousing - the practice of creating huge, central stores of customer data that can be used throughout the enterprise - is becoming more and more common practice and, as a consequence, the importance of data mining is growing stronger. Applications in Marketing Methods for User Profiling Mining Insurance Data E-Markteing with Data Mining Logfile Analysis Churn Management Association Rules for Marketing Applications Online Targeting and Controlling Behavioral Targeting Juridical Conditions of E-Marketing, Online Targeting and so one Controll of Online-Marketing Activities New Trends in Online Marketing Aspects of E-Mailing Activities and Newsletter Mailing Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013

Workshop Data Mining in Ag DMA 2013 Data Mining on Sensor and Spatial Data from Agricultural Applications Analysis of Remote Sensor Data Feature Selection on Agricultural Data Evaluation of Data Mining Experiments Spatial Autocorrelation in Agricultural Data Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013

Page 29: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

X X1...Xj ...Xn

x1

x2

.

.

.xi xi,j

xN

x1odx2od

xiod

xNod

d1

dn

=

X mm d = DPPd(X)

VX o dd =VarDPPdXV X1... Xj ... Xn

X1

:Xi XiXj-XiX,j :XN

V d1... dj ... dn

d1

:di didj

:dN

V=

|d|=1 |dd|=1, so dd is a unit vector iff d is a unit vector).

If |d|=1 then |dd|=1|dd| = SQRT( i=1..ndi

2d12 + i=1..ndi

2d22 + ... + i=1..ndi

2dn2 )

|dd| = SQRT( j=1..n(i=1..ndi2)dj

2 )

|dd| = SQRT( j=1..n 1 dj2 )

|dd| = SQRT( j=1..n 1 dj2 ) = 1

If |dd|=1 then |d|=1

1=|dd| = SQRT( i=1..ndi2d1

2 + i=1..ndi2d2

2 + ... + i=1..ndi2dn

2 )

the if

the if

1=|dd| = SQRT( (i=1..ndi2) (j=1..ndj

2) )

1=|dd| = SQRT( (i=1..ndi2)

2 )

1=|dd| = SQRT( (i=1..ndi2)

2 )

Page 30: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

Dot Product Projection: DPPd(y)≡ (y-p)od where the unit vector, d, can be obtained as d=(p-q)/|p-q| for points, p and q.Square Distance Functional: SDp(y) ≡ (y-p)o(y-p)

FAUST Functional-Gap clustering (FAUST=Functional Analytic Unsupervised and Supervised machine Teaching)

relies on choosing a distance dominating functional (map to R1 s.t. |F(x)-F(y)|Dis(x,y) x,y; so that any F-gap implies a linear cluster break.

Coordinate Projection is a the simplest DPP: ej(y) ≡ yj

Dot Product Radius: DPRpq(y) ≡ √ SDp(y) - DPPpq(y)2 Square Dot Product Radius: SDPRpq(y)≡ SDp(y)-DPPpq(y)2

Note: The same DPPd gaps are revealed by DPd(y)≡ yod since ((y-p)od=yod-pod and thus DP just shifts all DPP values by pod.Finding a good unit vector, d, for Dot Product functional, DPP. to maximize gaps

- ( j=1..n

Xj dj )2 = (1/N)

i=1..N( (

j=1..n xi,jdj)2 )

= (1/N)

i=1..N(

j=1..n xi,jdj) (k=1..n xi,kdk)

- (j=1..n

Xj dj) (k=1..n

Xk dk)

= (1/N)i=1..N

( j=1..n xi,j

2dj2 + 2

j<k xi,jxi,kdjdk )

- (j=1..n Xj

2dj2 + 2

j<k XjXkdjdk )

+(2

j=1..n<k XjXkdjdk )

= (1/N) j=1..n

Xj2 dj

2

- (j=1..n Xj

2dj2 - 2

j<k XjXkdjdk )

= (1/N)

j=1..nXj

2 dj2

- (j=1..n Xj

2dj2 + +(2

j=1..n<k XjXkdjdk )

- 2j<k XjXkdjdk )

= (1/N)

j=1..n( Xj

2 - Xj2 ) dj

2 + +(2j=1..n<k=1..n

(XjXk - XjXk ) djdk )

X X1...Xj ...Xn

x1

x2

.

.

.xi xi,j

xN

x1odx2od

xiod

xNod

d1

dn

=

X mm d = DPPd(X)

VX o dd =VarDPPdXV X1... Xj ... Xn

X1

:Xi XiXj-XiX,j :XN

V d1... dj ... dn

d1

:di didj

:dN

V=

Algorithm-3 (an optimum): Find d producing maximum VarianceDPPd(X)? View the nn matricies, VX, dd as n2-vectors. Then V=VXodd as n2vectors and the dd that gives the maximum V is VX/|VX|. So we want d such that dd forms the minimum angle (angle=0) with VX. Minimize F(d)=VXodd/|VM|. Since |VM| constant, minimize F(d)=VXodd.

subjected to i=1..n di2= 1

Method-1: Maximize VarianceDPPd(X) wrt d. Let upper bar mean column average. VarDPPd(X) =

(Xod)2 - ( Xod )2

Algoritm-2 (a heuristic): Find k s.t. is max. Set dk=1, dh=0 hk. We've already done this using ek with max stdev)

Xk2 - Xk

2

X12 - X1

2

Algorithm-1 (a heuristic): Compute the vector ( , ... , ). The unit vector (a1...an)≡A maximizing YoA is

A=Y/|Y|. So let D≡( √ ,...,√ ) and d≡D/|D| Remove outliers first?

X12 - X1

2 Xn2 - Xn

2

X12 - X1

2

Page 31: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F Ct 0 1 2 3 3 2 4 4 5 4 6 6 7 8 8 3 9 1010 512 213 218 121 122 123 124 128 229 130 131 332 333 234 335 537 338 339 340 141 542 443 544 545 747 348 149 351 652 353 354 155 456 257 158 359 160 162 265 166 268 169 1

In the neighborhood of F=15: 18 21 22 23 24 i39 e49 e8 e44 e11i39 0 17 21 21 24e49 17 0 4 4 7e8 21 4 0 1 5e44 21 4 1 0 4e11 24 7 5 4 0i39,e49,e11 outliers (gap=(13,28) or 15) {e8,e44}doubleton outlier

Sparse Hi end of CLUS.2 59 60 62 62 65 66 66 68 69 i31 i8 i36 i10 i6 i23 i32 i19 i18i31 0 3 5 10 6 7 12 10 12i8 3 0 7 10 5 6 11 9 11i36 5 7 0 8 5 7 9 9 10i10 10 10 8 0 10 12 9 14 9i6 6 5 5 10 0 3 9 5 8i23 7 6 7 12 3 0 11 4 10i32 12 11 9 9 9 11 0 13 4i19 10 9 9 14 5 4 13 0 12i18 12 11 10 9 8 10 4 12 0i10,i32,i19,i18 outliers. {i6,i23}doubleton outlier

F Ct on CLUS.2 0 2 1 1 2 1 3 3 4 2 5 4 6 3 7 2 8 1 9 310 211 212 313 414 315 716 517 618 219 120 321 222 223 424 425 126 327 228 129 330 331 132 135 1

Sparse Hi end 30 30 30 31 32 35 i3 i26 i44 i31 i8 i36i3 0 4 4 5 5 7i26 4 0 6 5 4 7i44 4 6 0 8 9 9i31 5 5 8 0 3 5i8 5 4 9 3 0 7i36 7 7 9 5 7 0i35 outlier.

F Ct on CLUS.2.1 0 1 1 2 2 1 3 4. 4 1 5 4 6 3 7 2 8 1 9 210 411 112 413 214 515 916 417 318 3

10F Ct on CLUS.2.1.2 0 1 4 115 124 127 334 135 136 143 144 146 153 154 155 256 157 158 169 176 288 1

__________(20,30) 2 virg; 2 vers (all outliers?)

__________(30,40) 0 virg; 3 vers (all outliers?)

__________(40,50) 1 virg; 2 vers (all outliers?)

__________(50,60) 4 virg; 3 vers

__________ [0.20) 2 virg; 1 vers (all outliers)

__________(60,88] 1 virg; 3 vers (all outliers)

Algorithm-2: Take the ai corresponding to max STD(Yi). STD(PL)=17 over twice others so F(x)=x3

__________CLUS.1 <25 1 virg; 50 seto

__________25< CLUS.2 <49 3 virg; 46 vers 49< CLUS.3 <70 46 virg; 4 vers But, would one pick out 49 as a gap/ thinning?

F Ct10 111 112 213 714 1215 1416 717 418 119 230 133 235 236 137 138 139 340 541 342 443 244 445 846 347 548 349 550 451 852 253 254 255 356 657 358 359 2

60 261 363 164 166 167 269 1

X12 - X1

2

Algorithm-1 (a heuristic): Compute the vector ( , ... , ). The unit vector (a1...an)≡A maximizing YoA is

A=Y/|Y|. So let D≡( √ ,...,√ ) and d≡D/|D|. F(x)=xod

X12 - X1

2 Xn2 - Xn

2

X12 - X1

2

_________CLUS.1 <15 (50 Setosa)

CLUS.2 >15 (50 Versacolor, 50 Verginica

_________CLUS.2.1 <19 (42 Vers, 14 Virg)

CLUS.2.2 19 ( 2 Vers, 29Virg)

CLUS.2.1.1 <14 30 Vers, 2 Virg .

CLUS.2.1.2 14 12 Vers, 12Virg

Page 32: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F Ct 0 1 2 3 3 2 4 4 5 4 6 6 7 8 8 3 9 1010 512 213 218 121 122 123 124 128 229 130 131 332 333 234 335 537 338 339 340 141 542 443 544 545 747 348 149 351 652 353 354 155 456 257 158 359 160 162 265 166 268 169 1

In the neighborhood of F=15: 18 21 22 23 24 i39 e49 e8 e44 e11i39 0 17 21 21 24e49 17 0 4 4 7e8 21 4 0 1 5e44 21 4 1 0 4e11 24 7 5 4 0i39,e49,e11 outliers (gap=(13,28) or 15) {e8,e44}doubleton outlier

Sparse Hi end of CLUS.2 59 60 62 62 65 66 66 68 69 i31 i8 i36 i10 i6 i23 i32 i19 i18i31 0 3 5 10 6 7 12 10 12i8 3 0 7 10 5 6 11 9 11i36 5 7 0 8 5 7 9 9 10i10 10 10 8 0 10 12 9 14 9i6 6 5 5 10 0 3 9 5 8i23 7 6 7 12 3 0 11 4 10i32 12 11 9 9 9 11 0 13 4i19 10 9 9 14 5 4 13 0 12i18 12 11 10 9 8 10 4 12 0i10,i32,i19,i18 outliers. {i6,i23}doubleton outlier

CLUS 22(F-mn) 0 2 2 1 4 1 5 2 6 1 7 1 8 1 9 110 312 414 215 117 218 119 120 121 223 224 125 126 227 228 229 530 131 432 233 234 436 137 139 240 142 243 144 145 146 247 348 251 352 154 255 157 359 260 162 164 170 1

X12 - X1

2

Algorithm-1 (a heuristic): Compute the vector ( , ... , ). Redo CLUS2, spreading F-values out as 2(F-min)

D≡( √ ,...,√ ) and d≡D/|D|. F(x)=xod

X12 - X1

2 Xn2 - Xn

2

X12 - X1

2

_________CLUS.1 <15 (50 Setosa)

CLUS.2 >15 (50 Versacolor, 50 Verginica

_________[0,1) 2 Vers, 0 Virg all outliers_________[1,2] 1 Vers, 0 Virg outlier

________(2,11) 9 Vers, 1 Virg________[11,12] 4 Vers, 0 Virg quad outlier set?

_________[14,16) 3 Vers, 0 Virg all outliers?

_________(16,22) 7 Vers, 0 Virg || [0,22) 26 Vers, 1 Virg

_________(22,31) 13 Vers, 3 Virg || [0,31) 39 Vers, 4 Virg

_________[31,35) 4 Vers, 8 Virg

_________(35,38) 1 Vers, 1 Virg outliers

_________(38,41) 2 Vers, 1 Virg outliers?

_________(41,50) 0 Vers, 12 Virg

_________(50,53) 0 Vers, 4 Virg

_________(53,56) 0 Vers, 3 Virg outliers?_________(56,58) 0 Vers, 3 Virg outliers?

_________(58,61) 0 Vers, 3 Virg outliers?

_________(61,71) 0 Vers, 3 Virg outlier

Page 33: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F Ct 0 1 1 1 2 1 8 120 125 126 128 130 132 234 136 137 238 139 240 241 142 443 444 245 347 148 349 350 251 452 253 454 355 156 457 258 359 160 162 163 164 165 166 267 569 570 171 272 173 374 275 376 177 278 379 280 481 482 284 385 386 287 188 1 89 1 90 2 92 7 93 1 95 2 96 2 97 2 98 3102 1103 2104 1109 1120 1

X12 - X1

2

Algorithm-1: Compute ( , ... , ). Applied to Sat150 (Satlog dataset with 150 pixs)

D≡( √ ,...,√ ) and d≡D/|D|. F(x)=xod

X12 - X1

2 Xn2 - Xn

2

X12 - X1

2

_________[0,5) 3c=5 _________[5,14) 1c=5_________[14,23) 1c=7

_________[23,27) 1c=2 1c=7

_________[27,31) 2c=7

_________[31,35) 1c=1 2c=7

_________[35,46) 10c=2 1c=4 1c=5 10c=7

_________[46,61) 2c=1 19c=2 1c=4 5c=5 7c=7

_________[61.68) 2c=1 1c=2 7c=4 1c=7

________ [68,83) 11c=1 1c=2 17c=3 3c=4 3c=7

_________[83,91) 13c=3

_________[91,94) 1c=1 7c=3

_________[94,100) 1c=1 8c=3

_________[100,106) 1c=1 3c=3_________[106,115) 1c=3_________[115,121) 1c=3

This Satlog dataset is 150 rows (pixels) and 4 feature columns (R, G, IR1, IR2)There are 6 row-classes with row counts as follows:

Count Class# Class Description19 c=1 red soil32 c=2 cotton crop50 c=3 grey soil12 c=4 damp grey soil10 c=5 soil with vegetation stubble27 c=7 very damp grey soil

There are no significant gaps.

There is some localization of classes with respect to F, but in a strictly unsupervised setting, that would be impossible to detect.

This is somewhat expected since the changes in ground cover class are gradual and smooth (in general) so that classes butt up against one-another (no gaps between them).

Page 34: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F-MN/4 0 1 7 111 112 214 115 216 117 118 320 221 322 323 324 525 326 327 129 130 132 233 435 136 137 438 439 240 241 442 143 1045 146 347 648 349 350 251 552 353 154 256 157 258 359 260 161 162 163 265 166 267 168 369 471 372 373 274 175 376 279 180 482 185 186 187 1

X12 - X1

2

Algorithm-1: on Concrete149(Strength=ClassLabel,Mix,Water,FineAgregate,Age)

D≡( √ ,...,√ ) and d≡D/|D|. F(x)=xodX12 - X1

2 _________[0,4) 1c=2

Concrete149 dataset has 149 rows; 1 class column and 4 feature columns (ST,MX,WA,FA,AG))There are 4 Strength classes with row counts as follows:Count Class# Class Description (Concrete Strength of...)19 c=0 [0,10)32 c=2 [20,30)50 c=4 [40,50)12 c=6 [60,100)

I deleted Strength= 10's, 30's, 50's so as to introduce gaps to identify.

I really didn't find any!!

_________[4,9) 1c=4

_________[9,13) 1c=0 1c=2 1c=4

_________[13,19) 1c=0 4c=2 2c=4 1c=6

_________[19,28) 1c=0 13c=2 6c=4 3c=6

_________[28,31) 1c=0 1c=2 1c=4 3c=6

_________[31,34) 1c=0 13c=2 6c=4

_________[34,44) 1c=0 16c=2 7c=4 6c=6

_________[44,55) 1c=0 1c=2 10c=4 18c=6

_________[55,64) 1c=0 13c=2 6c=4 7c=6

_________[64,70) 1c=0 1c=2 6c=4 4c=6

_________[70,78) 1c=0 1c=2 3c=4 10c=6

_________[78,81) 1c=0 13c=2 1c=4 4c=6

Page 35: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

F=MX 0 4 6 3 7 212 1013 214 318 920 422 523 324 527 331 336 441 243 444 346 348 249 255 1358 860 662 565 471 1672 474 382 483 797 2100 1

Algorithm-2: on Concrete149(Strength=ClassLabel,Mix,Water,FineAgregate,Age)

STD(MX)=101, STD(WA)=28, STD(FA)=99, STD(AG(=81) so we pick MX.

_________[0,5) 1c=0 1c=2 2c=4

_________[5,10) 1c=0 4c=2

_________[10,16) 1c=0 7c=2 7c=4 _________[16,19) 1c=0 3c=2 6c=4 _________[19,21) 1c=0 3c=2 1c=4

_________[21,25) 1c=0 10c=2 3c=4 _________[25,29) 1c=0 2c=2 1c=4 _________[29,31) 1c=0 1c=2 3c=4 _________[31,39) 1c=0 3c=2 3c=4

_________[39,45) 1c=0 2c=2 3c=4 4c=6

_________[45,52) 1c=0 2c=2 3c=4 3c=6 _________[5256,) 1c=0 1c=2 3c=4 12c=6

_________[56,64) 1c=0 3c=2 8c=4 8c=6 _________[64,66) 1c=0 2c=2 1c=4 3c=6

_________[66,78) 1c=0 2c=2 8c=4 15c=6

_________[78,90) 1c=0 2c=2 5c=4 6c=6

_________[90,101) 1c=0 2c=2 2c=4 1c=6

F=FA 0 15 3 2 5 417 319 821 129 341 2846 347 848 352 453 1558 362 463 465 767 369 472 373 1275 278 583 1100 4

_________[0,2) 1c=0 1c=2 14c=4 1c=6

_________[2,10) 1c=0 2c=2 2c=4 2c=6

_________[10,25) 1c=0 1c=2 8c=4 3c=6 _________[25,30) 1c=0 1c=2 14c=4 3c=6 _________[30,43) 1c=0 6c=2 3c=4 19c=6

_________[43,50) 1c=0 6c=2 4c=4 3c=6

_________[50,55) 2c=0 6c=2 5c=4 6c=6 _________[55,60) 1c=0 1c=2 2c=4 3c=6

_________[60,70) 1c=0 10c=2 6c=4 6c=6

_________[70,80) 1c=0 4c=2 7c=4 11c=6

d = STDVector = (101,28,99,81) divided by length.d=StdVec 0 1 7 111 112 214 115 216 117 118 320 221 322 323 324 525 326 327 129 130 132 233 435 136 137 438 439 240 241 442 143 1045 146 347 648 349 350 251 552 353 154 256 157 258 359 260 161 162 163 265 166 267 168 369 471 372 373 274 175 376 279 180 482 185 186 187 1

_________[0,14) 1c=0 2c=2 2c=4

_________[14,19) 1c=0 4c=2 2c=4 1c=6

_________[19,28) 1c=0 13c=2 6c=4 3c=6

_________[28,31) 1c= 0 1c=2 1c=4 1c=6

_________[31,34) 1c=0 4c=2 6c=4 1c=6

_________[34,44) 1c=0 12c=2 7c=4 6c=6

_________[44,55) 1c=0 1c=2 10c=4 18c=6

_________[55,64) 1c=0 4c=2 6c=4 7c=6

_________[64,70) 1c=0 1c=2 6c=4 4c=6

_________[70,78) 1c=0 1c=2 3c=4 10c=6

_________[78,81) 1c=0 1c=2 1c=4 4c=6

_________[81,88) 1c=0 1c=2 2c=4 2c=6

Page 36: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

56 159 561 262 263 664 865 1366 1967 1768 1169 770 1571 1272 573 774 1075 676 1177 878 1079 980 781 182 383 384 285 386 287 188 289 190 1

Alg-1: on SEEDS210(CLS123, area, compact, asym_coef, len_kernel_groove)

The classes are

Class1 = Kama

Class2 = Rosa

Class3 = Canadian

_________[56,58) 1c=1 0c=2 0c=3_________[58,60) 3c=1 0c=2 2c=3

_________[60,72) 59c=1 2c=2 51c=3

_________[72,76) 3c=1 14c=2 11c=3

_________[76,81) 4c=1 37c=2 4c=3

_________[81,91) 0c=1 17c=2 2c=3

Page 37: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

85 187 288 389 390 191 292 793 1194 1795 1296 1397 1598 1099 10100 6101 5102 10103 3104 5105 1106 1107 7108 7109 8110 9111 8112 1113 9114 6115 2116 2117 3118 2119 1120 1121 3123 2125 1

Alg-1: on SEEDS210

(CLS123, area, perimeter, compact, kern_length. kern_width, asym_coef, len_kernel_groove)_________[85,92) 6c=1 0c=2 6c=3

_________[92,106) 60c=1 4c=2 61c=3

_________[106,112) 4c=1 33c=2 3c=3

_________[112,126) 0c=1 33c=2 0c=3

Page 38: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

6 8 7 2 8 7 9 110 611 112 513 215 516 618 419 620 221 322 524 425 326 227 328 429 431 332 233 834 335 236 237 138 140 241 142 144 545 146 247 148 349 150 252 353 254 356 157 258 160 161 265 167 168 269 172 173 177 178 179 1

Alg-1: on WINE_Quality150 (150 wine samples, 4

feature columns, 0-10 quality levels (only 4-7 occur))_________[6.15) 1c=4 12c=5 8c=6 3c=7

_________[15,17) 0c=4 2c=5 3c=6 1c=7

_________[17,23) 1c=4 13c=5 5c=6 1c=7

_________[23,30) 1c=4 13c=5 6c=6 3c=7

_________[30,39) 1c=4 15c=5 6c=6 3c=7

_________[39,51) 1c=4 13c=5 4c=6 3c=7

_________[51,59) 1c=4 10c=5 1c=6 1c=7_________[59,63) 1c=4 2c=5 8c=6 3c=7

_________[63,71) 1c=4 3c=5 2c=6 3c=7

_________[71,79) 1c=4 4c=5 8c=6 3c=7

Page 39: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

4 1 5 2 6 3 7 4 8 9 9 610 711 812 513 214 315 1116 417 918 219 620 521 422 323 624 325 526 327 328 329 330 131 132 533 234 535 236 137 138 139 142 143 244 146 157 158 160 188 1

Alg-1: on WN150468 (149 wine samples, 4 features ( highest std/(max-min) ),

4,6,8 quality levels only)

_________[0.31) 22c=4 85c=6 14c=8

_________[31,41) 4c=4 14c=6

_________[41,51) 1c=4 1c=6 3c=8

_________[51,91) 1c=4 3c=6

Page 40: =  j X j 2 d j 2 +2  j                                 Publish Cody Fields,  Modified 2 years ago

4 110 211 412 813 914 1315 616 917 818 925 126 427 228 329 630 131 632 233 234 435 136 138 239 140 141 144 145 146 147 148 249 350 151 253 154 255 156 457 158 262 1 63 1 64 3 75 1 76 2 77 1 85 1 86 1 87 2 89 1 90 1 92 1 94 1106 1107 1109 1121 1

Alg-1: on WN150468 (149 wine samples, 4 features ( highest std/(max-min) ),

L=34, H=7+8 2 quality levels only)

_________[0.21) 23c=L 26c=H

_________[21,37) 13c=L 37c=H

_________[37,42) 2c=L 3c=H

_________[42,60) 10c=L 13c=H

_________[60,72) 3c=L 2c=H

_________[72,80) 2c=L 2c=H

_________[80,100) 2c=L 6c=H

_________[100,122) 2c=L 2c=H

DPPMinVec-MaxVec: on WN150468 (149 wine samples, 4 features

( highest std/(max-min) ), L=34, H=7+8 2 quality levels only)

0 1 1 15 2 18 3 13 4 12 5 9 6 8 7 11 8 11 9 310 411 112 213 514 515 416 718 419 122 323 125 126 327 228 233 234 138 1

_________[0. 11) 38c=L 67c=H

_________[11,20) 13c=L 16c=H

_________[20,30) 4c=L 8c=H