39
52 de tari cu atributele: Atribute: GNI per capita=gross national income per capita=venitul national brut GNP growth rate=rata de crestere a PNB (masurata in procente) Debt/export (%)=datorie/venit din exporturi Reserves/imports (%)=rezerva unei tari/nivelul importurilor Inflation=rata inflatiei (%) CC=contul curent= suma dintre balanța comercială (exporturi minus importuri de bunuri și servicii), veniturile din producție și plățile de transfer (precum ajutorul extern). (% din PIB) FDI=foreign direct invesments=investitii directe straine=(% din PIB) Trade=suma exporturilor si importurilor de bunuri si servicii (% din PIB) > library(e1071) >library(scales) >exem <- read.csv("C:/tari.csv") >exem > set.seed(5) > km<-kmeans(exem[,2:5],4) > km K-means clustering with 4 clusters of sizes 6, 11, 31, 4 Cluster means: GNI GDPgrowthrate Debt.export Reserves.imports 1

DocumentEx

  • Upload
    silica

  • View
    217

  • Download
    3

Embed Size (px)

DESCRIPTION

Emp Lu Economic

Citation preview

Page 1: DocumentEx

52 de tari cu atributele:

Atribute:

GNI per capita=gross national income per capita=venitul national brut

GNP growth rate=rata de crestere a PNB (masurata in procente)

Debt/export (%)=datorie/venit din exporturi

Reserves/imports (%)=rezerva unei tari/nivelul importurilor

Inflation=rata inflatiei (%)

CC=contul curent=suma dintre balanța comercială (exporturi minus importuri de bunuri și servicii), veniturile din producție și plățile de transfer (precum ajutorul extern). (% din PIB)

FDI=foreign direct invesments=investitii directe straine=(% din PIB)

Trade=suma exporturilor si importurilor de bunuri si servicii (% din PIB)

> library(e1071)

>library(scales)

>exem <- read.csv("C:/tari.csv")

>exem

> set.seed(5)

> km<-kmeans(exem[,2:5],4)

> km

K-means clustering with 4 clusters of sizes 6, 11, 31, 4

Cluster means:

GNI GDPgrowthrate Debt.export Reserves.imports

1 627.1850 3.450000 1570.7983 40.97167

2 3377.0955 2.454545 149.0145 41.50000

3 623.5642 6.621290 220.6948 39.30097

4 7436.6600 7.070000 54.0650 114.362501

Page 2: DocumentEx

Clustering vector:

[1] 2 2 3 3 2 3 1 3 3 3 1 3 3 3 2 4 3 4 3 1 3 3 1 3 3 1 4 2 2 3 3 3 3 3 2 1 3

[38] 3 4 2 2 3 2 3 3 3 3 2 3 3 3 3

Within cluster sum of squares by cluster:

[1] 3092095 10841994 6159941 8727649

(between_SS / total_SS = 88.2 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

>plot(exem[,2],exem[,3], col=km$cluster)

2

Page 3: DocumentEx

0 2000 4000 6000 8000 10000

-50

510

1520

exem[, 2]

exem

[, 3]

>table(km$cluster, exem$RiskClass) Se afiseaza matricea de confuzie:

3 5 6 7

1 0 1 0 5

2 1 1 6 3

3 0 2 11 18

4 0 1 2 1

> km$cluster

[1] 2 2 3 3 2 3 1 3 3 3 1 3 3 3 2 4 3 4 3 1 3 3 1 3 3 1 4 2 2 3 3 3 3 3 2 1 3

[38] 3 4 2 2 3 2 3 3 3 3 2 3 3 3 33

Page 4: DocumentEx

>o<-order(km$cluster)

>data.frame(exem$Country[o], km$cluster[o]) Afiseaza tarile grupate alfabetic in cele 4 clustere.

exem.Country.o. km.cluster.o.

1 Burundi 1

2 Comoros 1

3 Guinea-Bissau 1

4 Kiribati 1

5 Liberia 1

6 Sao Tome and Principe 1

7 Albania 2

8 Algeria 2

9 Belarus 2

10 Dominica 2

11 Maldives 2

12 Marshall Islands 2

13 Samoa 2

14 St. Lucia 2

15 St. Vincent and the Grenadines 2

16 Swaziland 2

17 Tonga 2

18 Angola 3

19 Bangladesh 3

20 Bhutan 3

21 Cambodia 3

22 Central African Republic 3

23 Chad 3

24 Congo, Dem. Rep. 3

25 Cote d'Ivoire 34

Page 5: DocumentEx

26 Djibouti 3

27 Ethiopia 3

28 Guinea 3

29 Guyana 3

30 Haiti 3

31 Kyrgyz Republic 3

32 Lao PDR 3

33 Mauritania 3

34 Myanmar 3

35 Nepal 3

36 Niger 3

37 Rwanda 3

38 Sierra Leone 3

39 Solomon Islands 3

40 Sudan 3

41 Syrian Arab Republic 3

42 Tajikistan 3

43 Tanzania 3

44 Togo 3

45 Uzbekistan 3

46 Vanuatu 3

47 Yemen, Rep. 3

48 Zambia 3

49 Equatorial Guinea 4

50 Gabon 4

51 Libya 4

52 St. Kitts and Nevis 4

INTERPRETARE: Conform clusterizarii k-means, Clusterul 1 cuprinde Burundi , Comoros , Guinea-Bissau , Kiribati . Liberia , Sao Tome and Principe care ar avea cel mai mic grad de risc.

5

Page 6: DocumentEx

Clusterul 2: Albania ,Algeria ,Belarus, Dominica , Maldives , Maldives , Marshall Islands , Samoa , St. Vincent and the Grenadines , Tonga care ar avea urmatorul grad de risc.

Similar pentru clusterele 3 si 4.

Tarile din grupa de risc 4 au gradul de risc cel mai mare.

>plot(exem, col = km$cluster)

Country

0 8000 0 2000 0 20 -10 30 3 5 7

030

080

00

GNI

GDPgrowthrate

-515

020

00

Debt.export

Reserves.imports

025

0

020 Inflation

CCbalance

-40

40

-10

30

FDI

Trade

015

0

0 30

35

7

-5 15 0 250 -40 40 0 150

RiskClass

>plot(exem$GNI, exem$GDPgrowthrate, xlab="GNI", ylab="GDP gr rate", col=km$cluster)

6

Page 7: DocumentEx

0 2000 4000 6000 8000 10000

-50

510

1520

GNI

GD

P g

r rat

e

>text(x=exem$GNI, y=exem$GDPgrowthrate, labels=exem$Country, col=km$cluster)

7

Page 8: DocumentEx

Clusterizare ierarhica:

d <- dist(exem[,-10], method = "euclidean")

>d

Sau

d<-dist(exem[,-10], method="manhattan")

d

fit <- hclust(d, method="ward.D") Calculeaza distanta dintre clustere conform metodei Ward.plot(fit)  Afiseaza dendograma.

8

Page 9: DocumentEx

37 9 34 20 7 26 23 11 36 12 17 31 47 19 32 46 25 30 42 45 8 10 24 4 22 49 52 44 3 50 33 51 14 13 38 6 21 35 43 1 48 29 2 5 28 18 15 40 4139

16 27

010

000

3000

050

000

Cluster Dendrogram

hclust (*, "ward.D")d

Hei

ght

>groups <- cutree(fit, k=4) Imparte dendograma in 4 clustere

> groups Afiseaza cele 4 clustere

[1] 1 1 2 2 1 2 3 2 3 2 3 2 2 2 4 4 2 4 2 3 2 2 3 2 2 3 4 1 1 2 2 2 2 3 1 3 3

[38] 2 4 4 4 2 1 2 2 2 2 1 2 2 2 2

>rect.hclust(fit, k=5, border="red") Printr-o linie rosie sunt delimitate cele 4 clustere.

9

Page 10: DocumentEx

37 9 34 20 7 26 23 11 36 12 17 31 47 19 32 46 25 30 42 45 8 10 24 4 22 49 52 44 3 50 33 51 14 13 38 6 21 35 43 1 48 29 2 5 28 18 15 40 4139

16 27

010

000

3000

050

000

Cluster Dendrogram

hclust (*, "ward.D")d

Hei

ght

>hcd = as.dendrogram(fit)

>hcd Este caracterizata dendograma:

'dendrogram' with 2 branches and 52 members total, at height 53389.49

> plot(hcd) Un alt mod de a vizualiza dendograma.

>plot(hcd, type="triangle") Dendograma in forma triunghiulara

10

Page 11: DocumentEx

Daca vrem sa inspectam partea superioara a dendogramei:

op <- par(mfrow = c(2, 1)) Functia par ()combina mai multe grafice pe acceasi foaie. Comanda mfrow=c(nrlinii, nrcoloane) creeaza o matrice de grafice care sunt asezate linie cu linie.

plot(cut(hcd, h = 1000)$upper, main = "Upper tree of cut at h=1000") Vizualizam partea superioara a dendogramei deasupra inaltimii 1000.

plot(cut(hcd, h = 1000)$lower[[2]], main = "Second branch of lower tree with cut at h=1000") Vizualizam a doua ramura a dendogramei sub inaltimea 1000

11

Page 12: DocumentEx

030

000

Upper tree of cut at h=1000B

ranc

h 1

Bra

nch

2B

ranc

h 3

Bra

nch

4B

ranc

h 5

Bra

nch

6

Bra

nch

7B

ranc

h 8

Bra

nch

9

Bra

nch

10B

ranc

h 11

Bra

nch

12B

ranc

h 13

Bra

nch

14B

ranc

h 15

020

040

0

Second branch of lower tree with cut at h=1000

20 7 26

par(op) Revin la vechile setari.

O alta modalitate de a vizualiza dendograma este cu pachetul “ape”. Dupa ce l-am instalat:

>library(ape)

> plot(as.phylo(hc), cex = 0.9, label.offset = 1)

Functia label.offset controleaza distanta de la atribut la eticheta.

12

Page 13: DocumentEx

1

234

5

6

7

8

910

11121314

1516

17181920

21

2223

24

2526

272829

3031

323334

35

363738

39

4041

42

43

44

45

46

47

48

49

50

515253

54

55

56

57

5859

60

61

6263

64

65

66

6768

69

70

71

72

73

7475

76777879

808182

8384

85

8687

88

89

9091

92

93

94

959697

98

99

100

101

102

103

104105

106

107

108109110

111112113

114115

116117

118119

120

121

122

123

124

125

126

127128

129

130131132

133

134

135136

137138

139

140141142

143

144145146

147

148149

150

plot(as.phylo(hc), type = "cladogram", cex = 0.9, label.offset = 1)

plot(as.phylo(hc), type = "fan")

Algoritmul k-medoids

> library(fpc)

> scurt <- subset(exem, select = -c(1, 10 ) )Vreau sa lucrez cu un set de date din care elimin coloanele 1 si 10

>scurt

>scurt <- subset(exem, select = -c(1 ) ) Elimin coloana 1 din setul de date pt. ca algoritmul bazat pe k-medoizi opereaza numai cu valori numerice.

13

Page 14: DocumentEx

>pamk.result<-pamk(scurt)

>pamk.result

$pamobject

Medoids:

ID GNI GDPgrowthrate Debt.export Reserves.imports Inflation CCbalance

41 41 4798.21 2.13 125.72 23.87 3.733429 -18.53

52 52 574.60 5.34 213.21 21.27 18.324440 -8.35

FDI Trade RiskClass

41 7.27 88.55481 7

52 4.97 71.23470 5

Clustering vector:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

1 1 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 2

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2

Objective function:

build swap

917.9195 868.4497

Available components:

[1] "medoids" "id.med" "clustering" "objective" "isolation"

[6] "clusinfo" "silinfo" "diss" "call" "data"

$nc

[1] 2

$crit

[1] 0.0000000 0.6595510 0.5954880 0.6054187 0.6054221 0.4967805 0.4999702

[8] 0.5248114 0.4886549 0.503776614

Page 15: DocumentEx

Daca nu s-a precizat nr. de clustere, s-au impartit datele in 2clustere.

> pamk.result$nc Afiseaza nr. de clustere create, 2.

>table(pamk.result$pamobject$clustering, scurt$RiskClass) Afiseaza matricea de contingenta.

3 5 6 7

1 1 2 5 4

2 0 3 14 23

>tab<- table(pamk.result$pamobject$clustering, scurt$RiskClass)

> classAgreement(tab)

$diag

[1] 0.07692308

$kappa

[1] -0.001605136

$rand

[1] 0.5098039

$crand

[1] 0.07031078

Algoritmul fuzzy –cmeans

Dorim sa realizam o clusterizare c-means cu 3 clustere.

>result<-cmeans(scurt[,-10],3, 100, m=3, method="cmeans")

> result

Fuzzy c-means clustering with 3 clusters15

Page 16: DocumentEx

Cluster centers:

GNI GDPgrowthrate Debt.export Reserves.imports Inflation CCbalance

1 6984.6295 7.916898 49.95741 132.43272 4.167117 15.241457

2 2740.3475 3.165602 184.38649 50.91883 4.995079 -6.802395

3 504.7024 6.107225 321.16541 33.78820 10.065506 -0.739523

FDI Trade RiskClass

1 3.347202 110.59149 6.177649

2 2.208445 99.08389 5.890744

3 4.027318 73.98097 6.578867

Memberships:

1 2 3

1 0.01819712 0.94687502 0.03492786

2 0.06687442 0.82439033 0.10873525

3 0.09590958 0.45338490 0.45070553

4 0.02249801 0.06419775 0.91330423

5 0.07980812 0.80080484 0.11938704

6 0.07017390 0.24916885 0.68065725

7 0.10415322 0.25219831 0.64364847

8 0.03318873 0.09459157 0.87221971

9 0.04095704 0.11199492 0.84704804

10 0.03718570 0.10552406 0.85729024

11 0.14693307 0.33475139 0.51831554

12 0.04839847 0.12674637 0.82485516

13 0.05181875 0.16925727 0.77892398

14 0.05992914 0.20275012 0.73732073

15 0.41318950 0.39286515 0.19394534

16 0.94900275 0.03093745 0.0200598116

Page 17: DocumentEx

17 0.04223468 0.11176704 0.84599828

18 0.57482464 0.27201738 0.15315798

19 0.02848940 0.07786530 0.89364530

20 0.08007379 0.21371274 0.70621347

21 0.06991964 0.24803956 0.68204080

22 0.01683486 0.04806666 0.93509848

23 0.13636242 0.44518168 0.41845590

24 0.01557795 0.04459914 0.93982291

25 0.01038617 0.02966119 0.95995264

26 0.08856148 0.22143649 0.69000203

27 0.82273104 0.10525715 0.07201181

28 0.08831928 0.78476826 0.12691246

29 0.15164585 0.66205977 0.18629439

30 0.02337579 0.07016511 0.90645910

31 0.03891965 0.10512003 0.85596032

32 0.02952092 0.08083563 0.88964345

33 0.04578063 0.14093428 0.81328509

34 0.04389534 0.11833118 0.83777348

35 0.08254127 0.67715272 0.24030601

36 0.14105472 0.33587412 0.52307116

37 0.05286293 0.14018678 0.80695029

38 0.05446047 0.17889924 0.76664028

39 0.59421255 0.23106112 0.17472633

40 0.41118102 0.39465554 0.19416343

41 0.38853562 0.41341891 0.19804547

42 0.02546826 0.07770023 0.89683152

43 0.06120001 0.79749970 0.14130029

44 0.09424394 0.41609818 0.48965787

45 0.03743814 0.10370902 0.8588528417

Page 18: DocumentEx

46 0.02080902 0.05794930 0.92124168

47 0.02754074 0.07623115 0.89622812

48 0.03505848 0.89030387 0.07463765

49 0.03345270 0.09802371 0.86852359

50 0.09673117 0.50213370 0.40113513

51 0.04430726 0.13772871 0.81796403

52 0.01873041 0.05544900 0.92582059

Closest hard clustering:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

2 2 2 3 2 3 3 3 3 3 3 3 3 3 1 1 3 1 3 3 3 3 2 3 3 3

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

1 2 2 3 3 3 3 3 2 3 3 3 1 1 2 3 2 3 3 3 3 2 3 2 3 3

Available components:

[1] "centers" "size" "cluster" "membership" "iter"

[6] "withinerror" "call"

Mai sus este afisat gradul de apartenenta al fiecarei tari la cele 3 clustere.

> plot(scurt[,2], scurt[,3], col=result$cluster)

18

Page 19: DocumentEx

-5 0 5 10 15 20

050

010

0015

0020

00

scurt[, 2]

scur

t[, 3

]

> points(result$centers[,c(2,3)], col=1:3, pch=8, cex=2)

Apar centroizii cellor 3 clustere corespunzatoare celui mai apropiat model hard clustering.

19

Page 20: DocumentEx

-5 0 5 10 15 20

050

010

0015

0020

00

scurt[, 2]

scur

t[, 3

]

>tab<-table(scurt$RiskClass, result$cluster)

>tab

Se afiseaza matricea de confuzie:

Valori previzionate

1 2 3

Valori reale 3 0 1 0

5 2 1 2

6 3 6 10

7 1 4 22 20

Page 21: DocumentEx

Intepretare: Din primul cluster fac parte 2 tari cu clasa de risc 5,3 tari cu rata de risc 6 si 1 tara cu rata de risc 7.

Din clusterul 2 face parte 1 tara cu clasa de risc 3, 1 tara cu clasa de risc 5, 6 tari cu clasa de risc 6 si 4 tari cu rata de risc 7.

> classAgreement(tab)

$diag

[1] 0.2115385

$kappa

[1] -0.07028112

$rand

[1] 0.5701357

$crand

[1] 0.1348571

Interpretare: Rata de acuratete a modelului este de 21,11%. Coeficientul lui Cohen este -0.0702, ceea ce conduce la concluzia ca datele nu sunt de incredere.

SOM

> library(kohonen)

>set.seed(101)

>exemplu<-exem[-1] Am eliminat prima coloana din setul de date, care este denumirea tarilor

> train.obs<-sample(nrow(exemplu),12) Aleg un esantion de 12 observatii pentru setul de antrenament.

>train.obs

> train.set <- scale(exemplu[train.obs,][,-9]) Elimin variabila clasa, adica coloana a 9-a.

> train.set

Functia scale() opereaza numai cu vectori numerici. Construiesc setul de antrenament. 21

Page 22: DocumentEx

Functia scale() standardizeaza datele.

>test.set<-scale(exemplu[-train.obs, ][-9], center = attr(train.set, "scaled:center"),scale = attr(train.set, "scaled:scale"))

center = attr(train.set, "scaled:center") Se scade media unei coloane din elementele ei.

scale = attr(train.set, "scaled:scale") Se impart coloanele (centrate) cu abaterea lor standard.

Construim setul de date de testare eliminand setul de antrenare

>test.set

Construim harta Kohonen:

> somexemplu <- som(train.set, grid = somgrid(3, 2, "hexagonal"))

> somexemplu

ATENTIE! Cand construiesc harta Kohonen, trebuie sa am in vedere ca produsul dimensiunilor hartii , aici 2x3 sa fie mai mic decat nr. observatiilor din setul de antrenare, aici 20.

> plot(somexemplu)

22

Page 23: DocumentEx

GNIGDPgrowthrateDebt.export

Reserves.importsInflationCCbalance

FDITrade

Se observa ca fiecare cluster este caracterizat de una sau mai multe variabile predominante, corespunzatoare triunghiului colorat mai mare.

Predictie cu SOM:

>somprediction<-predict(somexemplu, newdata=test.set, trainX=train.set, trainY=classvec2classmat(exemplu[,9][train.obs]))

Functia classvec2classmat() aplicata unui vector (=variabila de clasa) reprezinta acel vector sub forma unei matrici cu elementele 0 si 1, unde 1 reprezinta apartenenta la clasa si 0 non-apartenenta.

Facem o predictie a setului de testare.

> somprediction23

Page 24: DocumentEx

>tab<- table(exemplu[,9][-train.obs], somprediction$prediction)

>tab Se afiseaza matricea de confuzie:

Date previzionate

3 5 6 7

Date reale 5 0 0 2 2

6 1 2 7 4

7 2 5 10 5

> classAgreement(tab)

$diag

[1] 0.3

$kappa

[1] -0.04477612

$rand

[1] 0.5051282

$crand

[1] -0.0496444

Interpretare

Prin metoda SOM, rata de acuratete a modelului este de 30%. Coeficientul lui Cohen este -0.044 , ceea ce conduce la concluzia ca datele nu sunt de incredere.

Alte tipuri de dendograme:

>set.seed(101)

> samplexemplu <- exem[sample(1:52, 15),]

24

Page 25: DocumentEx

> samplexemplu

>distance <- dist(samplexemplu[,-10], method="euclidean")

>cluster <- hclust(distance, method="average")

>plot(cluster, hang=-1, label=samplexemplu$RiskClass)5 5 7 6 7 6 7 5 7 6 6 7 7 7 7

010

0020

0030

0040

0050

00

Cluster Dendrogram

hclust (*, "average")distance

Hei

ght

>plot(as.dendrogram(cluster), edgePar=list(col="darkgreen", lwd=2), horiz=T)

Comanda edgePar= o lista care specifica parametrii muchiilor dendogramei

lwd= line width=grosimea muchiilor dendogramei, ia numai valori pozitive

lwd=1 (by default)

25

Page 26: DocumentEx

horiz=T (muchiile dendogramei sunt orizontale)

5000 4000 3000 2000 1000 0

15182716282936320371233243042

> str(as.dendrogram(cluster)) Afiseaza dendograma sub forma de text.

>group.3 <- cutree(cluster, k = 3) Imparte dendograma in 3 clustere.

>table(group.3, samplexemplu$RiskClass) Compara cele 3 clustere cu clasede de risc cunoscute:

group.3 5 6 7

1 1 2 6

2 2 1 1

26

Page 27: DocumentEx

3 0 1 1

Din clusterul 1 fac parte 1 tara cu clasa de risc 5, 2 tari cu clasa de risc 6 si 6 tari cu clasa de risc 7, etc.

>plot(cluster); rect.hclust(cluster, k=5, border="red")15 18 27 16 28 29

363

20 37 1233 24 30 42

010

0020

0030

0040

0050

00

Cluster Dendrogram

hclust (*, "average")distance

Hei

ght

>par(mfrow=c(1,1))

> z <- as.dendrogram(cluster)

> attr(z[[2]][[2]],"edgePar") <- list(col="blue", lwd=4, pch=NA)

> attr(z[[2]][[1]],"edgePar") <- list(col="red", lwd=3, lty=3, pch=NA)

> plot(z, horiz=T)27

Page 28: DocumentEx

5000 4000 3000 2000 1000 0

15182716282936320371233243042

ty =tipul liniei

28

Page 29: DocumentEx

> z[[2]]

Arbori de decizie

Se instaleaza pachetul rpart.

>library(rpart)

> fit<-rpart(RiskClass ~ ., data=exemplu)

> fit Apare descries arborele si regulile sale:

n= 52

node), split, n, deviance, yval

* denotes terminal node

1) root 52 34.057690 6.365385

2) GNI>=1563.5 18 16.944440 5.944444 *

3) GNI< 1563.5 34 12.235290 6.588235

6) GDPgrowthrate>=3.855 24 9.833333 6.416667

12) Trade< 73.67603 12 5.666667 6.166667 *

13) Trade>=73.67603 12 2.666667 6.666667 *

7) GDPgrowthrate< 3.855 10 0.000000 7.000000 *

Nodurile marcate cu * sunt noduri terminale.

> plot(fit, uniform=TRUE, main="Arbore decizional") Se traseaza muchiile arborelui.

>text(fit, use.n=TRUE, all=TRUE, cex=.8) Se eticheteaza nodurile arborelui.

29

Page 30: DocumentEx

Arbore decizional

|GNI>=1564

GDPgrowthrate>=3.855

Trade< 73.68

6.365n=52

5.944n=18

6.588n=34

6.417n=24

6.167n=12

6.667n=12

7n=10

Din arborele de mai sus rezulta:

Daca GNI>=1563.5, se obtin 18 observatii cu media clasei de risc 5.944.

Daca GNI< 1563.5 34

Daca GDPgrowthrate<3.855, se obtin 10 observatii, cu media clasei de risc 7.

Daca GDPgrowthrate>=3.855

Daca Trade< 73.67603, se obtin 12 observatii cu media clasei de risc 6.166667.

Daca Trade>=73.67603, se obtin 12 observatii cu media clasei de risc 6.166667.

30

Page 31: DocumentEx

Clasificatorul naiv Bayesian(=CNB)

>library(e1071)

>library(class)

> model<-naiveBayes(exem[,1:9], factor(exem[,10]))

> model Sunt afisate probabilitatile apriori si cele conditionate.

Predictie cu CNB

Vrem sa stabilim clasa de risc a primelor 9 tari din setul de date, pe baza CNB:

> predict(model, exem[1:9,-10])

[1] 6 7 5 6 7 6 7 6 7

Levels: 3 5 6 7

Deci primele 9 tari din setul de date au clasele de risc: 6 7 5 6 7 6 7 6 7.

> table(predict(model, exem[,-10]), exem[,10], dnn=list('predicted','actual'))

actual

predicted 3 5 6 7

3 0 0 0 0

5 0 5 0 0

6 0 0 19 0

7 1 0 0 27

> tab<-table(predict(model, exem[,-10]), exem[,10], dnn=list('predicted','actual'))

>classAgreement(tab)

$diag

[1] 0.9807692

$kappa

[1] 0.9667093

$rand31

Page 32: DocumentEx

[1] 0.979638

$crand

[1] 0.9579734

Retele neuronale

> library(neuralnet)

> set.seed(123)

>head(exem)

> size.sample <- 12

> trainset <- exem[sample(1:nrow(exem), size.sample),]

> trainset

>nnet_train <- cbind(trainset, trainset$RiskClass == '5')

> nnet_train <- cbind(nnet_train, trainset$RiskClass == '6')

> nnet_train <- cbind(nnet_train, trainset$RiskClass == '7')

> names(nnet_train)[11] <- 'Clasa3'

> names(nnet_train)[12] <- 'Clasa5'

> names(nnet_train)[13]<-'Clasa6'

> names(nnet_train)[14]<-'Clasa7'

> names(nnet_train)

[1] "Country" "GNI" "GDPgrowthrate" "Debt.export"

[5] "Reserves.imports" "Inflation" "CCbalance" "FDI"

[9] "Trade" "RiskClass" "Clasa3" "Clasa5"

[13] "Clasa6" "Clasa7"

> nn <- neuralnet(Clasa5+Clasa6+Clasa7~GNI+GDPgrowthrate+Debt.export+Reserves.imports+Inflation+CCbalance+Trade+FDI, data=nnet_train, hidden=c(3,3))

> nn

32

Page 33: DocumentEx

Call: neuralnet(formula = Clasa5 + Clasa6 + Clasa7 ~ GNI + GDPgrowthrate + Debt.export + Reserves.imports + Inflation + CCbalance + Trade + FDI, data = nnet_train, hidden = c(3, 3))

1 repetition was calculated.

Error Reached Threshold Steps

1 2.082674381 0.008577950098 110

> plot(nn)

33

Page 34: DocumentEx

0.79056

0.68

759

0.16

72

FDI

1.024410.13

986

0.63

735

Trade

2.4023

0.294

6

-0.4

0149

CCbalance

0.56405

0.95379-0.41

397

Inflation

1.19486

1.68938

-0.1085

8

Reserves.imports

0.47634

-0.30244

-0.38856

Debt.export

2.07476-0.32495

0.0407GDPgrowthrate

0.05180.04884

0.05852

GNI

-1.3253

0.71

801

-1.7

9005

-0.04877

-0.82283

-2.19

212

-2.48883-1.09018

-8.31492

2.37665

-1.94

301

-0.9

7838

1.48445-1.1663

0.58

187

0.32356-2.69982

1.51329

Clasa7

Clasa6

Clasa5

0.806280.47596

0.03041

1

-0.82967-1.78544

0.08873

1

0.072380.90602

-0.0197

1

Error: 2.082674 Steps: 110

>exem1<-exem[,-1] Eliminam prima coloana (nenumerica) din setul de date.

> exem1

> predictie<- compute(nn, exem1[-9])$net.result

> maxid<-function(argument){return(which(argument==max(argument)))}

> idx<- apply(predictie, c(1), maxid)

> idx

[1] 2 2 2 3 2 3 3 2 3 2 3 3 2 2 2 2 3 2 3 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 3

[38] 2 2 2 2 3 2 2 2 3 3 2 2 2 2 3

> prediction <- c('Clasa3', 'Clasa5', 'Clasa6', 'Clasa7')[idx]

>prediction34

Page 35: DocumentEx

> table(prediction, exem$RiskClass) Se afiseaza matricea de confuzie.

Valori reale

Previzionate

prediction 3 5 6 7

Clasa5 1 3 11 12

Clasa6 0 2 8 15

Din cele 4 clase de risc (3, 5,6, 7) reteaua a detectat numai 2, Clasa5 si Clasa6.

>tab<-table(prediction, exem$RiskClass)

> library(e1071)

> classAgreement(tab)

$diag

[1] 0.05769230769

$kappa

[1] 0.001567398119

$rand

[1] 0.4969834087

$crand

[1] -0.009643901671

35