Upload
silica
View
217
Download
3
Embed Size (px)
DESCRIPTION
Emp Lu Economic
Citation preview
52 de tari cu atributele:
Atribute:
GNI per capita=gross national income per capita=venitul national brut
GNP growth rate=rata de crestere a PNB (masurata in procente)
Debt/export (%)=datorie/venit din exporturi
Reserves/imports (%)=rezerva unei tari/nivelul importurilor
Inflation=rata inflatiei (%)
CC=contul curent=suma dintre balanța comercială (exporturi minus importuri de bunuri și servicii), veniturile din producție și plățile de transfer (precum ajutorul extern). (% din PIB)
FDI=foreign direct invesments=investitii directe straine=(% din PIB)
Trade=suma exporturilor si importurilor de bunuri si servicii (% din PIB)
> library(e1071)
>library(scales)
>exem <- read.csv("C:/tari.csv")
>exem
> set.seed(5)
> km<-kmeans(exem[,2:5],4)
> km
K-means clustering with 4 clusters of sizes 6, 11, 31, 4
Cluster means:
GNI GDPgrowthrate Debt.export Reserves.imports
1 627.1850 3.450000 1570.7983 40.97167
2 3377.0955 2.454545 149.0145 41.50000
3 623.5642 6.621290 220.6948 39.30097
4 7436.6600 7.070000 54.0650 114.362501
Clustering vector:
[1] 2 2 3 3 2 3 1 3 3 3 1 3 3 3 2 4 3 4 3 1 3 3 1 3 3 1 4 2 2 3 3 3 3 3 2 1 3
[38] 3 4 2 2 3 2 3 3 3 3 2 3 3 3 3
Within cluster sum of squares by cluster:
[1] 3092095 10841994 6159941 8727649
(between_SS / total_SS = 88.2 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
>plot(exem[,2],exem[,3], col=km$cluster)
2
0 2000 4000 6000 8000 10000
-50
510
1520
exem[, 2]
exem
[, 3]
>table(km$cluster, exem$RiskClass) Se afiseaza matricea de confuzie:
3 5 6 7
1 0 1 0 5
2 1 1 6 3
3 0 2 11 18
4 0 1 2 1
> km$cluster
[1] 2 2 3 3 2 3 1 3 3 3 1 3 3 3 2 4 3 4 3 1 3 3 1 3 3 1 4 2 2 3 3 3 3 3 2 1 3
[38] 3 4 2 2 3 2 3 3 3 3 2 3 3 3 33
>o<-order(km$cluster)
>data.frame(exem$Country[o], km$cluster[o]) Afiseaza tarile grupate alfabetic in cele 4 clustere.
exem.Country.o. km.cluster.o.
1 Burundi 1
2 Comoros 1
3 Guinea-Bissau 1
4 Kiribati 1
5 Liberia 1
6 Sao Tome and Principe 1
7 Albania 2
8 Algeria 2
9 Belarus 2
10 Dominica 2
11 Maldives 2
12 Marshall Islands 2
13 Samoa 2
14 St. Lucia 2
15 St. Vincent and the Grenadines 2
16 Swaziland 2
17 Tonga 2
18 Angola 3
19 Bangladesh 3
20 Bhutan 3
21 Cambodia 3
22 Central African Republic 3
23 Chad 3
24 Congo, Dem. Rep. 3
25 Cote d'Ivoire 34
26 Djibouti 3
27 Ethiopia 3
28 Guinea 3
29 Guyana 3
30 Haiti 3
31 Kyrgyz Republic 3
32 Lao PDR 3
33 Mauritania 3
34 Myanmar 3
35 Nepal 3
36 Niger 3
37 Rwanda 3
38 Sierra Leone 3
39 Solomon Islands 3
40 Sudan 3
41 Syrian Arab Republic 3
42 Tajikistan 3
43 Tanzania 3
44 Togo 3
45 Uzbekistan 3
46 Vanuatu 3
47 Yemen, Rep. 3
48 Zambia 3
49 Equatorial Guinea 4
50 Gabon 4
51 Libya 4
52 St. Kitts and Nevis 4
INTERPRETARE: Conform clusterizarii k-means, Clusterul 1 cuprinde Burundi , Comoros , Guinea-Bissau , Kiribati . Liberia , Sao Tome and Principe care ar avea cel mai mic grad de risc.
5
Clusterul 2: Albania ,Algeria ,Belarus, Dominica , Maldives , Maldives , Marshall Islands , Samoa , St. Vincent and the Grenadines , Tonga care ar avea urmatorul grad de risc.
Similar pentru clusterele 3 si 4.
Tarile din grupa de risc 4 au gradul de risc cel mai mare.
>plot(exem, col = km$cluster)
Country
0 8000 0 2000 0 20 -10 30 3 5 7
030
080
00
GNI
GDPgrowthrate
-515
020
00
Debt.export
Reserves.imports
025
0
020 Inflation
CCbalance
-40
40
-10
30
FDI
Trade
015
0
0 30
35
7
-5 15 0 250 -40 40 0 150
RiskClass
>plot(exem$GNI, exem$GDPgrowthrate, xlab="GNI", ylab="GDP gr rate", col=km$cluster)
6
0 2000 4000 6000 8000 10000
-50
510
1520
GNI
GD
P g
r rat
e
>text(x=exem$GNI, y=exem$GDPgrowthrate, labels=exem$Country, col=km$cluster)
7
Clusterizare ierarhica:
d <- dist(exem[,-10], method = "euclidean")
>d
Sau
d<-dist(exem[,-10], method="manhattan")
d
fit <- hclust(d, method="ward.D") Calculeaza distanta dintre clustere conform metodei Ward.plot(fit) Afiseaza dendograma.
8
37 9 34 20 7 26 23 11 36 12 17 31 47 19 32 46 25 30 42 45 8 10 24 4 22 49 52 44 3 50 33 51 14 13 38 6 21 35 43 1 48 29 2 5 28 18 15 40 4139
16 27
010
000
3000
050
000
Cluster Dendrogram
hclust (*, "ward.D")d
Hei
ght
>groups <- cutree(fit, k=4) Imparte dendograma in 4 clustere
> groups Afiseaza cele 4 clustere
[1] 1 1 2 2 1 2 3 2 3 2 3 2 2 2 4 4 2 4 2 3 2 2 3 2 2 3 4 1 1 2 2 2 2 3 1 3 3
[38] 2 4 4 4 2 1 2 2 2 2 1 2 2 2 2
>rect.hclust(fit, k=5, border="red") Printr-o linie rosie sunt delimitate cele 4 clustere.
9
37 9 34 20 7 26 23 11 36 12 17 31 47 19 32 46 25 30 42 45 8 10 24 4 22 49 52 44 3 50 33 51 14 13 38 6 21 35 43 1 48 29 2 5 28 18 15 40 4139
16 27
010
000
3000
050
000
Cluster Dendrogram
hclust (*, "ward.D")d
Hei
ght
>hcd = as.dendrogram(fit)
>hcd Este caracterizata dendograma:
'dendrogram' with 2 branches and 52 members total, at height 53389.49
> plot(hcd) Un alt mod de a vizualiza dendograma.
>plot(hcd, type="triangle") Dendograma in forma triunghiulara
10
Daca vrem sa inspectam partea superioara a dendogramei:
op <- par(mfrow = c(2, 1)) Functia par ()combina mai multe grafice pe acceasi foaie. Comanda mfrow=c(nrlinii, nrcoloane) creeaza o matrice de grafice care sunt asezate linie cu linie.
plot(cut(hcd, h = 1000)$upper, main = "Upper tree of cut at h=1000") Vizualizam partea superioara a dendogramei deasupra inaltimii 1000.
plot(cut(hcd, h = 1000)$lower[[2]], main = "Second branch of lower tree with cut at h=1000") Vizualizam a doua ramura a dendogramei sub inaltimea 1000
11
030
000
Upper tree of cut at h=1000B
ranc
h 1
Bra
nch
2B
ranc
h 3
Bra
nch
4B
ranc
h 5
Bra
nch
6
Bra
nch
7B
ranc
h 8
Bra
nch
9
Bra
nch
10B
ranc
h 11
Bra
nch
12B
ranc
h 13
Bra
nch
14B
ranc
h 15
020
040
0
Second branch of lower tree with cut at h=1000
20 7 26
par(op) Revin la vechile setari.
O alta modalitate de a vizualiza dendograma este cu pachetul “ape”. Dupa ce l-am instalat:
>library(ape)
> plot(as.phylo(hc), cex = 0.9, label.offset = 1)
Functia label.offset controleaza distanta de la atribut la eticheta.
12
1
234
5
6
7
8
910
11121314
1516
17181920
21
2223
24
2526
272829
3031
323334
35
363738
39
4041
42
43
44
45
46
47
48
49
50
515253
54
55
56
57
5859
60
61
6263
64
65
66
6768
69
70
71
72
73
7475
76777879
808182
8384
85
8687
88
89
9091
92
93
94
959697
98
99
100
101
102
103
104105
106
107
108109110
111112113
114115
116117
118119
120
121
122
123
124
125
126
127128
129
130131132
133
134
135136
137138
139
140141142
143
144145146
147
148149
150
plot(as.phylo(hc), type = "cladogram", cex = 0.9, label.offset = 1)
plot(as.phylo(hc), type = "fan")
Algoritmul k-medoids
> library(fpc)
> scurt <- subset(exem, select = -c(1, 10 ) )Vreau sa lucrez cu un set de date din care elimin coloanele 1 si 10
>scurt
>scurt <- subset(exem, select = -c(1 ) ) Elimin coloana 1 din setul de date pt. ca algoritmul bazat pe k-medoizi opereaza numai cu valori numerice.
13
>pamk.result<-pamk(scurt)
>pamk.result
$pamobject
Medoids:
ID GNI GDPgrowthrate Debt.export Reserves.imports Inflation CCbalance
41 41 4798.21 2.13 125.72 23.87 3.733429 -18.53
52 52 574.60 5.34 213.21 21.27 18.324440 -8.35
FDI Trade RiskClass
41 7.27 88.55481 7
52 4.97 71.23470 5
Clustering vector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 1 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 2
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2
Objective function:
build swap
917.9195 868.4497
Available components:
[1] "medoids" "id.med" "clustering" "objective" "isolation"
[6] "clusinfo" "silinfo" "diss" "call" "data"
$nc
[1] 2
$crit
[1] 0.0000000 0.6595510 0.5954880 0.6054187 0.6054221 0.4967805 0.4999702
[8] 0.5248114 0.4886549 0.503776614
Daca nu s-a precizat nr. de clustere, s-au impartit datele in 2clustere.
> pamk.result$nc Afiseaza nr. de clustere create, 2.
>table(pamk.result$pamobject$clustering, scurt$RiskClass) Afiseaza matricea de contingenta.
3 5 6 7
1 1 2 5 4
2 0 3 14 23
>tab<- table(pamk.result$pamobject$clustering, scurt$RiskClass)
> classAgreement(tab)
$diag
[1] 0.07692308
$kappa
[1] -0.001605136
$rand
[1] 0.5098039
$crand
[1] 0.07031078
Algoritmul fuzzy –cmeans
Dorim sa realizam o clusterizare c-means cu 3 clustere.
>result<-cmeans(scurt[,-10],3, 100, m=3, method="cmeans")
> result
Fuzzy c-means clustering with 3 clusters15
Cluster centers:
GNI GDPgrowthrate Debt.export Reserves.imports Inflation CCbalance
1 6984.6295 7.916898 49.95741 132.43272 4.167117 15.241457
2 2740.3475 3.165602 184.38649 50.91883 4.995079 -6.802395
3 504.7024 6.107225 321.16541 33.78820 10.065506 -0.739523
FDI Trade RiskClass
1 3.347202 110.59149 6.177649
2 2.208445 99.08389 5.890744
3 4.027318 73.98097 6.578867
Memberships:
1 2 3
1 0.01819712 0.94687502 0.03492786
2 0.06687442 0.82439033 0.10873525
3 0.09590958 0.45338490 0.45070553
4 0.02249801 0.06419775 0.91330423
5 0.07980812 0.80080484 0.11938704
6 0.07017390 0.24916885 0.68065725
7 0.10415322 0.25219831 0.64364847
8 0.03318873 0.09459157 0.87221971
9 0.04095704 0.11199492 0.84704804
10 0.03718570 0.10552406 0.85729024
11 0.14693307 0.33475139 0.51831554
12 0.04839847 0.12674637 0.82485516
13 0.05181875 0.16925727 0.77892398
14 0.05992914 0.20275012 0.73732073
15 0.41318950 0.39286515 0.19394534
16 0.94900275 0.03093745 0.0200598116
17 0.04223468 0.11176704 0.84599828
18 0.57482464 0.27201738 0.15315798
19 0.02848940 0.07786530 0.89364530
20 0.08007379 0.21371274 0.70621347
21 0.06991964 0.24803956 0.68204080
22 0.01683486 0.04806666 0.93509848
23 0.13636242 0.44518168 0.41845590
24 0.01557795 0.04459914 0.93982291
25 0.01038617 0.02966119 0.95995264
26 0.08856148 0.22143649 0.69000203
27 0.82273104 0.10525715 0.07201181
28 0.08831928 0.78476826 0.12691246
29 0.15164585 0.66205977 0.18629439
30 0.02337579 0.07016511 0.90645910
31 0.03891965 0.10512003 0.85596032
32 0.02952092 0.08083563 0.88964345
33 0.04578063 0.14093428 0.81328509
34 0.04389534 0.11833118 0.83777348
35 0.08254127 0.67715272 0.24030601
36 0.14105472 0.33587412 0.52307116
37 0.05286293 0.14018678 0.80695029
38 0.05446047 0.17889924 0.76664028
39 0.59421255 0.23106112 0.17472633
40 0.41118102 0.39465554 0.19416343
41 0.38853562 0.41341891 0.19804547
42 0.02546826 0.07770023 0.89683152
43 0.06120001 0.79749970 0.14130029
44 0.09424394 0.41609818 0.48965787
45 0.03743814 0.10370902 0.8588528417
46 0.02080902 0.05794930 0.92124168
47 0.02754074 0.07623115 0.89622812
48 0.03505848 0.89030387 0.07463765
49 0.03345270 0.09802371 0.86852359
50 0.09673117 0.50213370 0.40113513
51 0.04430726 0.13772871 0.81796403
52 0.01873041 0.05544900 0.92582059
Closest hard clustering:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
2 2 2 3 2 3 3 3 3 3 3 3 3 3 1 1 3 1 3 3 3 3 2 3 3 3
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
1 2 2 3 3 3 3 3 2 3 3 3 1 1 2 3 2 3 3 3 3 2 3 2 3 3
Available components:
[1] "centers" "size" "cluster" "membership" "iter"
[6] "withinerror" "call"
Mai sus este afisat gradul de apartenenta al fiecarei tari la cele 3 clustere.
> plot(scurt[,2], scurt[,3], col=result$cluster)
18
-5 0 5 10 15 20
050
010
0015
0020
00
scurt[, 2]
scur
t[, 3
]
> points(result$centers[,c(2,3)], col=1:3, pch=8, cex=2)
Apar centroizii cellor 3 clustere corespunzatoare celui mai apropiat model hard clustering.
19
-5 0 5 10 15 20
050
010
0015
0020
00
scurt[, 2]
scur
t[, 3
]
>tab<-table(scurt$RiskClass, result$cluster)
>tab
Se afiseaza matricea de confuzie:
Valori previzionate
1 2 3
Valori reale 3 0 1 0
5 2 1 2
6 3 6 10
7 1 4 22 20
Intepretare: Din primul cluster fac parte 2 tari cu clasa de risc 5,3 tari cu rata de risc 6 si 1 tara cu rata de risc 7.
Din clusterul 2 face parte 1 tara cu clasa de risc 3, 1 tara cu clasa de risc 5, 6 tari cu clasa de risc 6 si 4 tari cu rata de risc 7.
> classAgreement(tab)
$diag
[1] 0.2115385
$kappa
[1] -0.07028112
$rand
[1] 0.5701357
$crand
[1] 0.1348571
Interpretare: Rata de acuratete a modelului este de 21,11%. Coeficientul lui Cohen este -0.0702, ceea ce conduce la concluzia ca datele nu sunt de incredere.
SOM
> library(kohonen)
>set.seed(101)
>exemplu<-exem[-1] Am eliminat prima coloana din setul de date, care este denumirea tarilor
> train.obs<-sample(nrow(exemplu),12) Aleg un esantion de 12 observatii pentru setul de antrenament.
>train.obs
> train.set <- scale(exemplu[train.obs,][,-9]) Elimin variabila clasa, adica coloana a 9-a.
> train.set
Functia scale() opereaza numai cu vectori numerici. Construiesc setul de antrenament. 21
Functia scale() standardizeaza datele.
>test.set<-scale(exemplu[-train.obs, ][-9], center = attr(train.set, "scaled:center"),scale = attr(train.set, "scaled:scale"))
center = attr(train.set, "scaled:center") Se scade media unei coloane din elementele ei.
scale = attr(train.set, "scaled:scale") Se impart coloanele (centrate) cu abaterea lor standard.
Construim setul de date de testare eliminand setul de antrenare
>test.set
Construim harta Kohonen:
> somexemplu <- som(train.set, grid = somgrid(3, 2, "hexagonal"))
> somexemplu
ATENTIE! Cand construiesc harta Kohonen, trebuie sa am in vedere ca produsul dimensiunilor hartii , aici 2x3 sa fie mai mic decat nr. observatiilor din setul de antrenare, aici 20.
> plot(somexemplu)
22
GNIGDPgrowthrateDebt.export
Reserves.importsInflationCCbalance
FDITrade
Se observa ca fiecare cluster este caracterizat de una sau mai multe variabile predominante, corespunzatoare triunghiului colorat mai mare.
Predictie cu SOM:
>somprediction<-predict(somexemplu, newdata=test.set, trainX=train.set, trainY=classvec2classmat(exemplu[,9][train.obs]))
Functia classvec2classmat() aplicata unui vector (=variabila de clasa) reprezinta acel vector sub forma unei matrici cu elementele 0 si 1, unde 1 reprezinta apartenenta la clasa si 0 non-apartenenta.
Facem o predictie a setului de testare.
> somprediction23
>tab<- table(exemplu[,9][-train.obs], somprediction$prediction)
>tab Se afiseaza matricea de confuzie:
Date previzionate
3 5 6 7
Date reale 5 0 0 2 2
6 1 2 7 4
7 2 5 10 5
> classAgreement(tab)
$diag
[1] 0.3
$kappa
[1] -0.04477612
$rand
[1] 0.5051282
$crand
[1] -0.0496444
Interpretare
Prin metoda SOM, rata de acuratete a modelului este de 30%. Coeficientul lui Cohen este -0.044 , ceea ce conduce la concluzia ca datele nu sunt de incredere.
Alte tipuri de dendograme:
>set.seed(101)
> samplexemplu <- exem[sample(1:52, 15),]
24
> samplexemplu
>distance <- dist(samplexemplu[,-10], method="euclidean")
>cluster <- hclust(distance, method="average")
>plot(cluster, hang=-1, label=samplexemplu$RiskClass)5 5 7 6 7 6 7 5 7 6 6 7 7 7 7
010
0020
0030
0040
0050
00
Cluster Dendrogram
hclust (*, "average")distance
Hei
ght
>plot(as.dendrogram(cluster), edgePar=list(col="darkgreen", lwd=2), horiz=T)
Comanda edgePar= o lista care specifica parametrii muchiilor dendogramei
lwd= line width=grosimea muchiilor dendogramei, ia numai valori pozitive
lwd=1 (by default)
25
horiz=T (muchiile dendogramei sunt orizontale)
5000 4000 3000 2000 1000 0
15182716282936320371233243042
> str(as.dendrogram(cluster)) Afiseaza dendograma sub forma de text.
>group.3 <- cutree(cluster, k = 3) Imparte dendograma in 3 clustere.
>table(group.3, samplexemplu$RiskClass) Compara cele 3 clustere cu clasede de risc cunoscute:
group.3 5 6 7
1 1 2 6
2 2 1 1
26
3 0 1 1
Din clusterul 1 fac parte 1 tara cu clasa de risc 5, 2 tari cu clasa de risc 6 si 6 tari cu clasa de risc 7, etc.
>plot(cluster); rect.hclust(cluster, k=5, border="red")15 18 27 16 28 29
363
20 37 1233 24 30 42
010
0020
0030
0040
0050
00
Cluster Dendrogram
hclust (*, "average")distance
Hei
ght
>par(mfrow=c(1,1))
> z <- as.dendrogram(cluster)
> attr(z[[2]][[2]],"edgePar") <- list(col="blue", lwd=4, pch=NA)
> attr(z[[2]][[1]],"edgePar") <- list(col="red", lwd=3, lty=3, pch=NA)
> plot(z, horiz=T)27
5000 4000 3000 2000 1000 0
15182716282936320371233243042
ty =tipul liniei
28
> z[[2]]
Arbori de decizie
Se instaleaza pachetul rpart.
>library(rpart)
> fit<-rpart(RiskClass ~ ., data=exemplu)
> fit Apare descries arborele si regulile sale:
n= 52
node), split, n, deviance, yval
* denotes terminal node
1) root 52 34.057690 6.365385
2) GNI>=1563.5 18 16.944440 5.944444 *
3) GNI< 1563.5 34 12.235290 6.588235
6) GDPgrowthrate>=3.855 24 9.833333 6.416667
12) Trade< 73.67603 12 5.666667 6.166667 *
13) Trade>=73.67603 12 2.666667 6.666667 *
7) GDPgrowthrate< 3.855 10 0.000000 7.000000 *
Nodurile marcate cu * sunt noduri terminale.
> plot(fit, uniform=TRUE, main="Arbore decizional") Se traseaza muchiile arborelui.
>text(fit, use.n=TRUE, all=TRUE, cex=.8) Se eticheteaza nodurile arborelui.
29
Arbore decizional
|GNI>=1564
GDPgrowthrate>=3.855
Trade< 73.68
6.365n=52
5.944n=18
6.588n=34
6.417n=24
6.167n=12
6.667n=12
7n=10
Din arborele de mai sus rezulta:
Daca GNI>=1563.5, se obtin 18 observatii cu media clasei de risc 5.944.
Daca GNI< 1563.5 34
Daca GDPgrowthrate<3.855, se obtin 10 observatii, cu media clasei de risc 7.
Daca GDPgrowthrate>=3.855
Daca Trade< 73.67603, se obtin 12 observatii cu media clasei de risc 6.166667.
Daca Trade>=73.67603, se obtin 12 observatii cu media clasei de risc 6.166667.
30
Clasificatorul naiv Bayesian(=CNB)
>library(e1071)
>library(class)
> model<-naiveBayes(exem[,1:9], factor(exem[,10]))
> model Sunt afisate probabilitatile apriori si cele conditionate.
Predictie cu CNB
Vrem sa stabilim clasa de risc a primelor 9 tari din setul de date, pe baza CNB:
> predict(model, exem[1:9,-10])
[1] 6 7 5 6 7 6 7 6 7
Levels: 3 5 6 7
Deci primele 9 tari din setul de date au clasele de risc: 6 7 5 6 7 6 7 6 7.
> table(predict(model, exem[,-10]), exem[,10], dnn=list('predicted','actual'))
actual
predicted 3 5 6 7
3 0 0 0 0
5 0 5 0 0
6 0 0 19 0
7 1 0 0 27
> tab<-table(predict(model, exem[,-10]), exem[,10], dnn=list('predicted','actual'))
>classAgreement(tab)
$diag
[1] 0.9807692
$kappa
[1] 0.9667093
$rand31
[1] 0.979638
$crand
[1] 0.9579734
Retele neuronale
> library(neuralnet)
> set.seed(123)
>head(exem)
> size.sample <- 12
> trainset <- exem[sample(1:nrow(exem), size.sample),]
> trainset
>nnet_train <- cbind(trainset, trainset$RiskClass == '5')
> nnet_train <- cbind(nnet_train, trainset$RiskClass == '6')
> nnet_train <- cbind(nnet_train, trainset$RiskClass == '7')
> names(nnet_train)[11] <- 'Clasa3'
> names(nnet_train)[12] <- 'Clasa5'
> names(nnet_train)[13]<-'Clasa6'
> names(nnet_train)[14]<-'Clasa7'
> names(nnet_train)
[1] "Country" "GNI" "GDPgrowthrate" "Debt.export"
[5] "Reserves.imports" "Inflation" "CCbalance" "FDI"
[9] "Trade" "RiskClass" "Clasa3" "Clasa5"
[13] "Clasa6" "Clasa7"
> nn <- neuralnet(Clasa5+Clasa6+Clasa7~GNI+GDPgrowthrate+Debt.export+Reserves.imports+Inflation+CCbalance+Trade+FDI, data=nnet_train, hidden=c(3,3))
> nn
32
Call: neuralnet(formula = Clasa5 + Clasa6 + Clasa7 ~ GNI + GDPgrowthrate + Debt.export + Reserves.imports + Inflation + CCbalance + Trade + FDI, data = nnet_train, hidden = c(3, 3))
1 repetition was calculated.
Error Reached Threshold Steps
1 2.082674381 0.008577950098 110
> plot(nn)
33
0.79056
0.68
759
0.16
72
FDI
1.024410.13
986
0.63
735
Trade
2.4023
0.294
6
-0.4
0149
CCbalance
0.56405
0.95379-0.41
397
Inflation
1.19486
1.68938
-0.1085
8
Reserves.imports
0.47634
-0.30244
-0.38856
Debt.export
2.07476-0.32495
0.0407GDPgrowthrate
0.05180.04884
0.05852
GNI
-1.3253
0.71
801
-1.7
9005
-0.04877
-0.82283
-2.19
212
-2.48883-1.09018
-8.31492
2.37665
-1.94
301
-0.9
7838
1.48445-1.1663
0.58
187
0.32356-2.69982
1.51329
Clasa7
Clasa6
Clasa5
0.806280.47596
0.03041
1
-0.82967-1.78544
0.08873
1
0.072380.90602
-0.0197
1
Error: 2.082674 Steps: 110
>exem1<-exem[,-1] Eliminam prima coloana (nenumerica) din setul de date.
> exem1
> predictie<- compute(nn, exem1[-9])$net.result
> maxid<-function(argument){return(which(argument==max(argument)))}
> idx<- apply(predictie, c(1), maxid)
> idx
[1] 2 2 2 3 2 3 3 2 3 2 3 3 2 2 2 2 3 2 3 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 3
[38] 2 2 2 2 3 2 2 2 3 3 2 2 2 2 3
> prediction <- c('Clasa3', 'Clasa5', 'Clasa6', 'Clasa7')[idx]
>prediction34
> table(prediction, exem$RiskClass) Se afiseaza matricea de confuzie.
Valori reale
Previzionate
prediction 3 5 6 7
Clasa5 1 3 11 12
Clasa6 0 2 8 15
Din cele 4 clase de risc (3, 5,6, 7) reteaua a detectat numai 2, Clasa5 si Clasa6.
>tab<-table(prediction, exem$RiskClass)
> library(e1071)
> classAgreement(tab)
$diag
[1] 0.05769230769
$kappa
[1] 0.001567398119
$rand
[1] 0.4969834087
$crand
[1] -0.009643901671
35