Upload
hoangminh
View
224
Download
0
Embed Size (px)
Citation preview
Adabag 2.0: una librería de R para Adaboost.M1 y Bagging
E t b Alf C téIII J d d U i d R Esteban Alfaro CortésMatías Gámez Martínez
Noelia García Rubio
III Jornadas de Usuarios de R17 y 18 de Noviembre de 2011
Escuela de Organización IndustrialMadrid Univ. de Castilla-La Mancha
F. C.C. Económicas y Empresariales Albacete
Madrid
Índice
e A
lbac
ete
e A
lbac
ete
1. Introducción
esar
iale
ses
aria
les
de
de
2. Algoritmos
• Adaboost
icas
icas
y y E
mp
reE
mp
re
• Bagging
3 Funciones
asasE
con
óm
iE
con
óm
i 3. Funciones
• Funciones de Boosting
adadd
e d
e C
ien
ciC
ien
ci • Funciones de Bagging
• Función Margen
Facu
lta
Facu
lta
4. Ejemplo
5. Conclusiones5. Conclusiones
1 Introd ccióne
Alb
acet
ee
Alb
acet
e 1. Introducción• En las últimas décadas se han desarrollado muchos métodos de
esar
iale
ses
aria
les
de
de
clasificación combinados basándose en árboles de clasificación.En la librería que presentamos se implementan dos de los máspopulares Adaboost M1 (versión multiclase de Adaboost) y
icas
icas
y y E
mp
reE
mp
re populares, Adaboost.M1 (versión multiclase de Adaboost) yBagging.
• La principal diferencia es que mientras Adaboost [Freund y
asasE
con
óm
iE
con
óm
i • La principal diferencia es que mientras Adaboost [Freund yShapire, 1996] construye sus clasificadores basesecuencialmente, actualizando la distribución sobre el conjunto
adadd
e d
e C
ien
ciC
ien
ci de entrenamiento para crear cada uno de ellos, Bagging[Breiman, 1996] combina los clasificadores individualesconstruidos sobre réplicas boostrap del conjunto de
Facu
lta
Facu
lta construidos sobre réplicas boostrap del conjunto de
entrenamiento.
1. Introducción
e A
lbac
ete
e A
lbac
ete
• Adabag 2.1 es una actualización de Adabag que añade una nuevafunción “margins” para calcular los márgenes de los
esar
iale
ses
aria
les
de
de
clasificadores. La primera versión fue publicada en junio de 2006y se ha utilizado ampliamente en tareas de clasificación en camposcientíficos muy diversos:
icas
icas
y y E
mp
reE
mp
re científicos muy diversos:
– Kriegler, B. & Berk, R. (2007). Estimación de las personas sin hogar en LosÁngeles (estimación espacial en pequeñas áreas).
asasE
con
óm
iE
con
óm
i
– Stewart, B. & Zhukov, Y. (2009). Análisis del contenido para clasificación dedocumentos militares.
D B k K W C K & V d P l D (2010) Cl ifi ió
adadd
e d
e C
ien
ciC
ien
ci – De Bock, K.W. , Coussement, K. & Van den Poel, D. (2010). Clasificaciónconjunta basada en Modelos Aditivos Generalizados.
– De Bock, K.W. , & Van den Poel, D. (2011). Predicción de la fuga de
Facu
lta
Facu
lta , , , ( ) gclientes.
– Y en paquetes de R como “digeR”, GUI para analizar datos DIGE(Diff I G l El t h i ) bidi i l li d(Difference In-Gel Electrophoresis) bidimensionales, realizado por:Fan, Y., Murphy, T.B. & Watson, W.G. (2009): digeR Manual.
2. Algoritmos: AdaBooste
Alb
acet
ee
Alb
acet
e
AdaBoost (Adaptative Boosting) Freund y Schapire (1996)
ges
aria
les
esar
iale
sd
ed
e AdaBoost (Adaptative Boosting), Freund y Schapire (1996)
Utiliza reponderación en lugar de remuestreo
icas
icas
y y E
mp
reE
mp
re
El clasificador débil anterior tiene sólo un 50% deprecisión sobre el conjunto de entrenamiento con los
asasE
con
óm
iE
con
óm
i p jnuevos pesos
Es especialmente adecuado para clasificadores débiles
adadd
e d
e C
ien
ciC
ien
ci Es especialmente adecuado para clasificadores débiles
La clasificación final se basa en el voto ponderado de losl ifi d i di id l
Facu
lta
Facu
lta clasificadores individuales
AdaBoost.M1
e A
lbac
ete
e A
lbac
ete
AdaBoost.M1 se aplica de la siguiente manera:1 Hacer una versión ponderada del conjunto de entrenamiento
esar
iale
ses
aria
les
de
de 1. Hacer una versión ponderada del conjunto de entrenamiento
(Tb), inicialmente todos los pesos son 1/n2. Repita el procedimiento para b=1, 2, ....,B
icas
icas
y y E
mp
reE
mp
re
p p p , , ,a) Construir sobre Tb un clasif. individual Cb(xi) ={1,…, k}
∑ ⎟⎞
⎜⎛ −n
bb e1l2/1))(I(C
asasE
con
óm
iE
con
óm
i
b) Calcular: ∑ ⎟⎟⎠
⎞⎜⎜⎝
⎛=≠=
b
bbii
bib e
eαywe
1b ln2/1y))(I(C x
adadd
e d
e C
ien
ciC
ien
ci
3 Construir el clasificador final (suponiendo Y={1 k}):
c) Actualizar y normalizar los pesos: )))(I(Cexp( b iibbi
1bi yαww ≠=+ x
Facu
lta
Facu
lta
))(CI(maxarg)(C bf ji
B
bi == ∑ xx α
3. Construir el clasificador final (suponiendo Y {1,…, k}):
))(CI(maxarg)(C b1
f jib
bYj
i ∑=∈
xx α
e A
lbac
ete
e A
lbac
ete
Boosting (AdaBoost)Boosting (AdaBoost)(F d d S h i 1996)
esar
iale
ses
aria
les
de
de (Freund and Schapire, 1996) Cf
α2
α1 αB
Voto Ponderado
icas
icas
y y E
mp
reE
mp
re
C
α2
C C. . .
asasE
con
óm
iE
con
óm
i C1 C2 CB
adadd
e d
e C
ien
ciC
ien
ci
T1 T2 TB. . .
actualización
Facu
lta
Facu
lta
B d dDistribución de probabilidad
TBootstrap o ponderada
2. Algoritmos: Bagging
e A
lbac
ete
e A
lbac
ete
Bootstrapping and aggregating (Breiman, 1996)
g gg g
esar
iale
ses
aria
les
de
de
Para b = 1, 2, .., B
R li l i * d T
icas
icas
y y E
mp
reE
mp
re Realizar con reemplazamiento n*=n muestras de T
Entrenar el clasificador Cb
asasE
con
óm
iE
con
óm
i
El clasificador final se obtiene por mayoría simple
de C C
adadd
e d
e C
ien
ciC
ien
ci de C1 .. CB
Aumenta la estabilidad del clasificador/disminuye
Facu
lta
Facu
lta
su varianza
e A
lbac
ete
e A
lbac
ete
BaggingBagging
esar
iale
ses
aria
les
de
de
CfBaggingBagging
(Breiman, 1996) Voto Simple
icas
icas
y y E
mp
reE
mp
re
C CC . . .
asasE
con
óm
iE
con
óm
i C1 CBC2
adadd
e d
e C
ien
ciC
ien
ci
TBT2T1. . .
Facu
lta
Facu
lta
Selección aleatorial i t
Tcon reemplazamiento
3. Funciones: Boosting
e A
lbac
ete
e A
lbac
ete adaboost.M1 Applies the Adaboost.M1 algorithm to a data set
Description
esar
iale
ses
aria
les
de
de Fits the Adaboost.M1 algorithm proposed by Freund and Schapire in 1996 using
classification trees as single classifiers.Usage
d b t M1(f l d t b TRUE fi l 100
icas
icas
y y E
mp
reE
mp
re adaboost.M1(formula, data, boos = TRUE, mfinal = 100,coeflearn = 'Breiman', control)Argumentsformula a formula as in the lm function
asasE
con
óm
iE
con
óm
i formula a formula, as in the lm function.data a data frame in which to interpret the variables named in formula.boos if TRUE (by default), a bootstrap sample of the training set is drawn using theweights for each observation on that iteration If FALSE every observation is used with
adadd
e d
e C
ien
ciC
ien
ci weights for each observation on that iteration. If FALSE, every observation is used withits weights.mfinal an integer, the number of iterations for which boosting is run or the number oftrees to use Defaults to mfinal=100 iterations
Facu
lta
Facu
lta trees to use. Defaults to mfinal 100 iterations.
coeflearn if ’Breiman’(by default), alpha=1/2ln((1-err)/err) is used. If’Freund’ alpha=ln((1-err)/err) is used. Where alpha is the weightupdating coefficient.updating coefficient.control options that control details of the rpart algorithm. See rpart.control formore details.
3. Funciones: Boostinge
Alb
acet
ee
Alb
acet
e adaboost.M1 Applies the Adaboost.M1 algorithm to a data set
esar
iale
ses
aria
les
de
de
DetailsAdaboost.M1 is a simple generalization of Adaboost for more than two classes.
icas
icas
y y E
mp
reE
mp
re
ValueAn object of class adaboost.M1, which is a list with the following components:
asasE
con
óm
iE
con
óm
i
formula the formula used.trees the trees grown along the iterations.weights a vector with the weighting of the trees of all iterations.
adadd
e d
e C
ien
ciC
ien
ci
weights a vector with the weighting of the trees of all iterations.votes a matrix describing, for each observation, the number of trees that assigned it toeach class, weighting each tree by its alpha coefficient.class the class predicted by the ensemble classifier.
Facu
lta
Facu
lta p y
importance returns the relative importance of each variable in the classification task.This measure is the number of times each variable is selected to split.
3. Funciones: Boosting
e A
lbac
ete
e A
lbac
ete
predict.boosting Predicts from a fitted adaboost.M1 object
esar
iale
ses
aria
les
de
de
DescriptionClassifies a data frame using a fitted adaboost.M1 object.
icas
icas
y y E
mp
reE
mp
re
Usage## S3 method for class 'boosting'predict(object newdata )
asasE
con
óm
iE
con
óm
i predict(object, newdata, ...)
Argumentsobject fitted model object of class adaboost M1 This is assumed to be the result of
adadd
e d
e C
ien
ciC
ien
ci object fitted model object of class adaboost.M1. This is assumed to be the result ofsome function that produces an object with the same named components as that returnedby the adaboost.M1 function.newdata data frame containing the values at which predictions are required The
Facu
lta
Facu
lta newdata data frame containing the values at which predictions are required. The
predictors referred to in the right side of formula(object) must be present by name innewdata.··· further arguments passed to or from other methods.g p
3. Funciones: Boostinge
Alb
acet
ee
Alb
acet
e
predict.boosting Predicts from a fitted adaboost.M1 object
esar
iale
ses
aria
les
de
de
icas
icas
y y E
mp
reE
mp
re
ValueAn object of class predict.boosting, which is a list with the followingcomponents:
asasE
con
óm
iE
con
óm
i p
formula the formula used.votes a matrix describing, for each observation, the number of trees that assigned it to
adadd
e d
e C
ien
ciC
ien
ci each class, weighting each tree by its alpha coefficient.class the class predicted by the ensemble classifier.confusion the confusion matrix which compares the real class with the predicted one.
Facu
lta
Facu
lta error returns the average error.
boosting cv Runs v fold cross validation with adaboost M1
3. Funciones: Boosting
e A
lbac
ete
e A
lbac
ete
boosting.cv Runs v-fold cross validation with adaboost.M1
DescriptionThe data are divided into v non-overlapping subsets of roughly equal size. Then, adaboost.M1 is
esar
iale
ses
aria
les
de
de applied on (v-1) of the subsets. Finally, predictions are made for the left out subsets, and the
process is repeated for each of the v subsets.Usageboosting cv(formula data v = 10 boos = TRUE mfinal = 100
icas
icas
y y E
mp
reE
mp
re boosting.cv(formula, data, v = 10, boos = TRUE, mfinal = 100,coeflearn = "Breiman", control)Argumentsformula a formula, as in the lm function.
asasE
con
óm
iE
con
óm
i
data a data frame in which to interpret the variables named in formula.boos if TRUE (by default), a bootstrap sample of the training set is drawn using the weights foreach observation on that iteration. If FALSE, every observation is used with its weights.v an integer specifying the type of v-fold cross validation Defaults to 10 If v is set as the number
adadd
e d
e C
ien
ciC
ien
ci
v an integer, specifying the type of v fold cross validation. Defaults to 10. If v is set as the numberof observations, leave-one-out cross validation is carried out. Besides this, every value between twoand the number of observations is valid and means that roughly every v-th observation is left out.mfinal an integer, the number of iterations for which boosting is run or the number of trees to use.
Facu
lta
Facu
lta Defaults to mfinal=100 iterations.
coeflearn if ’Breiman’(by default), alpha=1/2ln((1-err)/err) is used. If’Freund’ alpha=ln((1-err)/err) is used. Where alpha is the weight updatingcoefficientcoefficient.control options that control details of the rpart algorithm. See rpart.control for moredetails.
3. Funciones: Boostinge
Alb
acet
ee
Alb
acet
e
boosting.cv Runs v-fold cross validation with adaboost.M1
esar
iale
ses
aria
les
de
de
icas
icas
y y E
mp
reE
mp
re
Value
An object of class boosting cv which is a list with the following components:
asasE
con
óm
iE
con
óm
i An object of class boosting.cv, which is a list with the following components:
class the class predicted by the ensemble classifier.
adadd
e d
e C
ien
ciC
ien
ci confusion the confusion matrix which compares the real class with the predicted one.
error returns the average error.
Facu
lta
Facu
lta
3. Funciones: Bagging
e A
lbac
ete
e A
lbac
ete
bagging Applies the Bagging algorithm to a data set
esar
iale
ses
aria
les
de
de
Description
icas
icas
y y E
mp
reE
mp
re Fits the Bagging algorithm proposed by Breiman in 1996 using classification trees assingle classifiers.Usagebagging(form la data mfinal 100 control)
asasE
con
óm
iE
con
óm
i bagging(formula, data, mfinal = 100, control)Argumentsformula a formula, as in the lm function.data a data frame in which to interpret the variables named in formula
adadd
e d
e C
ien
ciC
ien
ci data a data frame in which to interpret the variables named in formula.mfinal an integer, the number of iterations for which boosting is run or the number oftrees to use. Defaults to mfinal=100 iterations.control options that control details of the rpart algorithm See rpart control for
Facu
lta
Facu
lta control options that control details of the rpart algorithm. See rpart.control for
more details.
3. Funciones: Bagginge
Alb
acet
ee
Alb
acet
e
bagging Applies the Bagging algorithm to a data set
esar
iale
ses
aria
les
de
de
DetailsUnlike boosting, individual classifiers are independent among them in bagging.
icas
icas
y y E
mp
reE
mp
re
ValueAn object of class bagging, which is a list with the following components:
asasE
con
óm
iE
con
óm
i
formula the formula used.trees the trees grown along the iterations.votes a matrix describing, for each observation, the number of trees that assigned it to
adadd
e d
e C
ien
ciC
ien
ci
votes a matrix describing, for each observation, the number of trees that assigned it toeach class.class the class predicted by the ensemble classifier.samples the bootstrap samples used along the iterations.
Facu
lta
Facu
lta p p p g
importance returns the relative importance of each variable in the classification task.This measure is the number of times each variable is selected to split.
3. Funciones: Bagging
e A
lbac
ete
e A
lbac
ete
predict.bagging Predicts from a fitted bagging object
esar
iale
ses
aria
les
de
de
DescriptionCl ifi d f i fi d b i bj
icas
icas
y y E
mp
reE
mp
re Classifies a dataframe using a fitted bagging object.Usage## S3 method for class 'bagging'predict(object newdata )
asasE
con
óm
iE
con
óm
i predict(object, newdata, ...)Argumentsobject fitted model object of class bagging. This is assumed to be the result of somefunction that produces an object with the same named components as that returned by the
adadd
e d
e C
ien
ciC
ien
ci function that produces an object with the same named components as that returned by thebagging function.newdata data frame containing the values at which predictions are required. Thepredictors referred to in the right side of formula(object) must be present by name in
Facu
lta
Facu
lta predictors referred to in the right side of formula(object) must be present by name in
newdata.··· further arguments passed to or from other methods.
3. Funciones: Bagginge
Alb
acet
ee
Alb
acet
e
predict.bagging Predicts from a fitted bagging object
esar
iale
ses
aria
les
de
de
icas
icas
y y E
mp
reE
mp
re
ValueAn object of class predict.bagging, which is a list with the following components:
asasE
con
óm
iE
con
óm
i
formula the formula used.votes a matrix describing, for each observation, the number of trees that assigned it to eachclass
adadd
e d
e C
ien
ciC
ien
ci class.class the class predicted by the ensemble classifier.confusion the confusion matrix which compares the real class with the predicted one.error returns the average error
Facu
lta
Facu
lta error returns the average error.
3. Funciones: Bagging
e A
lbac
ete
e A
lbac
ete bagging.cv Runs v-fold cross validation with Bagging
esar
iale
ses
aria
les
de
de Description
The data are divided into v non-overlapping subsets of roughly equal size. Then,bagging is applied on (v-1) of the subsets. Finally, predictions are made for thel ft t b t d th i t d f h f th b t
icas
icas
y y E
mp
reE
mp
re left out subsets, and the process is repeated for each of the v subsets.Usagebagging.cv(formula, data, v = 10, mfinal = 100, control)A t
asasE
con
óm
iE
con
óm
i Argumentsformula a formula, as in the lm function.data a data frame in which to interpret the variables named in formula.v an integer specifying the type of v fold cross validation Defaults to 10 If v is set as
adadd
e d
e C
ien
ciC
ien
ci v an integer, specifying the type of v-fold cross validation. Defaults to 10. If v is set asthe number of observations, leave-one-out cross validation is carried out. Besides this,every value between two and the number of observations is valid and means that roughlyevery v-th observation is left out
Facu
lta
Facu
lta every v th observation is left out.
mfinal an integer, the number of iterations for which boosting is run or the number oftrees to use. Defaults to mfinal=100 iterations.control options that control details of the rpart algorithm. See rpart.control forp p g pmore details.
3. Funciones: Bagginge
Alb
acet
ee
Alb
acet
e
bagging.cv Runs v-fold cross validation with Bagging
esar
iale
ses
aria
les
de
de
icas
icas
y y E
mp
reE
mp
re
Value
An object of class bagging cv which is a list with the following components:
asasE
con
óm
iE
con
óm
i An object of class bagging.cv, which is a list with the following components:
class the class predicted by the ensemble classifier.
adadd
e d
e C
ien
ciC
ien
ci
confusion the confusion matrix which compares the real class with the predicted one.
error returns the average error
Facu
lta
Facu
lta error returns the average error.
3. Funciones: Márgenes
e A
lbac
ete
e A
lbac
ete
margins Calculates the margins
esar
iale
ses
aria
les
de
de
Description
icas
icas
y y E
mp
reE
mp
re DescriptionCalculates the margins of an adaboost.M1 or bagging classifier for a data frame.
Usage
asasE
con
óm
iE
con
óm
i gmargins(object, newdata, ...)
Arguments
adadd
e d
e C
ien
ciC
ien
ci object This object must be the output of one of the functions bagging,adaboost.M1, predict.bagging or predict.boosting. This is assumed tobe the result of some function that produces an object with two components named
Facu
lta
Facu
lta formula and class, as those returned for instance by the bagging function.
newdata The same data frame used for building the object.
3. Funciones: Márgenese
Alb
acet
ee
Alb
acet
e
margins Calculates the margins
esar
iale
ses
aria
les
de
de
icas
icas
y y E
mp
reE
mp
re
Details
Intuitively, the margin for an observation is related to the certainty of its
asasE
con
óm
iE
con
óm
i
classification. It is calculated as the difference between the support of thecorrect class and the maximum support of a wrong class.
V l
adadd
e d
e C
ien
ciC
ien
ci ValueAn object of class margins, which is a list with only a component:
margins A vector with the margins
Facu
lta
Facu
lta margins A vector with the margins.
4. Ejemplo: Vehicle (4 clases)
e A
lbac
ete
e A
lbac
ete library(adabag)
data(Vehicle)
l < length(Vehicle[ 1])
esar
iale
ses
aria
les
de
de l <- length(Vehicle[,1])
sub <- sample(1:l,2*l/3)
Vehicle.rpart <- rpart(Class~.,data=Vehicle[sub,],maxdepth=5)
icas
icas
y y E
mp
reE
mp
re Vehicle.rpart.pred <- predict(Vehicle.rpart,newdata=Vehicle[-sub,], type="class")
tb <- table(Vehicle.rpart.pred,Vehicle$Class[-sub])
error.rpart <- 1-(sum(diag(tb))/sum(tb))
asasE
con
óm
iE
con
óm
i
tb
Vehicle.rpart.pred bus opel saab van
bus 64 7 9 0
adadd
e d
e C
ien
ciC
ien
ci bus 64 7 9 0
opel 3 26 12 0
saab 0 37 46 9
Facu
lta
Facu
lta van 2 2 5 60
> error.rpart
[1] 0.3049645
4. Ejemplo: Vehicle (4 clases)e
Alb
acet
ee
Alb
acet
ees
aria
les
esar
iale
sd
ed
e
> Vehicle.bagging <- bagging(Class ~.,data=Vehicle[sub, ],mfinal=50, control=rpart.control(maxdepth=3))
> Vehicle.bagging.pred <- predict.bagging(Vehicle.bagging,newdata=Vehicle[-sub ])
icas
icas
y y E
mp
reE
mp
re sub, ])
> Vehicle.bagging.pred$confusion
Observed Class
asasE
con
óm
iE
con
óm
i
Predicted Class bus opel saab van
bus 67 10 9 1
opel 1 32 11 0
adadd
e d
e C
ien
ciC
ien
ci
saab 1 25 40 0
van 0 5 12 68
> Vehicle bagging pred$error
Facu
lta
Facu
lta > Vehicle.bagging.pred$error
[1] 0.2659574
4. Ejemplo: Vehicle (4 clases)
e A
lbac
ete
e A
lbac
ete > Vehicle.bagging$importance
Comp Circ D.Circ Rad.Ra Pr.Axis.Ra Max.L.Ra Scat.Ra Elong
9.433962 2.744425 4.459691 2.572899 3.087479 14.236707 4.459691 9.777015
esar
iale
ses
aria
les
de
de Pr.Axis.Rect Max.L.Rect Sc.Var.Maxis Sc.Var.maxis Ra.Gyr Skew.Maxis Skew.maxis Kurt.maxis
0.000000 12.006861 4.116638 9.777015 1.372213 2.229846 6.861063 1.715266
Kurt.Maxis Holl.Ra
5.660377 5.488851
icas
icas
y y E
mp
reE
mp
re
> barplot(sort(Vehicle.bagging$importance,dec=T),col="blue",horiz=T,las=1)
asasE
con
óm
iE
con
óm
iadad
de
de
Cie
nci
Cie
nci
Facu
lta
Facu
lta
4. Ejemplo: Vehicle (4 clases)e
Alb
acet
ee
Alb
acet
ees
aria
les
esar
iale
sd
ed
e
> Vehicle.bagging.cv <- bagging.cv(Class~.,data=Vehicle)
> names(Vehicle.bagging.cv)
[1] "class" "confusion" "error"
icas
icas
y y E
mp
reE
mp
re [1] class confusion error
> Vehicle.bagging.cv$confusion
Clase real
l i d b l b
asasE
con
óm
iE
con
óm
i Clase estimada bus opel saab van
bus 206 20 28 1
opel 3 106 72 0
adadd
e d
e C
ien
ciC
ien
ci saab 0 70 91 1
van 9 16 26 197
> Vehicle.bagging.cv$error
Facu
lta
Facu
lta gg g $
[1] 0.2907801
> margins(Vehicle.bagging,Vehicle[sub,])->Vehicle.bagging.margins # training set
e A
lbac
ete
e A
lbac
ete
margins(Vehicle.bagging,Vehicle[sub,]) Vehicle.bagging.margins # training set
> margins(Vehicle.bagging.pred,Vehicle[-sub,])->Vehicle.bagging.predmargins # test set
> margins.test<-Vehicle.bagging.predmargins[[1]]
> margins.train<-Vehicle.bagging.margins[[1]]
esar
iale
ses
aria
les
de
de
> plot(sort(margins.train), (1:length(margins.train))/length(margins.train),type="l", xlim=c(-1,1),
+ main=“Bagging, Margin cumulative distribution graph",xlab="m“,ylab="% observations", col="blue3",lty=2)
> abline(v=0, col="red",lty=2)
> lines(sort(margins test) (1:length(margins test))/length(margins test) type="l" cex= 5 col="green")
icas
icas
y y E
mp
reE
mp
re > lines(sort(margins.test), (1:length(margins.test))/length(margins.test),type="l", cex=.5,col="green")
asasE
con
óm
iE
con
óm
iadad
de
de
Cie
nci
Cie
nci
Facu
lta
Facu
lta
4. Ejemplo: Vehicle (4 clases)e
Alb
acet
ee
Alb
acet
e
> Vehicle.adaboost <- adaboost.M1(Class ~.,data=Vehicle[sub,],mfinal=50,
+ control=rpart control(maxdepth=3))
esar
iale
ses
aria
les
de
de + control=rpart.control(maxdepth=3))
> Vehicle.adaboost.pred <- predict.boosting(Vehicle.adaboost,newdata=Vehicle[-sub, ])
> Vehicle adaboost pred$confusion
icas
icas
y y E
mp
reE
mp
re > Vehicle.adaboost.pred$confusion
Observed Class
Predicted Class bus opel saab van
asasE
con
óm
iE
con
óm
i
bus 67 2 3 0
opel 2 41 20 1
saab 0 27 45 4
adadd
e d
e C
ien
ciC
ien
ci
van 0 2 4 64
> error.adaboost <- 1-sum(diag(Vehicle.adaboost.pred$confusion))/sum( Vehicle.adaboost.pred$confusion)
Facu
lta
Facu
lta
> error.adaboost
[[1] 0.2304965
e A
lbac
ete
e A
lbac
ete > Vehicle.adaboost$importance
Comp Circ D.Circ Rad.Ra Pr.Axis.Ra Max.L.Ra Scat.Ra Elong
7.100592 3.550296 6.213018 3.254438 9.171598 11.538462 6.804734 5.917160
esar
iale
ses
aria
les
de
de Pr.Axis.Rect Max.L.Rect Sc.Var.Maxis Sc.Var.maxis Ra.Gyr Skew.Maxis Skew.maxis Kurt.maxis
0.000000 10.946746 5.029586 5.325444 6.213018 4.733728 5.325444 3.254438
Kurt.Maxis Holl.Ra
3.254438 2.366864
icas
icas
y y E
mp
reE
mp
re
> barplot(sort(Vehicle.adaboost$importance,dec=T),col="blue",horiz=T,las=1)
asasE
con
óm
iE
con
óm
iadad
de
de
Cie
nci
Cie
nci
Facu
lta
Facu
lta
4. Ejemplo: Vehicle (4 clases)e
Alb
acet
ee
Alb
acet
e
> Vehicle adaboost cv < boosting cv(Class~ data=Vehicle)
esar
iale
ses
aria
les
de
de > Vehicle.adaboost.cv <- boosting.cv(Class~.,data=Vehicle)
> names(Vehicle.adaboost.cv)
[1] "class" "confusion" "error"
icas
icas
y y E
mp
reE
mp
re > Vehicle.adaboost.cv$confusion
Observed Class
Predicted Class bus opel saab van
asasE
con
óm
iE
con
óm
i
bus 213 2 3 1
opel 3 116 74 2
saab 1 88 134 4
adadd
e d
e C
ien
ciC
ien
ci saab 1 88 134 4
van 1 6 6 192
> Vehicle.adaboost.cv$error
Facu
lta
Facu
lta [1] 0.2257683
> margins(Vehicle adaboost Vehicle[sub ]) >Vehicle adaboost margins # training set
e A
lbac
ete
e A
lbac
ete
> margins(Vehicle.adaboost,Vehicle[sub,])->Vehicle.adaboost.margins # training set
> margins(Vehicle.adaboost.pred,Vehicle[-sub,])->Vehicle.adaboost.predmargins # test set
> margins.test<-Vehicle.adaboost.predmargins[[1]]
> margins.train<-Vehicle.adaboost.margins[[1]]
esar
iale
ses
aria
les
de
de
> plot(sort(margins.train), (1:length(margins.train))/length(margins.train),type="l", xlim=c(-1,1), +main="AdaBoost, Margin cumulative distribution graph",xlab="m“,ylab="% observations", col="blue3",lty=2)
> abline(v=0, col="red",lty=2)
> lines(sort(margins.test), (1:length(margins.test))/length(margins.test),type="l", cex=.5,col="green")
icas
icas
y y E
mp
reE
mp
re > lines(sort(margins.test), (1:length(margins.test))/length(margins.test),type l , cex .5,col green )
asasE
con
óm
iE
con
óm
iadad
de
de
Cie
nci
Cie
nci
Facu
lta
Facu
lta
5. Conclusionese
Alb
acet
ee
Alb
acet
ees
aria
les
esar
iale
sd
ed
e
Las funciones incluidas en esta librería permiten:
icas
icas
y y E
mp
reE
mp
re 1. Resolver problemas de clasificación con más de dos clases.
2 Entrenar y predecir para nuevas observaciones
asasE
con
óm
iE
con
óm
i 2. Entrenar y predecir para nuevas observaciones.
3. Medir la importacia relativa de las variables.
adadd
e d
e C
ien
ciC
ien
ci
p
4. Calcular los márgenes de los clasificadores combinados.
Facu
lta
Facu
lta
5. Estimar el error mediante validación cruzada.
¡Gracias por su atención!
e-mail: [email protected]
Matias Gamez@uclm [email protected]