http://clopinet.com/CLOP/ [email protected]
CLOP A MATLAB® learning object package
http://clopinet.com/CLOP/[email protected]
http://clopinet.com/CLOP/ [email protected]
What is CLOP?
CLOP stands for
Challenge Learning Object Package
(It was developed for use in ML challenges with hundreds of thousands of features and/or examples)
http://clopinet.com/CLOP/ [email protected]
What is CLOP?
CLOP is an object-oriented Matlab package using the “Spider” interface
http://clopinet.com/CLOP/ [email protected]
DATA OBJECTS
http://clopinet.com/CLOP/ [email protected]
data(X, Y)
% Load data: X=load([data_dir 'gisette_train.data']); Y=load([data_dir 'gisette_train.labels']); % Create a data object and examine it: dat=data(X, Y); browse(dat, 2);
http://clopinet.com/CLOP/ [email protected]
ALGORITHM OBJECTS
http://clopinet.com/CLOP/ [email protected]
algo(hyperparam)
% Create data objects:trainD=data(X,Y);testD=data(Xt,Yt);
% Define some hyperparameters:hyper = {'degree=3', 'shrinkage=0.1'};
% Create a kernel ridge regression model:model = kridge(hyper);
% Train it and test it:[resu, Model] = train(model, trainD);tresu = test(Model, testD);
% Visualize the results: roc(tresu);
http://clopinet.com/CLOP/ [email protected]
COMPOUND MODELS
http://clopinet.com/CLOP/ [email protected]
Preprocessing
% For example, create a smoothing kernel: my_ker=gauss_ker({'dim1=11', 'dim2=11', 'sigma1=2', 'sigma2=2'}); show(my_ker);
% Create a preprocessing object of type convolve: my_prepro=convolve(my_ker);
% Perform the preprocessing and visualize the results: d=train(my_prepro, dat); browse(d, 2);
http://clopinet.com/CLOP/ [email protected]
chain({model1, model2,…})
% Combine preprocessing and kernel ridge regression:model = chain({my_prepro,kridge(hyper)});
% Combine replicas of a base learner:for k=1:10 base_model{k}=chain({my_prepro, naive});endmy_model=ensemble(base_model);
ensemble({model1, model2,…})
http://clopinet.com/CLOP/ [email protected]
BASICMETHODS
http://clopinet.com/CLOP/ [email protected]
train(model, trainD)
% After creating your complex model, just one command: train model=ensemble({chain({standardize,kridge(hyper)}),chain({normalize,naive})});
[resu, Model] = train(model, trainD);
% After training your complex model, just one command: testtresu = test(My_model, testD);
% You can chain with a “cv” object to perform cross-validation:cv_model=cv(my_model);% Just call train and test on it!
test(Model, testD)
http://clopinet.com/CLOP/ [email protected]
BASICOBJECTS
http://clopinet.com/CLOP/ [email protected]
Some CLOP objects
Basic learning machines
Feature selection, pre- and post- processing
Compound models
http://clopinet.com/CLOP/ [email protected]
BENCHMARKS
http://clopinet.com/CLOP/ [email protected]
MADELON Best BER=6.22Best BER=6.220.57% - n0=20 (4%) – BER0=7.33%0.57% - n0=20 (4%) – BER0=7.33%
my_classif=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'});
my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif})
DOROTHEA Best BER=8.54Best BER=8.540.99% - n0=1000 (1%) – BER0=12.37%0.99% - n0=1000 (1%) – BER0=12.37%
my_model=chain({TP('f_max=1000'), naive, bias});
Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmarkCompetitive baseline methods set new standards for the NIPS 2003 feature selection benchmark , , Isabelle Guyon, Jiwen Li, Theodor Mader, Patrick A. Pletscher, Georg Isabelle Guyon, Jiwen Li, Theodor Mader, Patrick A. Pletscher, Georg
Schneider and Markus UhrSchneider and Markus Uhr,Pattern Recognition Letters, Volume 28, Issue 12, 1 September 2007, Pages 1438-1444.,Pattern Recognition Letters, Volume 28, Issue 12, 1 September 2007, Pages 1438-1444.
Dataset Size Type FeaturesTraining Examples
Validation Examples
Test Examples
Arcene8.7 MB
Dense 10000 100 100 700
Gisette22.5 MB
Dense 5000 6000 1000 6500
Dexter0.9 MB
Sparse integer
20000 300 300 2000
Dorothea4.7 MB
Sparse binary
100000 800 350 800
Madelon2.9 MB
Dense 500 2000 600 1800
Class taught at ETH, Zurich, winter 2005Task of the students:• Baseline method provided, BER0 performance and n0 features.• Get BER<BER0 or BER=BER0 but n<n0.• Extra credit for beating best challenge entry.
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
GISETTE
DOROTHEA
NEW YORK, October 2, 2001 – Instinet Group Incorporated (Nasdaq: INET), the world’s largest electronic agency securities broker, today announced tha
DEXTER
MADELON
0 2000 4000 6000 8000 10000 12000 14000 160000
10
20
30
40
50
60
70
80
90
100
ARCENE
DEXTER Best BER=3.30Best BER=3.300.40% - n0=300 (1.5%) – BER0=5%0.40% - n0=300 (1.5%) – BER0=5%
my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'});
my_model=chain({s2n('f_max=300'), normalize, my_classif})
GISETTE Best BER=1.26Best BER=1.260.14% - n0=1000 (20%) – 0.14% - n0=1000 (20%) – BER0=1.80%BER0=1.80%
my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'});
my_model=chain({normalize, s2n('f_max=1000'), my_classif});
ARCENE Best BER= 11.9 Best BER= 11.9 1.2 %1.2 % - n0=1100 (11%) – BER0=14.7%- n0=1100 (11%) – BER0=14.7%
my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.1'});
my_model=chain({standardize, s2n('f_max=1100'), normalize, my_svc})
NIPS 2003 Feature Selection Challenge
http://clopinet.com/CLOP/ [email protected]
NIPS 2006 Model Selection Game
Dataset
CLOP models selected
ADA 2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias}
GINA
6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)}
HIVA
3*{norm,svc(degree=1),bias}
NOVA
5*{norm,gentleboost(kridge),bias}
SYLVA
4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias}
First place: Juha Reunanen, cross-indexing-7
sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters
not shown)
Dataset
CLOP models selected
ADA {sns, std, norm, neural(units=5), bias}
GINA
{norm, svc(degree=5, shrinkage=0.01), bias}
HIVA
{std, norm, gentleboost(kridge), bias}
NOVA
{norm,gentleboost(neural), bias}
SYLVA
{std, norm, neural(units=1), bias}
Second place: Hugo Jair Escalante Balderas, BRun2311062
sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown)
Note: entry Boosting_1_001_x900 gave better results, but was older.
Subject: Re: Goalie masksLines: 21
Tom Barrasso wore a great mask, one time, last season. It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, along with a steel mill on one side and the Civic Arena on the other. On the back of the helmet was the old Pens' logo the current (at the time) Pens logo, and a space for the "new" logo.
Lori
NOVA
GINA
HIVA
ADA
SYLVA
Dataset Domain Feature # Training # Validation # Test #
ADA Marketing 48 4147 415 41471
GINA Digit recognition 970 3153 315 31532
HIVA Drug discovery 1617 3845 384 38449
NOVA Text classification 16969 1754 175 17537
SYLVA Ecology 216 13086 1309 130857
Proc. IJCNN07, Orlando, FL, Aug, 2007:
PSMS for Neural Networks H. Jair Escalante, Manuel Montes y G´omez, and Luis Enrique Sucar
Model Selection and Assessment Using Cross-indexing, Juha Reunanen
http://clopinet.com/CLOP/ [email protected]
Credits
The Challenge Learning Object Package (CLOP) is based on code to which many people have contributed:- The developers of CLOP: Isabelle Guyon and Amir Reza Saffari Azar. - The creators of The Spider: Jason Weston, André Elisseeff, Gökhan BakIr, Fabian Sinz. - The developers of the packages attached to CLOP: Olivier Chapelle, Hugo Jair Escalante Balderas (PSMS), Gavin Cawley (LSSVM), Chih-Chung Chang and Chih-JenLin Jun-Cheng (LIBSVM), Chen, Kuan-Jen Peng, Chih-Yuan Yan, Chih-Huai Cheng, and Rong-En Fan (LIBSVM Matlab interface), Junshui Ma and Yi Zhao (second LIBSVM Matlab interface), Leo Breiman and Adele Cutler (Random Forests), Ting Wang (RF Matlab interface), Ian Nabney and Christopher Bishop (NETLAB). - The contributors to other Spider functions or packages: Thorsten Joachims (SVMLight), Chih-Chung Chang and Chih-JenLin (LIBSVM), Ronan Collobert (SVM Torch II), Jez Hill, Jan Eichhorn, Rodrigo Fernandez, Holger Froehlich, Gorden Jemwa, Kiyoung Yang, Chirag Patel, Sergio Rojas. - The authors of the Weka package and the R project who made code available, which was interfaced to Matlab and made accessible to CLOP.
http://clopinet.com/CLOP/ [email protected]
Book with CLOP and datasets
Feature Extraction, Foundations and Applications, Isabelle Guyon, Steve Gunn, et al, Eds. Springer, 2006 http://clopinet.com/fextract-book/
• CD including CLOP and the data of the NIPS2003 challenge• Tutorial chapters• Invited papers on the best results of the challenge