Upload
mlconf
View
919
Download
0
Embed Size (px)
Citation preview
1
http://chalearn.org/
Causality and Graph Reconstruction
MLconf 2015
Isabelle Guyon, ChaLearn
2
MotivationBIG data makes lots of BIG promises, but…
… will the promises be held?
DIFFICULTY
VALUE
Classical statistics Machine learning
What happened?
Howhappened?
Explicative power
Forecastingpower
http://chalearn.org/
ExploratoryDescriptive
Predictive
Causal /
Interventional
Decisionalpower
What will happen?
What is Causal graph reconstruction? http://chalearn.org/
4
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
5
Causal questions http://chalearn.org/
Which actions will have beneficial effects?
…your health?
…climate changes?… the economy?
What affects…
6
1. Observe “correlations” A – B.2. Hypothesize causal relationships:
A BB A A C B
3. Perform experiments.
Scientific method http://chalearn.org/
7Thanks to Jonas Peters for this example
Observe correlations
http://chalearn.org/
8
Hypothesize causal relationshipshttp://chalearn.org/
Thanks to Jonas Peters for this example
9
Hypothesize causal relationshipshttp://chalearn.org/
Thanks to Jonas Peters for this example
10
Hypothesize causal relationshipshttp://chalearn.org/
Chocolate Nobel
Chocolate Nobel
Chocolate Nobel
Chocolate Nobel
?
11
“Please test your researchers for ten years: Randomly pick half of them and give them chocolate for desert and give apples to the other half. Then compare the number of Nobel prizes in the two populations.”
Perform randomized controlled experiments http://chalearn.org/
12
BIGDATA
Smoking Lung cancerPollution Climate changesEducation Crime rateAlcohol consumption Car accidentsGender WagesCholesterol Heart diseaseChocolate Nobel Prize
How far can we get to improve causal hypotheses …
… to minimize the need for experiments?
13
• Pioneer work: Glymour, Scheines, Spirtes, Pearl (Turing Award, 2011), Rubin, in the US, since the 80’s.
• New wave: Hyvärinen, Schölkopf, Bühlmann in the EU. • Nobel prizes in econometrics: Haavelmo (1989),
Granger (2003), Sargent and Sims (2011).• DARPA programs: Big mechanisms (2014), upcoming
program (Schwartz, program manager).
Landmark work http://chalearn.org/
14
Game changing work:Causality challenges http://chalearn.org/
Cause-Effect Pairs (2013)
Neural Connectomics (2014)
Causation and Prediction (2007)
Pot-luck challenge (2008)
15
To make a long story short…http://chalearn.org/
1. Discovering dependencies: easiest = classical feature selection. Hard to beat!
2. Removing spurious dependencies: harder and “dangerous” because removing good features is more harmful than keeping bad ones.
3. Orienting dependencies: hardest.
16
Cause-effect pair challenge (2013) http://chalearn.org/
Initial impulse: Joris Mooij, Dominik Janzing, and Bernhard Schölkopf.Examples of algorithms and data: Povilas Daniušis, Arthur Gretton, Patrik O. Hoyer, Dominik Janzing, Antti Kerminen, Joris Mooij, Jonas Peters, Bernhard Schölkopf, Shohei Shimizu, Oliver Stegle, and Kun Zhang, Jakob Zscheischler. Datasets and result analysis: Isabelle Guyon + Mehreen Saeed + {Mikael Henaff, Sisi Ma, and Alexander Statnikov}, from NYU.Website and sample code: Isabelle Guyon + Phase 1: Ben Hamner (Kaggle) https://www.kaggle.com/c/cause-effect-pairs Phase 2: Ivan Judson, Christophe Poulain, Evelyne Viegas, Michael Zyskowski https://www.codalab.org/competitions/1381 Review, testing: Marc Boullé, Hugo Jair Escalante, Frederick Eberhardt, Seth Flaxman, Patrik Hoyer, Dominik Janzing, Richard Kennaway, Vincent Lemaire, Joris Mooij, Jonas Peters, Florin Popescu, Peter Spirtes,Ioannis Tsamardinos, Jianxin Yin, Kun Zhang.
MehreenEvelyne
Joris Dominik Bernhard Kun Ben
AlexanderMarc
17
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
18
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
A -> B ?0 / 1
19
A
BB =Temperature
A = log(Altitude)
A B ? http://chalearn.org/
20
A B A B
Best fit: A B http://chalearn.org/
21
The data:
A
B
Z
A B
B
A
Z
A <- B
A B
Z
ZBZA
A Z B
A B A | BDemographics:Sex HeightAge WagesCountry EducationLatitude Infant mortality
Ecology:City elevation TemperatureWater level Algal frequencyElevation Vegetation Dist. to hydrology Fire
Econometrics:Mileage Car resell priceNum.rooms House priceTrade price last day Trade price
Medicine:Cancer vol. RecurrenceMetastasis PrognosisAge Blood pressure
Genomics (mRNA level):transcription factor protein
induced
Engineering:Car model year HorsepowerNumber of cylinders MPGCache memory Compute powerRoof area Heating loadCement used Compressive strength
20% 80%
http://chalearn.org/
22
Random
Correlati
on
Indep tests
Baselin
e methods
Winners0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
267 participants
SuccessRate (AUC test data)
The results:http://chalearn.org/
23
Amazing: an operational causation coefficient!
http://chalearn.org/
24
Neural connectomics Challenge (2014) http://chalearn.org/
Coordinator: Isabelle Guyon
Data Providers: Demian BattagliaJavier OrlandiJordi Soriano FraderaOlav Stetter
Advisors:
Gavin CawleyGideon DrorHugo-Jair EscalanteAlice GuyonVincent LemaireSisi MaEric PeskinFlorin PopescuBisakha Ray,Mehreen SaeedAlexander Statnikov
Demian
Olav
Jordi
Javier
Bisakha
Mehreen
25
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
26
Network deconvolution http://chalearn.org/
27
Conclusion• Causal models:
– Better explain data.
– Make decisions.
• Challenges:– Fair evaluations.– Innovation.
• Machine Learning:– Novel approaches to causal discovery.– Operational “causation coefficient”:• First detect oriented pairs, then prune indirect effects
and confounders.• First build undirected graph, then orient edges.
28
http://chalearn.org/
Fully automatic machine learning without ANY human intervention
automl.chalearn.orgDecember 2014 – January 2016
$30,000 in prizes
Thank you!AutoML Challenge