Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Nate’splaybookforDataAnalysis
(aworkinprogress)
DataBingePresenta:onMarch24th,2017NatePowell
* Note, the terms frequentist and Bullshit may be used interchangeably during this talk… This is purely for my own amusement and does not represent the views of Data Binge or any of the labs involved…
** apologies in advance for over-use of the Tai Chi Symbol, it just kept seeming appropriate as I was putting this together…
WhatisDataAnalysis?
Ok…that’salongand…offtopicques:on…
Howaboutthis:
WhydoweneedaDataAnalysisplaybook?
You:
Me:
“IwasgivingyoualivingMike.ShowingyoutheplaybookIputtogetheroffmyownbeats.”
So,wheretobegin?
Rule1:
ThenConvinceyourAdvisor/Lab
ThenFindawaytoconvinceeveryoneelse(Cynicalread:
reviewers)
Convinceyourselffirst.
Source:DavidRedish,Circa2009
Convinceyourselffirst.
• Indataanalysiswe’reseekingtoanswerques:ons.Ifyoudon’tbelievetheanswer,keepworkingun:lyoudo,oracceptthatyoudon’t.
OOPS!!!!
Forgotone…
Rule0:
UNDERSTANDTHEQUESTION
Source:MyDad,whenIwasakid
“isitbebertohavealltheanswers,
orsomeoftheques:ons?”
Specific Example would be great here…
Rule0:
UNDERSTANDTHEQUESTION
Source:MyDad,whenIwasakid
Youhavetheques:onyouWANTtoask
Youhavetheques:onyouAREasking
Andyouwanttogetthesetwoascloseaspossible
MT-LRA
pre-training tasktraining Surgery! recovery/training SwitchSequence!
~10days ~2-3weeks
OverallMT-LRATrainingTimeline
~2.5-4weeks 6days
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
Rule0:
Corollary:Checkyourassump:ons
Source:AdamJohnson,circa2014
Inorderforanyonethingtobetrue,thereareusuallyawholelotofotherstepsthatneedtobetrueaswell.
Notallofthemare…
Andsomeofthemareworthchecking.
Convinceyourselffirst.
• Indataanalysiswe’reseekingtoanswerques:ons.Ifyoudon’tbelievetheanswer,keepworkingun:lyoudo,oracceptthatyoudon’t.
• ThisislicensetoPLAYWITHTHEDATA.Youmightnotactuallyknowtherightques:onstobeginwith,butstartaskingsomeandyoucanojenfindthemoreinteres:ngones…
PlottheData!LOOKatit
Humansarereallygoodatfindingstructureinthenoise…
(some:mesitisn’treallythere…)
Ontheotherhand,some:mesitis…
0 500 1000 1500 2000 2500−1500
−1000
−500
0
500
1000
1500
Phase Delay
Erro
r
Errors of various Kalman filters and Human subject by phase delay of trajectory
CA Kalman Filter errorSin Kalman Filter errorCV Kalman Filter errorSubject 1, Day 1, Noise Level 3
PlottheData!LOOKatitAnditcanleadyoutofindingpabernsthatarecri:cal.
Convinceyourselffirst.
• Indataanalysiswe’reseekingtoanswerques:ons.Ifyoudon’tbelievetheanswer,keepworkingun:lyoudo,oracceptthatyoudon’t.
• ThisislicensetoPLAYWITHTHEDATA.Youmightnotactuallyknowtherightques:onstobeginwith,butstartaskingsomeandyoucanojenfindthemoreinteres:ngones…
• Convinceyourselfanywayyou’recomfortablewith…Butbehonestabouthowconvincedyouare.
Randominterlude1:Bayesians
IamaBayesian.
AmIaBayesian?
“ABayesiandoesnotassignprobability1orprobability0toanyoutcome”-DaeyeolLee
Source:DaeyeolLee,Circa2011
Convinceyourselffirst.
• Inthefirststage,yougettodecidewhatyouneedtoseetobeconvinced.Isthatp<.05?Isthat85%probability?Isthat“I’vebeenlookingateverysingleexampleIcancomeupwithandIkeepseeingthesamething?”
• It’suptoyou!….Fornow.
Randominterlude2:P-values
WhatdoesaP-value>.05mean?
hbp://fivethirtyeight.com/features/not-even-scien:sts-can-easily-explain-p-values/
•whyis.05magical?
• whydowehavetochooseinadvance?
• whycan’tweusep-valuestomeananything?
Thisis*fine*,butitsoverlycomplicated…
Datathatistorturedlongenoughwilladmittoanything…
Source:RonaldCoase?DarrellHuff?UnknownSta:s:cian/Economist?bywayofMabChafee
Whydowecorrectformul:plecomparisons??(whyisthedogmaondoingsoiffy?)
Rule2:
Keepthisinmindwhenconvincingyourself!!!!
hbps://fivethirtyeight.com/features/science-isnt-broken/#part1
Randominterlude3:SamplingDistribu:ons!
Thisisasamplingdistribu:on
ThisapproachisVERYuseful.
Basicallyitsahistogramfortheprobabilitythatacertainvaluewillbegenerated…
(foranormallydistributedrandomprocesscenteredonzerowithstandarddevia:on1)
BUT,notallsamplingdistribu:onslooklikethis.
Randominterlude3:SamplingDistribu:ons!
Randominterlude4:WhyFrequen:ststatsaremorecommonthan
Bayesianstats
Whatifyourdistribu:onISN’T
normal?
Ifitsnotnormalenough,thecalcula:onsdon’tapplyanymore…
THEN,themathgetsreallyhard.
Frequen:ststatsgenerallyhaveanaly:calsolu:ons… Bayesianstatsfrequentlyrequirenumericalmethods.
Randominterlude4:WhyFrequen:ststatsaremorecommonthan
Bayesianstats
NumericalMethodsusedtobereallyhard…
butnowwehavecomputers.
Convinceyourselffirst.
Meanwhile,
backatthedata…
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
ThenConvinceyouradvisor/lab
ThefirststepwasimportantbecauseyouDONOTwanttoreportanyanswersyouaren’t
sureof.
Thisstepisimportantbecause:
A) youcangetasecondopiniononthedata
B) youhavetoconvincesomeoneelseindependently
C) youhavetoconvincesomeoneWITHOUTthemlookingATTHEDATADIRECTLY
ThenConvinceyouradvisor/lab
Thisisagood:metosimplifyandgeneralizeyouranalyses.
individualexamples
GroupExamples
Popula:onaverageexamples
(andscalethingsforpresenta:ontoagroup)
individualexamples:
5 10 15 20 25 30 35 40 45 500
20
40
60
80
100Sample Histogram for Session 17, k = 3
Laps
Num
ber
of tr
ansi
tions
iden
tifie
d
−10 −8 −6 −4 −2 0 2 4 6 8 10−1.5
−1
−0.5
0
0.5
1
1.5Average Z−scored Transition probability by laps from Switch, MT
Z−sc
ored
tran
sitio
n pr
obab
ility
Laps from Switch
0 5 10 15 20 25 30 35 40 45 50
1
2
3
K−means fit for Session 17, 3 clusters
Clu
ster
Lap Number
ThenConvinceyouradvisor/lab
GroupExamples
Popula:onaverageexamples
automateddetec:on
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
Thenfindawaytoconvinceeveryoneelse…especiallythereviewers.
NowyouhavefindingsthatyouANDyourlabbelievein,its:metopublishthem…
ThisiswherethingsgetaliblemorePolished,andaliblelessfree-form…
Andnotcoincidentally,whereFrequen:ststatsbecomeimportantagain.(whetherIlikethemornot,they’reubiquitous)
Thenfindawaytoconvinceeveryoneelse…especiallythereviewers.
Reviewer1: Reviewer2: Reviewer3:
Ifthereviewerasksyoutochangeyouranalysis,youcandoit,becauseyouKNOWyourresultsaretrue(becauseyoualready
convincedyourselfandyourlab)
Therearemanywaystogetthesameresult
NeuralNetworkmodels
SupportVectorMachines
K-MeansAnalysis
BayesianMethods
ANOVA
GeneralizedLinearModels
RandomForestAlgorithm
KNearestNeighbors
ChangepointAnalysis
CFDtest
ISIshuffleBootstrap
LinearRegression
Rule3:
Source:AllthePhysicsclasses,Par:cularcredittoDavidHall,circa2001“50waystocalculatetheenergyofassembly”
Therearemanywaystogetthesameresult
Rule3:
0 5 10 15 20 25 30 35 40 45 50
1
2
3
K−means fit for Session 17, 3 clusters
Clu
ster
Lap Number
K-means
EM SOM
IusedK-meanstoclustermyresults,butIcouldhaveusedotheralgorithms…
andwouldhaveifreviewersasked.
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
Therearemanywaystogetthesameresult
Rule3:
0 5 10 15 20 25 30 35 40 45 50
1
2
3
K−means fit for Session 17, 3 clusters
Clu
ster
Lap Number
K-means
EM SOMNotallalgorithmsareappropriateforthatanalysisthough…
SVM
RNN
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
Therearemanywaystogetthesameresult
Rule3:
Source:AllthePhysicsclasses,Par:cularcredittoDavidHall,circa2001“50waystocalculatetheenergyofassembly”
SupposedyouwantedtogetfromSquamishtoUBC(forsomereason…)
Tostartoutwith,youdon’thaveaLOTofop:ons(unlessyouhaveaboat…)
Therearesomestepsyouhavetotake.
Therearemanywaystogetthesameresult
Rule3:
Source:AllthePhysicsclasses,Par:cularcredittoDavidHall,circa2001“50waystocalculatetheenergyofassembly”
SupposedyouwantedtogetfromSquamishtoUBC(forsomereason…)
Tostartoutwith,youdon’thaveaLOTofop:ons(unlessyouhaveaboat…)
Therearesomestepsyouhavetotake.
Theneventually,youhaveafewmoreop:ons,youcouldtakedifferentroutes,buttherearecertainkeypointsyouneedtohit(again,unlessyouhaveaboat)
Theneventually,youhaveafewmoreop:ons,youcouldtakedifferentroutes,buttherearecertainkeypointsyouneedtohit(again,unlessyouhaveaboat)
Therearemanywaystogetthesameresult
Rule3:
Source:AllthePhysicsclasses,Par:cularcredittoDavidHall,circa2001“50waystocalculatetheenergyofassembly”
SupposedyouwantedtogetfromSquamishtoUBC(forsomereason…)
Ontheotherhand,Notallrouteswillworkinallsitua:ons…somewon’ttakeyouwhereyouwanttogo.
(oratleastwhereyouhavetogo…)
Rule3:hammercorollary
Source:AllthePhysicsclasses,Par:cularcredittoDavidHall,circa2001“50waystocalculatetheenergyofassembly”
Sometechniquesarejustnotappropriatethough,andcaremustbetakennottotalkyourselfintothewronganalysisbecauseitstrendyoritstheonlyoneyouknow.
UNDERSTANDTHEQUESTION!
Rule3:hammercorollary
Sub-corollary,doitquickvs.doitright.
Youwanttotestanumberofanalyses;youcantestmoreifyouimplementthemQUICKLYwiththetoolsyouhave!
Unlessyoutakethe:meandputintheefforttogettheanalysisRIGHT,you’llneverreallyknowiftheywork!
Nomodeloranalysisisworthwhileun:litsshowntobeABLETOFAIL
Source:DuaneNykamp,circa2012
Rule4:
Thiswasoneofthemostdepressing,butmostimportantplotsI’veeverseen.right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
Whenindoubt,compareyourdatatorandomdata*
Source:RedishLab,SchraterLab,circa2010-
Rule5:
* orwhatevertheappropriatecomparisonis
Thisdistribu:onrepresentsarandomprocessgenera:ngoutputfromacertainfunc:on(hereasimplemeasurement)
Wecanuserandomlygeneratednumbersasinputtomuchmorecomplicatedfunc:onstogeneratesamplingdistribu:ons…andthenlookwhereourdatafallsintheresul:ngdistribu:on.
Whenindoubt,compareyourdatatorandomdata*
Source:RedishLab,SchraterLab,circa2010-
Rule5:
* orwhatevertheappropriatecomparisonis
Thisdistribu:onrepresentsarandomprocessgenera:ngoutputfromacertainfunc:on(hereasimplemeasurement)
Wecanuserandomlygeneratednumbersasinputtomuchmorecomplicatedfunc:onstogeneratesamplingdistribu:ons…andthenlookwhereourdatafallsintheresul:ngdistribu:on.
Whenindoubt,compareyourdatatorandomdata*
Source:RedishLab,SchraterLab,circa2010-
Rule5:
* orwhatevertheappropriatecomparisonis
thedefaultcommonassump:onisrandom,butthat’snotalwaysthecorrectcomparison…Inordertoreallyknowwhattheideal
comparisonis,youhaveto…
UNDERSTANDTHEQUESTION!
Plotasmuchoriginaldataaspossible…
Whenindoubt,showmoredataandmoredistribu:ons
Somethoughtsonplots:
WhyPlotthis:
Plotasmuchoriginaldataaspossible…
Whenindoubt,showmoredataandmoredistribu:ons
Somethoughtsonplots:
whenyoucouldPlotthis:
Whenanalysisgetcomplicated,demonstra:onfiguresarehelpful
Somethoughtsonplots:
Laps
from
sw
itch
All Sessions, Lap−Lap Correlation, aligned to switch
−30 −20 −10 0 10 20
−30
−20
−10
0
10
20
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
10 20 30 40
5
10
15
20
25
30
35
40
45
All Cells, Lap−Lap Correlation, Day 17
Lap
Num
ber
10 20 30 40 500
10
20
30
40Cummulative Sum Correct Laps, Session 17
Cum
mul
ativ
e Su
m C
orre
ct
Lap Number 0 5 10 15 20 25 30 35 40 45 50
1
2
3
K−means fit for Session 17, 3 clusters
Clu
ster
Lap Number
0 5 10 15 20 25 30 35 40 45 500
20
40
60
80
100
K−means change points for Session 17, 3 clusters
Cum
mul
ativ
e Su
m C
lust
er
Lap Number0 5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7Transition Probability for Session 17, 3 clusters, 1 iteration
Tran
sitio
n Pr
obab
ility
Lap Number5 10 15 20 25 30 35 40 45 500
20
40
60
80
100 Sample Histogram for Session 17, 3 clusters
Lap Number
Num
ber o
f tra
nsiti
ons
iden
tifie
d
A
D
B
C
FE
right rewardleft reward
alt. reward
high cost choicepoint
low cost choicepoints
feeders
feeders
left reward
right reward
alt. reward
LABELYOURFRIGGINFIGURES!
Iknowthisisojenafunc:onofjournalswhohavestupidlimits,butseriously…Almostanythingthatmakesyourpapereasiertoreadisworthdoing.
Andthereisaspecialhellforpeoplewhodon’tlabelfigureaxes…
Somethoughtsonplots:
RememberthatColorblindnessexists.
Source:DavidRedish,AndrewWikenheiserCirca2010
Redvs.Greenisagreatcontrastifyouhavenormalcolorvision
Redvs.Blueismuchbeberifyouhavecolor-blindness
Somethoughtsonplots:
Ok,goEatPizza…anddogoodscience.
• BayesianAnalysis(usingJAGS?)
• Clustersor:ngusingMclust
• DecodingAnalyses?
• ThankstoalltheJoeyKnishes,pastandfuture.
OtherthingsIcouldtalkaboutinthefutureifpeopleareinterested…