26
Fairness in Machine Learning: Practicum Privacy & Fairness in Data Science CS848 Fall 2019

Fairness in Machine Learning: Practicumxihe/cs848/slides/04-module2... · 2019-10-01 · Supervised Learning – Brief Aside • Logistic regression is used to predict the probabilities

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

FairnessinMachineLearning:Practicum

Privacy&FairnessinDataScienceCS848Fall2019

HumanDecisionMaking

Data

JanelikesBollywoodmusicals.

DecisionMaker

Bob

Decision

Bob:“YoushouldwatchLesMiserables,it’salsoamusical!”

Jane:“Nicetry,Bob,butyouclearlydon’tunderstandhowtogeneralizefromyourpriorexperience.”

Supposewewanttorecommendamovie.

HumanDecisionMaking

Data

Janeisawoman.

DecisionMaker

Bob

Decision

Orevenworse:

Bob:“Ibetyou’dlikeoneofthesedumbwomen’smovies.”

Jane:“ActuallyBob,that’sasexistrecommendationthatdoesn’treflectwellonyouasapersonoryourunderstandingofcinema.”

Whatifweusemachinelearningalgorithmsinstead?Theywillgeneralizewellandbelessbiased,right?

AlgorithmicDecisionMaking

Data

Netflixdatabase,Jane’swatchhistory

DecisionMaker Decision

“Ablackbox collaborativefilteringalgorithmsuggestsyouwouldlikethismovie.”

Jane:“WowNetflix,thatwasagreatrecommendation,andyoudidn’tnegativelystereotypemeinordertogeneralizefromyourdata!”

Problemsolved!Right?

RecidivismPrediction

• InmanypartsoftheU.S.,whensomeoneisarrestedandaccusedofacrime,ajudgedecideswhethertograntbail.

• Inpractice,thisdecideswhetheradefendantgetstowaitfortheirtrialathomeorinjail.

• Judgesareallowedorevenencouragedtomakethisdecisionbasedonhowlikelyadefendantistore-commitcrimes,i.e.,recidivate.

RecidivismPrediction

Data

Criminalhistoryofdefendant(andothers)

DecisionMaker Decision

Highriskofrecommittingacrime.

Lowriskofrecommittingacrime.

Donotgrantbail.

Grantbail.

Machine BiasThere’ssoftwareusedacrossthecountrytopredictfuturecriminals.Andit’sbiasedagainst blacks.byJuliaAngwin,JeffLarson,SuryaMattu andLaurenKirchner,ProPublica,May23,2016

BernardParker,left,wasratedhighrisk;DylanFugett wasratedlowrisk.(JoshRitchieforProPublica)

PracticumActivity.

• TheProPublicateamstudiedaproprietaryalgorithm(COMPAS)andfoundthatitdiscriminatedagainstAfricanAmericans.

• Inthisactivity,you willtakeontheroleofthereportersanddataanalystslookingfordiscriminationofmorestandardmachinelearningalgorithms(SVMandLogisticRegression).

SupervisedLearning– BriefAside

• Insupervisedlearning,wewanttomakepredictionsofsometargetvalue.• Wearegiventrainingdata,amatrixwhereeveryrowrepresentsadatapointandeverycolumnisafeature,alongwiththetruetargetvalueforeverydatapoint.• Whatwe“learn”isafunctionfromthefeaturespacetothepredictiontarget.E.g.,iftherearemfeatures,thefeaturespacemightbeℝ",inwhichcaseabinaryclassifier isafunction

𝑓:ℝ" → 0, 1 .

SupervisedLearning– BriefAside

• Supportvectormachinesandlogisticregressionaredifferentalgorithmsforgeneratingsuchclassifiers,giventrainingdata.• Asupportvectormachine (withalinearkernel)justlearnsalinearfunctionofthefeaturevariables.• Inotherwords,itdefinesahyperplane inthefeaturespace,mappingpointsononesideto0andtheothersideto1.Itchoosesthehyperplanethatminimizesthehingeloss:max(0,distancetohyperplane).• Visually:

https://en.wikipedia.org/wiki/Support_vector_machine

SupervisedLearning– BriefAside

• Logisticregressionisusedtopredicttheprobabilitiesofbinaryoutcomes.Wecanconvertittoaclassifierbychoosingthemorelikelyoutcome,forexample.• Let�⃗� betheindependentvariablesforanindividualforwhomthetargetvalueis1withprobabilityp(�⃗�).

• Logisticregressionassumeslog . /⃗01.(/⃗)

isalinearfunctionof�⃗�,andthencomputesthebestlinearfunctionusingmaximumlikelihoodestimation.

PracticumActivity.

Breakintogroupsof3.Downloadtheactivityfromthewebsite(it’saJupyter notebook).Thinkcreativelyandhavefun!

DebriefPracticumActivity.

• Whatargumentsdidyoufindforthealgorithm(s)beingraciallybiased/unfair?• Whatargumentsdidyoufindforthealgorithm(s)notbeingraciallybiased/unfair?• Isoneofthealgorithmsmoreunfairthantheother?Why?Howwouldyousummarizethedifferencebetweenthealgorithms?• Cananalgorithmsimultaneouslyachievehighaccuracyandbefairandunbiasedonthisdataset?Whyorwhynot,andwithwhatmeasuresofbiasorfairness?

DebriefPracticumActivity–ConfusionMatrix.• Acommontoolforanalyzingbinarypredictionistheconfusionmatrix.

DebriefPracticumActivity–ConfusionMatrix.

ActualClass

P

N

PredictedClass

P N

TP

FP

FN

TN

DebriefPracticumActivity– FalsePositiveRate• Thefalsepositiverateismeasuredas

𝑭𝑷𝑹 = 𝑭𝑷𝑭𝑷8𝑻𝑵

Inotherwords:What%ofpeopledidwepredictwouldrecommitacrime,althoughinactualitytheywon’t?(perfectclassifiergets0)

Race0 Race 1

SVM 0.137 0.094

LR 0.214 0.136

DebriefPracticumActivity– FalsePositiveRate• Thefalsepositiverateforrace0isroughly1.45timeshigherthanforrace1usingSVM,and1.57usingLR!• Buttheyhadthesameaccuracy;howisthispossible?

• Ourclassifierstendtomakemorefalsepositivemistakesforrace0,andmorefalsenegativemistakesforrace1.

• The“accuracy”ofthemechanismisindifferenttothisdifference,butthedefendantssurelyarenot!

• Giventhatyouwillnotrecidivateandareoftheprotectedrace,thealgorithmlooksunfair.Logisticregression(whichwasslightlymoreaccurateoverall)seemsslightlyworse.

DebriefPracticumActivity–PositivePredictiveValue• Thepositivepredictivevalueismeasuredas

𝑷𝑷𝑽 = 𝑻𝑷𝑻𝑷8𝑭𝑷

Inotherwords:What%ofthepeoplewepredictedwouldrecidivatereallydorecidivate?(perfectclassifiergets1)

Race0 Race1

SVM 0.753 0.686

LR 0.725 0.658

DebriefPracticumActivity–PositivePredictiveValue• Bythismeasure,thealgorithmsdifferonthetworacialgroupsbynomorethanafactorof1.1,andseemroughlyfair.Ifanything,therateisbetterfortheprotectedgroup!• Whydoesn’tthiscontradictthefalsepositivefinding?

• Supposeforrace0wehave100individuals,50ofwhomrecidivate.• Supposeforrace1wehave100individuals,20ofwhomrecidivate.• Supposewemakeexactly5falsepositivesforeachracialgroup,andgeteverythingelsecorrect.Then:

• FPR0 =5/50=0.1whereasFPR1 =5/80=0.0625• ButPPV0 =50/55=0.909whereasPPR1 =20/25=0.8

• Given thatthealgorithmpredictsyouwillrecidivate,itlooksroughlyfair,logisticregressionmaybemoreso.

DebriefPracticumActivity–DisparateImpact.• Let𝑃𝑟𝑜𝑏@ bethefractionofracialgroup0thatwepredictedwouldrecidivate(similarlyfor 𝑃𝑟𝑜𝑏0).• Thedisparateimpactismeasuredas

𝑫𝑰 = 𝑷𝒓𝒐𝒃𝟎𝑷𝒓𝒐𝒃𝟏

Inotherwords:Howmuchmore(orless)likelywerewetopredictthatanindividualofracialgroup0wouldrecidivatevs.racialgroup0?(Notethattheperfectclassifiermaynotget1!)

DebriefPracticumActivity–DisparateImpact.

• SothedisparateimpactofSVMis1.56,andforLRis1.65.• Given thatyouareamemberofracialgroup0,thealgorithmlooksunfair,moresoforLR.• Notethatyoucan’t getasmalldisparateimpactand ahighaccuracy,ingeneral.Thismeasureisparticularlyusefulifyouthinkthedata itselfarebiased.

Race0 Race1

SVM 0.284 0.182

LR 0.400 0.242

DebriefPracticumActivity–Conclusion.• Machinelearningalgorithmsareoftenblackboxoptimizationswithoutobviousinterpretationsexpost.• Thereisnosingleperspectiveonfairness.Whatlooksfairconditionedonsomethingsmaylookdifferentconditionedonotherthings.• Nexttime,wewilldiveintothesetopicsinmoredepth,focusingespeciallyondisparateimpact.

ResearchProject!!!

• Choosingproject(beforenextclass,Oct1noon)• Formteamof1-3• Chooseatopic• Uploadapdf(1shortparagraphoftopicandteammembers)