1
Counterfactual Fairness Matt J. Kusner* ,1,2 , Joshua R. Loftus* ,1,3 , Chris Russell* ,1,4 , Ricardo Silva 1,5 *Equal Contribution 1 The Alan Turing Institute 2 University of Warwick 3 New York University 4 University of Surrey 5 University College London features are biased X ML is making Life-Changing Decisions Causal Inference Results: US law schools Law School Admissions Counterfactuals 1. Change race attribute Trump's Proposed Extreme Vetting Software [Biddle, 2017] Counterfactual Fairness structural causal models non-deterministic deterministic [Pearl et al., 2016] how can we make fair predictions? Given an individual: male black male black white 2. Compute unobserved variables in causal model 3. Recompute observed variables in causal model male black white male black white Definition. A predictor ˆ Y is counterfactually fair if given observations X = x and A = a we have that, P ( ˆ Y Aa = y |X = x,A = a ) = P ( ˆ Y Aa 0 = y |X = x,A = a ) for all y and a 0 6= a. Prior Fairness Definitions Are Insufficient Goal Data Learning Fair Predictors Algorithm features that are not descendants of A (non-deterministic models) Counterfactuals alter descendants of A (non-deterministic) (deterministic) predictive error causal models counterfactuals Results: NYC stop-and-frisk Arrest rate (data) Arrest if White (counterfactual) Arrest if Black Hispanic (counterfactual) We Have Big Problems Our Solution A fair classifier gives the same prediction had the person had a different race/sex. We present a metric to check if any algorithm is fair And an algorithm to learn fair classifiers ˆ Y : X ,A -! Y Fairness through Unawareness [Grgić-Hlača et al., 2016] Prior Fairness Methods Equality of Opportunity [Hardt et al., 2016] ˆ Y : X -! Y lack of minority race teachers affects minority student outcomes [Birdsall et al., 2016] class placement may be different [Howard, 2003] teachers may believe students have behavior issues [Rowley, S.J., et al., 2014] Fairness through Unawareness [Grgić-Hlača et al., 2016] remove sensitive attributes: Equality of Opportunity [Hardt et al., 2016] black white 35% of black students 50% of white students label is biased Y Unfair Influences Related Work P ( ˆ Y = y | do(A = a), X = x ) = P ( ˆ Y = y | do(A = a 0 ), X = x ) [Kilbertus et al., 2017] Compares different individuals with the same observed features Compares the same individual with a different version of him/herself 0 1 2 3 1.0 0.5 0.0 0.5 pred_zfya type original swapped 0 1 2 3 1.0 0.5 0.0 0.5 pred_zfya original swapped 0 1 2 3 1.0 0.5 0.0 0.5 pred_zfya original swapped 0 1 2 3 1.0 0.5 0.0 0.5 pred_zfya 0.0 0.5 1.0 1.5 2.0 0.5 0.0 0.5 pred_zfya type original swapped 0.0 0.5 1.0 1.5 2.0 0.4 0.0 0.4 0.8 pred_zfya original swapped 0.0 0.5 1.0 1.5 2.0 0.4 0.0 0.4 0.8 pred_zfya original swapped 0.0 0.5 1.0 1.5 2.0 0.4 0.0 0.4 0.8 pred_zfya density density density density density density density density female $ male black $ white asian $ white mexican $ white Full Unaware ˆ Y ˆ Y ˆ Y ˆ Y original data counter- factual

Counterfactual Fairness - GitHub Pagesmkusner.github.io/posters/CF_Poster.pdf · 2020. 10. 12. · Counterfactual Fairness Matt J. Kusner*,1,2, Joshua R. Loftus*,1,3, Chris Russell*,1,4,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Counterfactual FairnessMatt J. Kusner*,1,2, Joshua R. Loftus*,1,3, Chris Russell*,1,4, Ricardo Silva1,5

    *Equal Contribution 1The Alan Turing Institute 2University of Warwick 3New York University 4University of Surrey 5University College London

    features are biasedX

    ML is making Life-Changing Decisions Causal Inference Results: US law schools

    Law School Admissions

    Counterfactuals1. Change race attribute

    Trump's Proposed Extreme Vetting Software[Biddle, 2017]

    Counterfactual Fairness

    structural causal modelsnon-deterministicdeterministic

    [Pearl et al., 2016]

    how can we make fair predictions?

    Given an individual:

    male black male blackwhite

    2. Compute unobserved variables in causal model

    3. Recompute observed variables in causal model

    male blackwhite

    male blackwhite

    Definition. A predictor Ŷ is counterfactually fair if given observationsX = x and A = a we have that,

    P�ŶA a = y | X = x, A = a

    �= P

    �ŶA a0 = y | X = x, A = a

    for all y and a0 6= a.

    Prior Fairness Definitions Are Insufficient

    GoalData

    Learning Fair PredictorsAlgorithm

    features that are not descendants of A

    (non-deterministic models)Counterfactuals alter

    descendants of A

    (non-determi

    nistic)

    (deterministi

    c)

    predictive error

    causal models

    counterfactuals

    Results: NYC stop-and-frisk

    Arrest rate (data)Arrest if White (counterfactual)

    Arrest if Black Hispanic (counterfactual)

    We Have Big Problems

    Our SolutionA fair classifier gives the same prediction had the person had a different race/sex.

    We present a metric to

    check if any algorithm

    is fair

    And an algorithm to learn fair classifiers

    Ŷ : X , A �! Y

    Fairness through Unawareness[Grgić-Hlača et al., 2016]

    Prior Fairness Methods

    Equality of Opportunity[Hardt et al., 2016]

    Ŷ : X �! Y

    lack of minority race teachers affects minority student outcomes

    [Birdsall et al., 2016]

    class placement may be different[Howard, 2003]

    teachers may believe students have behavior issues[Rowley, S.J., et al., 2014]

    Fairness through Unawareness[Grgić-Hlača et al., 2016]

    remove sensitive

    attributes:

    Equality of Opportunity[Hardt et al., 2016]

    black

    white35% of black students50% of white students

    label is biasedY

    Unfair Influences

    Related WorkP�Ŷ = y | do(A = a),X = x

    �= P

    �Ŷ = y | do(A = a0),X = x

    �[Kilbertus et al., 2017]

    Compares different individuals with the same observed features

    Compares the same individual with a different version of him/herself

    0

    1

    2

    3

    −1.0 −0.5 0.0 0.5pred_zfya

    density

    typeoriginal

    swapped

    0

    1

    2

    3

    −1.0 −0.5 0.0 0.5pred_zfya

    density

    typeoriginal

    swapped

    0

    1

    2

    3

    −1.0 −0.5 0.0 0.5pred_zfya

    density

    typeoriginal

    swapped

    0

    1

    2

    3

    −1.0 −0.5 0.0 0.5pred_zfya

    density

    typeoriginal

    swapped

    0.0

    0.5

    1.0

    1.5

    2.0

    −0.5 0.0 0.5pred_zfya

    density

    typeoriginal

    swapped

    0.0

    0.5

    1.0

    1.5

    2.0

    −0.4 0.0 0.4 0.8pred_zfya

    density

    typeoriginal

    swapped

    0.0

    0.5

    1.0

    1.5

    2.0

    −0.4 0.0 0.4 0.8pred_zfya

    density

    typeoriginal

    swapped

    0.0

    0.5

    1.0

    1.5

    2.0

    −0.4 0.0 0.4 0.8pred_zfya

    density

    typeoriginal

    swapped

    dens

    ity

    dens

    ity

    dens

    ity

    dens

    ityde

    nsity

    dens

    ity

    dens

    ity

    dens

    ity

    female $ maleblack $ white asian $ white mexican $ white

    Full

    Una

    war

    e

    Ŷ Ŷ Ŷ Ŷ

    original data

    counter-factual