Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Societal consequences
CS 446
A few vignettes
1 / 34
A few vignettes
Machine learning is entering human-facing technologies more and more.
I “Big data” and large scale machine learning allow for masssurveillance (and mass discrimination).
I The choices suggested by machine learning systems are ofteninscrutable and unreliable; the systems are hard to debug.
I Machine learning systems have obscure failure modes (adversarialexamples!) which well-informed attackers can trigger.
How can anyone argue a machine learning system actually implementssome policy?How can one argue it doesn’t covertly implement some other policy?
2 / 34
A few vignettes
Machine learning is entering human-facing technologies more and more.
I “Big data” and large scale machine learning allow for masssurveillance (and mass discrimination).
I The choices suggested by machine learning systems are ofteninscrutable and unreliable; the systems are hard to debug.
I Machine learning systems have obscure failure modes (adversarialexamples!) which well-informed attackers can trigger.
How can anyone argue a machine learning system actually implementssome policy?How can one argue it doesn’t covertly implement some other policy?
2 / 34
YouTube’s “Content ID”
(From Wikipedia.)
YouTube can label videos as violating copyright and take them down.This can remove livelihood (demonetization).
3 / 34
China’s “social credit system”
(From Wikipedia.)
Who oversees this system?Which political agendas does it actually enforce?
4 / 34
Other examples
I What if everyone starts using deep-learning based websites to finddates?
I What if someone tampers with the algorithms?
I What if all cars are self-driving, with deep-learning based visionsystems?
I What if someone tampers with the weight vectors so that, in raresituations, the car drives itself into oncoming traffic?Given the existence of adversarial examples, are such modificationsdetectable?
5 / 34
The power of nation-states: Juniper routers (part 1)
(From Wikipedia.)
6 / 34
The power of nation-states: Juniper routers (part 2)
(From Wikipedia.)
Once nation-states start tampering with machine learning systems, it’llget even scarier.
7 / 34
Facial recognition and other biometrics?
...
...
(From ZDNet.)
8 / 34
Facial recognition
(From NYTimes.)
9 / 34
10 / 34
Lecture overview
I Today we’ll discuss two fields of study related to societalconsequences: fairness and privacy. Here are some references:
I Upcoming Fairness textbook: “Fairness and machine learning”;Solon Barocas, Moritz Hardt, Arvind Narayanan;https://fairmlbook.org/.
I Privacy tutorial: “Differentially Private Machine Learning: Theory,Algorithms, and Applications”; Kamalika Chaudhuri, Anand D.Sarwate; https://www.ece.rutgers.edu/~asarwate/nips2017/.
I These fields are still young, and most deployed ML is not adjusted totake them into account.
11 / 34
Privacy
12 / 34
Privacy
Suppose each example in a data set D = (zi)ni=1 corresponds to an
individual from the population.
I Can the output of a learning algorithm, run on data set D, revealinformation about an individual?
I Example: average salary computation.(Recall, average is minimizer of squared loss risk . . . )Suppose attacker knows salary of all but one person in a company.Claim: By learning the average salary, attacker knows everyone’ssalary.
average salary =1
n
(your salary + sum of everyone else’s salary
).
I Privacy was compromised by release of average salary:
I Before release of average salary, attacker did not know your salary.I After release of average salary, attacker knows your salary.
13 / 34
Another example
Privacy compromised by release of genome-wide association study(GWAS) results (Wang, Li, Wang, Tang, & Zhou, 2009).
Cancer group Healthy group
Published correlation statistics from GWAS
I “individuals can be identified from even a relatively small set ofstatistics, as those routinely published in GWAS papers”
I Based on your genes, can learn if you participated in the study or not.I Can also learn if you were in the “disease” group or “healthy” group.
14 / 34
Why this is different from ordinary inference
Scenario: An attacker knows your genetic information, but does notknow if you have certain disease.
1. Suppose GWAS results allow attacker to determine that you were inthe “disease” group in the study.Is privacy compromised by release of GWAS results?
2. Suppose GWAS results suggest people (like you) with a certaingenetic marker are likely to have a disease.Is privacy compromised by release of GWAS results?
15 / 34
What we can hope to achieve: differential privacy
What we would like:Given output of learning algorithm, attacker should not be able todistinguish between:
I Case 1: Training data contains your data (xi, yi).
I Case 2: Training data does not contain your data (xi, yi).
I Whether or not you are included in training data does not changewhat attacker can learn about you.
I Any secrets you have (that couldn’t have been learned otherwise)are not compromised by the learning algorithm.
Learning algorithms with this property are said to guarantee differentialprivacy (Dwork, McSherry, Nissim, & Smith, 2006).
16 / 34
Differential privacy
Formal definition: A randomized algorithm A guarantees ε-differentialprivacy if, for all data sets D and D′ that differ in just one individual’sdata point,
P(A(D) = t) ∈ (1± ε) · P(A(D′) = t), ∀ possible outputs t.
17 / 34
Example: differentially private average of salaries
Average salaryI Goal: approximately compute average salary.
I Sample average is not differentially private as before: can use it torecover individual salaries.
I Our algorithm: Given data set D = (si)ni=1 of salaries, output
A(D) =1
n
n∑i=1
si + Z,
where
Z ∼ pZ(z) =1
2σexp
(−|z|σ
),
σ ≈ max salary
εn.
(“max salary” is assumedto be publicly known.)
-3 -2 -1 0 1 2 3
x
0
0.1
0.2
0.3
0.4
0.5
exp(-abs(x))/2
18 / 34
Example: differentially private average of salaries
Average salaryI Goal: approximately compute average salary.I Sample average is not differentially private as before: can use it to
recover individual salaries.
I Our algorithm: Given data set D = (si)ni=1 of salaries, output
A(D) =1
n
n∑i=1
si + Z,
where
Z ∼ pZ(z) =1
2σexp
(−|z|σ
),
σ ≈ max salary
εn.
(“max salary” is assumedto be publicly known.)
-3 -2 -1 0 1 2 3
x
0
0.1
0.2
0.3
0.4
0.5
exp(-abs(x))/2
18 / 34
Example: differentially private average of salaries
Average salaryI Goal: approximately compute average salary.I Sample average is not differentially private as before: can use it to
recover individual salaries.
I Our algorithm: Given data set D = (si)ni=1 of salaries, output
A(D) =1
n
n∑i=1
si + Z,
where
Z ∼ pZ(z) =1
2σexp
(−|z|σ
),
σ ≈ max salary
εn.
(“max salary” is assumedto be publicly known.)
-3 -2 -1 0 1 2 3
x
0
0.1
0.2
0.3
0.4
0.5
exp(-abs(x))/2
18 / 34
Why this guarantees differential privacy
Consider data sets of salaries D and D′ that differ only in one entry.I Our algorithm: A(D) = avg(D) + Z, where
Z ∼ pZ(z) =1
2σexp
(−|z|σ
), σ ≈ max salary
εn.
I Need to show that output densities of A on input D and on inputD′ are within 1± ε of each other.
I Since D and D′ differ in one entry, say, si and s′i:
A(D′) = avg(D) +s′i − sin
+ Z.
I Ratio of probability densities:
pA(D)(t)
pA(D′)(t)=
pZ(t− avg(D)
)pZ
(t− avg(D)− s′i−si
n
)= exp
(− 1
σ
(∣∣∣t− avg(D)∣∣∣− ∣∣∣t− avg(D)− s′i − si
n
∣∣∣))
∈ exp
(± 1
σ· max salary
n
)≈ 1± ε.
19 / 34
How good is this?
For statistical analysis: Often data set itself is just a random sample,and we care more about broader population.
I Average on sample differs from population mean by Θ(n−1/2)anyway!
I Expected magnitude of noise Z: σ ≈ max salary
εn.
Overall:∣∣A(D)− population mean∣∣ ≤ O( salary stddev√
n+
max salary
εn
).
Accuracy improves with n, even under constraint of ε-differential privacy.
I In particular, when max salary is comparable to salary stddev,ε ≈ 1/
√n is almost “free”.
20 / 34
Recap: privacy
I Privacy can be compromised when sensitive data is used in dataanalysis / machine learning.
I is one formal definition of privacy that can be rigorously guaranteedand reasoned about.
I Many techniques for creating “differentially private” versions ofstatistical methods (including machine learning algorithms).
21 / 34
Fairness
22 / 34
Fairness
What is fairness?
I Equal treatment of people.
I Equal treatment of groups of people.
I . . .
What is unfair?
I Unequal treatment (of people or groups of people) based onattributes that society deems inappropriate (e.g., race, sex, age,religion).
I . . .
Highly domain- and application-specific.Why is this of concern with machine learning?
I Systems built using machine learning are increasingly used to makedecisions that affect people’s lives.
I More opportunity to misuse and confuse.
23 / 34
Example: predictive policing
Predictive policing: data-driven systems for allocating police resourcesbased on predictions of where it is needed.
Financial Times article from August 22, 2014 by Gillian Tett:
After all, as the former CPD computer experts point out, thealgorithms in themselves are neutral. “This program had abso-lutely nothing to do with race ... but multi-variable equations,”argues Goldstein. Meanwhile, the potential benefits of predictivepolicing are profound.
24 / 34
Unfairness with machine learning
Many reasons systems using machine learning can be unfair:
I People deliberately seeking to be unfair.
I Disparity in amounts of available training data for differentindividuals/groups.
I Differences in usefulness of available features for differentindividuals/groups.
I Differences in relevance of prediction problem for differentindividuals/groups.
I . . .
How we might address these problems:
1. Rigorously define and quantify (un)fairness.
2. Identify sources of confusion regarding fairness.
3. . . .
25 / 34
Quantifying (un)fairness: statistical parity
Setup for classification problem:
I X: features of an individual.
I A: protected attribute (e.g., race, sex, age, religion).
I For simplicity, assume A takes on only two values, 0 and 1.
I Y : output variable to predict (e.g., will repay loan).
I For simplicity, assume Y takes on only two values, 0 and 1.
I Y : prediction of output variable (as function of X and A).
Statistical parity is satisfied if
P(Y = 1 | A = 0) = P(Y = 1 | A = 1)
(i.e., Y is independent of A).A particular relaxation is the 4/5th rule:
P(Y = 1 | A = 0) ≥ 4
5· P(Y = 1 | A = 1).
26 / 34
Proxies for protected attribute
Cannot simply avoid using A explicitly, since because other features couldbe correlated with it.
Example: WSJ observed that online price for an item depends on howfar you live from a brick-and-mortar Staples store.
I Pricing system doesn’t explicitly look at your income.
I But where you live is probably correlated with your income.
I Result: charging higher prices to lower-income people.
27 / 34
Problem with statistical parity
The following example satisfies statistical parity:
A = 0 A = 1
Y = 0 Y = 1 Y = 0 Y = 1Y = 0 1/2 0 1/4 1/4Y = 1 0 1/2 1/4 1/4
Suppose Y = 1{will repay loan if granted one}.I For group A = 0: correctly give loans to people who will repay
them.
I For group A = 1: give loans out at random. (Sets up people forfailure!)
28 / 34
Equalized odds
Equalized odds is satisfied if
P(Y = 1 | Y = y,A = 0) = P(Y = 1 | Y = y,A = 1), y ∈ {0, 1}
(i.e., Y is conditionally independent of A given Y ).
I The previous example fails to satisfy equalized odds:
A = 0 A = 1
Y = 0 Y = 1 Y = 0 Y = 1
Y = 0 1/2 0 1/4 1/4 00+1/2 6=
1/41/4+1/4
Y = 1 0 1/2 1/4 1/4 1/20+1/2 6=
1/41/4+1/4
I Possible to post-process a predictor to make it satisfy equalized odds(w.r.t. empirical distribution), provided that predictor can look atA. (Hardt, Price, & Srebro, 2016)
I Problem: may be illegal to look at A, even if purpose is to ensurefairness property.
29 / 34
Within-group (weak) calibration
Within-group (weak) calibration is satisfied if
P(Y = 1 | A = a) = P(Y = 1 | A = a), a ∈ {0, 1}.
Note: Suppose Y is randomized classifier derived from conditionalprobability estimator p, i.e.,
Y =
{1 with probability p,
0 with probability 1− p.
And further, suppose p satisfies
P(Y = 1 | p = p,A = 0) = P(Y = 1 | p = p,A = 1) = p, p ∈ [0, 1],
(i.e., p is calibrated conditional probability estimator).Then Y satisfies within-group (weak) calibration.
30 / 34
Inherent trade-offs
Theorem (Kleinberg, Mullainathan, & Raghavan, 2016): Impossible tosatisfy both within-group (weak) calibration and equalized odds, unless
I P(Y = 1 | A = 0) = P(Y = 1 | A = 1), or
I Y is perfect predictor of Y .
Note: Similar dichotomy holds for scoring functions / conditionalprobability estimators, with slightly different definitions of within-groupcalibration and equalized odds.
31 / 34
COMPAS
COMPAS “risk assessment”: predict whether criminal defendants willcommit (another) crime.
I ProPublica study argues COMPAS is unfair:
E.g., FPR for A = 0 is 0.45, while FPR for A = 1 is 0.23.
I Study also shows COMPAS satisfies approximate within-group(weak) calibration.
P(Y = 1 | Y = 1, A = 0) = 0.64, P(Y = 1 | Y = 1, A = 1) = 0.59.
(This was part of a different argument that COMPAS is fair.)
I Inherent trade-off theorem explains why different definitions offairness may necessarily lead to different conclusions.
32 / 34
Ways to confuse fairness: Nature of the training data
Sometimes the available training data is not suitable for task.E.g., system to decide loan applications. Goal is to predict whether anapplicant will repay a loan if granted one.Is the following training data appropriate?
1. x = features of past loan applicant,y = 1{loan application was approved}.
2. x = features of past loan applicant who received loan,y = 1{loan was repaid}.
3. x = features of past loan applicant,y = 1{loan was repaid}.
33 / 34
Recap: fairness
I Automated decision-making based on machine learning should bescrutinized for appropriate fairness considerations (like any otherdecision-making system).
I Must think hard about appropriateness of data for a givenapplication.
34 / 34