69
Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December 2015

Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Embed Size (px)

Citation preview

Page 1: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Data mining, privacy and (non-)discrimination

Bettina Berendt, KU Leuven

Knowledge and the Web /

Privacy and Big Data courses 2015last updated 9 December 2015

Page 2: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

AgendaMotivation: concepts and current cases

(Classical) discrimination-aware data mining

Exploratory discrimination-aware data mining; evaluation

(Some) limitations + outlook

Page 3: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Privacy and non-discrimination

Two fundamental rights In ICT and data mining:

Violations may result from the use of certain information

Protection may result from changing processing w.r.t. this information (e.g. “features“)

“privacy-preserving data mining/publishing“

“discrimination-aware data mining“

Page 4: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Is this discrimination?

https://www.wonga.com analyses, among other things, your social-media data to determine your creditworthiness Assume (cf. examples from last week) that it generates

patterns that deny a loan to1. People who like Converse sneakers

2. People who like Oil of Olay

Assume that this is because people who ... in the past very rarely paid back their loans.

Page 5: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

(from Martijn Van Otterlo‘s presentation in Privacy and Big Data 2015)

Page 6: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

PS: China‘s Social Credit Score (1) (from the Los Angeles Times)

in China, government authorities are hard at work devising their own e-database to rate each and every one of the nation's 1.3 billion citizens by 2020 using metrics that include whether they pay their bills on time, plagiarize schoolwork, break traffic laws or adhere to birth-control regulations.

Page 7: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

PS: China‘s Social Credit Score (2)

China — largely atheist and lacking a strong civil society sector — has struggled for years to find a way to incentivize and reward moral and responsible behavior. It has launched appeals for citizens to uphold "traditional Chinese values" and […]

But the country continues to be shocked by incidents of callous, dishonest and immoral behavior, China prepares to rank its citizens on 'social credit' such as pedestrians refusing to help seniors who have fallen down (because they fear being sued by elderly extortionists), and motorists who accidentally strike pedestrians intentionally hitting them again to ensure they're dead (otherwise, the motorist would have to pay lifelong compensation for injuries).

The Social Credit System, the State Council says, offers hope of addressing this: "Only if there is mutual sincere treatment between members of society, and only if sincerity is fundamental, will it be possible to create harmonious and amicable interpersonal relationships. ... and realize social harmony, stability and a long period of peace and order."

Page 8: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Data and discrimination

E.g. a credit scoring & loan granting system uses/shares a person‘s personal data makes loan decisions depend on personal data

= differential treatment

Differential treatment is unlawful discrimination if it is based on “unjust grounds“ (e.g., gender)

Attention! This is a preliminary definition in the legal sense!

Page 9: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

“Discrimination is forbidden“

In many areas, including Labour Loans Insurance

The protected-by-law grounds differ by area, but usually include gender, disability, age and sexual orientation, cultural, religious and linguistic beliefs/affiliation

A short intro: (Naudts, 2015) – PaBD lecture #6

Page 10: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

“You may no longer ...“

European Court of Justice (2011) Case C-236/09, Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres:

(18) The use of actuarial factors related to sex is widespread in the provision of insurance and other related financial services. In order to ensure equal treatment between men and women, the use of sex as an actuarial factor should not result in differences in individuals’ premiums and benefits. To avoid a sudden readjustment of the market, the implementation of this rule should apply only to new contracts concluded after the date of transposition of this Directive.

Historical examples: only { rich | white | male } people get to vote

Page 11: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Data mining (DM) and discrimination (D) (1)

“DM avoids D.“ E.g. in the domain of predictive policing: Dave Eggers, The Circle: start-up pitch

(warning: satire) Chicago police “heat list“ Relapse prediction and parole decisions

Page 12: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

From The Economist, 2014

“The data that matter include the prisoner’s age at first arrest, his education, the nature of his crime, his behaviour in prison, his friends’ criminal records, the results of psychometric tests and even the sobriety of his mother while he was in the womb. The software estimates the probability that an inmate will relapse by comparing his profile with many others. The American version of LS/CMI, for example, holds data on 135,000 (and counting) parolees.

It is better to be guided by software than one’s gut, says Olivia Craven, head of the Idaho Commission of Pardons and Parole. Donna Sytek of the New Hampshire Parole Board agrees. Unaided, parole board members rely too much on their personal experiences and make inconsistent decisions, she says.”

Page 13: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

What‘s right about this?What‘s wrong with this?

Reflection question

Recommended reading: Legal view of predictive policing and Big

Data: (Ferguson, 2015) More CS thinking: (Berendt, 2015)

Page 14: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

DM and D (2)

“DM can lead to D, but ... hm ... maybe there‘s something to it?“

Cf. Laurens Naudts‘ remarks on the rational basis test in law and the assumptions of rationality concerning statistics and data mining.

Cf. “It is better to be guided by software than one’s gut” above

Page 15: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

What‘s right about this?What‘s wrong with this?

Reflection question

Page 16: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

DM and D (3)

“DM can lead to D, but modifying the algorithm can fix it.“

Classical discrimination-aware data mining

Page 17: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

What‘s right about this?What‘s wrong with this?

Part of today‘s lecture

Recommended reading: Sources and critique in (Berendt &

Preibusch, 2014)

Page 18: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

DM and D (4)

“The point of DM is D. (And so is much of human civilization?!) DM can lead to D, but making the

workings of the algorithm transparent can help make this more visible and encourage reflection and, ultimately, corrective action.“

Exploratory discrimination-aware data mining

Page 19: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

What‘s right about this?What‘s wrong with this?

Part of today‘s lecture Reflection question

Recommended reading: (Berendt & Preibusch, 2014)

Page 20: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

AgendaMotivation: concepts and current cases

(Classical) discrimination-aware data mining

Exploratory discrimination-aware data mining; evaluation

(Some) limitations + outlook

Page 21: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Pedreschi, Ruggieri, & Turini (2008)

PD and PND items: potentially (not) discriminatory– goal: want to detect & block mined rules such as

purpose=new_car & gender = female → credit=no– measures of discriminatory power of a rule include

elift (B&A → C) = conf (B&A → C) / conf (B → C) ,

where A is a PD item and B a PND item

Note: 2 uses/tasks of data mining here: Descriptive

“In the past, women who got a loan for a new car often defaulted on it.“

Prescriptive (Therefore) “Women who want a new car should not get a loan.“

Page 22: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Why not just “delete“ PD attributes?

If focus is detection: Prevents detection

If focus is prevention: May reproduce

indirect discrimination

... and this indirect discrimination will also not be detected!

Page 23: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

DADM: Examples and DCUBE output

Page 24: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Three points of intervention for DADM – algorithmic / “classical“ Post-processing

As a filter on the mining results (e.g. DCUBE)

Pre-processing Similar to the distortion-based techniques for privacy-

preserving association-rule mining e.g. Hajian et al. 2013ff.

In-processing e.g. Kamiran et al. 2010: change tree-learning algorithm at each node, the good split will be the one that achieves a

high purity with respect to the class label (e.g. credit good/bad), but a low purity with respect to the sensitive attribute (e.g. gender).

Many algorithms also avoid indirect discrimination (as formally defined via correlations / probabilistic implication).

Page 25: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Recall: Example weather data

NoTrueHighMildRainy

YesFalseNormalHotOvercast

YesTrueHighMildOvercast

YesTrueNormalMildSunny

YesFalseNormalMildRainy

YesFalseNormalCoolSunny

NoFalseHighMildSunny

YesTrueNormalCoolOvercast

NoTrueNormalCoolRainy

YesFalseNormalCoolRainy

YesFalseHighMildRainy

YesFalseHighHot Overcast

NoTrueHigh Hot Sunny

NoFalseHighHotSunny

PlayWindyHumidityTempOutlook

Page 26: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Recall: Decision tree learning for Classification / prediction

In which weather will someone play (tennis etc.)?

Result: this tree; but how to get there?

(Learned from the WEKA weather data)

Page 27: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Recall: Which attribute to select?

Page 28: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Recall: Which attribute to select?

Based on highest purity of the class attribute in the new nodes

(measured by entropy / info. gain)

Page 29: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Extending the weather dataGoal: learn a classifier that does not discriminate by gender

NoTrueHighMildRainy

YesFalseNormalHotOvercast

YesTrueHighMildOvercast

YesTrueNormalMildSunny

YesFalseNormalMildRainy

YesFalseNormalCoolSunny

NoFalseHighMildSunny

YesTrueNormalCoolOvercast

NoTrueNormalCoolRainy

YesFalseNormalCoolRainy

YesFalseHighMildRainy

YesFalseHighHot Overcast

NoTrueHigh Hot Sunny

NoFalseHighHotSunny

PlayWindyHumidityTempOutlookGender

M

F

M

M

F

M

F

M

F

F

M

M

F

M

Page 30: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Assume this “pattern“ in the new weather data

NoTrueHighMildRainy

YesFalseNormalHotOvercast

YesTrueHighMildOvercast

YesTrueNormalMildSunny

YesFalseNormalMildRainy

YesFalseNormalCoolSunny

NoFalseHighMildSunny

YesTrueNormalCoolOvercast

NoTrueNormalCoolRainy

YesFalseNormalCoolRainy

YesFalseHighMildRainy

YesFalseHighHot Overcast

NoTrueHigh Hot Sunny

NoFalseHighHotSunny

PlayWindyHumidityTempOutlookGender

M

F

M

M

F

M

F

M

F

F

M

M

F

M

Page 31: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Which attribute to select now?

Based on highest purity of the class attribute in the new nodes

(measured by entropy / info. gain)AND

each node is low in purity w.r.t. gender (~ half/half)!

(Of course, in general, this does not need to lead to the

selection of the same attribute!)

Page 32: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

AgendaMotivation: concepts and current cases

(Classical) discrimination-aware data mining

Exploratory discrimination-aware data mining; evaluation

(Some) limitations + outlook

Page 33: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Decision making: DM only?

But are (e.g. loan) decisions made fully automatically?

Cf. EU Privacy Directive, Article 15(1): “Member States shall grant the right to every person

not to be subject to a decision which produces legal effects concerning him or significantly affects him and which is based solely on automated processing of data intended to evaluate certain personal aspects relating to him, such as his performance at work, creditworthiness, reliability, conduct, etc.”

Page 34: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Four points of intervention for DADM – algorithmic & beyond

Pre-processing In-processing Post-processing

As a filter on the mining results (e.g. DCUBE) hiding “bad patterns“

In the interaction of a decision-support system (Berendt & Preibusch)

hiding or highlighting “bad patterns“

Page 35: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“

… oh, or sexual orientation.

… (and so on)

Page 36: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Detection

„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“

… oh, or sexual orientation.

… (and so on)

Page 37: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“

… oh, or sexual orientation.

… (and so on)

Page 38: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“

… oh, or sexual orientation.

… (and so on)

Page 39: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Page 40: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Page 41: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Avoidance of creation

Page 42: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

Page 43: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

Semi-automated decision support: sanitized rules sanitized minds?

Page 44: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

Semi-automated decision support: sanitized rules sanitized minds?

Page 45: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

Semi-automated decision support: sanitized rules sanitized minds?

Page 46: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

?

Semi-automated decision support: sanitized rules sanitized minds?

Page 47: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations of classical DADM

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

?

Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection

better decisions?

Page 48: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

How to do exploratory DADM?

Patterns that characterize classes Patterns that characterize rules Items, itemsets

interestingness measures

Visualisation, exploration, interactivity

Page 49: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Exploratory DADM: DCUBE-GUI

Left: rule count (size) vs. PD/non-PD (colour)

Page 50: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Exploratory DADM: DCUBE-GUI

Left: rule count (size) vs. PD/non-PD (colour)

Right: rule count (size) vs. AD-measure (rainbow-colours scale)

Page 51: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

DCUBE-GUI: Co-occurrences of items in rule premises

Page 52: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Evaluating DADM

Algorithm-centric, automated measures User studies

Page 53: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Evaluation: Comparing c & eDADM

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

?

Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection

better decisions?

“hiding bad patterns“, black box

“highlighting bad patterns“, white box

Page 54: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

A more accurate definition of unlawful discrimination

Equality and discrimination are two sides of the same coin: “The principle of equality requires that equal situations are treated equally and

unequal situations differently. Failure to do so will amount to discrimination unless an objective and reasonable justification exists” - Explanatory memorandum protocol 12 to the ECHR

Differential/unequal treatment vs. discrimination: Differential treatment: neutral - tells us nothing about the legal

acceptability of a given measure. Discrimination: refers to unacceptable differential treatment (from

a legal perspective). Whether or not differential treatment is unacceptable and thus amounts to

discrimination is determined by the choices of law makers and judicial review. However: differential treatment may be perceived as unfair/unjust even if

tolerated by law.

Page 55: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

An important example of EU non-discrimination law

European Convention on Human Rights • Art. 14 Prohibition of Discrimination “The enjoyment of the rights and freedoms set forth in this

Convention shall be secured without discrimination on any ground such as sex, race, colour, language, religion, political or other opinion, national or social origin, association with a national minority, property, birth or other status.”

Page 56: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations (1): DADM‘s simple view of unlawful discrimination

1. A given differentiation in treatment may or may not be unlawful discrimination depending on the agent if based on “innocuous“ reasons (indirect discr.) depending on whether situations are comparable (“treat equal

things equally and unequal things unequally“) NOT differentiating by a protected attribute may constitute

discrimination! depending on aims and proportionality of means

e.g. “genuine occupational requirement“ depending on the changing social & legal environment

2. A fixed set of attributes makes it impossible to detect new forms of discrimination.

Page 57: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Data mining for loan decision support

Data

Algorithm

Pattern

Decision

Loan defaults Demographics, loan

purposes

Actionability, decision quality

Grant / deny loan, justify

Positive / negative risk factors

Graphical presentation

With / without discrimination

DM, cDADM, eDADM

Page 58: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Online experiment with 215 US mTurkersFraming Prevention:

bank Detection:

agency $6.00 show-up

fee

Tasks 3 Exercise tasks 6 Assessed

tasks $0.25

performance bonus per AT

Questionnaire Demographics Quant/bank job Experience with

discrimination

Dabiku is a Kenyan national. She is single and has no children. She has been employed as a manager for the past 10 years. She now asks for a loan of $10,000 for 24 months to set up her own business. She has $100 in her checking account and no other debts. There have been some delays in paying back past loans.

Page 59: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Decision-making scenario

Task structure Vignette, describing applicant and

application Rules: positive/negative risks, flagged Decision and motivation, optional

comment

Required competencies Discard discrimination-indexed rules Aggregate rule certainties Justify decision by categorising risk

factors

Page 60: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Rule visualisation by treatment

Constrained DADM Hide bad features Prevention

scenario

Exploratory DADM Flag bad features Detection

scenario

residence

savings residence

foreigner

(not DA)DM Neither flagged

nor hidden

residence

foreigner

Page 61: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Actionability and decision quality

Decisions and Motivations DA versus DADM

More correct decisions in DADM More correct motivations in DADM No performance impact

Relative merits Constrained DADM better for

prevention Exploratory DADM better for

detection

Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014

Biases Discrimination

persistent in cDADM

‘‘I dropped the -.67 number a little bit because it included her being a female as a reason.’’

Page 62: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

AgendaMotivation: concepts and current cases

(Classical) discrimination-aware data mining

Exploratory discrimination-aware data mining; evaluation

(Some) limitations + outlook

Page 63: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations (1): DADM‘s simple view of unlawful discrimination

A given differentiation in treatment may or may not be unlawful discrimination depending on the agent if based on “innocuous“ reasons (indirect discr.) depending on whether situations are comparable

(“treat equal things equally and unequal things unequally“)

depending on aims and proportionality of means e.g. “genuine occupational requirement“

depending on the changing social & legal environment

Page 64: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Claim: The eDADM whitebox approach can

accommodate (some of) these complexities: provide more flexibility for detecting and

avoiding discrimination by positioning itself as a decision-support system

support awareness and reflection increase transparency increase accountability

Page 65: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Limitations (2) / Outlook: Social / Critical theories of discrimination

New discrimination grounds (see “mother“ ex.) Further patterns related to discrimination:

intersectionality + and – of hiding / showing features The hidden assumptions (+ effects!) of DM:

Ontological status of features? DM creates new features and new forms of

discrimination Notion of social justice underlying allocation?

Page 66: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Outlook: Evaluating these claims in practice

Constraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

?

Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection

better decisions?

Page 67: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Outlook: Developing the automated parts of eDADM furtherConstraint-oriented DADM Exploratory DADM

Detection

• Can only detect discrimination by pre-defined features / constraints

• Ex.: PD(female), PND(has-children), but discrimination of mothers

Exploratory data analysis supports feature construction, new feature analyses

Avoidance of creation

Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)

?

Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection

better decisions?

Page 68: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

Thankyou!

Page 69: Data mining, privacy and (non-)discrimination Bettina Berendt, KU Leuven Knowledge and the Web / Privacy and Big Data courses 2015 last updated 9 December

References Makienen, J. (2015). China prepares to rank its citizens on 'social credit‘. Los Angeles Times, 15 November 2015. http://

www.latimes.com/world/asia/la-fg-china-credit-system-20151122-story.html The Economist (2014). Parole and Technology: Prison breakthrough. 19 April 2014. http://

www.economist.com/news/united-states/21601009-big-data-can-help-states-decide-whom-release-prison-prison-breakthrough

Ferguson, A.G. (2015). Big data and predictive reasonable suspicion. University of Pennsylvania Law Review, 163(2), 327-410. http://scholarship.law.upenn.edu/cgi/viewcontent.cgi?article=9464&context=penn_law_review

Berendt, B. (2015). Big Capta, Bad Science? http://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf Berendt, B. & Preibusch, S. (2014). Better decision support through exploratory discrimination-aware data mining:

foundations and empirical evidence. Artificial Intelligence and Law, 22 (2), 175-209 . http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_preibusch_2014.pdf

Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of KDD’08, pp 560–568. ACM. http://www.di.unipi.it/~ruggieri/Papers/kdd2008.pdf

Ruggieri S, Pedreschi D, Turini F (2010). DCUBE: discrimination discovery in databases. In: Proceedings of SIGMOD’10, pp 1127–1130. http://www.di.unipi.it/~ruggieri/Papers/dcube.pdf

(and further papers by the same team) Sara Hajian, Josep Domingo-Ferrer: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE

Trans. Knowl. Data Eng. 25(7): 1445-1459 (2013). http://crises2-deim.urv.cat/docs/publications/journals/684.pdf Sara Hajian, Josep Domingo-Ferrer, Oriol Farràs: Generalization-based privacy preservation and discrimination prevention

in data publishing and mining. Data Min. Knowl. Discov. 28(5-6): 1158-1188 (2014). http://crises2-deim.urv.cat/docs/publications/journals/813.pdf

Faisal Kamiran, Toon Calders, Mykola Pechenizkiy: Discrimination Aware Decision Tree Learning. ICDM 2010: 869-874. http://wwwis.win.tue.nl/~tcalders/pubs/TR10-13.pdf

“EU Privacy Directive“: Directive 95/46/EC of the European Parliament and of the Council of 24.10.1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (O.J. L 281, 23.11.1995)

Lee A. Bygrave. Minding the Machine: Article 15 of the EC Data Protection Directive and Automated Profiling. Computer Law & Security Report, 2001, volume 17, pp. 17–24. http://folk.uio.no/lee/oldpage/articles/Minding_machine.pdf

Gao B, Berendt B (2011) Visual data mining for higher-level patterns: discrimination-aware data mining and beyond. In: Proceedings of the BENELEARN 2011. http://www.liacs.nl/~putten/benelearn2011/Benelearn2011_Proceedings.pdf