View
216
Download
0
Category
Tags:
Preview:
Citation preview
Ethical Aspects of
Data Mining
Information Capabilitywhat can you do?
Information Responsibility
what should you do?
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-2
Major Concerns of huge amount of data
1. Could be used for negative purposes
2. Errors in the data
3. Access to data not well controlled
4. Collected for one purpose, used in other Data Mining
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-3
Simson Garfinkel• “Database Nation:
– The Death of Privacy in the 21st Century”Sebastopol, CA: O’Reilly & Associates, 2000
Has interesting views of the rights of privacy
The need for governmental control to assure privacy
This book relates a series of government projects proposing centralized data
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-4
1965 National Data Center• Envisioned to combine records
from:– Bureau of the Census– Bureau of Labor Statistics– Internal Revenue Service– Social Security Administration
• Motivation: cut costs– Would lead to more accurate statistics– Princeton Institute – single site may
offer better information security• Canceled: public pressure (56%
oppose)
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-5
Credit Bureau Database• 1960s credit bureaus widely used by business
• Loans not repaid• Overdue credit card payments• Multiple address changes to escape creditors• Possibly might contain every phase of life
– Consumers rarely knew of its existence– Policies forbade consumers seeing their files
• 1971 Stopped by Congress (Fair Credit Reporting Act)
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-6
1971 Fair Credit Reporting Act• Allowed computerization
– But gave consumers rights to • Review• Challenge• Insert their own version
• Industry complained• 1970s & 1980s consolidation of credit reporting
industry to basically 3 firms– Not only give credit reports– ALSO WILL COMPUTE CREDIT SCORE– WILL SELL DEMOGRAPHICS & INFORMATION
FOR DATA MINING
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-7
Governmental Data Mining• Early 1980s Federal Government
– Matching programs• Catch fraud & abuse• Erroneous data often penalized innocent people
• 1994 Communications Assistance to Law Enforcement Act– New powers for wiretapping digital communications
• 1996 States required to – display social security numbers on driver’s licenses– Issue medical patients unique identifiers– Both discontinued due to citizen backlash
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-8
Clipper• 1991 proposal• Use encryption systems• Focus to track sexually explicit information to
minors• Might have required Internet providers to deploy
far-reaching monitoring & censoring• Courts: unconstitutional
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-9
Lotus & Equifax• 1990 – CD-ROM product
– Lotus Marketplace: Households• Names, addresses, demographic data• Every US household• Intent: small businesses could target-market like
large firms
• 30,000 people wrote to delete their names
• Project canceled
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-10
Lexis-Nexis• 1996 P-TRAK database
– Published SSNs of most US residents
• Thousands called switchboard to complain
• After 11 days, Lexis-Nexis discontinued product
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-11
Social Security Administration• 1997 informed US taxpayers that detailed
tax history available over the Internet– Security provisions
• Required some personal information
• Tens of thousands complained
• Senate investigated
• Service shut down
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-12
American Airlines
• Yield Management– Identify the probability of last-minute
cancellations to allow overbooking– Develop price schedules that maximize
revenue
• Consumers would like to have similar tools
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-13
DANGER
• Drug Enforcement Agency– Demanded access to drug chain frequent-
buyer inventories
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-14
Threat from inference
• If fewer than three organizations offer sales activities for a product, total sales information could be summed
• Insurance information about traffic violations and insurance claims
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-15
Contention• IT threats:
– Runaway marketing– Personal information sold as a
commodity– Intelligent computing threats
• Even if some data intended to be protected, neural networks could include data without explaining why
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-16
Scope of Error• 1991 study:
– 1,500 report sample– 43% of the files had errors
• Credit database errors– Fewer than 1% of files had errors– But that denied credit to over 2 million
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-17
Fingerprints• 90+ elements
– Odds of duplication low• Garfinkel calls “absolutely unique”
• 1987 FBI had 23 million cards on file– Scale too great to use for anything but confirmation
(given name)
• Sources of error– Entering data– Swapped in police lab– Modify records to frame the accused
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-18
DNA• Widely accepted as “absolutely unique”
– But identical twins by definition (3/1000)– Determined by heritage
• Communities with high in-breeding share more
• Concern about DNA databank mission creep– Use of neural network technology could inadvertently
induce use of information without realizing– Government needs protection for spies, defectors,
witnesses
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-19
Patient Medical Records• “Please Respect Patient Confidentiality”• Insurance companies have interest in
open knowledge– They argue “lower premiums”– More likely “higher profits”
• DANGER: perfect DNA knowledge– Insurers select clients– Ultimately, control over birth, allowable
marriage
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-20
Data Mining• Polk: buys motor vehicle registrations
(http://usa.polk.com/News/LatestNews/2006_0504_hybrids.htm)– Combines make & model with census data– Sell to marketers to
• Determine income• Lifestyle• Likelihood of purchasing any given product
• 21st-century marketing more one-to-one– Aggressively seek personalized information– Segmentation
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-21
Microsoft®• 1997 Internet Explorer 4
– Active desktop– Cookies- Audience of one
• ERP web-portals– Same principle– Customize desktop
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-22
Web Data Mining Issues• Privacy• New forms of discrimination
– Weblining: classifications based on irrelevant profiling data that marketing companies and others collect on the Web
• Spiliopoulou idenified three web mining applications– Data acquisition– Measurement of cost and quality– Assessment of user/owner satisfaction
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-23
Protection
1. Exercise anonymity
2. Publicize & litigate
3. Track them as they track you
4. Fight for new laws
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-24
Tools to support web mining
• Portals
• Site Trackers
• Profilers
• Search bots
• Deep linking, Meta-tagging trick, framing, in-line linking
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-25
Web Ethics
• Utilitarian view– Greatest good for the greatest number
• Rawlsian view– More individual protection
• Pragmatic view– Compromise
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-26
MovieFone• Traditional movie tracking company
– Depends on extensive interviews with sample of moviegoers
– Error ± 5%
• MovieFone– Sells advanced tickets– Predicts with less error
• actually samples the market
– Same as Amazon.com predicting book sales
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-27
Code of Fair Information PracticesDept. of Health, Education, and Welfare, 1973
• No personal data record-keeping systems whose very existence is secret
• People can find out what information about them is in a record and how it is used
• People can prevent information about them obtained for one purpose from being used or made available for other purposes without that person’s consent
• Any organization creating, maintaining, using, or disseminating records of identifiable personal data must assure the reliability of the data for their intended use and must take precautions to prevent misuses of the data
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-28
Canadian Personal Information Protection and Electronic Documents Act - 2000
• C-6
(http://www.parl.gc.ca/36/2/parlbus/chambus/house/bills/summaries/c6-e.htm)– Data collected from Web, or other
• Applies to federally regulated Canadian businesses– Banks & insurance
• Extended in 2003 to businesses regulated by Canadian provinces
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
13E-29
Code for the Protection of Personal Information
1. Accountability2. Identify purposes3. Consent4. Limiting collection5. Limited use, disclosure, retention6. Accuracy7. Safeguards8. Openness9. Individual access10. Challenging compliance
Recommended