2
DataTags Tools Motivation Salil Vadhan (lead PI), Harvard University PRIVACY TOOLS FOR SHARING RESEARCH DATA Computational Social Science The potential: massive new sources of data and ease of sharing will revolutionize social science. The problem: protecting the privacy of data subjects privacy open data e.g. NYT 5/21/12 “Troves of Personal Data, Forbidden to Researchers” privacy utility traditional approaches (e.g. “stripping PII”) Challenges for Sharing Sensitive Data Complexity of Law Thousands of privacy laws in the US alone, at federal, state, and local levels, usually context-specific: HIPAA, FERPA, CIPSEA, Privacy Act, PPRA, ESRA, … Difficulty of Deidentification Stripping “PII” usually provides weak protections and/or poor utility Inefficient Process for Obtaining Restricted Data Can involve months of negotiation between institutions, original researchers Sweeney `97 Vision An array of computational, legal, and policy tools to make privacy-protective data-sharing easier for researchers without expertise in privacy law/CS/stats. Approach: Integrated Privacy Tools Target: Data Repositories Co-PIs & Senior Personnel Kobbi Nissim, co-PI, CRCS & Georgetown James Honaker, Sr. Researcher, CRCS Micah Altman, co-PI, MIT Steve Chong, co-PI, CRCS Merce Crosas, co-PI, IQSS Urs Gasser, co-PI, Berkman Klein Center Tools that help generate a policy for your sensitive data that defines how to transfer, store, access, and use those data. Differential Privacy Tool: PSI – A Private data-Sharing Interface General-purpose: applicable to most datasets in repository. Automated: no differential privacy expert optimizing algorithms for a particular dataset or application Tiered access: DP interface for wide access to rough statistical information, helping users decide whether to apply for access to raw data (cf. Census PUMS vs RDCs) Goals of PSI Privacy Budgeting Interface Integration w/Statistical Tools for Social Science D D http://privacytools.seas.harvard.edu/ Tag Type Description Security Features Access Credentials Blue Public Clear storage, Clear transmit Open Green Controlled public Clear storage, Clear transmit Email- or OAuth Veried Registration Yellow Accountable Clear storage, Encrypted transmit Password, Registered, Approval, Click-through DUA Orange More accountable Encrypted storage, Encrypted transmit Password, Registered, Approval, Signed DUA Red Fully accountable Encrypted storage, Encrypted transmit Two-factor authentication, Approval, Signed DUA Crimson Maximally restricted Multi-encrypted storage, Encrypted transmit Two-factor authentication, Approval, Signed DUA DataTags and their respective policies Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System. Technology Science. 2015. Data Owner Data User Data deposit Deposit license Legal formalization Local practices CMR formalization FERPA formalization License generation Data use agreement Data Logic rules License text Recommended data tag Licenses Conditions for release and deposit Questions Answers Logic rules Robot Lawyers Bridging Law & CS Definitions of Privacy Broader Impacts Other Accomplishments Latanya Sweeney, co-PI, IQSS Edoardo Airoldi, co-PI, Harvard Stats Dept Gary King, co-PI, IQSS Marco Gaboardi, University of Buffalo David O’Brien, Sr. Researcher, Berkman Klein Center Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. “PSI (Ψ): a Private data Sharing Interface.” Poster at Theory and Practice of Differential Privacy (TPDP) and arXiv:1609.04340, 2016. Automated Interviews DataTags Levels Personnel associated with RobotLawyers: Micah Altman Stephen Chong AlexandraWood Obasi Shaw Aaron Bembenek Kevin Wang Argue that Differential Privacy Satisfies FERPA and other privacy laws via two arguments: 1. The FERPA privacy standard is relevant for analyses computed with DP A legal argument supported by a technical argument 2. Differential privacy satisfies the FERPA privacy standard A technical argument supported by a legal argument FERPA allows dissemination of de-identified information à sufficient to show that DP analyses result in outcome that is not identifiable Extract a mathematical definition of privacy from FERPA and provide a mathematical proof that DP satisfies this definition K. Nissim, A. Bembenek, A. Wood, M. Bun, M .Gaboardi, U. Gasser, D. O'Brien, T Steinke, and S. Vadhan. 2016. “Bridging the Gap between Computer Science and Legal Approaches to Privacy.” In Privacy Law Scholars Conference (PLSC), 2016. Data File Deposit Sensitive Dataset Direct Access Privacy Preserving Access Two-factor Authentication; Signed DUA Automated Interview Review Board Approval Robot Lawyers PSI: Differential Privacy Tool Infrastructure for research in social science and other human subjects research fields Training in multidisciplinary research: ≈ 100 students, postdocs, interns from law, computer science, social science, statistics Policy impact: White House Big Data Privacy Study, National Privacy Research Strategy, NIST 800-188 Deidentifying Government Datasets, Federal Trade Commission Numerous workshops and symposia organized, includingpublic symposium “Privacy in a Networked World” w/700+ registrants. New journal “Technology Science” utilizing DataTags Open-access pedagogical materials on data privacy for many audiences Many theoretical results illuminatingthe limits of differential privacy (lower bounds, algorithms, hardness results, attacks). Theoretical and empirical work bridging differential privacy & statistical inference (confidence intervals, hypothesis testing, Bayesian posterior sampling). Framework for modern privacy analysis: catalogue privacy controls, identify information uses, threats, and vulnerabilities, and design data programs that align these over data lifecycle.

PRIVACY TOOLS FOR SHARING RESEARCH DATA · DataTags and their respective policies Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System. Technology

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PRIVACY TOOLS FOR SHARING RESEARCH DATA · DataTags and their respective policies Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System. Technology

DataTags Tools

Motivation

SalilVadhan(leadPI),HarvardUniversityPRIVACYTOOLSFORSHARINGRESEARCHDATA

Computational Social ScienceThe potential: massive new sources of data and ease of sharing will revolutionize social science.

The problem: protecting the privacy of data subjects

privacy open data

e.g. NYT 5/21/12 “Troves of Personal Data, Forbidden to Researchers”

privacy

utility traditional approaches(e.g. “stripping PII”)

Challenges for Sharing Sensitive DataComplexity of Law• Thousands of privacy laws in the US alone, at federal,

state, and local levels, usually context-specific: HIPAA, FERPA, CIPSEA, Privacy Act, PPRA, ESRA, …

Difficulty of Deidentification• Stripping “PII” usually provides

weak protections and/or poor utility

Inefficient Process for Obtaining Restricted Data• Can involve months of negotiation between institutions,

original researchers

Sweeney ̀ 97

VisionAnarrayofcomputational,legal,andpolicytoolstomakeprivacy-protectivedata-sharing easierforresearcherswithoutexpertiseinprivacylaw/CS/stats.

Approach: Integrated Privacy ToolsTarget: Data Repositories

Co-PIs&SeniorPersonnel• Kobbi Nissim, co-PI, CRCS & Georgetown • James Honaker, Sr. Researcher, CRCS• Micah Altman, co-PI, MIT• Steve Chong, co-PI, CRCS• Merce Crosas, co-PI, IQSS• Urs Gasser, co-PI, Berkman Klein Center

Tools thathelpgenerateapolicyforyoursensitivedata

thatdefineshowtotransfer,store,access,andusethosedata.

DifferentialPrivacyTool:PSI– APrivatedata-SharingInterface

• General-purpose: applicable to most datasets in repository.• Automated: no differential privacy expert optimizing algorithms

for a particular dataset or application• Tiered access: DP interface for wide access to rough statistical

information, helping users decide whether to apply for access to raw data (cf. Census PUMS vs RDCs)

Goals of PSI

Privacy Budgeting Interface

Integration w/Statistical Tools for Social Science

D

D

http://privacytools.seas.harvard.edu/

web: mercecrosas.com twitter : @mercecrosas IQSS, Harvard University

DataTags LevelsTag Type Description Security Features Access Credentials

Blue PublicClear storage,Clear transmit Open

Green Controlled public Clear storage,Clear transmit

Email- or OAuth Verified Registration

Yellow Accountable Clear storage,Encrypted transmit

Password, Registered, Approval, Click-through DUA

Orange More accountable Encrypted storage, Encrypted transmit

Password, Registered, Approval, Signed DUA

Red Fully accountable Encrypted storage, Encrypted transmit

Two-factor authentication, Approval, Signed DUA

Crimson Maximally restricted Multi-encrypted storage, Encrypted transmit

Two-factor authentication, Approval, Signed DUA

�1

DataTags and their respective policies Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System.

Technology Science. 2015.

web: mercecrosas.com twitter : @mercecrosas IQSS, Harvard University

The DataTags automated interview …

Data OwnerData User

Data deposit

Deposit license

Legal formalization

Local practices

CMR

formalization

FERPA

formalization

License generation

Data use agreement

Data

Logic rules

License text

Recommended data tag Lic

enses

Conditions for release and deposit

Que

stion

s

Answer

s

Logic rules

Robot Lawyers

BridgingLaw&CSDefinitionsofPrivacy BroaderImpacts

OtherAccomplishments• Latanya Sweeney, co-PI, IQSS• Edoardo Airoldi, co-PI, Harvard Stats Dept• Gary King, co-PI, IQSS • Marco Gaboardi, University of Buffalo• David O’Brien, Sr. Researcher, Berkman Klein

Center

MarcoGaboardi,JamesHonaker,GaryKing,KobbiNissim,JonathanUllman,andSalilVadhan. “PSI(Ψ):aPrivatedataSharingInterface.”PosteratTheoryandPracticeofDifferentialPrivacy(TPDP)andarXiv:1609.04340,2016.

Automated Interviews

DataTags Levels

PersonnelassociatedwithRobotLawyers:• MicahAltmanStephenChong

• AlexandraWood• Obasi Shaw• AaronBembenek• KevinWang

ArguethatDifferentialPrivacySatisfiesFERPAandotherprivacylawsviatwoarguments:

1. TheFERPAprivacystandardisrelevantforanalysescomputedwithDPAlegalargumentsupportedbyatechnicalargument

2. DifferentialprivacysatisfiestheFERPAprivacystandardAtechnicalargumentsupportedbyalegalargumentFERPAallowsdisseminationofde-identifiedinformationà sufficientto

showthatDPanalysesresultinoutcomethatisnotidentifiableExtractamathematicaldefinitionofprivacyfromFERPAandprovidea

mathematicalproofthatDPsatisfiesthisdefinitionK.Nissim,A.Bembenek,A.Wood,M.Bun,M.Gaboardi,U.Gasser,D.O'Brien,TSteinke,andS.Vadhan.2016.“BridgingtheGapbetweenComputerScienceandLegalApproachestoPrivacy.”InPrivacyLawScholarsConference (PLSC),2016.

DataFileDeposit

SensitiveDataset

DirectAccess

PrivacyPreservingAccess

Two-factorAuthentication;SignedDUA

AutomatedInterview

ReviewBoardApproval

RobotLawyers

PSI:DifferentialPrivacyTool

• Infrastructureforresearchinsocialscienceandotherhumansubjectsresearchfields

• Traininginmultidisciplinaryresearch:≈ 100 students,postdocs,internsfromlaw,computerscience,socialscience,statistics

• Policyimpact:WhiteHouseBigDataPrivacyStudy,NationalPrivacyResearchStrategy,NIST800-188DeidentifyingGovernmentDatasets,FederalTradeCommission

• Numerousworkshopsandsymposiaorganized,includingpublicsymposium“PrivacyinaNetworkedWorld”w/700+registrants.

• Newjournal“TechnologyScience”utilizingDataTags

• Open-accesspedagogicalmaterialsondataprivacyformanyaudiences

• Manytheoreticalresultsilluminatingthelimitsofdifferentialprivacy(lowerbounds,algorithms,hardnessresults,attacks).

• Theoreticalandempiricalworkbridgingdifferentialprivacy&statisticalinference(confidenceintervals,hypothesistesting,Bayesianposteriorsampling).

• Frameworkformodernprivacyanalysis:catalogueprivacycontrols,identifyinformationuses,threats,andvulnerabilities,anddesigndataprogramsthataligntheseoverdatalifecycle.

Page 2: PRIVACY TOOLS FOR SHARING RESEARCH DATA · DataTags and their respective policies Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System. Technology