51
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron

Privacy Framework for RDF Data Mining

  • Upload
    lexine

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Privacy Framework for RDF Data Mining. Master’s Thesis Project Proposal By: Yotam Aron. Overview. Motivation and Goal Background Proposed Solution and Design Example Conclusion. Motivation. D ata mining continues to become more widespread. Useful for research, public policy, etc. - PowerPoint PPT Presentation

Citation preview

Page 1: Privacy Framework for RDF Data Mining

Privacy Framework for RDF Data Mining

Master’s Thesis Project ProposalBy: Yotam Aron

Page 2: Privacy Framework for RDF Data Mining

OverviewMotivation and GoalBackgroundProposed Solution and DesignExampleConclusion

Page 3: Privacy Framework for RDF Data Mining

MotivationData mining continues to become

more widespread.◦Useful for research, public policy,

etc.Want to maintain privacy of

participants in the database.Little work has been done for

privacy for semantic web data.

Page 4: Privacy Framework for RDF Data Mining

Previous WorkAnonymizationK-Anonimity1

Differential Privacy systems: PINQ2, AIRAVAT3.

Drawbacks:◦Do not apply to semantic web data.◦Do not support SPARQL.

Page 5: Privacy Framework for RDF Data Mining

GoalDevelop a system to protect

dataset participants’ personal data in SPARQL.

Integrates well with existing SPARQL endpoints.

Relatively easy for the user and the administrator to use.

Page 6: Privacy Framework for RDF Data Mining

BackgroundRule-based Privacy Policies in AIRDifferential Privacy

Page 7: Privacy Framework for RDF Data Mining

Rule-based Privacy Policies in AIR4

Rules define patterns in a SPARQL query.

If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.

Page 8: Privacy Framework for RDF Data Mining

AIR Example5

 air:if {:W s:TriplePattern :T . :T log:includes { :X type:F :V }.

 }; air:then [   air:description (“type:F was selected in " q:QUERY) ;   air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] .

SELECT ?s WHERE {?s type:F ?p}

AIR Policy (extract)

Query

AIR will show that the query is non-compliant with Policy4.

Page 9: Privacy Framework for RDF Data Mining

Differential Privacy OverviewMinimize probability of privacy

breach.Maximize statistical accuracy.Definition requires that given two

similar datasets, a function query on those two datasets give similar results with high probability.

Makes no assumptions on the underlying dataset.

Page 10: Privacy Framework for RDF Data Mining

Differential PrivacyDefinition: We say a randomized

computation M provides ɛ-differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M),

Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).

Page 11: Privacy Framework for RDF Data Mining

Differential Privacy in PracticeEach user is given an ɛ value that

cannot be exceeded.Each query qi has some noise value ɛi . In

total, the user’s queries must satisfy the property

Noise (usually Laplace), which depends on the aggregate function, is added with variance

Page 12: Privacy Framework for RDF Data Mining

Limitations of Differential PrivacyOnly statistical data protected.High variance in data yields poor

query results.Theory not always perfect in

practice.◦Assume no collusion among users.◦Covert channel attacks.6

◦What value of ɛ to choose?

Page 13: Privacy Framework for RDF Data Mining

Example, No DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

David 21,000

SELECT COUNT(Name) WHERE (Age < 25)

2

Page 14: Privacy Framework for RDF Data Mining

Example, No DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

SELECT COUNT(Name) WHERE (Age < 25)

1 Big difference in answers!!

Page 15: Privacy Framework for RDF Data Mining

Example, With DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

David 21,000

SELECT COUNT(Name) WHERE (Age < 25)

2 + noise = ~2 (with high probability)

Page 16: Privacy Framework for RDF Data Mining

Example, With DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

SELECT COUNT(Name) WHERE (Age < 25)

1+ noise = ~2 (with high probability)

With high probability, records are indistinguishable!

Page 17: Privacy Framework for RDF Data Mining

Practical Consequences of DPAn individual’s inclusion in the

dataset is not likely a privacy risk.

The answers to the queries can still be useful.

Page 18: Privacy Framework for RDF Data Mining

Achieving Differential Privacy in RDFCurrent techniques for

differential privacy are developed for relational databases.

As a first approximation, reduce triple-store to a relational database.

Improved mechanism as project progresses.

Page 19: Privacy Framework for RDF Data Mining

Example of RDF-RDBS Reduction:Person1 foaf:name “Alice”;

foaf:member :DIGfoaf:age “21”foaf:knows :Person2 :Person3.

:Person2 foaf:name “Bob”;foaf:member :DIG;foaf:knows :Person3.

:Person3 foaf:name “Charlie”;foaf:age “22”.

ID Foaf:name

Foaf:member

Foaf:knows

Foaf:age

Person1 “Alice” DIG [Person2,Person3

“21”

Person2 “Bob” DIG [Person3] None

Person3 “Charlie” None None “22”

Page 20: Privacy Framework for RDF Data Mining

Proposed SolutionSPARQL Privacy Insurance

Module (SPIM)Build layer between user and

endpoint.Integrate both AIR and

differential privacy.Integrate credential-checking

system.Modify existing differential

privacy framework for use with triple-stores.

Page 21: Privacy Framework for RDF Data Mining

ContributionsComplete privacy protection for

triplestores.Differential Privacy sensitivity for

SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.

Page 22: Privacy Framework for RDF Data Mining

System Overview

Page 23: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

Page 24: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

Page 25: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• TAAC Will:• Verify user has

permission to access

• Send central module data about user

Page 26: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM:• Controls order of

privacy operations.

• Interfaces with the SPARQL endpoint.

Page 27: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• AIR:• Reasoner that

uses rule-based policies to check queries for privacy hazards.

• Extracts information for differential privacy.

Page 28: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Policy Files:• Contain the

rules for AIR.

Page 29: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential Privacy Module:• Checks to see

for query limits (based off ɛ use.

• Applies noise to statistical data.

Page 30: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• User Data:• Contains user ɛ

data.

Page 31: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM:• Controls order of

privacy operations.

• Interfaces with the SPARQL endpoint.

Page 32: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Service Description:• Contains

information to be used for the addition of noise.

Page 33: Privacy Framework for RDF Data Mining

• Miscellaneous:• Interface to SPARQL

Endpoint• Transaction File• Improved Differential

Privacy Output• Service Description

Generator

Page 34: Privacy Framework for RDF Data Mining

• Potential Extensions:• Robustness against

attacks• Concurrency• Optimization for large

systems• Customizable UI• Accountability

Page 35: Privacy Framework for RDF Data Mining

Sample ScenarioTriplestore datamining in

biotechnological applications.Biofirm provides data about

hospitals in the US.Alice is a PhD student at MIT.Alice would like to query Biofirm’s

database for research purposes. She just got permissions yesterday and is logging in for the first time.

Page 36: Privacy Framework for RDF Data Mining

PreprocessingBiofirm installs SPIM, and runs

the service description generation code.◦May need to create the correct

interface.Makes sure the UI is accessible

online.

Page 37: Privacy Framework for RDF Data Mining

Sample Compliant QueryAlice would like to know the total

number of visits that Boston hospitals received.

SELECT (SUM(?s) as ?people) WHERE{?h a biofirm:Hospital.?h biofirm:visits ?s.?h biofirm:location geo:Boston.

}

Epsilon value: 1.0

Page 38: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Alice enters query into the provided user interface.

Page 39: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• TAAC insures that biofirm has given Alice access to its triple-store.

Page 40: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Query request arrives at SPIM central module.

Page 41: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Policyrunner is called upon to check query for triple patterns that are in violation.

• No violations found. • Since this is Alice’s

first time, AIR extracts what type of permissions Alice has.

Page 42: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM creates a profile for Alice. • Gives her an ɛ

value (suppose it 2.0).

• Stores it in triple store.

Page 43: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM extracts which variables will yield statistical results and will have differential privacy applied.

Page 44: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential Privacy module assures that query’s results will not exceed given epsilon value.

Page 45: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.

Page 46: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Query is sent to the endpoint.

• Results are received.

Page 47: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential privacy module adds noise to appropriate fields, and updates epsilon values.

Page 48: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM is ready to return the results.

Page 49: Privacy Framework for RDF Data Mining

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Alice receives results.

Page 50: Privacy Framework for RDF Data Mining

SummarySystem will combine rule-based

privacy with differential privacy.Develop differential privacy

techniques for semantic web data.

Make privacy module client and administrator friendly.