50
INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation Information-Flow Control for Location- based Services Nishkam Ravi Joint work with Marco Gruteser*, Liviu Iftode Computer Science, *Winlab, Rutgers University

INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation Information-Flow Control for Location-based Services Nishkam

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation

Information-Flow Control for Location-based

ServicesNishkam Ravi

Joint work with Marco Gruteser*, Liviu IftodeComputer Science,

*Winlab,Rutgers University

Motivation• Personal data commonly used in

internet-based computing– Social security number– Credit card information– Contact information

• User concerns– Where is my data going?– How is it being used?

• Identity theft incidents prevalent

• Database community working on countering illegitimate use of private information

Credit Card Number Social Security Number

Privacy

• Sharing sensitive information while preserving privacy is a challenging task

• Access Control is not sufficient– No control over data after it is read and

shared

• Need to restrain flow of information

?

Credit Card Number Social Security Number

Access Control

Privacy Solutions

• Prevention– Anonymization/Pseudonymization– Data supression/cloaking

• Avoidance– Information-flow control– End-to-end policies

• Cure– Tracking illegitimate flow of information– Punishing adversary

Prevention

Avoidance

Cure

Context-aware Computing

• Shift from “internet” to “ubiquitous” computing

• Ubiquitous computing heavily relies on user context– Location– Activity – Environment

• Context is dynamic in nature– Changes with time and space

Location-based Services

Location

Location

Jams, Accidents,

Gas-Station Location

Restaurants

• Location deemed most important context info• Immense interest in Location-based services (LBS)

LBS

911, Preferential Billing, Asset Tracking, Personnel Tracking

Location Privacy

• Potential for privacy abuse– They know where I am!

• More serious consequences– Location information could aid in criminal investigations

• Recognized by US government– “Location Privacy Protection Act, 2001”– “Wireless Privacy Protection Act, 2003”

Solutions for Location Privacy

• k-anonymization using spatial/temporal cloaking [Gruteser ’03)• Instead of disclosing location, disclose an interval

(x, y) ([ x1, x2], [y1, y2]) (x1 < x < x2, y1 < y < y2)

x1 x2

y2

y1

k = 3

How good is location cloaking?

• Cannot support applications which need precise location information

• Value of k not tailored for services

• Quality of service suffers– Inferior accuracy of results

• Can we have a framework + information-flow control model that preserves both location privacy and quality of service?

Framework for service-specific location privacy

Locatio

n

Location

Location

LocationTrusted Server

Location-Based Service

Results: f (x,y,d)

• Location of subjects maintained on a trusted server• When an LBS needs location information, it migrates a piece of code

to the trusted server• The code executes, reads location information and returns a result

– Distance– Density, Average Speed

Function fData d

Example Applications

• Application of density, average speed– Traffic information service

• Application of distance function– Geographical Routing Service

Main Problem

• The trusted server needs to ensure that the code is location safe– Should not leak location information

Information-flow Control• Information-flow control models restrict flow of

sensitive information in a program/system

• State of the art: Non-interference– Isolates public data from private data

int f(int a, int b){ int c = (a + b)/2;

output (c); }

Public

c

Private

a, b

Isolation Broken

Unix-style Password Checker

byte check(byte username, byte password){

byte match = 0;

for (i=0; i < database.length; i++){

if(hash(username, password) == hash(salts[i], passwords[i])){

match == 1;

break;

}

}

output(match);

}

Value of match depends on private variables

Violates Non-Interference

Non-inference

• In many real systems data isolation is not possible, including LBS

• We propose a new model of information-flow control

that – allows public data to be derived from private data– requires that the adversary does not infer private data from

public data from a single execution of the program

• Example:int f(int a, int b){

int c = (a + b)/2;

output (c);

}

Value of either a or b cannot be inferred from c

Non-inference satisfied

Theoretically…

• Non-inference is undecidable in general

• Decidable for independent executions/uni-directional information flow

Independent Executions

Example:

int f(int a, int b, int i){

int c;

if (i > 1)

c = (a + b)/2;

else

c = (a * b);

output (c);

}

If a, b are x-cordinates of two cars, their values would be different for the two executions

Private: a, b Public: i, c

a and b can be derived from (a + b)/2 and (a * b)

(a1 + b1)/2, (a2 * b2)

Protection Systems [Ullman 1976]

{ (S, O, P), R, Op, C}

{read}

{write}

{write}{read}{read}

o1 o2 o3

s1

s2

s3enter r into (s1,o1) delete r from (s1,o1)create subject s1 create object o1

command c(s1,s2,s3,o1,o2,o3) { if {read} in (s1, o2){

enter {read} in (s1,o2); enter {write} in (s1,o3);

} }

Q Q’

Safety: Can c on Q leak r? - Undecidable! - Decidable without create primitive

Proof Idea: Non-Inference == Safety

1 0 0 0 1 0

0 1 1

• Undecidability: Reduce safety to non-inference– Given a configuration Q find an equivalent program M

• Decidability: Reduce non-inference to safety without create – Given a program M find an equivalent configuration Q– No Create == Independent Executions

p1 p2 p3

o1

o2

o3

Outline

• Motivation

• Non-Inference

• Decidability

• Enforcement

• Evaluation

Deciding Non-inference: Overview

• Derive information-flow relations for a program– Static analysis– Abstract interpretation

• Rewrite information-flow relations as linear equations, and apply theory of solvability of linear equations– We assume all input and output variables are scalars– Type length of variables determined by minimum number of

bits required to store location information (1 byte for now)

Information-flow relations: R1

int f(int a, int b){ int c = (a + b)/2;

output (c); }

V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}

R1(v, e): “the value of variable v may be used in evaluating e”

R1(a, a+b/2) = 1, R1(b, a+b/2) = 1

Information-flow relations: R2

int f(int a, int b){ int c = (a + b)/2;

output (c); }

V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}

R2(e, v): “value of expression e may be used in evaluating variable v”

R2(a+b/2, c) = 1, R2(a+b/2, c) = 1

Information-flow relations: R3

int f(int a, int b){

int c = (a + b)/2;

output (c);

}

V = {a, b, c}, E = {(a+b)/2}, P = {a, b}, O = {c}

R3(v1, v1): “value of variable v1 may be used in evaluating variable v2”

R3 = R1R2 A, where A is the set of assignments

R3(a, c) = 1, R3(b, c) = 1

M = R3(P, O) =

1

1

Linear Equations

• A set of linear equations can be written as:

Ax = B

• Solvable if Rank(A) = Rank([A|B]) = N• Where A is an KxN matrix

• We can show that:

A program satisfies non-inference if

MTP = O and all its subsystems are not solvable

Linear Equations for the Example

MTP = O:

cb

a

11

Rank(MT) = 1 < ( |P| = 2)

Not solvable satisfies non-inference

Approach Overview

• Perform use-def analysis, def-use analysis

• Take transitive closures of def-use and use-def analysis to obtain R1 and R2

• R3 = R1R2 A

• Store R3(P,O) in matrix M

• Inspect solvability of MTP = O

Exampleint f(int x1, int y1, int x2, int y2, int k){

int x, y, dist, avg_x, avg_y;

int x = (x2 – x1)^2;

int y = (y2 – y1)^2;

dist = sqrt(x + y);

output(dist);

if (k > 100){

avg_x = (x1 + x2)/2;

avg_y = (y1 + y2)/2;

output(avg_x);

output(avg_y);

}

}

V = {x1, x2, y1, y2, k, x, y, dist, avg_x, avg_y} , P = {x1, x2, y1, y2}

E = {(x2 – x1)^2, (y2 – y1)^2, sqrt(x + y) , k > 100 , (x1 + x2)/2, (y1 + y2)/2}

Information-flow relations

000000

000000

000000

001000

001000

000111

011001

101010

011001

101010

1 EVR

0000000001

0000000010

0000000011

0000000100

0000001100

0000010100

2 VER

0000000001

0000000010

0000000100

0000001100

0000010100

0000100011

0001001100

0010010110

0100001101

1000010110

3 VVR

101

110

101

110

OPM

Linear Equations for the Example

)4|(|2)(

_

2

2

1

1

0101

1111

)4|(|2)(

_

_

2

2

1

1

0101

1010

)4|(|2)(

_

2

2

1

1

1010

1111

)4|(|3)(

_

_

2

2

1

1

0101

1010

1111

3

2

1

PMRank

yavg

dist

y

x

y

x

PMRank

yavg

xavg

y

x

y

x

PMRank

xavg

dist

y

x

y

x

PMRank

yavg

xavg

dist

y

x

y

x

T

T

T

T

)2(|1)(

_2

111

)2|(|1)(

_2

111

)4|(|1)(

][

2

2

1

1

1111

26

15

4

PMRank

yavgy

y

PMRank

xavgx

x

PMRank

dist

y

x

y

x

T

T

T

None of the subsystems is solvable Satisfies non-inference

Implementation and Evaluation• Implemented a static analyzer that decides non-

inference for Java programs– Doesn’t handle inter-procedural data analysis yet– Used Soot (API for Java bytecode analysis)– Used Indus (API for dataflow analysis)

• Evaluated by testing it on a benchmark– Distance (calculates distance between 2 cars)– Speed (calculates speed in a region)– Density (calculates density of cars in a region)– Attacks such as Wallet, WalletAttack, PasswordChecker,

AverageAttack, IfAttack

Case Study 1: AverageAttack

int average(int x1, int x2, ..int xn){average = (x1 + x2 ..+ xn)/n;

output(average);}

int average-attack(int x1, int x2, ..int xn){x1 = x3; x2 = x3; x4 = x3….; xn = x3;

average = (x1 + x2 ..+ xn)/n; output(average);}MTP = O:

This system is solvable, rejected by our analyzer

Average

xn

x

x

x

..

3

2

1

0..0010

Case Study 2: WalletCan there be false negatives?

int wallet(int p, int q, int c){

if (p > c){

p = p – c;

q = q + c;

}

output(q);

}

MTP = O: [1][p] = [q]

System is solvable, rejected by our analyzer:

False Negative! (Implicit information flows)

p is private : amount of money in the wallet

q is public : amount of money spent

c is public : cost of an item

Case Study 3: Wallet AttackHow bad are false negatives?

int wallet-attack(int p, int q, int c){ n = length(p); while(n >= 0){ c = 2^(n-1); Leaks value of p bit by bit if (p > c){

p = p – c; q = q + c; n = n – 1; } } output(q);}

MTP = O: [1][p] = [q]

System is solvable, rejected by our analyzer

Running Time of Analysis

Conclusions

• Non-inference : a novel information-flow control model

• Allows information to flow from private to public but not vice-versa

• Enforceable using static analysis for uni-directional information flow

• Applicable to location based services

INS & ContextSphere | Columbia Univ. - Feb. 25, 2003 | Confidential © 2002 IBM Corporation

Probabilistic Validation of Aggregated Data in VANETs

Nishkam RaviJoint work with Fabio Picconi, Marco Gruteser*, Liviu Iftode

Computer Science,*Winlab,

Rutgers University

Motivation• Traffic information systems based on V2V data

exchange (e.g TrafficView)

a

Location

Speed

Car Id

e

e a

a

a

b

b c

a

a,b

a,b,c

b

c

d

a

b

c

d

a

Spoof/bogus information

• How can data be validated?

Existing Solutions

• Cross-validation (Golle 2004)– Cross-validate data against a set of rules– Cross-validate data from different cars– Assumes: honest cars > malicious cars– Assumes multiple sources of information

• Use PKI and strong identities (Hubaux 2005)– A tamper-proof box signs data– Keys are changed periodically for privacy– Cross-validation used– High data overhead

• Desired solution: high security, low data overhead

LocationSpeed

TimestampSignatureCertificate

4 bytes

88 bytes

Syntactic Aggregation

Location 1Speed 1

TimestampSignatureCertificate

Location 2Speed 2

TimestampSignatureCertificate

Location nSpeed n

TimestampSignatureCertificate

. . . . . . .

Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2. . . . . . . . . . Location n’, Speed n’, id n

TimestampSignatureCertificate

Location 1’, id 1 Location 2’, id 2

. . . . . . Location n’, id n

TimestampSignatureCertificate

Malicious aggregator can Include bogus information

Semantic Aggregation

Location 1Speed 1

TimestampSignatureCertificate

Location 2Speed 2

TimestampSignatureCertificate

Location nSpeed n

TimestampSignatureCertificate

. . . . . . .

n cars in segment [(x1,y1), (x2,y2)].

TimestampSignatureCertificate

n cars (id1, id2 . .id n)in segment: [(x1,y1),(x2,y2)]

TimestampSignatureCertificate

or

Assumptions

• Tamper-proof service– Stores keys– Signs, timestamps, generates random numbers– Provides a transmit buffer

• Applications are untrusted and implement their own aggregation modules

• Principle of economy of mechanism– “the protection system’s design should be as simple and

small as possible”

Tamper-proof Service

• Trusted Computing– Every layer of the software stack is attested using binary hash– Only well-known software/applications allowed to execute

• BIND (Shi,Perrig,Leendert 2005)– Partial attestation– Data isolation– Provides flexibility

• Implement tamper-proof service in software– Attest using BIND

Our solution

Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2

. . . . . . . . . . .Location n’, Speed n’, id n

Location 1’, Speed 1’, id 1 Location 2’, Speed 2’, id 2

. . . . . . . . . .Location n’, Speed n’, id n

TimestampRandom Number r

Tamper-proofService

Transmit buffer

r mod n

Location 2Speed 2

TimestampSignatureCertificate

SignatureCertificate

Receiver validates the aggregated record

Multiple random numbers and proof records improve probability of success

Evaluation

• Metric: security/bandwidth

• Base Case 1– All records signed and certified– High security, high bandwidth usage

• Base Case 2– Semantic aggregation, no certificates – Minimal bandwidth usage, no security

• Our solution– Somewhere in between

Bandwidth usage

Bandwidth requirement of our solution compared with the two base cases

Bandwidth requirement of our solution = m*d + n*(d + 90) + 88

n = 1, d = 4 bytes n = 4, d = 4 bytes

Security

Security of our solution compared with the two base cases (f/m = 0.5)

Security of our solution: 1 – (1 – f/m)^n

Security/Bandwidth

Security/Bandwidth of our solution compared with the two base cases

Conclusions

• Used the idea of random checks to validate data

• PKI based authentication, tamper-proof service

• Evaluated our solution on a new metric: security/bandwidth

Demo: Indoor Localization Using Camera Phones

• User wears phone as a pendant• Camera clicks and sends images to a web-server via GPRS• Web-server compares query images with tagged images and

sends back location updates• No infrastructure required

– Neither custom hardware nor access points are required– Physical objects do not have to be “tagged”– Users do not have to carry any special device

• User orientation is also determined

Images tagged with location

Web

ServiceImages

Location Update