65
INFORMATION RETRIEVAL (IR) (PRIVATE VS. PUBLIC) VENINGSTON. K Ph.D. Student, Department of CSE, Government College of Technology, Coimbatore. [email protected]

Information Retrieval AICTE FDP at GCT Coimbatore

Embed Size (px)

DESCRIPTION

The presentation is about Public and Private Information Retrieval Systems.

Citation preview

Page 1: Information Retrieval AICTE FDP at GCT Coimbatore

INFORMATION RETRIEVAL (IR)

(PRIVATE VS. PUBLIC)

VENINGSTON. K

Ph.D. Student, Department of CSE,

Government College of Technology, Coimbatore.

[email protected]

Page 2: Information Retrieval AICTE FDP at GCT Coimbatore

PRESENTATION OUTLINE

Public IR

What is Web IR?

Overview of Web IR Technologies

Web IR Models

Web Search architecture

Semantic Matching

Personalization in Web IR

Challenges in Web based IR

Challenges in Personalizing Web IR

Summary Note

Private IR

What is Private IR?

How Does It Work?

PIR Model

Approaches to PIR

PIR Properties

Summary Note

2

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

Page 3: Information Retrieval AICTE FDP at GCT Coimbatore

WHY INFORMATION RETRIEVAL? 11

/Dece

mb

er/2

01

3

3

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 4: Information Retrieval AICTE FDP at GCT Coimbatore

WEB INFORMATION RETRIEVAL

(WEB SEARCH)

Technologies for helping users to accurately,

quickly, and easily find information on the web

11

/Dece

mb

er/2

01

3

4

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 5: Information Retrieval AICTE FDP at GCT Coimbatore

GOAL OF WEB SEARCH

Accurate Efficient Easy to Use

Results are

relevant

Response time

is short

Good user

experience

Results are

comprehensive

Results are

novel

Fast task

completion

11

/Dece

mb

er/2

01

3

5

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 6: Information Retrieval AICTE FDP at GCT Coimbatore

WEB USERS HEAVILY RELY ON SEARCH

ENGINES

11

/Dece

mb

er/2

01

3

6

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 7: Information Retrieval AICTE FDP at GCT Coimbatore

HUGE DATA CENTERS 11

/Dece

mb

er/2

01

3

7

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 8: Information Retrieval AICTE FDP at GCT Coimbatore

OVERVIEW OF WEB SEARCH

TECHNOLOGIES

General Web Search, Entity Search, Facet

Search, Question Answering, Multimedia Search

Ranking, Matching, Retrieval Document

Understanding, Query Understanding, Crawling,

Indexing, Result Presentation, Anti-spam

Classification, Clustering, Ranking, Graph

Learning, Tagging, Distributed Computing

11

/Dece

mb

er/2

01

3

8

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 9: Information Retrieval AICTE FDP at GCT Coimbatore

WEB SEARCH ARCHITECTURE

Query

StringIR

System

Ranked

Documents

1. Page1

2. Page2

3. Page3

.

.

Document

corpus

Web Spider

9

11

/Dece

mb

er/2

01

3

9

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 10: Information Retrieval AICTE FDP at GCT Coimbatore

COMPONENT TECHNOLOGIES FOR WEB IR

Relevance Ranking

Importance Ranking

Web Page Understanding

Query Understanding

Crawling

Indexing

Search Result Presentation

Anti-Spam

Search Log Data Mining / Web Mining

11

/Dece

mb

er/2

01

3

10

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 11: Information Retrieval AICTE FDP at GCT Coimbatore

THREE IMPORTANT PROCESSES IN WEB IR

Retrieval

Finding documents from inverted index

Matching

Calculating relevance score between query and

document pair

Ranking

Ranking documents based on relevance scores,

importance scores, etc.,

11

/Dece

mb

er/2

01

3

11

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 12: Information Retrieval AICTE FDP at GCT Coimbatore

WEB IR MODELS

Vector Space Model (Salton 1975 )

Probabilistic Model

Okapi or BM25 Model (Robertson and Walker

1994 )

Language Model (Ponte and Croft 1998 )

User Model

11

/Dece

mb

er/2

01

3

12

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 13: Information Retrieval AICTE FDP at GCT Coimbatore

VECTOR SPACE MODEL 11

/Dece

mb

er/2

01

3

13

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 14: Information Retrieval AICTE FDP at GCT Coimbatore

PROBABILISTIC MODEL 11

/Dece

mb

er/2

01

3

14

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 15: Information Retrieval AICTE FDP at GCT Coimbatore

OKAPI OR BM25 MODEL 11

/Dece

mb

er/2

01

3

15

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 16: Information Retrieval AICTE FDP at GCT Coimbatore

LANGUAGE MODEL 11

/Dece

mb

er/2

01

3

16

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 17: Information Retrieval AICTE FDP at GCT Coimbatore

USER MODEL

User models are personal characteristics of the

user that the system maintains

A user profile can be thought as a user model

Types of user models Depending on the user being modeled

Individual

Canonical (group)

Depending on Acquisition model

Explicit (stated)

Implicit (inferred)

11

/Dece

mb

er/2

01

3

17

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 18: Information Retrieval AICTE FDP at GCT Coimbatore

SEMANTIC MATCHING 11

/Dece

mb

er/2

01

3

18

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 19: Information Retrieval AICTE FDP at GCT Coimbatore

PERSONALIZATION - ENVIRONMENTS WHERE

IS BEING USED

Databases

Newsgroups

Personal Information Management (desktop files, E-mail,

bookmarks, etc.)

News: electronic journals

Search engines

Web sites

Business

e-commerce

e-health

e-etc.,

11

/Dece

mb

er/2

01

3

19

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 20: Information Retrieval AICTE FDP at GCT Coimbatore

OBJECTIVES

To enhance the Personalized Web Search and

Retrieval with an intention to satisfy user‟s search

context

To customize the Web Information Retrieval (IR)

for users.

To Provide results specific to individual users.

It is predominantly important because different users

expect different information even for the same query

To predict whether personalization required or not

To develop Computationally intelligent and

efficient algorithm for this personalization task

11

/Dece

mb

er/2

01

3

20

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 21: Information Retrieval AICTE FDP at GCT Coimbatore

PERSONALIZATION IN WEB IR [1/2]

Web Personalization is viewed as an application

of data mining and machine learning techniques

to build models of user behavior that can be

applied to the task of predicting user needs and

adapting future interactions with the ultimate

goal of improved user satisfaction.

11

/Dece

mb

er/2

01

3

21

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 22: Information Retrieval AICTE FDP at GCT Coimbatore

PERSONALIZATION IN WEB IR [2/2]

Initially Search engines were concerned with

retrieving relevant documents to a query.

Within the information overload on the web,

it is increasingly difficult for search engines

to satisfy the individual user needs.

Personalization has long been recognized as

an avenue to greatly improve search

experience.

Disambiguates the web search by modeling

the user profile by his/her interests and

preferences.

11

/Dece

mb

er/2

01

3

22

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 23: Information Retrieval AICTE FDP at GCT Coimbatore

PROBLEM DESCRIPTION

Personalization in Web IR

Customize search results according to each individual user

Research questions in Personalized Web IR

What to use to Personalize?

How to model and represent past search contexts?

How to Personalize?

How to use it to improve search results?

When not to Personalize?

How to decide whether personalization required or not?

How to know Personalization helped?

How to evaluate personalized results?

11

/Dece

mb

er/2

01

3

23

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 24: Information Retrieval AICTE FDP at GCT Coimbatore

GENERAL PROBLEM STATEMENT

When search query is issued, most of the search

engines return the same results irrespective of

the users interest

Lack the existence of semantic structure and

hence it makes difficult for the machine to

understand the information provided by the user

Lack in Identifying intention of the user

Lack in processing Inaccurate / Ambiguous

queries imprecise keyword

11

/Dece

mb

er/2

01

3

24

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 25: Information Retrieval AICTE FDP at GCT Coimbatore

RELATED WORKS

Short term personalization - book mark

Long term personalization - browsing history

Result Diversification - Query reformulation

Collaborative personalization - for group of

users

Search interaction personalization - Clicks

Session based personalization

Location based personalization

Task based personalization

and so on…

11

/Dece

mb

er/2

01

3

25

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 26: Information Retrieval AICTE FDP at GCT Coimbatore

ARCHITECTURE OF PERSONALIZATION BASED

WEB IR

Rankings

Document

corpus

Ranked

Documents

1. Doc1

2. Doc2

3. Doc3

.

.

1. Doc1

2. Doc2

3. Doc3

.

.

Feedback

Query

String

Revise

d

Query

Re-Ranked

Documents

1. Doc2

2. Doc4

3. Doc5

.

.

Query

Reformulation

Personalized

IR

Web

11

/Dece

mb

er/2

01

3

26

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 27: Information Retrieval AICTE FDP at GCT Coimbatore

CHALLENGES FOR WEB IR

Distributed Data: Documents spread over millions of different web servers.

Volatile Data: Many documents change or disappear rapidly (e.g. dead links).

Large Volume: Billions of separate documents.

Unstructured and Redundant Data: No uniform structure, HTML errors, up to 30% near duplicate documents.

Quality of Data: No editorial control, false information, poor quality writing, typos, etc.

Heterogeneous Data: Multiple media types (images, video), languages, character sets, etc.

11

/Dece

mb

er/2

01

3

27

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 28: Information Retrieval AICTE FDP at GCT Coimbatore

CHALLENGES FOR PERSONALIZATION IN

WEB IR

From the system centered approach to a

user centered approach to IR

Modeling the user context in personalized

IR

Exploiting the user context to enhance

search quality

The privacy issues

The evaluation issues

11

/Dece

mb

er/2

01

3

28

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Focused on the

next part of

presentation

Page 29: Information Retrieval AICTE FDP at GCT Coimbatore

POSSIBLE APPROACHES TO INFORMATION

RETRIEVAL

Statistical approaches

◦ Co-occurrence of features between document

and query

◦ Rank documents based on similarity

Semantic approaches

◦ “Understand” the query, find matching

documents

User profile approaches

◦ User profiles store approximations of user

interests

11

/Dece

mb

er/2

01

3

29

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 30: Information Retrieval AICTE FDP at GCT Coimbatore

BENEFITS OF PERSONALIZED SEARCH

Resolving ambiguity

The profile provides a context to the query in order

to reduce ambiguity.

Example: The profile of interests will allow to distinguish what

the user asked about “Jaguar” (“Animal”, “Car”) really wants

Revealing hidden treasures

The profile allows to bring the most relevant

documents, which could be hidden beyond top

results page

Example: Owner of iPhone searches for Google Android. Pages

referring to both would be most interesting

11

/Dece

mb

er/2

01

3

30

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 31: Information Retrieval AICTE FDP at GCT Coimbatore

WHERE TO APPLY USER PROFILES?

The user profile can be applied in several ways

To modify the query itself pre-processing

Query Expansion User profile is applied to add

terms to the query

To process results of a query post-processing

To present document snippets

Adaptation of meta-search

11

/Dece

mb

er/2

01

3

31

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 32: Information Retrieval AICTE FDP at GCT Coimbatore

VARIATIONS OF USER PROFILE USAGE1

1/D

ece

mb

er/2

01

3

32

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 33: Information Retrieval AICTE FDP at GCT Coimbatore

SUMMARY ON IR

Web Information Retrieval is a very challenging

yet exciting area!

Solution: Learning individual user to match the

query with the document

Personalized Web Information Retrieval

Promises significant quality improvements. However,

they are far from optimal

Thus, more research is necessary in the field of IR

“Computational Intelligence“ could be adopted by

search tools to manage effectively search,

retrieval, filtering and presenting relevant

information.

11

/Dece

mb

er/2

01

3

33

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 34: Information Retrieval AICTE FDP at GCT Coimbatore

PRIVATE INFORMATION RETRIEVAL (PIR)

[1995]

Goal: allow user to query database while hiding the identity of the data-items.

Note: hides identity of data-items; not existence of interaction with the user.

Motivation: patent databases; stock quotes; web access and so on.

Paradox(?): imagine buying in a store without the seller knowing what you buy.

(Encrypting requests is useful against third parties; not against owner of data.)

11

/Dece

mb

er/2

01

3

34

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 35: Information Retrieval AICTE FDP at GCT Coimbatore

WHAT IS PRIVATE INFORMATION

RETRIEVAL?

Real-World Example:

Suppose there is a movie database and we

want to find information on the movie „Indian‟

We do not want anyone to know about our

interest in this movie.

11

/Dece

mb

er/2

01

3

35

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 36: Information Retrieval AICTE FDP at GCT Coimbatore

THE GOAL OF PIR

Suppose there is a movie database and we want

to find information on the movie „Endiran‟

We do not want the database operator to know

about our interest in this movie.

Users' intentions are to be kept secret

11

/Dece

mb

er/2

01

3

36

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 37: Information Retrieval AICTE FDP at GCT Coimbatore

HOW DOES IT WORK?

Very Simple approach

Download the entire database

Improved approach

Suppose there is a database with blocks D1,…, Dr.

A client wants to retrieve block Dα from the database

in such a way that the database operator learns

nothing about α.

Do this without downloading the entire database.

11

/Dece

mb

er/2

01

3

37

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 38: Information Retrieval AICTE FDP at GCT Coimbatore

GOLDBERG‟S SCHEME

We can represent a database of r blocks as an rxs

matrix D and get the αth block (αth row) of D

using simple linear algebra

Dα = eα.D

Where eα =[0 0 … 1… 0] is a vector with all zeros,

except a one for the α coordinate.

There are l servers, each with a copy of the

database.

We secretly share eα in to v1,….,vl and send one to

each server.

Each server computes and sends their response

ri=vi.D

11

/Dece

mb

er/2

01

3

38

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 39: Information Retrieval AICTE FDP at GCT Coimbatore

GOLDBERG‟S SCHEME

The responses r1,….rk are secret shares for Dα. (k

is the number of responses)

What happens if some of the responses are

wrong?

11

/Dece

mb

er/2

01

3

39

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 40: Information Retrieval AICTE FDP at GCT Coimbatore

AOL SEARCH LOG DATA SCANDAL

#4417749: clothes for age 60

60 single men

best retirement city

jarrett arnold

jack t. arnold

jaylene and jarrett arnold

gwinnett county yellow pages

rescue of older dogs

movies for dogs

sinus infection

Thelma Arnold

62-year-old widow

Lilburn, Georgia

11

/Dece

mb

er/2

01

3

40

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 41: Information Retrieval AICTE FDP at GCT Coimbatore

OBSERVATION

The owners of databases know a lot about the users!

This poses a risk to users‟ privacy.

E.g. consider database with stock prices

What can we do?

Trust them that they will protect our secrecy,

or

Use Cryptography

11

/Dece

mb

er/2

01

3

41

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 42: Information Retrieval AICTE FDP at GCT Coimbatore

HOW CAN CRYPTO HELP?

Note: This problem has nothing to do with

secure communication!

user U database D

11

/Dece

mb

er/2

01

3

42

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 43: Information Retrieval AICTE FDP at GCT Coimbatore

CURRENT SETTING

user Udatabase D

A new primitive:

Private Information Retrieval (PIR)

secure link

11

/Dece

mb

er/2

01

3

43

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 44: Information Retrieval AICTE FDP at GCT Coimbatore

MODELING PIR

Server: holds n-bit string x

n should be thought of as very large

User: desires to retrieve xi and

to keep i private

11

/Dece

mb

er/2

01

3

44

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 45: Information Retrieval AICTE FDP at GCT Coimbatore

x=x1,x2 , . . ., xn {0,1}n

SERVER

i {1,…n}

xi

USER

i j

PRIVATE PROTOCOL TO INFORMATION

RETRIEVAL

11

/Dece

mb

er/2

01

3

45

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 46: Information Retrieval AICTE FDP at GCT Coimbatore

There is NO privacy preservation.

Communication Cost: log n

SERVER

USER

x =x1,x2 , . . ., xn

xi

NON-PRIVATE PROTOCOL

i

i {1,…n}

11

/Dece

mb

er/2

01

3

46

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 47: Information Retrieval AICTE FDP at GCT Coimbatore

Server sends entire database x to User.

Information theoretic privacy.

Communication Cost: n

SERVER

xi

USER

x =x1,x2 , . . ., xn

x1,x2 , . . ., xn

TRIVIAL PRIVATE PROTOCOL

Is this optimal?

“The number of bits communicated between U and S has to be smaller

than n.”

11

/Dece

mb

er/2

01

3

47

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 48: Information Retrieval AICTE FDP at GCT Coimbatore

PROBLEM

In any 1-server PIR with information

theoretic privacy the communication is at

least n.

11

/Dece

mb

er/2

01

3

48

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 49: Information Retrieval AICTE FDP at GCT Coimbatore

POSSIBLE SOLUTIONS

User is asked for additional random indices.

Drawback: reveals a lot of information

Employ general crypto protocols to compute xi

privately.

Drawback: highly inefficient (polynomial in n).

Anonymity.

Note: Hides identity of user; not the fact that xi is retrieved.

11

/Dece

mb

er/2

01

3

49

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 50: Information Retrieval AICTE FDP at GCT Coimbatore

ANONYMITY - EXAMPLE

Original Data vs. Anonymized Data

11

/Dece

mb

er/2

01

3

50

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 51: Information Retrieval AICTE FDP at GCT Coimbatore

TWO APPROACHES

Information-Theoretic PIR

Replicate database among k servers.

Unconditional privacy against t servers.

Computational PIR

Computational privacy, based on cryptographic assumptions.

11

/Dece

mb

er/2

01

3

51

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 52: Information Retrieval AICTE FDP at GCT Coimbatore

INFORMATION THEORETIC PRIVACY

(PERFECT PRIVACY)

The distribution of the queries the user sends to

any server is independent of the index he/she

wishes to retrieve.

This means that each server cannot gain any

information about user‟s interest regardless of

his computational power.

11

/Dece

mb

er/2

01

3

52

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 53: Information Retrieval AICTE FDP at GCT Coimbatore

COMPUTATIONAL PRIVACY

The distributions of the queries the user sends to

any server are computationally indistinguishable

by varying the index.

This means that each server cannot gain any

information about user‟s interest provided that

he/she is computationally bounded.

11

/Dece

mb

er/2

01

3

53

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 54: Information Retrieval AICTE FDP at GCT Coimbatore

COMMUNICATION COST

Multiple servers, information-theoretic

PIR: 2 servers, comm. n1/2

k servers, comm. n1/k

log n servers, comm. Poly( log(n) )

Single server, computational PIR: Comm. Poly( log(n) )

11

/Dece

mb

er/2

01

3

54

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 55: Information Retrieval AICTE FDP at GCT Coimbatore

K-SERVER PIR

Correctness: User

obtains xi

Privacy: No single

server gets

information about i

U

S1x {0,1}n

S2x {0,1}n

i

x {0,1}nSk

11

/Dece

mb

er/2

01

3

55

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 56: Information Retrieval AICTE FDP at GCT Coimbatore

input:

PIR PROPERTIES

B1 B2 … Bw

input:

index i = 1,…,w

• the user learns Bi

• the database does not learn i

• the total communication is < w

Note: secrecy of the database is not required

correctness

secrecy (of the user)

non-triviality

These properties needs to be defined more formally!

polynomial time randomized interactive algorithms

11

/Dece

mb

er/2

01

3

56

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 57: Information Retrieval AICTE FDP at GCT Coimbatore

PIR PROPERTIES

Correctness

In every invocation of the protocol the user retrieves

the bit he is interested in (i.e. xi)

Privacy

In every invocation of the protocol each server does

not gain any information about the index of the bit

retrieved by the user (i.e. i).

11

/Dece

mb

er/2

01

3

57

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 58: Information Retrieval AICTE FDP at GCT Coimbatore

PIR DOESN‟T EXISTS [1/4]

Correctness, Non-triviality and Secrecy CANNOT be satisfied simultaneously.

Def: A transcript T is possible for (i,B) if P(T(i,B) = T) > 0

Take some T’, and look where it is possible:

T’ T’

T’ T’

indices i

data

base

s B

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

58

Page 59: Information Retrieval AICTE FDP at GCT Coimbatore

PIR DOESN‟T EXISTS [2/4]

secrecy → if

T’ is possible for some B and i

then

it is possible for B and all the other i’s

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

indices i

data

base

s B

T’ T’

T’ T’

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

59

Page 60: Information Retrieval AICTE FDP at GCT Coimbatore

PIR DOESN‟T EXISTS [3/4]

non-triviality → length(transcript) < length(database)↓

# transcripts < #databases↓

there has to exist T’ that is possible for two databases B0 and B1

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

data

base

s B

← B0

← B1

indices i

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

60

Page 61: Information Retrieval AICTE FDP at GCT Coimbatore

PIR DOESN‟T EXISTS [4/4]

B0 and B1 differ on at least one index i’. So, if i’ is the input of the user then

correctness → contradiction

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’

data

base

s B

← B0

← B1

i‟

indices i

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

61

Page 62: Information Retrieval AICTE FDP at GCT Coimbatore

THUS, IDEAL PIR DOESN‟T EXIST!

How to bypass the impossibility result?

Two ideas:

limit the computing power of a cheating database

use a larger number of “independent” databases

11

/Dece

mb

er/2

01

3A

ICT

E F

DP

on

Web

Ap

plica

tion

Secu

rity

62

Page 63: Information Retrieval AICTE FDP at GCT Coimbatore

SUMMARY

Complexity of PIR

Communication

Computation

Possible Extensions

Symmetric PIR

User may not learn any item other than the one he/she

requested

Searching by key-words

Public-key encryption with key-word search

11

/Dece

mb

er/2

01

3

63

AIC

TE

FD

P o

n W

eb

Ap

plica

tion

Secu

rity

Page 64: Information Retrieval AICTE FDP at GCT Coimbatore

REFERENCES

Xiaohui Tao, Yuefeng Li, and Ning Zhong, “A Personalized Ontology model for Web information gathering”, IEEE Trans. Knowledge and Data Engg., vol.23, No. 4, pp 496-511, April 2011.

Markus Strohmaier, Mark Kr¨oll“Acquiring Knowledge about human goals from search query logs”, ACM Transactions on Information System, March 2011.

K.W.-T. Leung, W. Ng, and D.L. Lee, “Deriving Concept- Based User Profiles from Search Engine Logs,” IEEE Trans. Knowledge and Data Engg., vol. 22, no. 7, pp 969-982, July. 2010.

Zhicheng Dou, Ruihua Song, Ji-Rong Wen, and Xiaojie Yuan, “Evaluating the Effectiveness of Personalized Web Search” IEEE Trans. Knowledge and Data Engg., Vol. 21, No. 8,pp 1178-1190, Aug 2009.

Y. Li and N. Zhong. “Mining Ontology for Automatically Acquiring Web User Information Needs”, IEEE Transactions on Knowledge and Data Engg., 18(4), pp 554-568, April 2006.

Fang Liu, Clement Yu, Weiyi Meng, “Personalized Web Search for Improving Retrieval Effectiveness” IEEE Trans. Knowledge and Data Engg., Vol. 16, No. 1,pp 28-40, January 2004.

B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval”. Journal of the ACM 45(6),pp 965-982, 1995.

Page 65: Information Retrieval AICTE FDP at GCT Coimbatore

THANKING YOU