22
AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin Rajman, Karl Aberer VLDB_2008 2012/5/16 1

AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

Embed Size (px)

Citation preview

Page 1: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

1

AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network

Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin Rajman, Karl Aberer

VLDB_2008

2012/5/16

Page 2: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

2

OutlinesIntroductionDistributed indexing/retrievalAlvisP2P architectureAlvisP2P softwareConclusion

2012/5/16

Page 3: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

3

IntroductionThe AlvisP2P IR engine enables

efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. ◦Uses an optimized overlay network.◦A novel indexing/retrieval

mechanisms.◦Ensure low bandwidth consumption.◦Enabling unlimited network growth.

2012/5/16

Page 4: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

4

Cont.Two important properties:

◦the generated distributed index stores posting lists for carefully chosen indexing term combinations (hereafter called keys), and

◦the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements.

2012/5/16

Page 5: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

5

Cont.Guaranteeing acceptable storage

and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size.

2012/5/16

Page 6: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

6

Cont.Two key generation techniques :

◦Highly Discriminative Keys (HDK)◦Query-Driven Indexing (QDI)

Overcoming the scalability problem of single-term retrieval in structured P2P networks, while preserving a retrieval quality fully.

2012/5/16

Page 7: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

7

Distributed indexing/retrieval

Each peer is responsible for:◦the generation of index entries to be

stored in the global distributed index for its local documents, and

◦the storage and maintenance of the fraction of the global index associated with the keys that have been assigned to the peer by the hashing mechanism used in the underlying DHT.

2012/5/16

Page 8: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

8

Cont.HDK approach :

◦Generates new keys during the indexing phase based on observed document frequencies:

◦Each time a posting list for some key k exceeds a predefined size, new indexing keys with more terms are generated.

QDI approach :◦Using decentralized monitoring of query

statistics to detect and index new popular keys, as well as to remove obsolete keys from the index.

2012/5/16

Page 9: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

9

Cont.

Steps :A peer receives a new query.To explore the lattice of query

term combinations(query lattice).The querying peer requests the

posting list associated with the term combination from the peer responsible for it.

2012/5/16

Page 10: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

10

Cont.If the term combination is indeed

present in the global index, the requested posting list is sent back to the querying peer.

If this list is not truncated, the part of the query lattice dominated by the term combination is excluded from further lattice exploration.

2012/5/16

Page 11: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

11

Cont.

2012/5/16

Additional approximations can be made to improve load balancing with an only marginal loss in retrieval precision.

Page 12: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

12

Cont.During the exploration, each

contacted peer also updates the usage statistics for the requested term combination.

The querying peer produces their union, ranks all the documents w.r.t the original query, and presents the top-ranked results to the user.

2012/5/16

Page 13: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

13

Cont.On-demand indexing mechanism.The peer responsible for this key

acquires a new posting list containing a bounded number of top-ranked document references.

The key is then considered as indexed and can thus be used for subsequent query processing.

2012/5/16

Page 14: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

14

Cont.Obsolete keys can be removed.

Resulting in an efficient indexing structure adaptive to the current query popularity distribution.

Increasing the overall retrieval quality.

2012/5/16

Page 15: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

15

AlvisP2P architecture

2012/5/16

Page 16: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

16

Cont.The components in higher layers

exclusively rely on the functionalities provided by lower layers.

Layers 1 and 2 implement the peer-to-peer overlay infrastructure.

Layer 2 consists of a Distributed Hash Table (DHT) that is able to sustain high traffic loads.

2012/5/16

Page 17: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

17

Cont.Layer 3 implements one of the

aforementioned techniques.Layer 4 is responsible for

ranking.◦Ranking model◦Global document frequencies,

average document length, term frequencies and other statistical information.

2012/5/16

Page 18: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

18

Cont.Layer 5 implements a possibly

sophisticated “local search engine”.

2012/5/16

Page 19: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

19

AlvisP2P softwareThe user can use a Web browser

to query the AlvisP2P network.

2012/5/16

Page 20: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

20

Cont.Figure 5 shows the client's

“Search” tab with a query result.

2012/5/16

Page 21: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

21

Cont.The corresponding “Manager of

shared documents” tab is displayed in Figure 6.

2012/5/16

Page 22: AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin

22

ConclusionWe have presented the AlvisP2P

prototype for scalable full-text P2P-IR that uses carefully selected indexing term combinations associated with possibly truncated posting lists.

The AlvisP2P can comparable to state-of-the-art centralized search engine.

2012/5/16