Realtime search

哈工大信息检索研究中心哈工大信息检索研究中心

Realtime Search

罗磊


Reference

A Billion Queries Per Day Bigtable: A Distributed Sorage System for Str

uctured Data Large-scale Incremental Processing Using Dist

ributed Transactions and Notifications


Realtime Search @ Twitter




Posting list format and early query terminal Only evaluate as few documents as possible be

fore terminating the query Rank documents in reverse time order (newest

documents first)








Index memory model



















Lock-free algorithms and data structures
















Realtime Search @ Twitter

Q & A


Realtime Search @ Google


Arguement about MapReduce

MapReduce: A major step backwards MapReduce and Parallel DBMSs: Friend or Fo

e MapReduce: A Flexible Data Processing Tool


Google's realtime search

Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.


Avoid indexes duplicates

MapReduce can't deal with any update MapReduce must be run again over the entire r

epository


Where to store index at Google

DBMS Can't handle the sheer volume of data: Google's in

dexing System store tens of petabytes across thousands of machines.

Bigtable Can't provide tools to help programmers maintain

data invariants in the face of concurrent updates


Percolator

Incremental processing model it maintains a very large repository of documents a

nd update it efficiently Processing many updates concurrently


Percolator

Percolator provides two main abstractions for performing incremental processing at large scale: ACID transactions over a random-access repositor

y observers, a way to organize an incremental compu

tation


Percolator


Transaction is a must






Transcation on Percolar











Transcation processing is complicated by the possibility of client failure table server failure does not affect the system since

Bigtable guarantees that written locks persist across talbe server failures

A transaction will not clean up a lock unless it suspects that a lock belongs to a dead or stuck worker


Notifications

Transactions let the user mutate the table while maintaining invariants, but users also need a way to trigger and run the transactions

To provide notifications, Percolator maintans a special "observation" column.


Optimization

Reduce RPC batching when reading from the same table ser

ver Hacking on Linux kernel to support high threa

d counts


Evaluation


Evalution


Realtime Search @ Google

Q & A


Realtime Search

Realtime Search @ Twitter Realtime Search @ Google

Thank you

Documents

Realtime search