Bigtable: A Distributed Sorage System for Structured Data
Large-scale Incremental Processing Using Distributed Transactions and Notifications
3. Realtime Search @ Twitter
4.
5.
6. Posting list format and early query terminal
Only evaluate as few documents as possible before terminating the query
Rank documents in reverse time order (newest documents first)
7.
8.
9.
10.
11.
12.
13. Index memory model
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31. Lock-free algorithms and data structures
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46. Realtime Search @ Twitter
Q & A
47. Realtime Search @ Google
48. Arguement about MapReduce
MapReduce: A major step backwards
MapReduce and Parallel DBMSs: Friend or Foe
MapReduce: A Flexible Data Processing Tool
49. Google's realtime search
Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.
50. Avoid indexes duplicates
MapReduce can't deal with any update
MapReduce must be run again over the entire repository
51. Where to store index at Google
DBMS
Can't handle the sheer volume of data: Google's indexing System store tens of petabytes across thousands of machines.
Bigtable
Can't provide tools to help programmers maintain data invariants in the face of concurrent updates
52. Percolator
Incremental processing model
it maintains a very large repository of documents and update it efficiently
Processing many updates concurrently
53. Percolator
Percolator provides two main abstractions for performing incremental processing at large scale: