Realtime search

  • View
    791

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Text of Realtime search

  • 1. Realtime Search
  • 2. Reference
    • A Billion Queries Per Day
    • Bigtable: A Distributed Sorage System for Structured Data
    • Large-scale Incremental Processing Using Distributed Transactions and Notifications
  • 3. Realtime Search @ Twitter
  • 4.
  • 5.
  • 6. Posting list format and early query terminal
    • Only evaluate as few documents as possible before terminating the query
    • Rank documents in reverse time order (newest documents first)
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Index memory model
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. Lock-free algorithms and data structures
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. Realtime Search @ Twitter
    • Q & A
  • 47. Realtime Search @ Google
  • 48. Arguement about MapReduce
    • MapReduce: A major step backwards
    • MapReduce and Parallel DBMSs: Friend or Foe
    • MapReduce: A Flexible Data Processing Tool
  • 49. Google's realtime search
    • Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.
  • 50. Avoid indexes duplicates
    • MapReduce can't deal with any update
    • MapReduce must be run again over the entire repository
  • 51. Where to store index at Google
    • DBMS
      • Can't handle the sheer volume of data: Google's indexing System store tens of petabytes across thousands of machines.
    • Bigtable
      • Can't provide tools to help programmers maintain data invariants in the face of concurrent updates
  • 52. Percolator
    • Incremental processing model
      • it maintains a very large repository of documents and update it efficiently
    • Processing many updates concurrently
  • 53. Percolator
    • Percolator provides two main abstractions for performing incremental processing at large scale:
      • ACID transactions over a random-access repository
      • observers, a way to organize an incrementa