Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Large Systems:Large Systems:Design + Design + Implementation:Implementation:
➢ Google SearchGoogle Search
Image (c) Facebook
2
Case Study: Google Evolution
Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer Scientist Lecture lecture, November, 2010
Jeff Dean, “Evolution and future directions of large-scale storage and computation systems at Google”, SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, ACM, New York, NY, USA (2010), pp. 1-1
https://research.google.com/pubs/jeff.html
3
4
5
6
7
8
9
10
11
Leaf servers handle both index & doc requests from in-memory data structures
12
Leaf servers handle both index & doc requests from in-memory data structures
Coordinates index switching as new shards become available
13
New Problems
More collections to search besides Web More structured: Maps
Need more real-time results
14
More Real-Time
Creating Index was batch process via MapReduce Store all documents in GFS (==HDFS) Run several MapReduce jobs to create index Upload index to Leaf servers
New documents would not show up in search results for 2-3 days [Peng and Dadek, 2010]
Needed lower “time from crawl-to-search-hit” Solution:
New data storage system: Colossus / BigTable Event-driven, incremental processing: Caffeine /
Percolator
15
BigTable:
16
BigTable:
17
Caffeine / Percolator
Crawler uploads new version of page in BigTable Updates to BigTable can trigger code E.g. code to create index Push index update to Leafs