2
Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load Balancing Failure Detection Architecture Data Partitioning and Replication Fault Tolerance Evaluation References [1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 12031213. Web caching with consistent hashing. [2] Giuseppe DeCandia, et alProceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Dynamo: Amazon's Highly Available Key-Value Store. [3] Rajesh Nishtala, et al NSDI 2013. Scaling Memcache at Facebook. [4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution. • Backend server to multiple virtual nodes mapping • Even partitioning of the data amongst servers • Load Redistribution on addition/removal of a backend server • Replication at N backend servers where N=3 High Availability Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability. Works on a locality aware request distribution infrastructure with multiple front end servers. The front-end server to serve a request is selected using round-robin scheduling. Front-end server uses two level consistent hash ring to determine the backend server that would serve the request. Coordinator server manages addition and removal of nodes. Inverted Indices sharded across the backend servers. Replication across backend servers to achieve fault tolerance and good availability.

Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection

Embed Size (px)

Citation preview

Page 1: Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection

Khoj: A Highly Scalable and Available SearchHarneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi

System Overview

Load Balancing Failure Detection

Architecture

Data Partitioning and Replication

Fault Tolerance Evaluation

References[1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 12031213. � Web caching with consistent hashing.[2] Giuseppe DeCandia, et alProceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Dynamo: Amazon's Highly Available Key-Value Store.[3] Rajesh Nishtala, et al NSDI 2013. Scaling Memcache at Facebook.[4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution.

• Backend server to multiple virtual nodes mapping • Even partitioning of the data amongst servers

• Load Redistribution on addition/removal of a backend server

• Replication at N backend servers where N=3 High Availability

Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability.• Works on a locality aware request

distribution infrastructure with multiple front end servers.

• The front-end server to serve a request is selected using round-robin scheduling.

• Front-end server uses two level consistent hash ring to determine the backend server that would serve the request.

• Coordinator server manages addition and removal of nodes.

• Inverted Indices sharded across the backend servers.

• Replication across backend servers to achieve fault tolerance and good availability.

Page 2: Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection

Khoj: A Highly Scalable and Available SearchHarneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi

System Overview

Load Balancing Failure Detection

Architecture

Data Partitioning and Replication

Fault Tolerance Evaluation

References[1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 12031213. � Web caching with consistent hashing.[2] Giuseppe DeCandia, et alProceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Dynamo: Amazon's Highly Available Key-Value Store.[3] Rajesh Nishtala, et al NSDI 2013. Scaling Memcache at Facebook.[4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution.

• Backend server to multiple virtual nodes mapping • Even partitioning of the data amongst servers

• Load Redistribution on addition/removal of a backend server

• Replication at N backend servers where N=3 High Availability

Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability.• Works on a locality aware request

distribution infrastructure with multiple front end servers.

• Clients send requests to the front-end servers using round-robin scheduling.

• Front-end server uses two level consistent hash ring to determine the backend server that would serve the request.

• Coordinator server manages addition and removal of nodes.

• Inverted Indices sharded across the backend servers.

• Replication across backend servers to achieve fault tolerance and good availability.