Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection

Khoj: A Highly Scalable and Available SearchHarneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi

System Overview

Load Balancing Failure Detection

Architecture

Data Partitioning and Replication

Fault Tolerance Evaluation

References[1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 12031213. � Web caching with consistent hashing.[2] Giuseppe DeCandia, et alProceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Dynamo: Amazon's Highly Available Key-Value Store.[3] Rajesh Nishtala, et al NSDI 2013. Scaling Memcache at Facebook.[4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution.

• Backend server to multiple virtual nodes mapping • Even partitioning of the data amongst servers

• Load Redistribution on addition/removal of a backend server

• Replication at N backend servers where N=3 High Availability

Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability.• Works on a locality aware request

distribution infrastructure with multiple front end servers.

• The front-end server to serve a request is selected using round-robin scheduling.

• Front-end server uses two level consistent hash ring to determine the backend server that would serve the request.

• Coordinator server manages addition and removal of nodes.

• Inverted Indices sharded across the backend servers.

• Replication across backend servers to achieve fault tolerance and good availability.

Khoj: A Highly Scalable and Available SearchHarneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi

System Overview

Load Balancing Failure Detection

Architecture

Data Partitioning and Replication

Fault Tolerance Evaluation

References[1 Karger, D.; Sherman, A.; Berkheimer, A.; Bogstad, B.; Dhanidina, R.; Iwamoto, K.; Kim, B.; Matkins, L.; Yerushalmi, Y. (1999). Computer Networks 31 (11): 12031213. � Web caching with consistent hashing.[2] Giuseppe DeCandia, et alProceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. Dynamo: Amazon's Highly Available Key-Value Store.[3] Rajesh Nishtala, et al NSDI 2013. Scaling Memcache at Facebook.[4] Vivek Pai, Guarav Banga, ASPLOS-VIII. Locality-Aware Request Distribution.

• Backend server to multiple virtual nodes mapping • Even partitioning of the data amongst servers

• Load Redistribution on addition/removal of a backend server

• Replication at N backend servers where N=3 High Availability

Khoj is a distributed search engine which combines well known techniques to achieve high scalability and availability.• Works on a locality aware request

distribution infrastructure with multiple front end servers.

• Clients send requests to the front-end servers using round-robin scheduling.

• Front-end server uses two level consistent hash ring to determine the backend server that would serve the request.

• Coordinator server manages addition and removal of nodes.

• Inverted Indices sharded across the backend servers.

• Replication across backend servers to achieve fault tolerance and good availability.

Documents

Khoj: A Highly Scalable and Available Search Harneet Singh, Avinaash Gupta and Krishna Gayatri Kuchimanchi System Overview Load BalancingFailure Detection