The traditional and typical search use case is the one large search collection distributed among many nodes and shared by all users. However, there is a class of applications which need a large number of small or medium collections which can be used, managed and scaled separately. This talk will cover our effort in helping a client set up a large scale SolrCloud setup with thousands of collections running on hundreds of nodes. I will describe the bottlenecks that we found in SolrCloud when running a large number of collections. I will also take you through the multiple features and optimizations that we contributed to Apache Solr to reduce or remove the choke points in the system. Finally, I will talk about the benchmarking process and the lessons learned from supporting such an installation in production.
Text of Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
1. Scaling SolrCloud to a large number of Collections Shalin Shekhar Mangar, Lucidworks Inc. email@example.com twitter.com/shalinmangar
2. Apache Solr has a huge install base and tremendous momentum. SOLRmost widely used search solution on the planet. 8M+ total downloads Solr is both established & growing 250,000+ monthly downloads Solr has tens of thousands of applications in production. You use Solr everyday. Largest community of developers. 2500+open Solr jobs.
3. Solr scalability is unmatched. box.com (Dropbox for business) 10TB+ Index Size 10 Billion+ Documents 100 Million+ Daily Requests
4. Solr scalability is unmatched.
5. The traditional search use-case One large index distributed across multiple nodes A large number of users sharing the data Searches across the entire cluster
6. Example: Product Catalog Must search across all products
7. Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication. ! Goals scalability, performance, high-availability, simplicity, and elasticity What is SolrCloud?
8. Terminology ZooKeeper: Distributed coordination service that provides centralised conguration, cluster state management, and leader election Node: JVM process bound to a specic port on a machine Collection: Search index distributed across multiple nodes with same conguration Shard: Logical slice of a collection; each shard has a name, hash range, leader and replication factor. Documents are assigned to one and only one shard per collection using a hash-based document routing strategy Replica: A copy of a shard in a collection Overseer: A special node that executes cluster administration commands and writes updated state to ZooKeeper. Automatic failover and leader election.
9. Collection with 2 shards across 4 nodes with replication factor 2 iv Jetty (node 2, port 8984) Solr webapp logstash4solr shard1 Replica Java VM iv Jetty (node 1, port 8983) Solr webapp logstash4solr shard1 Leader Java VM iv Jetty (node 4, port 8986) Solr webapp logstash4solr shard2 Replica Java VM iv Jetty (node 3, port 8985) Solr webapp logstash4solr shard2 Leader Java VM Sharding Replication Replication Zookeeper 1 Zookeeper 2 Zookeeper 3 Leader ElectionCentralized conguration management ZooKeeper Ensemble HTTP APIs XML/JSON/CSV/PDF Java/Ruby/Python/PHP Millions of documents, millions of users
10. The limits of the possible can only be dened by going beyond them into the impossible Arthur C. Clarke
11. The curious case of multi-tenant platforms Multi-tenant platform for storage and search Thousands of tenant applications Each tenant application has millions of users
12. One SolrCloud collection per tenant Searches are specialised to a users data or the tenant applications dataset Some tenants create a lot of data, others very little Some use CPU intensive geo-spatial queries, some just perform simple full text searches and sorting Some are write-heavy, others read-heavy Some have text in a different natural language
13. Measure and optimise Analyze and nd missing features Setup a performance testing environment on AWS Devise tests for stability and performance Find bugs and bottlenecks and x em
14. Problem #1: Cluster state and updates The SolrCloud cluster state has information about the collections, their shards and replicas All nodes and (Java) clients watch the cluster state Every state change is notied to all nodes Limited to (slightly less than) 1MB by default 1 node bounce triggers a few 100 watcher res and pulls from ZK for a 100 node cluster (three states: down, recovering, active)
15. Solution - Split cluster state and scale Each collection gets its own state node in ZK Nodes selectively watch only those states which they are a member of Clients cache state and use smart cache updates instead of watching nodes http://issues.apache.org/jira/browse/SOLR-5473
16. Problem #2: Overseer performance Thousands of collections create a lot of state updates Overseer falls behind and replicas cant recover or cant elect a leader Under high indexing/search load, GC pauses can cause overseer queue to back up
17. Solution - Improve the overseer Harden the overseer code against ZooKeeper connection loss (SOLR-5325) Optimise polling for new items in overseer queue (SOLR-5436) Dedicated overseers nodes (SOLR-5476) New Overseer Status API (SOLR-5749) Asynchronous execution of collection commands (SOLR-5477, SOLR-5681)
18. Problem #3: Moving data around Not all users are born equal - A tenant may have a few very large users We wanted to be able to scale an individual users data maybe even as its own collection SolrCloud can split shards with no downtime but it only splits in half No way to extract users data to another collection or shard
19. Solution: Improved data management Shard can be split on arbitrary hash ranges (SOLR-5300) Shard can be split by a given key (SOLR-5338, SOLR-5353) A new migrate API to move a users data to another (new) collection without downtime (SOLR-5308)
20. Problem #4: Exporting data Lucene/Solr are designed for nding top-N search results Trying to export full result set brings down the system due to high memory requirements as you go deeper
21. Solution - Distributed deep paging New cursorMark feature for deep paging (SOLR-5463)
22. twitter.com/UweSays The JVM is completely irresponsible and can only be killed with kill -9 JVM Bugs!
23. Testing scale at scale Performance goals: 6 billion documents, 4000 queries/ sec, 400 updates/sec, 2 seconds NRT sustained performance 5% large collections (50 shards), 15% medium (10 shards), 85% small (1 shard) with replication factor of 3 Target hardware: 24 CPUs, 126G RAM, 7 SSDs (460G) + 1 HDD (200G) 80% trafc served by 20% of the tenants
24. How to manage large SolrCloud clusters Developed Solr Scale Toolkit Fabric based tool to setup and manage SolrCloud clusters in AWS complete with collectd and SiLK Backup/Restore from S3. Parallel clone commands. Open source! https://github.com/LucidWorks/solr-scale-tk
25. Gathering metrics and analysing logs LucidWorks SiLK (Solr + Logstash + Kibana) collectd daemons on each host rabbitmq to queue messages before delivering to log stash Initially started with Kafka but discarded thinking it is overkill Not happy with rabbi