Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & Ken LaPorte, Bloomberg

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>O C T O B E R 1 1 - 1 4 , 2 0 1 6 B O S T O N , M A </p></li><li><p>Building a Vibrant Search Ecosystem @ Bloomberg </p><p>Steven Bower &amp; Ken LaPorte </p><p>Copyright 2016 Bloomberg Finance L.P. All rights reserved. </p></li><li><p>3 </p><p>01 Bloomberg Largest provider of financial news and information Our strength is quickly and accurately delivering data, news and analytics Creating high performance and accurate information retrieval systems is core to our </p><p>strength </p></li><li><p>4 </p><p>02 Why are we giving this talk? </p></li><li><p>5 </p><p>01 What came before </p><p> Search has been around for a long time at Bloomberg - Rapid delivery of product to clients - Proprietary, commercial and open-source search technologies </p><p> Fragmented solutions - Disparate search technologies - Custom code - Deployment patterns - Lack of standards </p><p> Costly to maintain &amp; evolve </p></li><li><p>6 </p><p>01 How We Got Started Created a team to specialize in search Reviewed existing applications reliant upon search Selected a set of representative applications </p><p>- Various scales - Data types - Distinct requirements </p></li><li><p>7 </p><p>01 Why Solr? Evaluated other open source search engines </p><p>- Already used at Bloomberg Large community &amp; widely used Established &amp; growing feature set Scalable Committed to open source </p><p>- Ability to contribute to core engine - Ability to fix bugs ourselves - Contributions in almost every Solr release since 4.5.0 - 3 Solr committers at the company </p></li><li><p>8 </p><p>01 Search as a service Designed platform with application teams Middleware service to wrap Solr </p><p>- Familiar &amp; lightweight interface - Simplified APIs - Insulate clients from changes in Solr </p><p> Pass-thru capability Basic monitoring/metrics </p></li><li><p>9 </p><p>01 Open for business! </p><p> Hundreds of search applications - Diverse use cases and scale - Displaced other technologies </p><p> &gt;10 Billion documents &gt;10 Million new documents daily &gt;4000 Solr instances &gt;100s of servers &gt;2,000 of queries per second Mission critical to Bloomberg and the financial markets </p><p>0</p><p>50</p><p>100</p><p>150</p><p>200</p><p>250</p><p>300</p><p>2012</p><p>Num</p><p>ber </p><p>of C</p><p>olle</p><p>ctio</p><p>ns</p><p>Time</p><p>Number of Collections over Time</p><p>2016</p></li><li><p>10 </p><p>01 What have we done?! Human scaling Ineffective Alarming Manual build process </p><p>- Limited automated testing Configuration Management Lots of known unknowns </p></li><li><p>11 </p><p>01 Challenge: EcoSystem </p><p> Ownership - Wheres the line? </p><p> Planning for scale Education </p><p>- Search != Database - Data types (text parsing) - Relevance - Features </p></li><li><p>12 </p><p>01 Solution: Ecosystem Survey </p><p>- Understand business requirements - Identify scale and complexity - Assist with schema and query design - Concerns </p><p> Develop &amp; Test - Best practices - Documentation &amp; code samples - Office hours &amp; support chat - Community development </p></li><li><p>13 </p><p>01 Solution: Ecosystem Validate &amp; Deploy </p><p>- Hardware provisioning - Automated deployments - Hot &amp; cold collections - Load testing </p><p> Maintain and Grow - Applications change &amp; grow - Solr &amp; platform upgrades - Monitoring </p></li><li><p>14 </p><p>01 Challenge: Monitoring Solr Very large monitoring footprint What should we monitor? </p><p>- Ping - Cluster state - Process state - Server health </p><p> False alarms - Flutter - Solr can lie to you! (SOLR-8599) </p><p> Many different ways to view system health - Different people care about different things - Active vs Forensic </p></li><li><p>15 </p><p>01 Solution: Monitoring Solr Monitor via multiple mechanisms Aggregate events </p><p>- Alarm on multiple signals - Delay alarms </p><p> Niteowl - Solr / ZooKeeper / Generic - Distributed / Scalable - Events indexed into Solr </p><p> Led to massive stability improvements </p></li><li><p>16 </p><p>01 What We Found Long Garbage Collections </p><p>- Profiler interactions with Mmap - Young generation pressure during ingest - Use G1GC / Keep heap small </p><p> Long Recovery Times - Transaction logs dont hold enough - Always doing full replications when under ingest load </p><p> Solr Bugs Out of Memory Exceptions </p><p>- One off OOMs are not uncommon - Use DocValues! - OOM Killer </p><p>SOLR-9310SOLR-9207SOLR-9506</p><p>Long recovery times</p><p>SOLR-6931 Random connection reset issues</p><p>SOLR-8085 Replicas get out of sync</p><p>SOLR-8599 ZooKeeper client in inconsistent state</p></li><li><p>17 </p><p>01 Challenge: Configuration Management </p><p> Deployment process Requires versioning / rollback </p><p>- Some changes cannot be rolled back Template driven configuration </p><p>- Good for simple things - Doesnt scale for complex collections </p><p> Lack of provenance </p></li><li><p>18 </p><p>01 Solution: Configuration Management Convert to SDLC process </p><p>- Configurations live in Git repository - Solr extensions linked as dependencies - Built with Maven / Jenkins - Published to artifact repository </p><p> Validation of configurations during build - Static Analysis </p><p> Allowed schema changes Access control of solr configuration </p><p>- Integration testing </p><p> Deployed to ZooKeeper / Solr </p></li><li><p>19 </p><p>01 Challenge: Infrastructure Substantial demand Large lead times Differing requirements </p><p>- Security - Scale - Control </p><p> Too many pets! </p></li><li><p>20 </p><p>01 Solution: Infrastructure Streamlined process Shared and dedicated resources Built from the ground up </p><p>- Well defined layers of abstraction - Cattle not pets - Infrastructure-as-code - SDLC / provenance </p><p> Better hardware == better experience - SSDs - More RAM - Faster network </p><p>Hardware / OS</p><p>Control Plane</p><p>Applications</p><p>APIs</p></li><li><p>21 </p><p>01 Whats next? Containerization </p><p>- Simplify / decentralize operational procedures - Local testing and development - Security / Metrics / QoS </p><p> Delegation of control - Mute / Direct alarms to tenants - Tenant managed </p><p> Detect failures before they happen - Heuristics / ML models </p><p> Solr - More work on streaming - Analytics </p><p> distributed analytics pivot faceting </p></li><li><p>Building a Vibrant Search Ecosystem @ Bloomberg </p><p>QUESTIONS? </p><p>Steven Bower </p><p>Ken LaPorte </p></li></ul>