Realtime Search at Twitter - Michael Busch

  • View
    876

  • Download
    1

Embed Size (px)

DESCRIPTION

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 At Twitter we serve more than 1.5 billion queries per day from Lucene indexes, while appending more than 200 million tweets per day in realtime. Additionally we recently launched image, video and relevance search on the same engine. This talk will explain the changes we made to Lucene to support this high load and the changes and improvements we made in the last year.

Text of Realtime Search at Twitter - Michael Busch

  • 1. Tweets per day
  • 2. Queries per day
  • 3. Indexing latency
  • 4. Avg. query response time
  • 5. Earlybird - Realtime Search @twitterMichael Busch@michibuschmichael@twitter.combuschmi@apache.org
  • 6. Earlybird - Realtime Search @twitter Agenda Introduction - Search Architecture - Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 7. Introduction
  • 8. Introduction Twitter acquired Summize in 2008 1st gen search engine based on MySQL
  • 9. Introduction Next gen search engine based on Lucene Improves scalability and performance by orders or magnitude Open Source
  • 10. Realtime Search @twitter Agenda - Introduction Search Architecture - Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 11. Search Architecture
  • 12. Search Architecture Tweets Ingester Ingester pre-processes Tweets for search Geo-coding, URL expansion, tokenization, etc.
  • 13. Search Architecture Tweets Thrift Ingester MySQL Master MySQL Slaves Tweets are serialized to MySQL in Thrift format
  • 14. Earlybird Tweets Thrift Ingester MySQL Master MySQL Earlybird Slaves Index Earlybird reads from MySQL slaves Builds an in-memory inverted index in real time
  • 15. Blender Thrift Thrift Blender Earlybird Index Blender is our Thrift service aggregator Queries multiple Earlybirds, merges results
  • 16. Realtime Search @twitter Agenda - Introduction - Search Architecture Inverted Index 101 - Memory Model & Concurrency - Top Tweets
  • 17. Inverted Index 101
  • 18. Inverted Index 101 1 The old night keeper keeps the keep in the town 2 In the big old house in the big old gown. 3 The house in the town had the big old keep 4 Where the old night keeper never did sleep. 5 The night keeper keeps the keep in the night 6 And keeps in the dark and sleeps in the light.Table with 6 documentsExample from:Justin Zobel , Alistair Moffat,Inverted les for text search engines,ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006
  • 19. Inverted Index 1011 The old night keeper keeps the keep in the town term freq2 In the big old house in the big old gown. and 1 big 2 3 The house in the town had the big old keep dark 1 4 Where the old night keeper never did sleep. did 1 5 The night keeper keeps the keep in the night gown 1 6 And keeps in the dark and sleeps in the light. had 1 house 2 Table with 6 documents in 5 keep 3 keeper 3 keeps 3 light 1 never 1 night 3 old 4 sleep 1 sleeps 1 the 6 town 2 where 1 Dictionary and posting lists
  • 20. Inverted Index 101 Query: keeper1 The old night keeper keeps the keep in the town term freq2 In the big old house in the big old gown. and 1 big 2 3 The house in the town had the big old keep dark 1 4 Where the old night keeper never did sleep. did 1 5 The night keeper keeps the keep in the night gown 1 6 And keeps in the dark and sleeps in the light. had 1 house 2 Table with 6 documents in 5 keep 3 keeper 3 keeps 3 light 1 never 1 night 3 old 4 sleep 1 sleeps 1 the 6 town 2 where 1 Dictionary and posting lists
  • 21. Inverted Index 101 Query: keeper1 The old night keeper keeps the keep in the town term freq2 In the big old house in the big old gown. and 1 big 2 3 The house in the town had the big old keep dark 1 4 Where the old night keeper never did sleep. did 1 5 The night keeper keeps the keep in the night gown 1 6 And keeps in the dark and sleeps in the light. had 1 house 2 Table with 6 documents in 5 keep 3 keeper 3 keeps 3 light