Upload
lucenerevolution
View
780
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presented by James Atherton, Search Team Lead, 7digital A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.
Citation preview
Implementing Search with Solr at 7digital
James Atherton Content Discovery Team Lead
Implementing Search
with Solr
James Atherton
Content Discovery Lead
@mr_road
Who is 7digital?
Online digital content provider
Covering over 47 territories
Online music store: www.7digital.com
API: api.7digital.com
We power a number of music services:
Samsung
Blackberry
Turntable.fm
Pure
Where we came from...
SQL Searches
SELECT *
FROM <table>
WHERE name LIKE '<search_term>%';
This was SLOW and BAD!!
Wrapped Solr in an API
Old Architecture
API
DB
Domain Objects
Artist Documents
Release Documents (e.g. album or single)
Track Documents
First Attempt - 2011
• Artists and Releases
• Solr 1.4
• 17 stores
• ~40GB
• Dropped DIH as it had issues
2011 Architecture
HTTP
API
Search
API Solr
DB
Solr
Tracks
Artists
Releases
2012
• Added Tracks Core
• Solr 3.5
• 47 stores
• ~400GB
• More than 430 M docs
• Didn't revisit DIH
Current Architecture
HTTP
API Search
API
Artist/Release
Solrs
Track Solrs Track Solrs
Track Solrs Track Solrs
Artist/Release
Solrs
Things Learnt
We should have split by <X>; for us Shops.
Beware Inflection Points
Data size: 400GB != 40GB * 10
Throughput: 600 rpm IS NOT 4 * 150 rpm
What we want in our servers?
RAM ?
Fast Disks?
CPUs?
Virtual?
Bare Metal?
Optimize really...?
Cache Warming/First search?
Testing
Test ingestion/data import, then test again
Your data is not as clean as you think
Load test early and often
We need to be better at this still
Logs
Logging is worth its weight in gold
But don't get weighed down
Monitoring
We use statsd/graphite and NewRelic:
Visualise Indexing
Which territory's data has been indexed?
Instant Search
Magic Deploys
We recently adopted CFEngine, it is awesome!!
The Future
HTTP
API Search
API
Artist Solrs
Track
Solrs
Track
Solrs
Track
Solrs
Release
Solrs
Track Solrs Solr Cloud, in
the Cloud??
Questions
?
James Atherton
@mr_road
@7digital
We are recruiting, please talk to me afterwards.
Resources
https://github.com/etsy/statsd/
https://github.com/7digital
http://d3js.org/
James Atherton
@mr_road
@7digital