84
Solr & Lucene at Etsy Gregg Donovan [email protected] Technical Lead, Search

Gregg Donovan @Lucenerevolution 2011

Embed Size (px)

DESCRIPTION

Solr & Lucene at Etsy

Citation preview

Page 1: Gregg Donovan @Lucenerevolution 2011

Solr & Lucene at EtsyGregg Donovan

[email protected] Lead, Search

Page 2: Gregg Donovan @Lucenerevolution 2011

1.5 years Solr & Lucene at Etsy.com

3 years Solr & Lucene at TheLadders.com

Page 3: Gregg Donovan @Lucenerevolution 2011
Page 4: Gregg Donovan @Lucenerevolution 2011

8+ million members

Page 5: Gregg Donovan @Lucenerevolution 2011

9.3 million items

Page 6: Gregg Donovan @Lucenerevolution 2011

800k+ active sellers

Page 7: Gregg Donovan @Lucenerevolution 2011

1+ billion pageviews / month

Page 8: Gregg Donovan @Lucenerevolution 2011
Page 9: Gregg Donovan @Lucenerevolution 2011
Page 10: Gregg Donovan @Lucenerevolution 2011
Page 11: Gregg Donovan @Lucenerevolution 2011
Page 12: Gregg Donovan @Lucenerevolution 2011
Page 13: Gregg Donovan @Lucenerevolution 2011
Page 14: Gregg Donovan @Lucenerevolution 2011

Maximize Solr out-of-the-box

Page 15: Gregg Donovan @Lucenerevolution 2011

Hack at a low-level

Page 16: Gregg Donovan @Lucenerevolution 2011

Know when to do each

Page 17: Gregg Donovan @Lucenerevolution 2011
Page 18: Gregg Donovan @Lucenerevolution 2011

Or

Page 19: Gregg Donovan @Lucenerevolution 2011
Page 20: Gregg Donovan @Lucenerevolution 2011

Don’t fear trunk

Page 21: Gregg Donovan @Lucenerevolution 2011

builds.apache.org/job/Solr-trunk/changes

Page 22: Gregg Donovan @Lucenerevolution 2011
Page 23: Gregg Donovan @Lucenerevolution 2011
Page 24: Gregg Donovan @Lucenerevolution 2011

http://localhost:8393/solr/placesuggest/select?

q={!lucene}s*&sfield=latlong&pt=37.595804,-122.364521

&sort=div(geodist(),sqrt(sum(population,50)))%20asc

Page 25: Gregg Donovan @Lucenerevolution 2011

{!lucene}

{!field}

{!func}

{!dismax}

{!edismax}

{!boost}

{!term}

Page 26: Gregg Donovan @Lucenerevolution 2011

Cheap ranking awesomeness

Page 27: Gregg Donovan @Lucenerevolution 2011
Page 28: Gregg Donovan @Lucenerevolution 2011

ExternalFileField ftw!

Page 29: Gregg Donovan @Lucenerevolution 2011

schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0" stored="false" indexed="true" class="solr.ExternalFileField" valType="float"/> <field name="hotness" type="file"/>

/search/data/treasury/external_hotness.1306390802088:1=2.32=1.73=1.1

Solr query:sort={!func}hotness+desc

Page 30: Gregg Donovan @Lucenerevolution 2011

ExternalFileField caveats

Page 31: Gregg Donovan @Lucenerevolution 2011

More relevance: boost query

Page 32: Gregg Donovan @Lucenerevolution 2011

http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&rel=category:furniture^10+OR+((-material:acrylic)^5)&qq=desk

Page 33: Gregg Donovan @Lucenerevolution 2011

Impression tracking

Page 34: Gregg Donovan @Lucenerevolution 2011

etsy.com/search?q=desk&explain=1

Page 35: Gregg Donovan @Lucenerevolution 2011

Side-by-Side testing

Page 36: Gregg Donovan @Lucenerevolution 2011
Page 37: Gregg Donovan @Lucenerevolution 2011

Cheap performance wins

Page 38: Gregg Donovan @Lucenerevolution 2011

Put off sharding till you must

Page 39: Gregg Donovan @Lucenerevolution 2011

cat ${indexDir}/* > /dev/null

Page 40: Gregg Donovan @Lucenerevolution 2011

Return IDs, minimize stored fields

Page 41: Gregg Donovan @Lucenerevolution 2011

RAM: $10-20 / GB

Page 42: Gregg Donovan @Lucenerevolution 2011

SSD: 0.1ms vs 10ms seek

Page 43: Gregg Donovan @Lucenerevolution 2011

Custom?

Page 44: Gregg Donovan @Lucenerevolution 2011

solr-user

Page 45: Gregg Donovan @Lucenerevolution 2011

Tools for low-level hacking

Page 46: Gregg Donovan @Lucenerevolution 2011

Continuous deployment

Page 47: Gregg Donovan @Lucenerevolution 2011
Page 48: Gregg Donovan @Lucenerevolution 2011

One button. So easy a dog could do it.

Page 49: Gregg Donovan @Lucenerevolution 2011
Page 50: Gregg Donovan @Lucenerevolution 2011
Page 51: Gregg Donovan @Lucenerevolution 2011

MTTR > MTBF

Page 52: Gregg Donovan @Lucenerevolution 2011
Page 53: Gregg Donovan @Lucenerevolution 2011
Page 54: Gregg Donovan @Lucenerevolution 2011

github.com/etsy/logster

Page 55: Gregg Donovan @Lucenerevolution 2011

Tracking GC

Page 56: Gregg Donovan @Lucenerevolution 2011

export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -Xloggc:/var/log/search/gc.log"

Page 57: Gregg Donovan @Lucenerevolution 2011
Page 58: Gregg Donovan @Lucenerevolution 2011
Page 59: Gregg Donovan @Lucenerevolution 2011

Alerting

Page 60: Gregg Donovan @Lucenerevolution 2011

Testing

Page 61: Gregg Donovan @Lucenerevolution 2011
Page 62: Gregg Donovan @Lucenerevolution 2011

SaveAsFixture

Page 63: Gregg Donovan @Lucenerevolution 2011

Profiling

Page 64: Gregg Donovan @Lucenerevolution 2011

Java Primitive Library

fastutil

trove4j

Page 65: Gregg Donovan @Lucenerevolution 2011

Know the hooks

QParserPlugin

SolrEventListener

SolrRequestHandler

SearchComponent

SolrCache

ValueSourceParser

Page 66: Gregg Donovan @Lucenerevolution 2011

SolrIndexSearcher gotchasreference counting

using it as a cache key:

WeakHashMap<SolrIndexSearcher,MyValue> myCache...

Page 67: Gregg Donovan @Lucenerevolution 2011

Example:personalized collections

Page 68: Gregg Donovan @Lucenerevolution 2011
Page 69: Gregg Donovan @Lucenerevolution 2011

fq={!term f=id}123 OR {!term f=id}456

Page 70: Gregg Donovan @Lucenerevolution 2011

Need a map of PK to docId

Page 71: Gregg Donovan @Lucenerevolution 2011

Use custom SolrCache plus SolrEventListener to fill it

Page 72: Gregg Donovan @Lucenerevolution 2011

github.com/giokincade/FastTermFilter

Page 73: Gregg Donovan @Lucenerevolution 2011

i18n currency sorting and filtering

Page 74: Gregg Donovan @Lucenerevolution 2011
Page 75: Gregg Donovan @Lucenerevolution 2011

currency.xml:

<currencyConfig version="1.0">! <currencies>! ! <currency name="United States Dollar" symbol="$" code="USD"/>! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>...! </currencies>! <rates>! ! <rate from="USD" to="AUD" rate="1.168750"/>! ! <rate from="USD" to="CAD" rate="1.085000"/>! ! <rate from="USD" to="CZK" rate="20.107500"/>! ! <rate from="USD" to="DKK" rate="5.323750"/>... </rates></currencyConfig>

Page 76: Gregg Donovan @Lucenerevolution 2011

price:[10.00USD to 50.00USD]

price:20.00EUR

price:[$10.00 to $50.00]

Page 77: Gregg Donovan @Lucenerevolution 2011

MoneyFieldType.java:

@Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency);

if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have same currency.")); }

String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser);

return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }

Page 78: Gregg Donovan @Lucenerevolution 2011

Replication gotcha

Page 79: Gregg Donovan @Lucenerevolution 2011

SOLR-2202

Page 80: Gregg Donovan @Lucenerevolution 2011

Related Searches

Page 81: Gregg Donovan @Lucenerevolution 2011

Autosuggest!

Page 82: Gregg Donovan @Lucenerevolution 2011

bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely

Page 83: Gregg Donovan @Lucenerevolution 2011

The TermDictionary is not a whitelist

Page 84: Gregg Donovan @Lucenerevolution 2011