Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search
Toan Vinh Luu, PhD Senior Search Engineer local.ch AG
In this talk
• Autosuggestion feature
• Autosuggestion architecture
• Evaluation
local.ch • Local search engine in Switzerland (web, mobile)
• Each month: – > 4 millions unique users – > 8 millions queries on mobile (iOS, android,…)
• Users search for: – Services (e.g “restaurant zurich”) – Resident information (e.g “peter meier”) – Phone number (e.g. 0800 86 80 86) – Addresses, point of interest – ...
Why autosuggestion is important?
User taps on the phone 8 times instead of 34 times to get to the result list when searching for “Electric installation Wallisellen”
What should we suggest to user?
Popular data suggestion
Popular queries suggestion
>2000 queries/month for “cablecom” which have only 1 entry
“mc donalds” has less entries than “muller” but is queried >10x
Query history suggestion • 9% mobile queries are
historical queries.
• 38% users search by a query in the past
Spellchecker suggestion >700’000 mistakes per month on mobile (9%)
Detail entry suggestion
Special information suggestion
Autosuggestion Architecture
Aut
osug
gest
AP
I/S
earc
h A
PI SuggestData
component
Query history component
Popular query component
Spellchecker component
Index
Index
Index
Index
Index
Query log
Popular query processor
Local.ch Database
How do we process “popular queries” • Popular is just not high frequency!
• User’s language – 4 languages are used in Switzerland. Fail if we suggest “bäckerei” for a French
speaking user
• Location – Fail if we suggest a hospital in Zurich for an user in Geneva
• Misspell – Fail if we suggest “zürich” and “züruch”
• Unique users – Fail if we suggest “toan” just because I searched my name thousands of times
• Blacklist – Fail if we suggest “f**k”, “pe**is”
Popular query processor • Preprocessing query log:
– Text normalization, stopword, blacklist, keep only queries return results…
• A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07”, "user": “eeaad0c09abc41676c1c99530693”
}
Find candidate popular queries for each language
{ "query" : { "query_string" : { "query" : "language:%s AND date:[%s TO %s] AND
-q.untouched:/[0].*/” % (language, fromDate, toDate) } }, ”aggs" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR } } } }
Find number of unique users given a query
{ "query" : { "query_string" : { "query" : "q.untouched:%s AND date:[%s TO %s]” % (query, fromDate, toDate) } },
"aggs": { "num_users": { "cardinality": { "field": "user" } } } }
Bounding box to limit popular queries given location
0
50
100
150
200
250
300
5.9
5
6.0
5
6.1
5
6.2
5
6.3
5
6.4
5
6.5
5
6.6
5
6.7
5
6.8
5
6.9
5
7.0
5
7.1
5
7.2
5
7.3
5
7.4
5
7.5
5
7.6
5
7.7
5
7.8
5
7.9
5
8.0
5
8.1
5
8.2
5
8.3
5
8.4
5
8.5
5
8.6
5
8.7
5
8.8
5
8.9
5
9.0
5
9.1
5
9.2
5
9.3
5
9.4
5
9.5
5
9.6
5
9.7
5
9.8
5
9.9
5
10.0
5
10.1
5
10.2
5
10.3
5
10.4
5
90% Popular query: Chuv (Centre Hospitalier Universitaire Vaudois)
45.81 45.88 45.95 46.02 46.09 46.16 46.23 46.3
46.37 46.44 46.51 46.58 46.65 46.72 46.79 46.86 46.93
47 47.07 47.14 47.21 47.28 47.35 47.42 47.49 47.56 47.63 47.7
47.77
5.9
5
6.0
4
6.1
3
6.2
2
6.3
1
6.4
6.4
9
6.5
8
6.6
7
6.7
6
6.8
5
6.9
4
7.0
3
7.1
2
7.2
1
7.3
7.3
9
7.4
8
7.5
7
7.6
6
7.7
5
7.8
4
7.9
3
8.0
2
8.1
1
8.2
8.2
9
8.3
8
8.4
7
8.5
6
8.6
5
8.7
4
8.8
3
8.9
2
9.0
1
9.1
9.1
9
9.2
8
9.3
7
9.4
6
9.5
5
9.6
4
9.7
3
9.8
2
9.9
1
10
10.0
9
10.1
8
10.2
7
10.3
6
10.4
5
Histogram of query “chuv” based on freq, longitude and latitude
46.5243,6.6397
46.52,6.63
46.53,6.64
Percentiles aggregation to find min, max value of querying location
"query" : { "match" : {"q" : {"query" :”chuv”}}
}, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95] } }, "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95] } } }
Popular query stored in Solr index
{ "q": "chuv", "lang": ["de”,"fr”, "en”], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524
}
Solr request to suggest popular query
q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc
Evaluation
• Several metrics are used to evaluate autosuggestion feature – Number of typed characters to get to result list
• Average length of input: 10.0 chars • Average length of clicked suggestion: 15.4 chars
– Number of clicks on suggested items – Average rank of clicked item
Number of clicks on suggested items since new feature release
Release date
0
0.5
1
1.5
2
2.5
Average rank of clicked item
Release new query suggestion
Conclusion
• We can combine 2 search frameworks to bring better search experience to user:
– Solr is efficient for querying, faceting and caching
– Elasticsearch is efficient for big data aggregation and query log storing
Contact information
• Search team at local.ch – [email protected] – [email protected] – [email protected]
• We are hiring a search engineer! – Contact: [email protected]