Upload
lucidworks
View
487
Download
1
Embed Size (px)
Citation preview
Developing Scalable User Search for PlayStation 4
Ai Sasho [email protected] Sr. So/ware Engineer Sony Interac6ve Entertainment
©2016 Sony Interac6ve Entertainment
About My Team
§ Developing social features for PS4 to improve social gaming experiences. § Worked on User Search and Players You May know recommenda6on
features.
§ Server side: Isaias, Marlon, Pavan, Chris, Janhavi, Xifan, Venkat § Client side: Tomas, Nythya, Max, Yukio, Katsuya, Eric, Tong Sony Interac8ve Entertainment = Sony Network Entertainment Intn’l + Sony Computer Entertainment
= Greatness Awaits!
©2016 Sony Interac6ve Entertainment
©2016 Sony Interac6ve Entertainment
Outline
§ User Search Feature Overview
§ SolrCloud Setup
§ Personalized Search: Lucene + SolrCloud § Challenges
§ Solr4.8 to 5.4 Upgrade
©2016 Sony Interac6ve Entertainment
User Search
©2016 Sony Interac6ve Entertainment
User Search
§ Fast • query should return < 100 ms
§ Reliable / Fault Tolerant § Scalable
• SolrCloud cluster need to handle: o Up to 1000 RPS query requests o Up to 250 RPS indexing requests
• Appr. 300 millions documents
§ Ranking search results by friendship. • Up to n degrees of separation. • Friends, 2nd degree fiends (friends of friends), etc.
©2016 Sony Interac6ve Entertainment
User Search: Requirements
©2016 Sony Interac6ve Entertainment
SolrCloud: System Architecture
ZooKeeper
SolrCloud cluster
Leader
a Replica
Leader
a Replica
Leader
a Replica
Leader
a Replica
ELB Applica6on Servers
Database
§ SolrCloud 5.4 § Documents
• User data (~ 1.5 kb per user) • ID, Online ID, Name (First, Middle, Last), Privacy, User Type, etc.. • ~ 300 million documents
§ Shards • 4 shards + many replicas. • # shards determined experimentally. • Most of the docs on each shard fit in the memory.
§ Cache • Query Result Cache, Document Cache, Filter Cache, etc ..
§ Commit • SoftAutoComit: 5 secs • AutoCommit: 15 mins (OpenSearcher=false)
©2016 Sony Interac6ve Entertainment
SolrCloud: Configurations
§ Tokenizers • Whitespace Tokenizer
§ Filters § Ascii Folding Filter
o Stored and queried with equivalent English alphabets. o Joan Miró -> Joan Miro
§ N-Gram Filter o abc -> a, b, c, ab, bc, abc o Takes up more space, but faster than wildcard (*) when
queried. § Lower Case Filter
©2016 Sony Interac6ve Entertainment
SolrCloud: Configurations
§ People search users they know or they kind of know... § Search results should be ranked by the friendship
between the searcher and the searched (users).
©2016 Sony Interac6ve Entertainment
Personalized Search: Overview
User A <-‐ Friend (1st degree of separa6on) User B <-‐ Friend (1st degree of separa6on) User C <-‐ Friend of Friend (2nd degree of separa6on) ... User Y <-‐ Not associated. User Z <-‐ Not associated.
©2016 Sony Interac6ve Entertainment
Personalized Search: Ideas
q=ps4king& bf=friends:(ID1 or ID2 or ID3 or …)^500& bf=friends2nd:(ID4 or ID5 or ID6 or …)^50& bf=friends3rd:(ID7or ID8 or ID9 or …)^5& …
Possible Solu8on 1 : Query SolrCloud with the list of friend IDs.
Problems • The list of friends can be very long (poten6ally thousands). • Increases the query latency.
Giving a higher boost for users who are closer to the caller.
Possible Solution 2: Index the friendship in SolrCloud. Add “friends“ fields, if the caller is in one of the “friends” fields, boost the document. Problems:
o Too many requests to Solr. o Maintaining friendship in Solr in addition to our database
might be overkill. o Requires a large disk space.
©2016 Sony Interac6ve Entertainment
Personalized Search: Ideas
©2016 Sony Interac6ve Entertainment
Personalized Search: Our Solution
+ Personalized Index
Stores people close to the caller (friends, friends of friends, up to n degrees of separa6on). § Also used in friend recommenda6on
system. § Other team already uses Lucene index for
user owned games.
Global Index
Includes all the users.
©2016 Sony Interac6ve Entertainment
Personalized Search: Lucene + SolrCloud
Online ID First Name
…. Degree of Separa6on
ps4Queen Marge … 1
ps4King Homer ... 1
ps4awesome
Bart … 2
… … … …
Lucene Index (simplified)
Applica6on Server
Friendship Data
§ Lucene index created on-‐demand for the caller
§ Cached temporarily
+
§ Hard to increase the performance using two index systems. (Lucene + SolrCloud) • Tuned SolrCloud a lot (cache size, query optimization,
soft/auto commit settings, GC settings, etc.)
§ Not a problem anymore, but SolrCloud had been unstable for a while. • Entire cluster would have gone down a couple of
times a month.
©2016 Sony Interac6ve Entertainment
Challenges
§ Increased the number of replicas
• When leader goes in recovery, need to have enough replicas to handle all the requests.
§ Reconfigured GC settings with CMS (concurrent mark sweep).
§ Decreased the size of the document query cache.
o Cache warm-up time was longer than the soft auto commit duration -> was always warming the cache.
©2016 Sony Interac6ve Entertainment
Challenges: Instability Solutions
©2016 Sony Interac6ve Entertainment
SolrCloud Upgrade
§ Motivations • Originally Solr 4.8 was used, but due to the instability
issues, upgraded to Solr 5.4.
§ Challeges • Tried to data stream from a Solr 4.8 node to Solr 5.4 by
joining a node, but did not work.
• Some data types have been deprecated. o IntegerType, LongType -> TrieInteger, TrieLong o schema.xml needed to be updated with the new data types. o Decided to full index the 300 million documents in Solr 5.4
cluster.
§ First, query out 300M docs and then full indexing. § Deep paging (specifying start index and limit) is too slow
• Solr needs to cache documents up to the starting index.
§ The logical cursor cusorMark is solution to the deep paging problem.
The cursorMark returns the next cursor as part of the response. § cursorMark is not perfect. Sometimes the cursor stops before the end
of the documents. Could use filter query to query the certain range of documents by ids.
©2016 Sony Interac6ve Entertainment
SolrCloud Upgrade: Full Indexing
...&rows=10&sort=id+asc&cursorMark=AoEjR0JQ
Q & A Any Ques6ons?