22

Webinar: Search and Recommenders

Embed Size (px)

Citation preview

Page 1: Webinar: Search and Recommenders
Page 2: Webinar: Search and Recommenders

2016

OCTOBER 11-14BOSTON, MA

http://lucenerevolution.com

Page 3: Webinar: Search and Recommenders

Search and Recommenders

Grant Ingersoll

@gsingers

CTO, Lucidworks

Jake Mannix

@pbrane

Lead Data Engineer, Lucidworks

Page 4: Webinar: Search and Recommenders

• Vision, motivations and definitions

• Use cases for ecommerce, compliance, fraud and customer support

• Fusion and the evolution of recommenders

• Demo

• Future Directions

Agenda

Page 5: Webinar: Search and Recommenders

Search-Driven Everything

Customer Service

Customer Insights

Fraud Surveillance

Research Portal

Online Retail Digital Content

Page 6: Webinar: Search and Recommenders

• Many companies treat search, recommendations/discovery and analytics as different beasts, yet:

• The same inputs that make search better can also drive recommendations and better analytics

• Engagement analytics is the key:

• Your users give you engagement signals regarding the content that is relevant to them

• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”)

• These signals are often the biggest factor in both search relevance AND recommendations

• In the enterprise, this is still the case, but the types of signals are often different (email, IM)

Three Sides of the Same Coin

Page 7: Webinar: Search and Recommenders

• Content — documents which are textually similar are often good as “similar items” to be recommended

• Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way

• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one!

Defining Moments

Page 8: Webinar: Search and Recommenders

Search-Driven Online Retail

 Increase conversions with a personalized shopping experience with

best in class reliability and performance.

CATALOG

DYNAMIC NAVIGATION AND LANDING PAGES

INSTANT INSIGHTS AND ANALYTICS

PERSONALIZED SHOPPING EXPERIENCE

PROMOTIONS USER HISTORY

Data Acquisition

Data Processing

Smart Access API

Page 9: Webinar: Search and Recommenders

Search-Driven Compliance and Surveillance

Detect and investigate activity for regulatory compliance, from one

unified view.

DATABASE

ACCURATE REAL-TIME INFORMATION

CONTEXTUALLY-ENRICHED

INFORMATION

MESSAGESLOGS

DATA EXPLORATION AND VISUALIZATION

Data Acquisition

Indexing & Streaming

Smart Access API

Page 10: Webinar: Search and Recommenders

Search-Driven Customer Service

Resolve customer issues quickly with immediate access to relevant answers.

CUSTOMER SELF-SERVICE

KNOWLEDGE BASE

PROACTIVE ALERTS AND RECOMMENDATIONS

EXPERT TUNED RELEVANCY DRIVEN BY

ANALYTICS AND INSIGHTS

CRM SUPPORT TICKETS & ISSUE TRACKING

Data Acquisition

Data Processing

Smart Access API

Page 11: Webinar: Search and Recommenders

Fusion and Recommenders

Page 12: Webinar: Search and Recommenders

Lucidworks Fusion Is Search-Driven Everything

• Drive next generation relevance via Content, Collaboration and Context

• Harness best in class Open Source: Apache Solr + Spark

• Simplify application development and reduce ongoing maintenance

CATALOG

DYNAMIC NAVIGATION AND LANDING PAGES

INSTANT INSIGHTS AND ANALYTICS

PERSONALIZED SHOPPING EXPERIENCE

PROMOTIONS USER HISTORY

Data Acquisition

Indexing & Streaming

Smart Access API

Recommendations & Alerts

Analytics & InsightsExtreme Relevancy

Access data from anywhere to build intelligent, data-driven applications.

Page 13: Webinar: Search and Recommenders

Fusion Architecture

REST

API

Worker Worker Cluster Mgr.

Apache Spark

Shards Shards

Apache Solr

HD

FS (O

ptio

nal)

Shared Config Mgmt

Leader Election

Load Balancing

ZK 1

Apache Zookeeper

ZK N

DATABASEWEBFILELOGSHADOOP CLOUD

Connectors

Alerting/Messaging

NLP

Pipelines

Blob Storage

Scheduling

Recommenders/Signals

Core Services

Admin UI

SECURITY BUILT-IN

Lucidworks View

Page 14: Webinar: Search and Recommenders

• Fusion

• Recommenders API

• Machine Learning pipeline stages

• Scheduling

• Solr:

• More Like This + Signals

• Spark:

• MLlib, Mahout, custom

Key Platform Tech

Page 15: Webinar: Search and Recommenders

• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:

• Extracts nontrivial terms from specified fields in it

• Builds an “OR” query to search for closest matches (like a cosine similarity computation)

• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query

• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V

Content-focused

{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>

Page 16: Webinar: Search and Recommenders

“People who bought X also bought Y” / “Movies recommended for you”

Collaborative Filtering

Search User/Item Index

Top K users who’ve

interacted with this Item

Search and Rollup on User/

Item Index

Top Y docs

Current DocFilter by context Profit

User/Item Index

Offline Tasks

User/Item Signals

Math!

Page 17: Webinar: Search and Recommenders

• Fusion CF-based “documents like this” pipeline stages:

• Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight)

• Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )

• Roll-up: topN(sum(score_i * weight_i))

• Sub-query: fetch the documents from primary Solr index of these top N doc_ids

Collaborative Filtering: step by step in Fusion

Page 18: Webinar: Search and Recommenders

• Both content-based and CF recommenders use features of the documents to generate a similarity metric

• Content uses the tokens in the document

• CF uses user ids who have engaged with it

• Metrics can be weighted-summed, allowing a “slider” between the two

• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix

• There is a cost to such techniques: harder to maintain, harder to A/B test variations

Multi-modal

Page 19: Webinar: Search and Recommenders

• Basics:

• 26 Apache Projects registered so far plus LW web properties

• 93 datasources* including email, Github, JIRA*, Website and Wiki

• Fusion 2.4

• Signals everywhere

• UI based on Lucidworks View

• ASF Mail archives mirrored at: http://asfmail.lucidworks.io

Demo

http://searchhub.lucidworks.com

Page 20: Webinar: Search and Recommenders

Implementation Details

http://github.com/lucidworks/searchhub

Branch: GH-28-doc-view

Key Source Code

UI

Angular Directives:

perdocument

recommendations

Offline Tasks

Spark Jobs:

mail_thread_signal_creation_job.json

SimpleTwoHopRecommender.scala

Fusion PipelinesQuery:

lucidfind-recommendations

cf-similar-items-batch-rec

cf-similar-items-rec

Page 21: Webinar: Search and Recommenders

• Ensemble and Click-based approaches

• https://github.com/lucidworks/searchhub/issues/40

• https://github.com/lucidworks/searchhub/issues/28

• https://github.com/lucidworks/searchhub/issues/22

• Deploy live

• User registrations

• https://github.com/lucidworks/searchhub/issues/30

Future Work

Page 22: Webinar: Search and Recommenders

Resources

Fusion: http://www.lucidworks.com/products/fusion

Search Hub: http://searchhub.lucidworks.com

Company: http://www.lucidworks.com

Our blog: http://www.lucidworks.com/blog

Twitter: @gsingers, @pbrane