Webinar: Search and Recommenders

2016

OCTOBER 11-14BOSTON, MA

http://lucenerevolution.com

http://lucenerevolution.com

Search and Recommenders

Grant Ingersoll

@gsingers

CTO, Lucidworks

Jake Mannix

@pbrane

Lead Data Engineer, Lucidworks

• Vision, motivations and definitions

• Use cases for ecommerce, compliance, fraud and customer support

• Fusion and the evolution of recommenders

• Demo

• Future Directions

Agenda

Search-Driven Everything

Customer Service

Customer Insights

Fraud Surveillance

Research Portal

Online Retail Digital Content

• Many companies treat search, recommendations/discovery and analytics as different beasts, yet:

• The same inputs that make search better can also drive recommendations and better analytics

• Engagement analytics is the key:

• Your users give you engagement signals regarding the content that is relevant to them

• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”)

• These signals are often the biggest factor in both search relevance AND recommendations

• In the enterprise, this is still the case, but the types of signals are often different (email, IM)

Three Sides of the Same Coin

• Content — documents which are textually similar are often good as “similar items” to be recommended

• Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way

• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one!

Defining Moments

Search-Driven Online Retail

Increase conversions with a personalized shopping experience with

best in class reliability and performance.

CATALOG

DYNAMIC NAVIGATION AND LANDING PAGES

INSTANT INSIGHTS AND ANALYTICS

PERSONALIZED SHOPPING EXPERIENCE

PROMOTIONS USER HISTORY

Data Acquisition

Data Processing

Smart Access API

Search-Driven Compliance and Surveillance

Detect and investigate activity for regulatory compliance, from one

unified view.

DATABASE

ACCURATE REAL-TIME INFORMATION

CONTEXTUALLY-ENRICHED

INFORMATION

MESSAGESLOGS

DATA EXPLORATION AND VISUALIZATION

Data Acquisition

Indexing & Streaming

Smart Access API

Search-Driven Customer Service

Resolve customer issues quickly with immediate access to relevant answers.

CUSTOMER SELF-SERVICE

KNOWLEDGE BASE

PROACTIVE ALERTS AND RECOMMENDATIONS

EXPERT TUNED RELEVANCY DRIVEN BY

ANALYTICS AND INSIGHTS

CRM SUPPORT TICKETS & ISSUE TRACKING

Data Acquisition

Data Processing

Smart Access API

Fusion and Recommenders

Lucidworks Fusion Is Search-Driven Everything

• Drive next generation relevance via Content, Collaboration and Context

• Harness best in class Open Source: Apache Solr + Spark

• Simplify application development and reduce ongoing maintenance

CATALOG

DYNAMIC NAVIGATION AND LANDING PAGES

INSTANT INSIGHTS AND ANALYTICS

PERSONALIZED SHOPPING EXPERIENCE

PROMOTIONS USER HISTORY

Data Acquisition

Indexing & Streaming

Smart Access API

Recommendations & Alerts

Analytics & InsightsExtreme Relevancy

Access data from anywhere to build intelligent, data-driven applications.

Fusion Architecture

REST

API

Worker Worker Cluster Mgr.

Apache Spark

Shards Shards

Apache Solr

HD

FS (O

ptio

nal)

Shared Config Mgmt

Leader Election

Load Balancing

ZK 1

Apache Zookeeper

ZK N

DATABASEWEBFILELOGSHADOOP CLOUD

Connectors

Alerting/Messaging

NLP

Pipelines

Blob Storage

Scheduling

Recommenders/Signals

…

Core Services

Admin UI

SECURITY BUILT-IN

Lucidworks View

• Fusion

• Recommenders API

• Machine Learning pipeline stages

• Scheduling

• Solr:

• More Like This + Signals

• Spark:

• MLlib, Mahout, custom

Key Platform Tech

• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:

• Extracts nontrivial terms from specified fields in it

• Builds an “OR” query to search for closest matches (like a cosine similarity computation)

• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query

• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V

Content-focused

{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>

“People who bought X also bought Y” / “Movies recommended for you”

Collaborative Filtering

Search User/Item Index

Top K users who’ve

interacted with this Item

Search and Rollup on User/

Item Index

Top Y docs

Current DocFilter by context Profit

User/Item Index

Offline Tasks

User/Item Signals

Math!

• Fusion CF-based “documents like this” pipeline stages:

• Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight)

• Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )

• Roll-up: topN(sum(score_i * weight_i))

• Sub-query: fetch the documents from primary Solr index of these top N doc_ids

Collaborative Filtering: step by step in Fusion

• Both content-based and CF recommenders use features of the documents to generate a similarity metric

• Content uses the tokens in the document

• CF uses user ids who have engaged with it

• Metrics can be weighted-summed, allowing a “slider” between the two

• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix

• There is a cost to such techniques: harder to maintain, harder to A/B test variations

Multi-modal

• Basics:

• 26 Apache Projects registered so far plus LW web properties

• 93 datasources* including email, Github, JIRA*, Website and Wiki

• Fusion 2.4

• Signals everywhere

• UI based on Lucidworks View

• ASF Mail archives mirrored at: http://asfmail.lucidworks.io

Demo

http://searchhub.lucidworks.com

http://asfmail.lucidworks.io


Implementation Details

http://github.com/lucidworks/searchhub

Branch: GH-28-doc-view

Key Source Code

UI

Angular Directives:

perdocument

recommendations

Offline Tasks

Spark Jobs:

mail_thread_signal_creation_job.json

SimpleTwoHopRecommender.scala

Fusion PipelinesQuery:

lucidfind-recommendations

cf-similar-items-batch-rec

cf-similar-items-rec

http://github.com/lucidworks/searchhub

• Ensemble and Click-based approaches

• https://github.com/lucidworks/searchhub/issues/40



• Deploy live

• User registrations


Future Work

https://github.com/lucidworks/searchhub/issues/40




Resources

Fusion: http://www.lucidworks.com/products/fusion

Search Hub: http://searchhub.lucidworks.com

Company: http://www.lucidworks.com

Our blog: http://www.lucidworks.com/blog

Twitter: @gsingers, @pbrane

http://www.lucidworks.com/products/fusion


http://www.lucidworks.com

http://www.lucidworks.com/blog

Technology

Webinar: Search and Recommenders