29

Webinar: Natural Language Search with Solr

Embed Size (px)

Citation preview

Page 1: Webinar: Natural Language Search with Solr
Page 2: Webinar: Natural Language Search with Solr

Ted Sullivan

Natural Language Search with Solr

lucidworks.com

Senior Solutions Architect

Page 3: Webinar: Natural Language Search with Solr

The take-home word for this talk is:

CONTEXT

Page 4: Webinar: Natural Language Search with Solr

What I will talk about …Why does context matter?

Phrase and contextual ambiguities in search

• Recent advances in Query Autofiltering that attack the context problem by adding “verb/preposition” disambiguation *

Traditional ways of visualizing context in search - forging search “loops”

• Facets

• Typeahead

https://lucidworks.com/blog/2015/11/19/query-autofiltering-chapter-4-a-novel-approach-to-natural-language-processing/*

Page 5: Webinar: Natural Language Search with Solr

Adding metadata context to Suggestions using Facets

Using Pivot Facets to create semantically rich suggestions

Facets to bring user-centric context to suggestions

• Entitlements: Security trimming of suggestions

• User session context: Dynamic On-The-Fly Predictive Analytics!

What I will talk about …

Page 6: Webinar: Natural Language Search with Solr

Why Does Context Matter?

Relevance is contextual - relevant to whom under what circumstances?

Language / User Intent / Social and business factors

Ambiguities in search are often due to an failure/inability to detect context.

So, what can we do about this - or is this talk just some obvious hand-waving BS that we’ve heard a thousand times? Hopefully, not.

But that said - maybe just a little theory first …

Page 7: Webinar: Natural Language Search with Solr

Contextual RelationshipsSemantic Context - Language, Lexicon

User Context - Intent, Agendas,Permissions, Demographics, Location

Social Context - Popularity, Common Behaviors => Recommendations

Business Context - Rules, Organization, Domain, Security

Context == Relationships

Within and between metadata “objects”

Search is an exchange of one metadata object - the query - for others - the results.

Page 8: Webinar: Natural Language Search with Solr

Things are related to other ThingsRelationships provide context

• Static or known Relationships - defined by a knowledge graph such as an Ontology

• Discovered Relationships - computed by data mining

Knowledge Graphs - connected-ness

Usage Logs (query logs, other captured events or signals) - behavioral contexts

Clustering - unsupervised learning algorithms

Natural Language Processing - semantic contexts - noun phrases - statements

Machine Learning - supervised learning => Feature extraction

Page 9: Webinar: Natural Language Search with Solr

Apple

Tim Cook

Times Square

Granny Smith

White Album

iPhone Macintosh Computer Tablet Steve Jobs Lisa iTunes

Broadway Wall Street Empire State Building Bronx Zoo

Pie Fritters Season Sauce Cider Picking Tree McIntosh

Records Beatles George Martin Capitol White Album

Feature Sets

Page 10: Webinar: Natural Language Search with Solr

Resolving AmbiguitiesPhrase or syntactic ambiguities - detecting nouns

Autophrasing - unstructured data

Query Autofiltering - structured data

Contextual or semantic ambiguities (subject-verb-object) - detecting intent

Traditional NLP - POS detection, Machine Learning

Query Autofiltering with verb/preposition disambiguation

Page 11: Webinar: Natural Language Search with Solr

Song

Songwriter

Genre

Performer

Recording

Guitarist

Pianist

VocalistProducerRecord Label

Band

Album

Enough abstractions - give me some examples!

Music Ontology

Page 12: Webinar: Natural Language Search with Solr

Discovery and Focus

Enough abstractions - give me some examples!Medical Ontology

Disease

Condition Symptom

DrugTreatment

Page 13: Webinar: Natural Language Search with Solr

Query Autofiltering “songs Eric Clapton wrote” vs. “songs Eric Clapton performed”

Without Verb support get:

(performer_ss:”Eric Clapton” OR composer_ss:”Eric Clapton”) AND composition_type:Song

For either.

With Verb support

Now we get:

songs Eric Clapton wrote => composer_ss:”Eric Clapton” AND composition_type:Song

songs Eric Clapton performed => performer_ss:”Eric Claptpn” AND composition_type:Song

Verb/Preposition context rules

written,wrote,composed =>composer_ssperformed,played,sang,recorded:performer_ss

Page 14: Webinar: Natural Language Search with Solr

Query Autofiltering “Bands that Eric Clapton was in”

No context rules (raw autofiltering):

((name_s:Band OR musician_type_ss:Band) AND (name_s:\"Eric Clapton\" OR original_performer_s:\"Eric Clapton\" OR composer_ss:\"Eric Clapton\" OR performer_ss:\"Eric Clapton\" OR groupMembers_ss:\"Eric Clapton\”))

Add context rule

members,member,was in,is in,who's in,who's in the,is in the,was in the => memberOfGroup_ss,groupMembers_ss

((name_s:Band OR musician_type_ss:Band) AND groupMembers_ss:\"Eric Clapton\")

Verb/Preposition context rules

Page 15: Webinar: Natural Language Search with Solr

Query Autofiltering Verb/Preposition context rulesWho’s in The Who

raw autofiltering

((name_s:\"The Who\" OR original_performer_s:\"The Who\" OR performer_ss:\"The Who\" OR memberOfGroup_ss:\"The Who\”))

Page 16: Webinar: Natural Language Search with Solr

Query Autofiltering Verb/Preposition context rulesWho’s in The Who

raw autofiltering

((name_s:\"The Who\" OR original_performer_s:\"The Who\" OR performer_ss:\"The Who\" OR memberOfGroup_ss:\"The Who\”))

with context rule

members,member,was in,is in,who's in,who's in the,is in the,was in the => memberOfGroup_ss,groupMembers_ss

query is now:

(memberOfGroup_ss:\"The Who\")

Page 17: Webinar: Natural Language Search with Solr

Query Autofiltering

Drugs that treat abdominal pain

treatment_type:Drug AND has_indication:”abdominal pain”

Drugs that cause abdominal pain

treatment_type:Drug AND has_side_effect:”abdominal pain”

vs.

treatment_type:Drug AND (has_indication:”abdominal pain” OR has_side_effect:”abdominal pain”)

Verb/Preposition context rulestreat,for,indicated => has_indicationcause,produce => has_side_effect

Page 18: Webinar: Natural Language Search with Solr

Query AutofilteringBeatles Songs covered vs Songs Beatles covered

covers by other artists of songs written by the Beatles vs covers by Beatles of songs by other songwriters

Robert Johnson Songs that Eric Clapton covered

works the same as:

Eric Clapton covers of Robert Johnson Songs

Insomnia Drugs - are just indicated drugs

Noun-Noun Phrases

Robert Johnson Songs

Beatles Songs

Robert Johnson Songs

Insomnia Drugs

covered,covers:performer_ss | version_s:Cover |original_performer_s:_ENTITY_,recording_type_ss:Song=>original_performer_s:_ENTITY_

Page 19: Webinar: Natural Language Search with Solr

Facets provide ContextVisualization and the search “conversation”: Discovery and Focus

• Post-query visualization- facet display - aggregated attributes of found things

• Pre-query visualization - query suggestion or typeahead - can use facets too (stay tuned).

• The Good, The Bad and The Ugly aspects of Facets

New and Improved: Statistics, Analytics and APIs - Oh My!

• Dashboards and Dynamic Business Intelligence

• Heatmap Faceting

• Pivot Facets and Ad-Hoc Object Hierarchies - now with stats!

•JSON Facet API

Page 20: Webinar: Natural Language Search with Solr

How can we use facets to improve typeahead?

Put more precision and more context into a suggester.

=> Using metadata - guide the user to more precise queries that we can be really GOOD at!

To do this, we can build a specialized suggester collection - then we can use facet contexts to build semantic and behavioral relationships within and between searches.

* Shameless Monty Python’s Flying Circus reference

And now for something completely different! *

Page 21: Webinar: Natural Language Search with Solr

Suggester BuildwareQuery Collectors or Fetchers

Gather sets of query suggestions - Interface with multiple implementations possible

Suggester Builder

• Validates suggestions

• Adds context to suggestions using faceting

• Submits suggestion and metadata to Solr Index

Query Logs

Terms Component

Curated Lists

Pivot Facet CollectorPivot Facet Collector

Databases - SQL or Not

Page 22: Webinar: Natural Language Search with Solr

Pivot Facet Query CollectorUses “Field Pattern Templates” to generate semantically rich suggestions

Structured data - metadata fields contain object attributes

Can combine these attributes into phrases - semantically (or not)

Machine doesn’t know semantics.

Example

Bob Jobs Accountant Cincinnati Ohio

makes sense

Ohio Accountant Jones Cincinnati Bob

doesn’t

first_name last_name occupation city state

Page 23: Webinar: Natural Language Search with Solr

Pivot Facet Query Collector

${musician_type} ${recording_type}s

${genre} ${musician_type}s

${performer} ${recording_type}s

Rolling Stones Albums

New Wave Songs

Classical Pianists

If we create Pivot Template Patterns like this:

${original_performer} ${recording_type}s covered by ${performer} (plus context)

Beatles Songs covered by Joe Cocker

We get suggestions like this:

${name}

Stuck Inside of Mobile With The Memphis Blues Again

Page 24: Webinar: Natural Language Search with Solr

Suggester Builder - validate and contextualize

• Validate - make sure that the query works

• Contextualize - use facets to acquire “aboutness” stuff

Tests the query against the content collection

“Stuck Inside of Mobile With The Memphis Blues Again”composition_type_ss: [

"Song"]composer_ss: ["Bob Dylan"

]genre_ss: ["Blues Rock""Folk Rock"]

Page 25: Webinar: Natural Language Search with Solr

Use Cases - User Context sensitive typeahead User Permissions: Security Trimming of Suggestions

Faceting on ACL lists of content collection - copy set of ACL values for suggestion result set to suggester collection

=> Don’t suggest queries that return 0 results for a given user

User Behavior: Dynamic On-The-Fly Predictive Analytics

Cache context facets returned by Suggester - use as boost queries for subsequent queries in a user session

=> System learns “what” user is looking for

Page 26: Webinar: Natural Language Search with Solr

Data Quality - Text - MetadataData design and curation - solve garbage in - garbage out at the source.

More fields with more precise values - combine for expressiveness

The Ole Structured vs Unstructured bugga-boo

Use Machine Learning / Knowledge Base Classification to add metadata

Page 27: Webinar: Natural Language Search with Solr

“MODEL”(

Machine(Learning(

Subject(Ma6er(Experts(

Model Building

Training'Set'–'“Seed'Crystal”'Subject'Ma8er'Experts'

Machine'Learning'

Model'

QUERY& DOCUMENTS&

Yes$

No$

Feature&Sets&

Model: Mapping of Text => Feature Sets

Detecting and Consuming Context

Page 28: Webinar: Natural Language Search with Solr

(more)'Structured'Document'Collec1on'

Query'Autofiltering'

Query'

Solr'/'Lucene''

Result'Set'

Query Autofiltering can be used as a “normalization” layer for classification

Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))

Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))

Document)Classifica0on)Stages)(Manual,ML,Ontology,Hybrid))

Metadata)Enrichment)

(more))Structured)Document)Collec0on)–)The)Model!)

=> Can Think of the Solr/Lucene Index itself as the “Model”

Page 29: Webinar: Natural Language Search with Solr

Thank you!

lucidworks.com

Ted Sullivan

Senior Solutions Architect