Upload
vida
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Static Rank Framework for Lucene / Solr. Mike Schultz [email protected]. Static Rank for Solr / Lucene. Dynamic Rank Why Static Rank Combining Scores Static Rank Components. Multiple Fields / Multiple Types. PubDate. Continuous (Date, Int , Float, …). I sNews. M ediaType. - PowerPoint PPT Presentation
Citation preview
A Static Rank Framework for Lucene/SolrMike [email protected]
Static Rank for Solr/Lucene
• Dynamic Rank
• Why Static Rank
• Combining Scores
• Static Rank Components
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Enum (Book, CD, DVD, Cassette)
Multiple Fields / Multiple Types
PubDate
IsNews
MediaType
TextBody
Continuous (Date, Int, Float, …)
Boolean (True, False)
Enum (Book, CD, DVD, Cassette)
Text (Natural Language)
Dynamic Rank
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
• Query Dependent = F(Q,D)PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)• Not comparable across queries
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Dynamic Rank
• Query Dependent = F(Q,D)• Huge dynamic range (0.001-1502.3)• Not comparable across queries• Not easily normalized
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Dynamic Score
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
All (dynamic) things equal, I want– Newer over older
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
All (dynamic) things equal, I want– Newer over older– CD over cassette
Why Static Rank?
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
All (dynamic) things equal, I want– Newer over older– CD over cassette– Arbitrary feature A over arbitrary
feature B
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System
• Query Independent = F(D)– i.e. static across queries
Static Score
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System
• Query Independent = F(D)– i.e. static across queries
• More easily bounded
Static Score
Combined Rank
PubDate
IsNews
MediaType
TextBody
TF * IDF
Query
Static Rank System
Cust
om Q
uery
Com
bine
d Sc
ore
Framework - Requirements
Cust
om Q
uery
Com
bine
d Sc
ore
• Intuitive, hand-tunable, debuggable
Framework - Requirements
Cust
om Q
uery
Com
bine
d Sc
ore
• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing
Framework - Requirements
Cust
om Q
uery
Com
bine
d Sc
ore
• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing• Minimal parameters
Framework - Requirements
Cust
om Q
uery
Com
bine
d Sc
ore
• Intuitive, hand-tunable, debuggable• Query-time only, no re-indexing• Minimal parameters• Static Rank should boost / demote– But not too much!– Docs should stay in their own dynamic
rank “neighborhood”.
Combining Scores - Approaches
Cust
om Q
uery
Com
bine
d Sc
ore
• Addition?– Dynamic(0.0001) + Static(0.3) = 0.3001– Dynamic(1542.1) + Static(0.3) = 1542.4– Difficult to get right across queries
Combining Scores - Approaches
Cust
om Q
uery
Com
bine
d Sc
ore
• Multiplication?– Dynamic(50.0) * Static(0.3) = 15.0– Dynamic(10.0) * Static(2.0) = 20.0– Could work, but awkward
Combining Scores - Approaches
Line
ar Q
uery
Com
bine
d Sc
ore
1. Bound StaticScore: -1.0 to 1.02. CScore = DScore*(100+S%*SScore)– At most, staticRank will boost/demote
dynamicScore by S%– CScore = 0.014 * (100+30*0.5)– CScore = 145.3 * (100+30*-0.5)
LinearQuery
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
• Extend solr.ValueSource/Parser
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
• Extend solr.ValueSource/Parser • Uses field cache for inputs
Static Rank
PubDate
IsNews
MediaType
TextBody
Query
Static Rank System Static Score
• Extend solr.ValueSource/Parser • Uses field cache for inputs• Extremely fast
Static Rank
PubDate
IsNews
MediaType
Static Rank
PubDate
IsNews
MediaType
AgoValueSource
years ago
Static Rank
PubDate
IsNews
MediaType
MuxValueSource
0
T
F
AgoValueSource
years ago
years ago
MuxValueSource Config
Static Rank
PubDate
IsNews
MediaType
0
T
F
EnumValueSource
MuxValueSourceAgoValueSource
years ago
years ago
EnumValueSource Config
• Maps Fixed-Vocabulary to YEARS AGO• A hierarchy and 3 values: MIN,0,MAX• All things equal (dynamically), DVD = +3.3 years
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSourceAgoValueSource
years ago
years ago
years ago
years ago ?
-1
1
Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt
Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt
• Linear• No parameters (fixed)• Too gradual over 2000+ years
Mapping YearsAgo to -1.0 – 1.0• Step Function: if > 10 years-ago = -1, else = +1• 1 parameter• Too abrupt
• Linear• No parameters (fixed)• Too gradual over 2000+ years
• Sigmoid• 2 parameters• Smooth over entire range• Easy to calculate
Sigmoid
Slope
Sigmoid
Slope x-intercept (year)
1.0
-1.0
Years-ago
x0 = 1.5 years ago
Static Rank
PubDate
IsNews
MediaType
0
T
F
SumValueSource
EnumValueSource
MuxValueSourceAgoValueSource
SigmoidValueSource
-1
1
years ago
years ago
years ago
SigmoidValueSource Config
Static Rank Config
Conclusion
• solr.ValueSource/Parser - fast and flexible
Conclusion
• solr.ValueSource/Parser - fast and flexible
• CScore = DScore * (100 + S% * SScore)• -1.0 < SScore < 1.0
Conclusion
• solr.ValueSource/Parser - fast and flexible
• CScore = DScore * (100 + S% * SScore)• -1.0 < SScore < 1.0
• “Time” as a common currency for static features