common wisdom
• they are everywhare and bloat index
• remove them to increase performance (smaller index and query) and relevance of search results
common wisdom
• they are everywhare and bloat index
• remove them to increase performance (smaller index and query) and relevance of search results
• … but sometimes stop words add little semantic to a sentence
• … and sometimes you need them - To be or not to be
common wisdom
• they are everywhare and bloat index
• remove them to increase performance (smaller index and query) and relevance of search results
• … but sometimes stop words add little semantic to a sentence
• … and sometimes you need them - To be or not to be
• having the best of both worlds? multiple mappings of data: one with stop words removed and one with stop words
common wisdom
• they are everywhare and bloat index
• remove them to increase performance (smaller index and query) and relevance of search results
• … but sometimes stop words add little semantic to a sentence
• … and sometimes you need them - To be or not to be
• having the best of both worlds? multiple mappings of data: one with stop words removed and one with stop words doubled data by indexing in two different ways!
• Common Terms Query analyzes query, identifies whichwords are “important” based on document frequencies for each term
• Common Terms Query leverage the power of stop wordremoval (faster searches) without eliminating them (theycan contribute to score sometimes)
• Common Terms Query adapts to your domain, wordswith high frequency will automatically be recognized as stop words
restoring stop words
possibility of improving
• searches comprised only of stopwords (improved recall)• to be or not to be• The Who
• searches for short searches including stopwords (improved precison)• pearl vs. the pearl• the one• a zukofsky (author Zukofsky, title "a")
• distinguish "in" from "and” in some cases• archaeology in literature != archaeology and literature
restoring stop words
possibility of improving
• searches comprised only of stopwords (improved recall)• to be or not to be• The Who
• searches for short searches including stopwords (improved precison)• pearl vs. the pearl• the one• a zukofsky (author Zukofsky, title "a")
• distinguish "in" from "and” in some cases• archaeology in literature != archaeology and literature
possibility of degrading
• long queries (over 6 terms) with a lot of stopwords have reduced precision• Lectures on the Calculus of Variations and Optimal Control Theory• BUT: the words occurring as a phrase float to the top• AND: you can modify minimum match (mm) param
restoring stop words
how to decide?
• take a look at your business knowledge domain
• count percent of searches with stop words
• count terms in user queries