2
Outline
Search Engines Evaluation/Testing Our Approach Data Collection Examples
3
Search Engines Evaluation/Testing
4
Search Engine Evaluation
Prepare a set of queries and the ground truth, then evaluate the results of different search engines using well-defined measurements How to prepare queries, i.e., test inputs? How to get the ground truth, i.e., test oracles?
5
Test Oracles
Previous Approaches Manually labeling
too costly, hardly reusable Clickthrough Data
cannot find relevant pages that are not in the search results Automatic labeling based on the search results of multiple
search engines at the same time
bias to systems of similar characteristics Use previous search results as test oracles
desired search results may change
6
Mining Test Oracles from Search Results
7
Basic Idea
Mine implicit rules between inputs/outputs, e.g., tvguide.com, => imdb.com; basketball-reference.com, => nba.com ericsson,sony, => sonyericsson.com
8
Build The Dataset
Terms (features) of inputs Query words Query types
Terms (features) of outputs Domains of top 10 search results
Terms (features) of multiple search engines Search engine + domains of top 10 search results
9
Example Dataset
pine,furniture,Home.csv,barnfurnituremart.com,americancountryhomestore.com,overstock.com,prairiecountryfurniture.com,etsy.com,unfinishedfurnituregiant.com,cozylogfurniture.com,directfrommexico.com,oakplus.com,sawdustcityllc.com,
buy,wine,online,Food.csv,wine.com,foodandwine.com,marketviewliquor.com,winechateau.com,wines.com,thewinebuyer.com,wineweb.com,alloutwine.com,cellaraiders.com,french-wine-online.com,
piercing,labret,Beauty.csv,wikipedia.org,youtube.com,youtube.com,about.com,ygoy.com,ehow.com,bodyjewelleryshop.com,google.com,bmezine.com,piercingdot.com
10
Example Dataset
interest,rates,today,Finance.csv,real.csv,google:wellsfargo.com,google:bankrate.com,google:marketwatch.com,google:interest.com,google:interest.com,google:mortgagenewsdaily.com,google:usbank.com,google:mortgage101.com,google:yahoo.com,google:mortgageloan.com,bing:wellsfargo.com,bing:bankrate.com,bing:marketwatch.com,bing:wsj.com,bing:interest.com,bing:interest.com,bing:bankrate.com,bing:usbank.com,bing:yahoo.com,bing:usatoday.com,yahoo:bankrate.com,yahoo:wellsfargo.com,yahoo:bankrate.com,yahoo:interest.com,yahoo:msn.com,yahoo:money-rates.com,yahoo:cnn.com,yahoo:yahoo.com,yahoo:fxstreet.com,yahoo:marketwatch.com,
11
Association Rule Mining
A,B,C=>D confidence(A=>D) = support(A,D)/support(A)
bing:mlb.com, => google:mlb.com, support(bing:mlb.com, google:mlb.com)=26, support(bing:mlb.com)=27, confidence(bing:mlb.com, =>
google:mlb.com, )=26/27
12
Association Rule Mining
Mine all frequent itemsets We are most interested in the single postfix
rules, i.e., A=>B, where B’s size is 1 Algorithm
For each itemset S For each u in S
Check the rule S-u => u
13
Data Collection
14
Search Engines
Google Bing Yahoo Baidu Sogou Soso
15
Queries
Google trends (hot queries), 1000 queries Queries in KDDCUP 2005, 800 queries Google Adwords, 15,000 queries, 22 types Baidu Tops
16
Examples
17
dpreview.com,kenrockwell.com, => amazon.com, : 29/29=1.0, violations: test: 37/40, violations: 3881,4691,4783,
amazon.com,kenrockwell.com, => dpreview.com, : 29/29=1.0, violations: test: 37/39, violations: 2089,8921,
canon,amazon.com, => canon.com, : 22/22=1.0, violations: test: 34/38, violations: 4090,4870,5384,7400,
canon.com,amazon.com, => canon, : 22/22=1.0, violations: test: 34/38, violations: 3560,5409,8983,8988,
canon.com,Hobbies.csv, => canon, : 31/31=1.0, violations: test: 31/34, violations: 3560,5409,8988,
canon.com,dpreview.com, => canon, : 22/22=1.0, violations: test: 24/26, violations: 5409,8983,
gsmarena.com,samsung.com, => samsung, : 26/26=1.0, violations: test: 32/35, violations: 852,1195,1714,
phonenumber.com, => whitepages.com, : 25/25=1.0, violations: test: 11/12, violations: 1077,
Hobbies.csv,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319,
nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319, canon.com, => canon, : 37/37=1.0, violations: test: 37/41, violations:
3560,5409,8983,8988, amazon.com,nikon, => nikon.com, : 25/25=1.0, violations: test: 25/27, violations:
896,8319, reversephonedirectory.com,Computer.csv, => whitepages.com, : 22/22=1.0, violations:
test: 26/30, violations: 1804,4424,5453,8720,
18
Internet.csv,ericsson, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776,
reversephonedirectory.com, => whitepages.com, : 22/22=1.0, violations: test: 28/32, violations: 1804,4424,5453,8720,
simplyrecipes.com,about.com, => allrecipes.com, : 25/25=1.0, violations: test: 38/39, violations: 5596,
Finance.csv,oanda.com, => xe.com, : 20/20=1.0, violations: test: 27/30, violations: 3410,5566,5781,
oanda.com, => xe.com, : 20/20=1.0, violations: test: 28/31, violations: 3410,5566,5781, food.com,foodnetwork.com, => allrecipes.com, : 30/30=1.0, violations: test: 32/34,
violations: 7642,8519, foodnetwork.com,simplyrecipes.com, => allrecipes.com, : 39/39=1.0, violations: test:
40/43, violations: 566,5596,7642, ericsson,sony, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776, myrecipes.com,foodnetwork.com, => allrecipes.com, : 24/24=1.0, violations: test: 28/30,
violations: 2748,5252, myrecipes.com,allrecipes.com, => foodnetwork.com, : 24/24=1.0, violations: test: 28/35,
violations: 377,1236,1335,1645,3752,6655,6920, phonenumber.com,phone, => whitepages.com, : 20/20=1.0, violations: test: 8/9,
violations: 1077, Food.csv,joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36,
violations: 566, nikonusa.com,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations:
896,8319, joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36, violations: 566,
19
mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711,
Finance.csv,mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711,
recipes,myrecipes.com, => foodnetwork.com, : 20/21=0.9523809523809523, violations: 7778, test: 20/25, violations: 1236,1335,6655,6920,7770,
recipes,myrecipes.com, => allrecipes.com, : 20/21=0.9523809523809523, violations: 7778, test: 24/25, violations: 7770,
phonearena.com,samsung, => gsmarena.com, : 21/22=0.9545454545454546, violations: 3806, test: 33/34, violations: 3802,
samsung.com,samsungmobile.com, => samsung, : 21/22=0.9545454545454546, violations: 8585, test: 8/10, violations: 1195,4778,
food.com,about.com, => allrecipes.com, : 21/22=0.9545454545454546, violations: 2406, test: 43/46, violations: 5740,7359,8893,
Dining.csv,mcdonalds, => mcdonalds.com, : 21/22=0.9545454545454546, violations: 5326, test: 20/22, violations: 3470,3569,
amazon.com,nikon.com, => nikon, : 25/26=0.9615384615384616, violations: 7295, test: 25/30, violations: 1256,5102,6165,6744,7287,
nikon.com,Hobbies.csv, => nikon, : 28/29=0.9655172413793104, violations: 7295, test: 26/31, violations: 1256,5102,6165,6744,7287,
20
Examples of Multiple Search Engines
21
bing:medicinenet.com,google:emedicinehealth.com, => google:medicinenet.com, : 107/107=1.0, violations:
symptoms,bing:medicinenet.com, => google:webmd.com, : 55/55=1.0, violations:
Hobbies.csv,yahoo:allrecipes.com, => google:allrecipes.com, : 53/53=1.0, violations:
bing:medicinenet.com,yahoo:nih.gov, => google:medicinenet.com, : 100/100=1.0, violations:
google:amazon.com,bing:gsmarena.com, => google:gsmarena.com, : 52/52=1.0, violations:
bing:gsmarena.com,google:youtube.com, => google:gsmarena.com, : 73/73=1.0, violations:
google,google:google.com, => bing:google.com, : 56/56=1.0, violations:
google:allrecipes.com,recipe, => bing:allrecipes.com, : 55/55=1.0, violations:
bing:medicinenet.com,yahoo:mayoclinic.com, => google:medicinenet.com, : 90/90=1.0, violations:
bing:dpreview.com,bing:amazon.com, => google:dpreview.com, : 56/56=1.0, violations:
22
bing:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 7124,
Home.csv,bing:amazon.com, => google:amazon.com, : 90/91=0.989010989010989, violations: 2124,
bing:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 90/91=0.989010989010989, violations: 8556,
bing:webmd.com,yahoo:wrongdiagnosis.com, => google:webmd.com, : 95/96=0.9895833333333334, violations: 6305,
recipes,yahoo:allrecipes.com, => google:allrecipes.com, : 95/96=0.9895833333333334, violations: 6041,
bing:mayoclinic.com,bing:nih.gov, => google:mayoclinic.com, : 102/103=0.9902912621359223, violations: 583,
bing:mayoclinic.com,bing:medicinenet.com, => google:medicinenet.com, : 124/125=0.992, violations: 645,
bing:medicinenet.com,bing:webmd.com, => google:medicinenet.com, : 136/137=0.9927007299270073, violations: 8556,
yahoo:nextag.com,bing:amazon.com, => google:amazon.com, : 172/173=0.9942196531791907, violations: 4773,
bing:medicinenet.com,google:mayoclinic.com, => google:medicinenet.com, : 174/175=0.9942857142857143, violations: 645,
google:walmart.com,bing:amazon.com, => google:amazon.com, : 177/178=0.9943820224719101, violations: 4773,
23
bing:mayoclinic.com,google:nih.gov, => google:mayoclinic.com, : 143/145=0.9862068965517241, violations: 1255,583,
bing:amazon.com,yahoo:thefind.com, => google:amazon.com, : 72/73=0.9863013698630136, violations: 4773,
symptoms,bing:webmd.com, => google:webmd.com, : 77/78=0.9871794871794872, violations: 6451,
yahoo:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 77/78=0.9871794871794872, violations: 8556,
yahoo:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 78/79=0.9873417721518988, violations: 7124,
bing:allrecipes.com,yahoo:allrecipes.com, => google:allrecipes.com, : 160/162=0.9876543209876543, violations: 566,5601,
yahoo:bankrate.com,bing:bankrate.com, => google:bankrate.com, : 82/83=0.9879518072289156, violations: 6266,
Internet.csv,bing:gsmarena.com, => google:gsmarena.com, : 83/84=0.9880952380952381, violations: 7617,
bing:gsmarena.com, => google:gsmarena.com, : 86/87=0.9885057471264368, violations: 7617,
bing:nextag.com,bing:amazon.com, => google:amazon.com, : 176/178=0.9887640449438202, violations: 4773,7343,
bing:mayoclinic.com,bing:answers.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 6328,
24
Thank you!