Upload
sharon-collins
View
221
Download
0
Embed Size (px)
Citation preview
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness
Wang Hua 王化情報科学科四年
Motivation
Too many search engines More than 20 major general-purpose enginesMore specific-purpose engines
Simple aggregation of rankings is popular.
We address the need to quantify and visualize the closeness between search engines.
Too Many Search Engines with Different Policy
Major search enginesYahoo, Altavista, Google,Lycos etc.
Distinct ranking policyDirectory type Robot typePagerank type with hyperlink
Outline of Methods
Ranking
Li st d istance measure
Distance between search engines
Ranking
Partial ListCases for WWW web sitesTop 100 list
List of results from search engines
Footrule Distance among Ranking Lists
: ranking lists i |(i) - (i)| [a,b,c,d,e]
[a,d,e,c,b] 0+2+1+2+3 =8
Kendall-tau Distance Definition [Dwork, WWW10, 2001] Counts the number of pairwise disagreements betwe
en two lists
| { i < j | (i) < (j) but (i) > (j) } |
[a,b,c,d] [a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d)
0+0+0+1+1+1=3
Character of Distance
Kendall-tau has O(n log n)-time complexity
Meets triangle inequality and norm distance
Matrix of Distance
Keyword = “university
Engines Dmos Alta Yahoo OverT Excite Lycos Aol Sprinks Galay
Dmos 441 100 132 121 190 213 211 42
Alta 490 737 574 895 915 100 720
Yahoo 2324 2123 1349 879 1221 1766
Overture 7162 7113 6254 945 312
Excite 8927 9699 282 192
Lycos 8712 462 354
Aol 461 365
Sprinks 123
Galaxy
Table 4.2 The Closeness of Search Engines
Visualization
Kernighan-Lin Algorithm
Kamada Spring Model
Comparison of the 2 methods
Kernighan-Lin Method
Brief explanation
Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”
Kernighan-Lin by Color CodingKeyword1=“Gucci” Keyword2=“Hermes”
Kamada Spring Model
Brief explanation
An example
Kamada Spring ModelKeyword1=“Totti” Keyword2=“Nakata”
Comparison of the 2 methods
Results
Distances between search engines are different.
Different fields have different characters
Some search engines such as Sprinks are far away from others.
Excite, Aol are near to each other in most cases.
Conclusion
Address the need to quantify and visualize the closeness between search engines.
Provide users GUI to see the closeness of search engines.
Help users to select the proper search engines
Help users to see the features of each search engines in carious fields.
Future Work
Use more search engines
Use both general-purpose and special-purpose search engines
Use hyperlinks to find the resemblance
Apply this idea to other fields