Upload
westbrook-sterling
View
29
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Geographic Web Information Retrieval. Alexander Markowetz , University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg. Current Situation In Web-IR. Everybody is online But never seen. Queries are too short Resultsets are too large. - PowerPoint PPT Presentation
Citation preview
Geographic Web Information Retrieval
Alexander Markowetz, University of Marburg
Thomas Brinkhoff, FH Oldenburg
Bernhard Seeger, University of Marburg
2
Current Situation In Web-IR
Everybody is online But never seen
3
Current Situation In Web-IR
Queries are too short
Resultsets are too large
You can effectively block your competitors
Good results get buried
Smaller Results
Ways to drill the ice-berg
4
Solutions
Personalized Search
Dynamic/Interactive Search
5
Geographic Web-IR
Location is the most personal property „All business is local“
People already use the web geographically „Yoga Brooklyn“
„Linux usergroup Frankfurt“
And get poor results
We are going to make that a lot better
6
How-Not-To
Semantic Web „If just everybody included Geographic Markup in
their web-pages“
Two problems Chicken-Egg
Malicious Webmaster Metatags Anyone?
Bottomline Semantic web is for „B2B“ situations only.
7
How-To
Modify traditional IR techniques to extract geographic markers Multigranular approach
Extending basic Web-IR
Map pages to geographic positions Footprint
Aggregate and Cluster them
Build Applications Geographic Search
Geographic Web-Mining
8
Geocoding
Footprint Geographic Position of
a Webpage
Set of points and polygons, associated with some amplitude
9
Preliminaries
Basic IR Assumptions can easily be extended to „geographic-IR“ Radius-1 Hypothesis
Radius-2 Hypothesis (co-citation)
Intra-Site Hypothesis Intra-subdomain
Intra-directory
10
Multigranularity
Information extraction on different levels Domain
Subdomain
Directory
File
Need to aggregate
Dir
File
Dom
SDom SDom
Dir
File
11
Sources
On all levels Names of places
Zip-codes
Area-codes
On Site Level Whois
Business Directories
Links Density over a given area
Radius-1 and Radius-2
Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10th WWW, 2001
Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000
Dir
File
Dom
SDom SDom
Dir
File
12
Geographic Search
A simple interface Not so exciting, but...
Key Words
City
Street
State
Area code
SEARCH
13
Dynamic Geographic-IR
Replacing the „next“ button
Closer Continue Wider
Next Closer Wider
Next ½ mile 1 mile 2 miles5 miles 10 miles 25 miles 100 miles
14
Locality
Final ranking is a (linear) combination of importance and geographic distance.
Chances are: Amazon will still rank first: no matter where you are
Amazon is a „global bully“
Idea: Eliminate global bullies by computing importance
differently
Give less weight to links that span a longer distance
15
Evaluation
Evaluation Web-IR is hard
Evaluating geo-Search is even harder
Mistakes are hard to find
16
Impact of geo-IR
Next generation Search Engine
Location based Service For cellphones under UMTS
Move traffic from A&E Local companies will get more traffic
Increase Profits from Adwords Smallest businesses will advertise online
Locally focused
The „Leaflet-industry“ will shrink
17
Geographic Web-Mining
The web reflects human society. Distorted
Delayed/Ahead
A lot of interesting social questions can be answered by looking at a large webcrawl
You can save time and money compared to door-to-door surveys This is widely used
But: Most of them are of geographic nature
18
Example Queries
Where in Germany are vintage sneakers a trend?
Is there a fashion authority that is accepted in all regions of Germany?
Do Britney and Madonna have the same audience?
Draw a map of Germany with all sites about vintage sneakers.
Find all fashion-sites that get a min of 1000 equally distributed links.
Map the areas in Germany, where there are significantly more Sites for B. than for M.
Precise Semantics?
19
Current Work
Older Prototype Metasearch on top of lycos.de
Screen-scrape & re-order
Whois only
Did very well
20
Current Work
Current Prototype for Geographic Search Limited to Germany = .de domains
50.000.000 Pages
Expected online by late summer
In co-operation with Yen-Yu Chen
Xiaohui Long
Torsten Suel
Polytechnic University, Brooklyn
21
Reinventing Web-IR
Nearly no (academic) work in geo-IR
Allmost every aspect of Web-IR needs to be looked at again Interfaces
Query processing
Index distribution
Link analysis
User profile analysis
Spam detection
Even: Other aspects of personalized search
Changes in the web
22
Thank you
Any questions?