22
Geographic Web Information Retrieval Alexander Markowetz , University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg

Geographic Web Information Retrieval

Embed Size (px)

DESCRIPTION

Geographic Web Information Retrieval. Alexander Markowetz , University of Marburg Thomas Brinkhoff, FH Oldenburg Bernhard Seeger, University of Marburg. Current Situation In Web-IR. Everybody is online But never seen. Queries are too short Resultsets are too large. - PowerPoint PPT Presentation

Citation preview

Page 1: Geographic Web Information Retrieval

Geographic Web Information Retrieval

Alexander Markowetz, University of Marburg

Thomas Brinkhoff, FH Oldenburg

Bernhard Seeger, University of Marburg

Page 2: Geographic Web Information Retrieval

2

Current Situation In Web-IR

Everybody is online But never seen

Page 3: Geographic Web Information Retrieval

3

Current Situation In Web-IR

Queries are too short

Resultsets are too large

You can effectively block your competitors

Good results get buried

Smaller Results

Ways to drill the ice-berg

Page 4: Geographic Web Information Retrieval

4

Solutions

Personalized Search

Dynamic/Interactive Search

Page 5: Geographic Web Information Retrieval

5

Geographic Web-IR

Location is the most personal property „All business is local“

People already use the web geographically „Yoga Brooklyn“

„Linux usergroup Frankfurt“

And get poor results

We are going to make that a lot better

Page 6: Geographic Web Information Retrieval

6

How-Not-To

Semantic Web „If just everybody included Geographic Markup in

their web-pages“

Two problems Chicken-Egg

Malicious Webmaster Metatags Anyone?

Bottomline Semantic web is for „B2B“ situations only.

Page 7: Geographic Web Information Retrieval

7

How-To

Modify traditional IR techniques to extract geographic markers Multigranular approach

Extending basic Web-IR

Map pages to geographic positions Footprint

Aggregate and Cluster them

Build Applications Geographic Search

Geographic Web-Mining

Page 8: Geographic Web Information Retrieval

8

Geocoding

Footprint Geographic Position of

a Webpage

Set of points and polygons, associated with some amplitude

Page 9: Geographic Web Information Retrieval

9

Preliminaries

Basic IR Assumptions can easily be extended to „geographic-IR“ Radius-1 Hypothesis

Radius-2 Hypothesis (co-citation)

Intra-Site Hypothesis Intra-subdomain

Intra-directory

Page 10: Geographic Web Information Retrieval

10

Multigranularity

Information extraction on different levels Domain

Subdomain

Directory

File

Need to aggregate

Dir

File

Dom

SDom SDom

Dir

File

Page 11: Geographic Web Information Retrieval

11

Sources

On all levels Names of places

Zip-codes

Area-codes

On Site Level Whois

Business Directories

Links Density over a given area

Radius-1 and Radius-2

Geospatial Mapping and Navigation of the Web, Kevin S. McCurley, 10th WWW, 2001

Computing Geographical Scopes of Web Resources, J. Ding, L. Gravano, and N. Shivakumar, VLDB 2000

Dir

File

Dom

SDom SDom

Dir

File

Page 12: Geographic Web Information Retrieval

12

Geographic Search

A simple interface Not so exciting, but...

Key Words

City

Street

State

Area code

SEARCH

Page 13: Geographic Web Information Retrieval

13

Dynamic Geographic-IR

Replacing the „next“ button

Closer Continue Wider

Next Closer Wider

Next ½ mile 1 mile 2 miles5 miles 10 miles 25 miles 100 miles

Page 14: Geographic Web Information Retrieval

14

Locality

Final ranking is a (linear) combination of importance and geographic distance.

Chances are: Amazon will still rank first: no matter where you are

Amazon is a „global bully“

Idea: Eliminate global bullies by computing importance

differently

Give less weight to links that span a longer distance

Page 15: Geographic Web Information Retrieval

15

Evaluation

Evaluation Web-IR is hard

Evaluating geo-Search is even harder

Mistakes are hard to find

Page 16: Geographic Web Information Retrieval

16

Impact of geo-IR

Next generation Search Engine

Location based Service For cellphones under UMTS

Move traffic from A&E Local companies will get more traffic

Increase Profits from Adwords Smallest businesses will advertise online

Locally focused

The „Leaflet-industry“ will shrink

Page 17: Geographic Web Information Retrieval

17

Geographic Web-Mining

The web reflects human society. Distorted

Delayed/Ahead

A lot of interesting social questions can be answered by looking at a large webcrawl

You can save time and money compared to door-to-door surveys This is widely used

But: Most of them are of geographic nature

Page 18: Geographic Web Information Retrieval

18

Example Queries

Where in Germany are vintage sneakers a trend?

Is there a fashion authority that is accepted in all regions of Germany?

Do Britney and Madonna have the same audience?

Draw a map of Germany with all sites about vintage sneakers.

Find all fashion-sites that get a min of 1000 equally distributed links.

Map the areas in Germany, where there are significantly more Sites for B. than for M.

Precise Semantics?

Page 19: Geographic Web Information Retrieval

19

Current Work

Older Prototype Metasearch on top of lycos.de

Screen-scrape & re-order

Whois only

Did very well

Page 20: Geographic Web Information Retrieval

20

Current Work

Current Prototype for Geographic Search Limited to Germany = .de domains

50.000.000 Pages

Expected online by late summer

In co-operation with Yen-Yu Chen

Xiaohui Long

Torsten Suel

Polytechnic University, Brooklyn

Page 21: Geographic Web Information Retrieval

21

Reinventing Web-IR

Nearly no (academic) work in geo-IR

Allmost every aspect of Web-IR needs to be looked at again Interfaces

Query processing

Index distribution

Link analysis

User profile analysis

Spam detection

Even: Other aspects of personalized search

Changes in the web

Page 22: Geographic Web Information Retrieval

22

Thank you

Any questions?