Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs....

Preview:

Citation preview

November 2006 © 2006 Glenbrook Networks, Inc.

Deep Web Search

Series A FinancingJulia Komissarchik

Co-Founder & VP ProductsGlenbrook Networks

November 2006 © 2006 Glenbrook Networks, Inc. 2

Deep Web vs. Surface Web

Surface Web - This is what search engines (e.g., Google) sift through

Deep Web • Pages revealed only after actions on prior page(s) • Rich in content• Pages often are created on the fly, in response to a specific inquiry • Content is often perishable• More and more of the desired content is stored in Deep Web

Challenge - To harvest desired data from the Surface Web and determine intelligently when and how to go to the Deep Web to collect more

Franchise Locations

Lawyers/CPAs/

DoctorsListings

Country/State/

Federal Government Registrations

Job ListingsOffice

LocationsProfessional Memberships Events Blogs

Surface Web – 12 Billion Pages

Deep Web – 300+ Billion Pages

November 2006 © 2006 Glenbrook Networks, Inc. 3

Deep Web Examples l Travel

• United Airlines• Expedia

l Job Search• URS• Express Script

l Store Locator• Wendy’s• H&R Block

l Katrina• Katrina Survivor – Connector List from “Gulf Coast News”

November 2006 © 2006 Glenbrook Networks, Inc. 4

Probleml Size - 10-30 times larger than the existing Surface

web covered by Google, Yahoo, MSN and othersl Pages are dynamically generated in response to a

question entered into DHTML forms – no passwords just appropriate questions

l Pages don’t have urls - non-addressable space, no classic search engine link-based page ranking

l Designed for humans not machinesl Long tail – millions of web sites with DHTML forms of

arbitrary structure

November 2006 © 2006 Glenbrook Networks, Inc. 5

Solution

l Vertical focus –• Manageable size

• Restricted Semantics

• Pragmatic knowledge

l Sophisticated AI techniques – penetration through forms

l Fact-based page ranking vs. link-based page ranking

November 2006 © 2006 Glenbrook Networks, Inc. 6

Prediction - 3-5 year out impactl Web

• Deep Web will become pervasive • Most businesses will have an extensive online presence• No Standardization in view for Facts Representation

l Factual Search will become prevailing• Expectations of answers to a question, not references to a bunch of documents to be

read• Ability to ask questions like

• Find a restaurant in 5 mile radius that serves chicken enchilada under $5 and is open right now

• Find a local plumber that has been in business for more than 10 years, has experience with sprinkle systems and accepts Visa

• Find a business that is within 5 minutes walk of a bus stop for a line that is within 10 minutes of a given location

• Find locations that have the highest increase of job openings in IT industry• Who are the top 10 RBI players in American league over last mont h• What was the apartment rentals trend in San Francisco over the last 6

months

November 2006 © 2006 Glenbrook Networks, Inc. 7

So – A Lot Of Business Opportunities

l Successful vertical applications• Business Info/Local Search• Events/Entertainment• Travel• Sports• Health• Job Search

l New Google in breeding – synthesis of multiple factual vertical Deep Web search engines into a massive horizontal search

November 2006 © 2006 Glenbrook Networks, Inc.

Glenbrook Networks

Series A FinancingData Collection and Fact Extraction

Via Intelligent Web Trawling

November 2006 © 2006 Glenbrook Networks, Inc. 9

l Glenbrook delivers data using intelligent patented search technology capable of extracting targeted information from the web with high precision, efficiency, and speed

l What makes us unique:• Glenbrook trawls the Deep Web, not just crawls the Surface

Web

• Glenbrook’s output is a structured factual data feed rather than a simple list of links to pages of potential interest

• It is not template based (so it is highly scalable)

Glenbrook Networks

November 2006 © 2006 Glenbrook Networks, Inc. 10

Application ExamplesGlendor Local Search

November 2006 © 2006 Glenbrook Networks, Inc. 11

Application ExamplesGlendor Local Search

November 2006 © 2006 Glenbrook Networks, Inc. 12

Application ExamplesGlendor Deep Web Job Search

November 2006 © 2006 Glenbrook Networks, Inc. 13

Trend Analysis of Job PostingsCase Study

l A large Wall Street financial institution has approached Glenbrook to help collecting in-depth information about public companies, in particular job postings

l Using Glenbrook Deep Web trawler and Fact Extractor data was collected biweekly directly from public companies websites

l The data feed was used by analysts to perform trend analysis that influenced their recommendations for stock market

November 2006 © 2006 Glenbrook Networks, Inc. 14

Public Companies Job Postings Trend Analysis (by Company)

0

200

400

600

800

1000

1200

1400

1600

1800

12/2

8/20

05

1/4/

2006

1/11

/200

6

1/18

/200

6

1/25

/200

6

2/1/

2006

2/8/

2006

2/15

/200

6

2/22

/200

6

3/1/

2006

3/8/

2006

3/15

/200

6

3/22

/200

6

3/29

/200

6

4/5/

2006

4/12

/200

6

4/19

/200

6

4/26

/200

6

5/3/

2006

5/10

/200

6

5/17

/200

6

5/24

/200

6

5/31

/200

6

QUALCOMM Incorporated

Altera CorporationAmerisourceBergen Corporation

Target Corp.

QUALCOMM Incorporated

Dell Inc.

November 2006 © 2006 Glenbrook Networks, Inc. 15

Public Companies Job Postings Trend Analysis (by Company and by City)

0

20

40

60

80

100

120

12/2

8/20

05

1/4/

2006

1/11

/200

6

1/18

/200

6

1/25

/200

6

2/1/

2006

2/8/

2006

2/15

/200

6

2/22

/200

6

3/1/

2006

3/8/

2006

3/15

/200

6

3/22

/200

6

3/29

/200

6

4/5/

2006

4/12

/200

6

4/19

/200

6

4/26

/200

6

5/3/

2006

5/10

/200

6

5/17

/200

6

5/24

/200

6

5/31

/200

6

McKesson Ontario, CA

QUALCOMM India

QUALCOMM GermanyURS Corporation Bagdad Iraq

Sun Microsystems Bulington MA

Dell Inc. Cincinnati OH

November 2006 © 2006 Glenbrook Networks, Inc. 16

Thank you

Julia Komissarchikjulia@glenbrooknet.com650 759 3959