November 2006 © 2006 Glenbrook Networks, Inc.
Deep Web Search
Series A FinancingJulia Komissarchik
Co-Founder & VP ProductsGlenbrook Networks
November 2006 © 2006 Glenbrook Networks, Inc. 2
Deep Web vs. Surface Web
Surface Web - This is what search engines (e.g., Google) sift through
Deep Web • Pages revealed only after actions on prior page(s) • Rich in content• Pages often are created on the fly, in response to a specific inquiry • Content is often perishable• More and more of the desired content is stored in Deep Web
Challenge - To harvest desired data from the Surface Web and determine intelligently when and how to go to the Deep Web to collect more
Franchise Locations
Lawyers/CPAs/
DoctorsListings
Country/State/
Federal Government Registrations
Job ListingsOffice
LocationsProfessional Memberships Events Blogs
Surface Web – 12 Billion Pages
Deep Web – 300+ Billion Pages
November 2006 © 2006 Glenbrook Networks, Inc. 3
Deep Web Examples l Travel
• United Airlines• Expedia
l Job Search• URS• Express Script
l Store Locator• Wendy’s• H&R Block
l Katrina• Katrina Survivor – Connector List from “Gulf Coast News”
November 2006 © 2006 Glenbrook Networks, Inc. 4
Probleml Size - 10-30 times larger than the existing Surface
web covered by Google, Yahoo, MSN and othersl Pages are dynamically generated in response to a
question entered into DHTML forms – no passwords just appropriate questions
l Pages don’t have urls - non-addressable space, no classic search engine link-based page ranking
l Designed for humans not machinesl Long tail – millions of web sites with DHTML forms of
arbitrary structure
November 2006 © 2006 Glenbrook Networks, Inc. 5
Solution
l Vertical focus –• Manageable size
• Restricted Semantics
• Pragmatic knowledge
l Sophisticated AI techniques – penetration through forms
l Fact-based page ranking vs. link-based page ranking
November 2006 © 2006 Glenbrook Networks, Inc. 6
Prediction - 3-5 year out impactl Web
• Deep Web will become pervasive • Most businesses will have an extensive online presence• No Standardization in view for Facts Representation
l Factual Search will become prevailing• Expectations of answers to a question, not references to a bunch of documents to be
read• Ability to ask questions like
• Find a restaurant in 5 mile radius that serves chicken enchilada under $5 and is open right now
• Find a local plumber that has been in business for more than 10 years, has experience with sprinkle systems and accepts Visa
• Find a business that is within 5 minutes walk of a bus stop for a line that is within 10 minutes of a given location
• Find locations that have the highest increase of job openings in IT industry• Who are the top 10 RBI players in American league over last mont h• What was the apartment rentals trend in San Francisco over the last 6
months
November 2006 © 2006 Glenbrook Networks, Inc. 7
So – A Lot Of Business Opportunities
l Successful vertical applications• Business Info/Local Search• Events/Entertainment• Travel• Sports• Health• Job Search
l New Google in breeding – synthesis of multiple factual vertical Deep Web search engines into a massive horizontal search
November 2006 © 2006 Glenbrook Networks, Inc.
Glenbrook Networks
Series A FinancingData Collection and Fact Extraction
Via Intelligent Web Trawling
November 2006 © 2006 Glenbrook Networks, Inc. 9
l Glenbrook delivers data using intelligent patented search technology capable of extracting targeted information from the web with high precision, efficiency, and speed
l What makes us unique:• Glenbrook trawls the Deep Web, not just crawls the Surface
Web
• Glenbrook’s output is a structured factual data feed rather than a simple list of links to pages of potential interest
• It is not template based (so it is highly scalable)
Glenbrook Networks
November 2006 © 2006 Glenbrook Networks, Inc. 10
Application ExamplesGlendor Local Search
November 2006 © 2006 Glenbrook Networks, Inc. 11
Application ExamplesGlendor Local Search
November 2006 © 2006 Glenbrook Networks, Inc. 12
Application ExamplesGlendor Deep Web Job Search
November 2006 © 2006 Glenbrook Networks, Inc. 13
Trend Analysis of Job PostingsCase Study
l A large Wall Street financial institution has approached Glenbrook to help collecting in-depth information about public companies, in particular job postings
l Using Glenbrook Deep Web trawler and Fact Extractor data was collected biweekly directly from public companies websites
l The data feed was used by analysts to perform trend analysis that influenced their recommendations for stock market
November 2006 © 2006 Glenbrook Networks, Inc. 14
Public Companies Job Postings Trend Analysis (by Company)
0
200
400
600
800
1000
1200
1400
1600
1800
12/2
8/20
05
1/4/
2006
1/11
/200
6
1/18
/200
6
1/25
/200
6
2/1/
2006
2/8/
2006
2/15
/200
6
2/22
/200
6
3/1/
2006
3/8/
2006
3/15
/200
6
3/22
/200
6
3/29
/200
6
4/5/
2006
4/12
/200
6
4/19
/200
6
4/26
/200
6
5/3/
2006
5/10
/200
6
5/17
/200
6
5/24
/200
6
5/31
/200
6
QUALCOMM Incorporated
Altera CorporationAmerisourceBergen Corporation
Target Corp.
QUALCOMM Incorporated
Dell Inc.
November 2006 © 2006 Glenbrook Networks, Inc. 15
Public Companies Job Postings Trend Analysis (by Company and by City)
0
20
40
60
80
100
120
12/2
8/20
05
1/4/
2006
1/11
/200
6
1/18
/200
6
1/25
/200
6
2/1/
2006
2/8/
2006
2/15
/200
6
2/22
/200
6
3/1/
2006
3/8/
2006
3/15
/200
6
3/22
/200
6
3/29
/200
6
4/5/
2006
4/12
/200
6
4/19
/200
6
4/26
/200
6
5/3/
2006
5/10
/200
6
5/17
/200
6
5/24
/200
6
5/31
/200
6
McKesson Ontario, CA
QUALCOMM India
QUALCOMM GermanyURS Corporation Bagdad Iraq
Sun Microsystems Bulington MA
Dell Inc. Cincinnati OH