13
Web Search Module 6 INST 734 Doug Oard

Web Search

Embed Size (px)

DESCRIPTION

Web Search. Module 6 INST 734 Doug Oard. Agenda. The Web Crawling Web search. Washington Post, February 10, 2011. Email. FTP. RTSP. “The Web”. HTML. Web Server. HTTP. URL. File System. Internet communication protocols. HTTP (transfer). URL - PowerPoint PPT Presentation

Citation preview

Page 1: Web Search

Web Search

Module 6

INST 734

Doug Oard

Page 2: Web Search

Agenda

The Web

• Crawling

• Web search

Page 3: Web Search

Washington Post, February 10, 2011

Page 4: Web Search

HTML(data/display)

Internetcommunication

protocols

RTSPFTPEmail

WebServer

HTTP(transfer)

File System

URL(e.g.,http://www.foo.org/snarf.html)

HTMLHTTPURL

“The Web”

Page 5: Web Search

Internet Web

• Internet: collection of global networks

• Web: way of managing information exchange

• There are many other uses for the Internet– File transfer (FTP)– Email (SMTP, POP, IMAP)

Page 6: Web Search

What “Caused” the Web?

• Affordable storage– 300,000 (typed) words/$ by 1995

• Adequate network capacity– 25,000 simultaneous transfers by 1995– 1 second/screen (of text) by 1995

• Display capability– 10% of US population could see images by 1995

• Effective search capabilities– Lycos and Yahoo! achieved useful scale in 1994-1995

Page 7: Web Search

Internet Hosts

Jan-81 Jan-86 Jan-91 Jan-96 Jan-01 Jan-06 Jan-110

100,000,000

200,000,000

300,000,000

400,000,000

500,000,000

600,000,000

700,000,000

800,000,000

900,000,000

1,000,000,000

Page 8: Web Search

Internet Users

Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08 Jan-10 Jan-120%

5%

10%

15%

20%

25%

30%

35%

Por

tion

of

the

Glo

bal

Pop

ula

tion

http://www.internetworldstats.com/

Page 9: Web Search

64%5%

4%

5%

2%

8%

2%4%

5% 0%

33%

28%

9%

6%

5%

5%

4%

4%

4%2%

EnglishChineseSpanishJapanesePortuguese GermanArabicFrenchRussianKorean

Page 10: Web Search

Billions of Queries per Month

Google; 114.7

Baidu; 14.5

Bing+Yahoo; 13.1

Yandex; 4.8

Other; 28.7

December, 2012

Page 11: Web Search

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

20,000,000

Mar

-03

Apr-0

3

May

-03

Jun-

03

Jul-0

3

Aug-0

3

Sep-

03

Oct

-03

Nov-0

3

Dec-0

3

Jan-

04

Feb-

04

Mar

-04

Apr-0

4

May

-04

Jun-

04

Jul-0

4

Aug-0

4

Sep-

04

Oct

-04

Nov-0

4

Dec-0

4

Jan-

05

Feb-

05

Mar

-05

Apr-0

5

May

-05

Jun-

05

Jul-0

5

Aug-0

5

Sep-

05

Oct

-05

Doubling

Doubling

Doubling

18.9 Million Weblogs TrackedDoubling in size approx. every 5 monthsConsistent doubling over the last 36 months

An Adoption Curve: BlogsDoubling

Page 12: Web Search

November 2013http://www.searchenginejournal.com/growth-social-media-2-0-infographic/77055/

Page 13: Web Search

Agenda

• The Web

Crawling

• Web search