29
Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Search Engines in eCommerceWeb-Based

Information Architectures

MSEC 20-760Mini II

Jaime Carbonell

Page 2: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

General Topic: Applying IR to eCommerce

• High-level review of homework 1 and 2

• The search-engine business

• Getting search engines to work for you

• Some web-site design principles

• Other IR-related eCommerce business ideas

Page 3: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Building a Search Engine (1)

Assemble the Collection

• Acquire a document data base

• Or, spider the Web to collect the DB

• Or, spider a your own site/company

Page 4: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Building a Search Engine (2)Index the Collection (HW1)• Build a dictionary from collection C

Find all unique words & optionally stem themFilter out stop wordsOptionally generate phrases as wordsΣ is resulting word list

• For each wi in Σ

Calculate & store log2IDF for wi

Find all Dj where wi occurs

Store ID(Dj) and wi positions in Dj

Page 5: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Building a Search Engine (3)

Match Queries to Collection (HW2)

• Filter out query words not in Σ

• Compute ArgmaxkDj

in C[Sim(Q, Dj]

Use dot-product or cosine similarity

Use inverted index for computation

Page 6: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

The Search Engine Business (1)

Services Provided

• Locating (most) useful web pages

• Two-step process: "Query & Find"

Then click-through based on summary

Page 7: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

The Search Engine Business (2)

Revenue Model• Maximizing traffic => advertisements, etc.

Lycos, Google, AltaVista, Excite, Metacrawler...• Installing intranet searching for a fee or providing

search technology to others

Inktomi, Verity, Google, Condor...• Boosting glory/value of parent corporation

Infoseek => Disney

Page 8: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

The Search Engine Business (3)

Hybrid Models• Universal locators (people, locations, ...)

Metacrawler/GO2Net, Lycos...• Hierarchical Content-based Browser

Yahoo clear first, later Lycos & others...• Together with News, Stock-quotes, Chat-rooms, ....

Yahoo clear leader, now many others...

Page 9: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

New Technologies (1)

Better Search Technologies

• Metasearch (combine output of multiple engines)

e.g. Metacrawler, Vivisimo

• Marrying IR with hand-built taxonomies

e.g. Yahoo originally, later most others

Page 10: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

New Technologies (2)

Better Search Technologies• Ranking web sites by in-link density

e.g. Google,

Authorities = high in-link degree

Hubs = high out-link degree

Rank = Argmaxkdj in Drel

[Σilogi (inlinki(dj))ai ]

• Marrying IR with Translation

e.g. AltaVista/Babblefish, Google, …

Page 11: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

New Technologies (3)

Better Mousetraps in the Drawing Board• True Web-Based Translingual IR• High-powered, more accurate search for a fee

(MMR, probabilistic IR search, quality filters,...)• WebSearch + Summarization & Fusion• Multimedia search for a fee• Automatically-generate Yahoo-like hierarchies• Search part of the hidden-web (distributed IR)

Page 12: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

New Technologies (4)

Better Mousetraps in the Drawing Board• More comprehensive Web Crawlers

AltaVista indexes < 30% of web

Google indexes 2.0 Billion URLs < 50% of web

All others index much less...• Generate answers to questions (not just ‘hits’)

[AskJeeves.com does not work well]

FAQ’s, helpdesks, networking to humans, ...

Page 13: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching (1)

Objectives

• Want your eCommerce site found easily by all potential customers

• Want your site to rank above the competition in web searches

• Want customers to stay within your eCommerce web site, once they find it

Page 14: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching (2)

Content Strategy

1. Build your first-pass web site

2. Generate alphabetized union of terms in your web site and in those of the primary competition.

e.g. "...amazing" "antelope" "antiques" "auction" ... "catalog" "cars" ...

Page 15: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching (3)

Content Strategy

3. Filter out all terms not directly relevant to your

business. e.g. "auction" "antiques" "catalog"...

4. Expand the filtered list with synonyms or highly-related terms (dual of q-expansion)

e.g. "antique" => "antique, vintage, classic"

5. Where to put such terms? Edit your site to include the terms that fit naturally. For others…

Page 16: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching (4)

Content Strategy6. Include the rest of the terms "invisibly"

– Meta-tags for indexing– Minuscule font for word lists

(illegible text appears as background pattern)– Text color = background color– Minimize all extraneous text on portal page(s)

(e.g. move text to other linked pages).

Page 17: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part II (1)

Find Key Competition

1. Complete first-pass web site (last slides)

2. Register with all search engines

3. Contract 20-to-50 potential "clients"

Page 18: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part II (2)

Find Key Competition

4. Have clients generate multiple queries for your eProduct or eService without knowing what’s in your web site. Try these queries on multiple search engines (except Authority and Frequency-biased ones like Google)

5. Find web sites that consistently rank higher in search (if any) via one or more engines

Page 19: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part II (3)

Analyze Key Competition6. Find terms in competition web sites that match

spontaneous queries (looking carefully at meta-tags, invisible fonts, etc.)

7. Add such terms to your web pages invisibly8. Optionally remove more extraneous text from

portal page(s)9. Re-register with search engines, and iterate until

your web site is near the top for most of the reasonable queries in most of the engines.

Page 20: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part II (4)

OPTIMIZE Your Site for Search engines

10.Remove maximal amount of non-key-word text (e.g. put it in liked pages, or as .gif files). Recall the denominator in cosine-similarity function.

11.Subdivide general entry pages into topically-specific ones (increase info-density wrt query).

Page 21: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part III (1)

Connectivity Strategy• Make your term-laden pages attractive entry

portals• Link these search-engine entry pages strongly to

home/entry page(s) if these are different• Provide intra-site searching capability if your site

has > 30 pages, where only on-site or associated text and pages are searched.

Page 22: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

Optimizing WebSites for Searching Part III (2)

Connectivity Strategy

• Possibly hand off to general search engine upon failure of local search.

• Maximize the number in-page links to entry portals from anywhere and everywhere else (internal and external).

Page 23: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas (1)

eCLIP: Adaptable Electronic Clipping Service• Goal: Personalized eNewspaper

(weekly, daily, hourly)• User sets interest profile

YES: "finance>eCommerce>technology""science>astronomy"

NO: "sports" "politics>scandals"KEY-TERMS: "ecommerce" "search engine"

"IPO" "Hubble"

Page 24: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas (2)

• Multiple newsfeeds are categorized on entry ...and filtered by user profiles

• Maximally-relevant & novel news is includedNext most relevant or less novel is summarizedRest is ignored.

• User feedback automatically adjusts profile(e.g. thumbs-down on more Amazon.com news thumbs-up on Google, a new search engine)

• Revenue models: subscription, advertisement, ...

Page 25: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas Part II (1)

ePUB: Customized Publishing

• Goal: Offer customized books (texts, trade, etc.)

• Index all offerings by chapter & section

• Permit user to search & browse

(using MMR, summarization, etc.)

Page 26: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas Part II (2)

ePUB: Customized Publishing• Assemble for user a customized bundle

(e.g. Ch 3-7 of "Intro to IR" + Ch 5-6 of "Web IR" + Ch 2 of "Applied Linear Algebra")

• Print, bind and ship 50+ copies...or ship single copy electronically (e.g. via PDF)

Page 27: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas Part II (3)

eFACT: Universal Q/A Database• Goal: Answer any question over web• Create large FAQ incrementally, categorized by

subject areas• Have humans answer questions over web Pay for

answers with free subscription? $$?• If new question matches, give answer, else send to

humans and resort to metasearch for relevant web-pages (not an answer, but best one can do for now), and email answer later.

• Essentially do AskJeeves the right way

Page 28: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas Part III (1)

iSELL: Meta Auction eSite• Goal: The Metacrawler of Web Auction Sites• User describes product she wants to sell• iSELL finds best match to auction sites(s) that

sell(s) such products (similarity between description and auction offerings past and present)

• ...or auction site that gets best prices• iSELL’s metaform automatically connects and

lists product in one or several auction sites and de-lists when sold.

• iSELL gets a cut of the selling price + ad revenues

Page 29: Search Engines in eCommerce Web-Based Information Architectures MSEC 20-760 Mini II Jaime Carbonell

IR-Related eCommerce Business Ideas Part III (2)

WebRATE: Rating Service for eSites

• Goal: Nielsen’s or CU or USN&WR of the Web

• Find similarity to other sites, ...

• Sites pay to be rated by content, style, traffic, etc.