17
Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent William I. Chang Chief Scientist Baidu.com wchang @ baidu.com

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent

Embed Size (px)

DESCRIPTION

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent. William I. Chang Chief Scientist Baidu.com wchang @ baidu.com. History. Outline. Synopsis of Web search evolution Themes and principles Challenges and opportunities - PowerPoint PPT Presentation

Citation preview

Page 1: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent

William I. Chang

Chief ScientistBaidu.com

wchang @ baidu.com

Page 2: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Synopsis of Web search evolution

• Themes and principles

• Challenges and opportunities

• Possible steps toward the next generation of search

Outline

Page 3: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

Three Laws of Search (Infoseek ~1996-8)1. Phrase are more basic than single words2. Confidence in the source is more important than the content 3. “Facet” is more powerful than the relational model

Fundamental Theorem of Search: the search space can be factored, where each dimension is a taxonomy

• NLP– Billions of terms, proper names

– Lexical analysis and special cases: capitalization, contraction, acronyms, possessives, etc.

– Word stemming, phrase stemming

– Phrase extraction and query-rewriting, e.g. home run record

• Leveraging user input and community recommendation– Query suggestion by log-mining– Selection and ranking using link analysis and anchortext indexing

• Birth of Adversarial Information Retrieval (anti-spam)

First Generation: ~1996-2000

Page 4: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

“The Internet is a place where one can always find someone to help answer any question or get anything done.”

• Web Oracle model (proposed in 1998)– Online communities: BBS, Mailing list, eGroup, Usenet News…– FAQ documents on the Internet– FAQ Finder & Builder as a community killer-app– Intelligent search

• User-generated content: blogs, MySpace, YouTube…• Tagging• Communities around knowledge: Wikipedia, Baidupedia…• Question-answering communities:

– Navers, Yahoo! Answers, Sina iAsk, Baidu iKnow…

• People search: LinkedIn, Facebook…

Second Generation: ~2001-present

Page 5: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

“The Internet is a matching network.”

• Personalized search results– Based on locale, personal profile, search and browse history– Personal ranking function, source selection, keyword filtering– Personal search agent: spider, summarizer, Q&A agent

• Integration of search and recommendation (pull and push)– Subscription through automatic personalization

• Content, media, events, products and services…

– Matching things with people, etc.– Shopping assistant, information integrator

• Predictive recommendation with feedback– Always-on and environment-aware– Do you like this? Make it more (or less) custom, please.– Is your taste like mine? How to evaluate the evaluator?

Third Generation

Page 6: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• First Generation– Searching for information or content using NLP techniques– Based on community recommendation of content or keywords– Little or no personalization

• Second Generation– Aimed at resolving problems or finding people, entertainment– Centered around community-created content– Group customization

• Third Generation– More integrated into people’s daily lives and needs– Predictive, locale and environment-aware– True personalization

Summary

Page 7: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Phrases are the conceptual units– Accurate name extraction and matching– Query rewriting & suggestion, “no quotes”, typos are OK– Understanding user needs, semantic match, machine translation

• Confidence in the source– Leverage community recommendation to filter content– Tagging, blogging, SMS forwarding– Community-created content are more interesting

• People helping people– Answer any question or get anything done

• Internet is more and more part of people’s everyday lives– Ubiquitous, always on, environment aware– Universal messaging/delivery of content, better integration– On-the-spot advice e.g. personalized shopping

Principles

Page 8: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Search ranking function is an incredibly complex, possibly non-decomposable multi-objective optimization problem:

– Recall-precision tradeoff– Weighting of multiple terms– Textual quality and specificity, information richness– Unique and original content– Popularity vs authority– Freshness, timeliness– User needs, domain specific query

• Search engine is a database of massive size that needs to be continually refreshed and near-real-time updated, with high QofS requirements (response time, uptime) and throughput/efficiency requirements (search itself is free).

• Search has to be built around user behavior analysis of massive scale, in order to respond to constantly changing WWW environment. This has to be automated, self-adaptive, and near-real-time.

Challenges for Search

Page 9: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• (China) Each year, data size doubles and user-base doubles (2x2=4), placing financial strains on service providers. Data centers and electricity are scarce resources.

• Many distributed systems in operation, but they need to be flexible and reconfigurable, without sacrificing efficiency (much).

• How to beneficially direct traffic between search and other services? What types of advertising will users accept? How to be context sensitive and user-sensitive?

Challenges for a Search Service

Page 10: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• WWW as social network has become balkanized. We need new “people” search engines that let people find and help other people, yet protect privacy and reputation.

• (China) The emergence of nascent commerce infrastructure poses huge challenges. Commerce platforms need to support safe transactions, advertising and brand marketing, and need to seamlessly integrate online and offline services.

• Education, government, media, and Internet as agents for social engineering?

Challenges for Society and the Internet

Page 11: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Transparency of advertising effectiveness vs secrecy of matching algorithm

• Ad targeting and audience segmentation

• Convergence of different forms of online advertising: search, display, contextual, behavioral

• Convergence of online and traditional advertising: brand marketing, local advertising (classifieds, yellowpages), direct marketing

• Integration of online and offline services

• Ubiquitous, mobile applications

Toward Next Generation Search: Business

Page 12: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Ease of use: AND vs OR, “soft” AND, synonyms and concept search

• Query term suggestion (does it hurt?)

• Community Q&A: mining FAQs, routing to experts, “Wiki-Answers”

• Factoid extraction

• Open platform to accommodate topic/user/task-specific search engines

Toward Next Generation Search: Product

Page 13: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Relevance vs user satisfaction

• Session behavior and modeling: term additions

• Result diversity; avoidance of “abandonment”

• How to evaluate the efficiency of incremental information discovery by a search engine

• TF*IDF revisited

Toward Next Generation Search: Science

Page 14: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• FLASH memory SSD: fast read, slow write

• Data analysis platform

• Development platform

• Internal- and external use of P2P technologies

• Search engine as a platform

Toward Next Generation Search: Infrastructure

Page 15: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

• Recruitment

• Talent development

Toward Next Generation Search: Talent

Page 16: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

History

Ever-increasingly leverage user and community collective intelligence, in a manner that is self-adaptive, scalable, and (near) real-time, in order to support ubiquitous, integrated online and offline services.

In Conclusion

Page 17: Toward Next Generation Search:  Business, Product, Science, Infrastructure, and Talent

Thank you

wchang @ baidu.com