View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Search and Data Management
Rakesh AgrawalMSR Search Lab
Current Focus & Direction
• Understand the virtuous cycle between search and data and ways to accelerate it
• New search-centric applications– Personal data mining (Health)– Distributed Knowledge creation (Education)
Search & Data: Virtuous Cycle
Search
DataInsights
Queries, Clicks
Mining
Relevance
Web PagesFeedsBetter Search Results ►
More Data ►Greater Insights ►
Better Search Results
Intents
Behaviors
Connections
Popularity
Trends
Related Searches (aka Query Suggestions)
• Most popular queries containing the current query• Analysis of how users reformulated their queries
• Query click graph to find related queries
Football SoccerWildflower cafe Wildflower bakery
(whole query)(piecewise)
Result Diversification
• Ideas from portfolio theory to allocate space to different result types
• Marginal utility of adding a document decreases if the result set already contains high quality documents of the same type
• Query and document classification using merged click logs
Seeddocuments
ANIMALS documents
ANIMALS queries
Classification Using Click Graph
Algorithm: Random walk with absorbing states
118
125
133
141
149
157
164
171
100
120
140
160
180
1995 2000 2005 2010 2015 2020 2025 2030
Year
Num
ber
of P
eopl
e W
ith
Chr
onic
Con
ditio
ns (m
illio
ns)
Changing Nature of Disease
• New Challenge: chronic conditions: illnesses and impairments expected to last a year or more, limit what one can do and may require ongoing care.
• In 2005, 133 million Americans lived with a chronic condition (up from 118 million in 1995).
Infectious Diseases
Technology Trends
• Tremendous simplification in the technologies for capturing useful personal information
• Dramatic reduction in the cost and form factor for personal storage
• Cloud Computing
Personal Health Analytics
Personal Data Mining
Charts for appropriate demographics?
Optimum level for Asian Indians: 150 mg/dL(much lower than 200 mg/dL for Westerners)
Due to elevated levels of lipoprotein(a)*
Computation and selection across millions of data sources
Privacy and security
*Enas et al. Coronary Artery Disease In Asian Indians. Internet J. Cardiology. 2001.
Collaborative Knowledge Creation(Educational Material)
• More than 3.5 million articles in 75 languages
• Fashioned by more than 25,000 writers
• 1 million articles in English (80,000 in Encyclopedia Britannica)
• Inspired by Wikipedia• But multiple viewpoints
rather than one consensus version!
• How to personalize search to find the material suitable for one’s own style of teaching?
• Management of trust and authoritativeness?
Summary
• Web search is a “data management and creating value from data” problem
• New search-centric applications can provide rich fodder for future database research.