View
62
Download
0
Category
Tags:
Preview:
DESCRIPTION
ICDE2009 Keynotes Summary. Shanghai, China, 3.29-4.2 Li Yukun. Outline. Keynotes Search Computing( Stefano Ceri ) Data Management in the Cloud( Raghu Ramakrishnan) Why Can't I Find My Data the Way I Find My Dinner? David Carlson. Keynote 1. Search Computing Stefano Ceri - PowerPoint PPT Presentation
Citation preview
Outline
Keynotes Search Computing(Stefano Ceri) Data Management in the Cloud(Raghu Ramakrishn
an) Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 1
Search ComputingStefano Ceri
Dipartimento di Elettronica e Informazione, Politecnico di Milano
Piazza L. Da Vinci 32, 20133 Milano, Italy
Stefano.Ceri@polimi.it
Motivation
“Who are the strongest European competitors on software ideas?
Who is the best doctor to cure insomnia in a nearby hospital?
Where can I attend an interesting conference in my field close to a sunny beach?”
This information is available on the Web, but no software system can accept such queries nor compute the answer.
Core model for search computing
Conventional services Are abstracted as systems producing sets of equal-weight answers;
Service computing A cross-discipline that covers the science and technology of
bridging the gap between Business Services and IT Services. The goal of Services Computing is to enable IT services and
computing technology to perform business services more efficiently and effectively.
Search services Can be abstracted as systems producing ranked lists of answers.
Search computing It is a new paradigm where ranking is the dominant factor for composing
services. Multi-domain query, constellation of cooperating search services,
possibly dynamically selected,
CHAPTERS OF SEARCH COMPUTING
Theory for search computing Select the best abstractions covering the concepts Design basic operations on services and algorithms Compute time and space complexity
Statistical models for search services Build statistical estimators of the number and quality of the results
Optimization methods for search computing Description abstractions for search services
Expose ranking-specific properties of search services
Language abstractions for search computing by incorporating the ranking aspects and strategies for dealing with rankings
CHAPTERS OF SEARCH COMPUTING
Human-computer interfaces Expressing ranking preferences. Light-weight user interaction
Semantics Merging the results of heterogeneous search services semantic “join” of search services.
Higher-order ranking “ranking of rankings”, is essential for selecting and prioritizing
search services. A multi-level one,
Managing individual and social searching search strategies to user profiling or to past user interactions Societal recommendation and evaluation
Thus, individual and societal aspects are key ingredients for search computing
CHAPTERS OF SEARCH COMPUTING
Search computing engineering designing, assembling and deploying search computing software
applications. Economy of search computing
Suitable business models, based upon advertising schemes, pay-per-query, subscription fees, micro-billing, and so on.
Security and privacy of search computing control of how data is used. For instance, use of a search service could be granted to a servi
ce computing application, provided that the service’s owners can trace all queries involving their data and limit the kind of information that is made visible to the queries.
PROJECT ORGANIZATION
Funded by the European Research Council in the framework of the IDEAS Advanced Grants;
It started on Nov. 1, 2008 and will last five years.
PROJECT ORGANIZATION
The project involves about 30 researchers at Politecnico
Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus, Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli, Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto Verganti, Marco Tagliasacchi, Massimo Tisi.
SeCo has an advisory board Edoardo Amaldi (Operations Research), Fabio Casati (Service Computing), Georg Gottlob (Theory), Ioana Manolescu (Systems and Performance), Roberto Verganti (Business Models), Gerhard Weikum (Information Retrieval for the Web), Jennifer Widom (Languages and Paradigms)
seven teams
Concept teamTheory and methodsService registration and managementQuery processingInteraction designTools and prototypesBusiness models and technology watch
More information on SeCo is available on the project’s Web site: http://home.dei.polimi.it/ceri/seco/index.html
Outline
Keynotes Search Computing
Stefano Ceri Data Management in the Cloud
Raghu Ramakrishnan Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 2: Data Management in the Cloud
Yahoo! Research
Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Nick Puz Rodrigo Fonseca
CCDI
Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams
Living in the Clouds
We want to start a new website, FredsList.com
Our site will provide listings of items for sale, jobs, etc.
As time goes on, we’ll add more features illustrate how more cloud capabilities are used
as needed List of capabilities/components is illustrative, n
ot exhaustive
Step 1: Listings
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList.com application FredsList.com application
1234323, transportation, For sale: one bicycle, barely used
FredsList wants to store listings as (key, category, description)
5523442, childcare, Nanny available in San Jose
215534, wanted, Looking for issue 1 of Superman comic book
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )
Step 2: Search
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
“bicycle”
FredsList’s customers quickly ask for keyword search
Search
Vespa
“dvd’s” “nanny”
MessagingYMB
FredsList.com application FredsList.com application
ALTER ListingsSET Description SEARCHABLE
ALTER ListingsSET Description SEARCHABLE
Step 3: Photos
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList decides to add photos to listings
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsADD Photo BLOB
ALTER ListingsADD Photo BLOB
Step 4: Data Analysis
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE ANALYZABLE
ALTER ListingsMAKE ANALYZABLE
Compute
Grid
Batch export
Pig query to analyze categories
Hadoop program to geocode data
Hadoop program to generate fancy pages for listings
Step 5: Performance
Simple Web Service API’s Simple Web Service API’s
Database
Sherpa
FredsList wants to reduce its data access latency
Search
Vespa
MessagingYMB
Storage
MObStorForeign key
photo → listing
FredsList.com application FredsList.com application
ALTER ListingsMAKE CACHEABLE
ALTER ListingsMAKE CACHEABLE
Compute
Grid
Batch export
Caching
memcached
Requirements for Cloud Services
Multitenant A cloud service must support multiple, organizationally distant customers.
Elasticity Tenants should be able to negotiate and receive resources/QoS on-demand.
Resource Sharing Ideally, spare cloud resources should be transparently applied when a tenant’s nego
tiated QoS is insufficient. Horizontal scaling
It should be possible to add cloud capacity in small increments; this should be transparent to the tenants
Metering A cloud service must support accounting that reasonably ascribes operational and c
apital expenditures to each of the tenants of the service. Security
A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.
Availability A cloud service should be highly available.
Operability A cloud service should be easy to operate
Types of Cloud Services
Two kinds of cloud services: Horizontal Cloud Services
Functionality enabling tenants to build applications or new services on top of the cloud
Functional Cloud Services Functionality that is useful in and of itself to tenants. E.g.,
various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping
Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)
The Sherpa Solution
The next generation global-scale record store
Record-orientation: Routing, data storage optimized for low-latency record access
Scale out: Add machines to scale throughput (while keeping latency low)
Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay
Consistency model: Reduce complexity of asynchrony for the application programmer
Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity
26
Storage unit 1 Storage unit 2 Storage unit 3
Range Queries in YDOT
Clustered, ordered retrieval of records
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
Router
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrange
StrawberryTomatoWatermelon
Grapefruit…Pear?Grapefruit…Lime?
Lime…Pear?
Storage unit 1Canteloupe
Storage unit 3Lime
Storage unit 2Strawberry
Storage unit 1
Updates
1
Write key k
2Write key k7 Sequence # for key k
8 Sequence # for key k
SU SU SU
3Write key k
4
5SUCCESS
6Write key k
RoutersMessage brokers
31
Goal: make it easier for applications to reason about updates and cope with asynchrony
What happens to a record with primary key “Brian”?
Consistency Model
34
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current version
Stale versionStale version
Read
Consistency Model
35
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read up-to-date
Current version
Stale versionStale version
Consistency Model
36
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current version
Stale versionStale version
Consistency Model
37
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current version
Stale versionStale version
Consistency Model
38
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Consistency Model
39
Index Maintenance
How to have lots of interesting indexes, without killing performance?
Solution: Asynchrony! Indexes updated asynchronously when base
table updated
Planned functionalityPlanned functionality
43
MObStor
Yahoo!’s next-generation globally replicated, virtualized media object storage service
Better provisioning, easy migration, replication, better BCP, and performance
New features (Evergreen URLs, CDN integration, REST API, …)
The object metadata problem is addressed using Sherpa, though MObStor is focused on blob storage.
The World Has Changed
Web applications need Scalability! Geographic distribution High availability Reliable storage
Web applications be unfit for Complicated queries Strong transactions
Web Data Management
Large data analysis(Hadoop)
Structured record storage
(PNUTS)
Blob storage(SAN/NAS)
• Scan oriented workloads
• Focus on sequential disk I/O
• $ per cpu cycle
• CRUD • Point lookups
and short scans
• Index organized table and random I/Os
• $ per latency
• Object retrieval and streaming
• Scalable file storage
• $ per GB
Application Design Space
Records Files
Get a few things
Scan everything
Sherpa MObStor
Everest Hadoop
YMDBMySQL
Filer
Oracle
BigTable
47
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni
Outline
Keynotes Search Computing(Stefano Ceri) Data Management in the Cloud(Raghu Ramakrishn
an) Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 3
Why Can’t I Find My Data the
Way I Find My Dinner?
David Carlson Director International Polar Year International Programme Office Cambridge, UK ipy.djc@gmail.com
International Polar Year(IPY)
One can find almost every discipline represented in the IPY projects, and funding has come from geophysical, biological and social agencies and programs.
IPY data
open access data policy display and access of IPY data We have component systems, within nations, dis
ciplines, or existingdata service centers, that provide access examples for portions of the IPY data set.
We have unprecedented bandwidth for real-time data transmission
But , How to access these data set easily!!!
Example
To understand and predict the health of migratory bird populations in the polar environment, Need ornithological, toxicological, ecological, met
eorological, hydrological, climatological, geomagnetic, and sociological data.
These data will cover a broad range of space and times scales, often in disparate (or at least inconsistent) space and time coordinate system
Problems
Data access For a larger population of curious users, the specialized
data services associated with subsets of the IPY data will not provide easy, friendly, or even accessible
Interfaces No familiar interfaces will provide integrated discovery
and browse services. No long-term plan
On longer time scales, and even as data storage capabilities grow rapidly, most of the IPY data sets donot, at present, have acceptable long-term archive plans, even for passive storage without continued discovery services.
Research issues
smart search engines pattern recognition data mining tools multi-gigabyte personal storage devices Advanced animation capabilities coupled with almost unlimited mobile bandwidth offer many citizens expansive and amazing access to commercial, r
ecreational, financial, and personal data and data services.
What changes in strategy, technology, funding and individual and collective behavior need to occur in the world of scientific data to allow me to browse, view and access IPY data on my iTouch?
Recommended