Cooperative Database Caching within Cloud Environments

Preview:

DESCRIPTION

 

Citation preview

© 2012 UZH, CSG@IFI

Cooperative Database Caching within Cloud Environments

Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2, Burkhard Stiller1

1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich UZH, Switzerland

2Blaise Pascal University - LIMOS, Francevancea,stiller@ifi.uzh.ch, laurent.dorazio@isima.fr

AIMS, Luxembourg, Luxembourg, June 6, 2012

© 2012 UZH, CSG@IFI

Background

Databases – Client: asks a query (SQL)– Server: returns the result (tuples)

Client-side caching– Page Caching, Tuple Caching – Semantic Caching

• Clients store the results of old queries

• Old results used for answering new queries

© 2012 UZH, CSG@IFI

Background - Semantic Caching

QUERYREWRITING

Query

Probe Remainder

Semanticcache

Server

Queriesdescriptions

Semantic Regions– Query description– Result set

Query rewriting– Probe– Remainder

© 2012 UZH, CSG@IFI

Database Caching & Cloud Computing

Most cloud providers charge data transfer between cloud environment and “outside world” in a pay-as-you-go matter

Database caching within cloud environment– Improves performance– Economic benefits

• Amount of data transferred decreases

Payments for data transferred reduced

© 2012 UZH, CSG@IFI

Approach

© 2012 UZH, CSG@IFI

Cooperative Semantic Caching

Share local semantic caches between clients

Use cache entries of other clients

Performance improvements

Sem

antic

Ca

che

Sem

antic

Ca

che

Sem

antic

Ca

che

© 2012 UZH, CSG@IFI

Cooperative Semantic Caching

Q1 : select * from persons where age > 10

Q3 : select * from persons where age > 7

result

select * from persons where age > 7 and age <= 10

R1 : age > 10

result

resultselect * from R1

© 2012 UZH, CSG@IFI

Potential Use Cases

GIS (Geographic Information System) storage– Large amount of data (e.g. seismic events)– Processing done on client side – Two-dimensional range selections (area)

NetFlow-based architectures– Routers collect flow records and store them in databases– Analyzers (intrusion detection, accounting,… ) access them– Range selections (Start Time, IP)

© 2012 UZH, CSG@IFI

Query Rewriting

Query rewriting– Probe– Remote probes– Remainder QUERY

REWRITING

Query

Probe Remainder

LocalSemantic

cache

Server

All queriesdescriptions

Remote probe

RemoteSemantic

cache

Remote probe

RemoteSemantic

cache

. . .

© 2012 UZH, CSG@IFI

System Design

© 2012 UZH, CSG@IFI

CoopSC

CoopCooperative SSemantic CCaching Query types

– Selection (n-Dimensional range predicates)– select id, name, age from persons where 20 < age and

age < 30 Cache organization

– Semantic regions– Distributed Index – built on top of a P2P overlay

© 2012 UZH, CSG@IFI

CoopSC - Query Rewriting

Local Rewriting– Probe

– Local Remainder

• Portion of the query which is

not available in the local cache

Distributed Rewriting– Remote Probes

– Remainder

Query

Local Cache

RemoteProbe

RemoteProbe

Remainder

Probe

Local Rewriting

Local Remainder

Distributed RewritingDistributed

Index

© 2012 UZH, CSG@IFI

Distributed Index

Built on top of P2P overlay Regions and queries represented as

rectangular shapes MX-CIF Quad Tree

– Efficiently find intersection between rectangular shapes

Each region is indexed in the smallest quad which totally contains it

Easy to adapt to n-Dimensional regions/queries

© 2012 UZH, CSG@IFI

Update Handling

Issues– Invalidation of old entries– Combining different snapshots can generate inconsistencies

Quad space division (specified update level) Virtual timestamps stored in database Each modification increments the virtual timestamp of

corresponding quad Regions store virtual timestamps of quads that they

intersect

© 2012 UZH, CSG@IFI

Cloud Computing Scenarios

© 2012 UZH, CSG@IFI

Cloud Scenario A

Database server running outside the cloud

Clients located inside in the cloud

Non-operational use cases– Example: cloud environment

used for running scientific experiments

© 2012 UZH, CSG@IFI

Cloud Scenario B

Database server running inside the cloud

Clients located inside in the cloud

Operational use cases– Example: corporation

using cloud environment as an alternative to building a datacenter

© 2012 UZH, CSG@IFI

Evaluation

© 2012 UZH, CSG@IFI

Experiment Design

Measurements– Response time– Amount of data transferred– Payments for data transfer

Experiments – Cache size– Update level

Testing sessions– 5 select testing sessions (50 queries each)– Update sessions interleaved

© 2012 UZH, CSG@IFI

Evaluation

Wisconsin benchmark dataset (10.000.000 tuples) Scenario A

– Database Server: Zurich testbed– 5 Client: Rackspace

Scenario B– Database server

• Amazon EC2

– 5 Clients: EmanicsLab Queries

– About 10.000 tuples– Semantic locality

© 2012 UZH, CSG@IFI

Scenario A

© 2012 UZH, CSG@IFI

Data transferred/Payments

CoopSC significantly reduces the number of tuples sent by database server

Amount of money also reduced

© 2012 UZH, CSG@IFI

Response Time

Rackspace behaves unstable

No performance improvements noticed

© 2012 UZH, CSG@IFI

Scenario B

© 2012 UZH, CSG@IFI

Data transferred/Payments

CoopSC significantly reduces the number of tuples sent by database server

Bandwidth payments also reduced

© 2012 UZH, CSG@IFI

Response Time

CoopSC improves response time

© 2012 UZH, CSG@IFI

Data transferred/Payments (Updates)

Good behavior for low update rate

Economic and performance benefits

© 2012 UZH, CSG@IFI

Response Times (Updates)

Response increases with the grow of update rate

© 2012 UZH, CSG@IFI

Summary & Conclusion

Summary– Cooperative caching approach used for reducing the load of

the database server

– Update statements supported

– CoopSC applied in the context of cloud environments CoopSC reduces the amount of data transferred

between cloud and outside world which has economic benefits

Performance benefits as long as cloud providers are stable

© 2012 UZH, CSG@IFI

Questions?

© 2012 UZH, CSG@IFI

Update Handling - Algorithm

procedure Execute(query)quads = query.getIntersecteQuad(updateLevel);

before = database.getTimestamps(quads);

plan = rewrite(query, before);result = plan.execute();

after = database.getTimestamps(quads);

if (before == after) return result;

elseresult database.execute(query);

Recommended