32
Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Seaweed: Scalable Delay Aware Querying

Embed Size (px)

DESCRIPTION

Seaweed: Scalable Delay Aware Querying. Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge. Motivation. Large, highly distributed data sets Data stored on endsystems Endsystems often unavailable Centralization, replication do not scale - PowerPoint PPT Presentation

Citation preview

Page 1: Seaweed:  Scalable Delay Aware Querying

Seaweed: Scalable Delay Aware Querying

Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron

Microsoft Research, Cambridge

Page 2: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 2

Motivation•Large, highly distributed data

sets•Data stored on endsystems•Endsystems often unavailable•Centralization, replication do not

scale•Must query data in-situ•How can we deal with

unavailability?

Page 3: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 3

Delay aware querying• In-situ

•Push queries to endsystems

• Incremental results•As endsystems become available

•Progress estimation•Current and future completeness

•Scalability•Fault-tolerance

Page 4: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 4

Applications•Admin, diagnostics, resource

mgmt•Select-Project-Aggregate queries•Small results•Low to moderate query rates

•Different network scales•Data center (10,000+)•Enterprise (100,000+)• Internet (1,000,000+)

Page 5: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 5

Enterprise network management

•Endsystem-based monitoring•Endsystems log their own traffic•Flow and PacketHeader tables

•Queries by admins/operators• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80

•Flow is horizontally partitioned

•300,000 hosts, 1 month•765 TB total size•2.4 Gbps update rate

Page 6: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 6

Roadmap•Motivation•Design

•Overview•Delay awareness•Distributed query protocols

•Evaluation•Conclusion

Page 7: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 7

Seaweed overview• In-situ querying

• One-shot queries

• Incremental results• Progress estimation

• Meta-data replication

• Exactly-once semantics• Scalable, failure-resilient

protocols• Built on P2P overlay

Page 8: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 8

Why delay awareness?•Endsystem unavailability

Page 9: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 9

What is delay awareness?•User receives partial results•Needs progress indicator

•How much data is out there?•How much have I seen?•How long before I get to 99%?

•Delay/completeness tradeoff•Predicted by Seaweed

Page 10: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 10

Completeness•% of relevant data rows seen so

far•Relevant matches query

predicates•Query-specific

•Completeness predictor:•Currently available rows•Total rows•Expected rows/time

Page 11: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 11

Completeness predictor

Page 12: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 12

Completeness prediction•Relevant rows

•Column histograms•Standard row-count estimation•Replication remote estimation

•Uptime•Availability models

•Replicated meta-data•Highly available•Orders of magnitude smaller than

data

Page 13: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 13

Predictor generation• Meta-data replicated periodically• Query sent to all endsystems

•Application-level multicast tree•Retransmit on failure•Aggregate predictors in-tree

• Exactly-once semantics•Available local histogram, time=0•Unavailable replica histogram,

avail.

Page 14: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 14

0

2

4

6

8

10

12

14

16

18

20

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

76

77

78

79

80

81

82

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

0

2

4

6

8

10

12

14

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

0

1

2

3

4

5

6

7

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

76

77

78

79

80

81

82

1 10 100 1000 10000Time (hours)

Ro

ws

(mill

ion

s)

Predictor generation

`` `

A B C D

0

10 20 40 5030

10

20

Thickness

Frequency

σ1B:

` `

`

A+B

A+B C+D

C D

80

85

90

95

100

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

A+B+C+D

A`

0

10 20 40 5030

10

20

Thickness

Frequency

σ1

B C D

Page 15: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 15

Query execution•Persistent query state

•New endsystems get active query list

• Incremental convergecast of results•Deterministic child parent mapping•Each vertex is replicated set•Parent remembers child result versions

•Exactly-once semantics• In-network aggregation

Page 16: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 16

Roadmap•Motivation•Design•Evaluation•Conclusion

Page 17: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 17

Evaluation• Packet-level simulation• Farsite availability traces

•51663 hosts, ~4 weeks•Flow tables from packet traces

•456 hosts, ~4 weeks•Assigned randomly to simulation

hosts

• Two queries• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80• SELECT COUNT(*) FROM Flow WHERE Bytes > 20000

Page 18: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 18

Predictor accuracy

Page 19: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 19

Prediction accuracy (2)

Page 20: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 20

Overheads

0.0001

0.001

0.01

0.1

1

10

100

1000

0 200 400 600 800 1000

Time (hours)

Tx b

andw

idth

(b

ytes

/s/e

ndsy

stem

)

Seaweed maintenance O(1)MSPastry O(log N)Seaweed query O(log N)

Page 21: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 21

Scalability

Page 22: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 22

Roadmap•Motivation•Design•Evaluation•Conclusion

Page 23: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 23

Related work•P2P querying

•PIER, Mercury, …•Move data across network

•Continuous/streaming queries•Astrolabe, SDIMS, Borealis, …• Ignore availability

Page 24: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 24

Future work•Selective centralization

•“Distributed materialized views”•Need bandwidth/availability

estimation•Large views can melt network

•Beyond histograms•Wavelets approximate results?

•Real-life experience, measurements•Deployment within Microsoft

Page 25: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 25

Conclusion•Querying highly distributed data

•Challenges are unavailability, scale

•Delay awareness•Predict delay/availability tradeoff•Exactly-once semantics

•Seaweed:scalable delay aware querying

•Meta-data replication•Fault-tolerant protocols

Page 26: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 26

Questions?

Page 27: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 27

Consistency (membership)• “Exactly-once” semantics

•No double-counting•Every endsystem’s results counted

•If available at any point in query lifetime

•“Precise single-site validity”

• Estimate always generated•For all endsystems, available or not•Endsystem computes own estimate

•If available through estimation phase

Page 28: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 28

Consistency (time)

•Avoid tight synchronization•Clock-skewed snapshots

•Loosely synchronized clocks•With good NTP, milliseconds

•Currently left to application layer•Timestamped, append-only tuples

•Explicit predicates on timestamp

Page 29: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 29

Result aggregation

• Deterministic mapping to parent

• Each parent is replicated set

• Parents remember child results

R1+R2+R3

R3’

`

` `

` `

` ` `

R1 R2

R1,R2 R1,R2

R1+R2 R3

R1+R2,R3 R1+R2,R3R1+R2,R3’ R1+R2,R3’

R1+R2+R3’

Page 30: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 30

Query dissemination in Pastry

836

000FFF hash(query)

0FAE??DA0

3??

37B

???

8??

E9A

Page 31: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 31

Replication in Pastry

8F690E

910

8E2

000FFF

Topology-independentnode identifiers

Each node maintainsa virtual neighbor set (vset)

8F0

Page 32: Seaweed:  Scalable Delay Aware Querying

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 32

Result routing in Pastry

836

0FA = hash(query)

0360F6