42
Monitoring Large, Monitoring Large, Distributed, Dynamic Distributed, Dynamic Systems Systems 1 Systor Systor 11 11, IBM Haifa IBM Haifa Izchak Izchak Sharfman Sharfman, Guy , Guy Sagy Sagy, , Avishay Avishay Livne Livne, Amir , Amir Abboud Abboud, , Daniel Keren, Daniel Keren, Assaf Assaf Schuster Schuster

Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Monitoring Large, Monitoring Large,

Distributed, Dynamic Distributed, Dynamic

SystemsSystems

SystorSystor ‘‘1111,, IBM HaifaIBM Haifa

1

SystorSystor ‘‘1111,, IBM HaifaIBM HaifaIzchakIzchak SharfmanSharfman, Guy , Guy SagySagy, , AvishayAvishay LivneLivne, Amir , Amir AbboudAbboud, , Daniel Keren, Daniel Keren, AssafAssaf SchusterSchuster

Page 2: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� A Geometric Approach to Monitoring Distributed A Geometric Approach to Monitoring Distributed Data StreamsData Streams. . SIGMOD’SIGMOD’06 06 (Honorable Mention for (Honorable Mention for Best Paper Award), ACM TODS’Best Paper Award), ACM TODS’0707

�� Aggregate Threshold Queries in Sensor Networks. Aggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0707

�� Shape Sensitive Geometrical Monitoring.Shape Sensitive Geometrical Monitoring. PODS’PODS’0808, , IEEE TKDE IEEE TKDE 20112011

�� TopTop--k k VectorialVectorial Aggregation Queries in a Distributed Aggregation Queries in a Distributed EnvironmentEnvironment. . JPDC JPDC 20102010

2

EnvironmentEnvironment. . JPDC JPDC 20102010�� Distributed Threshold Querying of General Functions Distributed Threshold Querying of General Functions

by a Difference of Monotonic Representationby a Difference of Monotonic Representation. . PVLDB’PVLDB’11 11

�� Optimal Local Constraints for Distributed Stream Optimal Local Constraints for Distributed Stream Monitoring, Monitoring, submitted.submitted.

�� EUEU FPFP77 ProjectProject “LIFT”“LIFT” ((USING LOCAL INFERENCE USING LOCAL INFERENCE IN MASSIVELY DISTRIBUTED SYSTEMSIN MASSIVELY DISTRIBUTED SYSTEMS))

�� EUEU FPFP77 Project “DATA SIM” Project “DATA SIM” ((DATA SCIENCE FOR DATA SCIENCE FOR SIMULATING THE ERA OF ELECTRIC VEHICLESSIMULATING THE ERA OF ELECTRIC VEHICLES))

Page 3: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Web Page Frequency CountsWeb Page Frequency Counts

�� Mirrored web siteMirrored web site

�� Mirrors record the frequency of requests for pagesMirrors record the frequency of requests for pages�� Detect when the global frequency of requests for a page Detect when the global frequency of requests for a page

exceeds a predetermined thresholdexceeds a predetermined threshold

3

Req #1

Req #2

Req #3

Page 4: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Running example: monitoring cloud Running example: monitoring cloud

healthhealth

�� Small computer cloud (two machines)Small computer cloud (two machines)

�� Want to submit an alert when Want to submit an alert when averageaverage load is ≥ load is ≥ 8080%.%.

�� But But –– don’t want to keep reporting individual don’t want to keep reporting individual

4

�� But But –– don’t want to keep reporting individual don’t want to keep reporting individual loads.loads.

�� Trivial solution:Trivial solution: every machine keeps silent as every machine keeps silent as long as its load is < long as its load is < 8080%.%.

�� Better Better –– set different thresholds (robust set different thresholds (robust machine reports when load machine reports when load ≥ ≥ 9090%,%, weak machine weak machine when load ≥ when load ≥ 7070%).%).

Page 5: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

We’re scraping the surface of the We’re scraping the surface of the

Distributed Monitoring ProblemDistributed Monitoring Problem

�� Slightly modifying the problem makes it far harder…Slightly modifying the problem makes it far harder…

�� Want to monitor Want to monitor uniformityuniformity

load) average( ,, =LLLL�� Three machines, loads are Three machines, loads are

�� Want to submit an alert when Want to submit an alert when

load) average( ,, 321 =LLLL

( ) ( ) ( ) TLLLLLL ≥−+−+−2

3

2

2

2

1

Page 6: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� Local conditions Local conditions for submitting an alert are harder to for submitting an alert are harder to define!define!

�� Loads at all machines may be increasing but in a Loads at all machines may be increasing but in a uniform fashion, hence no alert should be sent, etc.uniform fashion, hence no alert should be sent, etc.

�� A huge range of problems…A huge range of problems…

6

�� Simplest Simplest –– average (linear function) of two scalar average (linear function) of two scalar parameters.parameters.

�� Most general and difficult Most general and difficult –– many nodes, each holds a many nodes, each holds a (dynamic) vector of parameters, need to monitor a (dynamic) vector of parameters, need to monitor a general function of them all. The value of this general function of them all. The value of this function may indicate a problem, an abnormality, or a function may indicate a problem, an abnormality, or a phase change.phase change.

Page 7: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

The need has been there for a while!The need has been there for a while!

�� “More than Sum: Complex Aggregates. Network aggregation “More than Sum: Complex Aggregates. Network aggregation

infrastructures can support a large class of functions for infrastructures can support a large class of functions for

merging data... Up to now we have focused on SUM... merging data... Up to now we have focused on SUM...

Standard database languages offer other aggregates including Standard database languages offer other aggregates including

AVERAGE, STDEV, MAX and MIN. Given a constraint on AVERAGE, STDEV, MAX and MIN. Given a constraint on AVERAGE, STDEV, MAX and MIN. Given a constraint on AVERAGE, STDEV, MAX and MIN. Given a constraint on

one of these global aggregates one of these global aggregates (e.g. ‘ensure that the STDEV (e.g. ‘ensure that the STDEV

of latency is < of latency is < 1 1 second’)second’), it is not immediately clear what , it is not immediately clear what

local event local event should trigger global constraint checks.”should trigger global constraint checks.”

–– From “A Wakeup Call for Internet Monitoring Systems: The From “A Wakeup Call for Internet Monitoring Systems: The

Case for Distributed Triggers”, Case for Distributed Triggers”, Ankur Jain, Joseph M. Ankur Jain, Joseph M.

Hellerstein, Sylvia Ratnasamy, David Wetherall, Hellerstein, Sylvia Ratnasamy, David Wetherall, HOTNETSHOTNETS0404..

7

Page 8: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� No general solution yet.No general solution yet.

�� Required: a paradigm for compiling Required: a paradigm for compiling locallocal

conditions at the nodes, such that:conditions at the nodes, such that:

–– Every Every globalglobal event is captured (i.e. it results event is captured (i.e. it results

in the violation of at least one local in the violation of at least one local

condition).condition).condition).condition).

–– Communication is minimal.Communication is minimal.

�� Why? Reduce communication, avoid false Why? Reduce communication, avoid false

alerts, maintain privacy… alerts, maintain privacy…

8

Page 9: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Problem Definition Problem Definition –– StreamsStreams

�� A set of data sourcesA set of data sources

�� DistributedDistributed

�� DynamicDynamic

�� A data vector is collected from each streamA data vector is collected from each stream

Given: Given:

9

�� Given: Given:

�� A function over the union of data vectorsA function over the union of data vectors

�� A threshold A threshold TT�� Continuous query:Continuous query: alert when the GLOBAL function alert when the GLOBAL function

crosses crosses TT�� Goal: minimize communication during query executionGoal: minimize communication during query execution

Page 10: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Search Engines Search Engines

�� Distributed datacenter/warehouseDistributed datacenter/warehouse�� ““Our logs are larger than any other data by orders of Our logs are larger than any other data by orders of

magnitude. They are our source of truthmagnitude. They are our source of truth..”” Sridhar Sridhar RamaswamyRamaswamy. . SIGMOD’SIGMOD’08 08 keynote on “Extreme Data Mining”keynote on “Extreme Data Mining”

10

RamaswamyRamaswamy. . SIGMOD’SIGMOD’08 08 keynote on “Extreme Data Mining”keynote on “Extreme Data Mining”

�� Mining the logs: Compute pairs of keywords for Mining the logs: Compute pairs of keywords for which the correlation index is highwhich the correlation index is high

�� ““Network bandwidth is a relatively scarce resource in our Network bandwidth is a relatively scarce resource in our computing environmentcomputing environment”. ”. Dean and Dean and GhemawatGhemawat.. MapReduceMapReducepaper,paper, OSDIOSDI’’0404

Page 11: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

11

Page 12: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Monitoring Cloud HealthMonitoring Cloud Health

�� Amazon’s Elastic Compute Cloud Amazon’s Elastic Compute Cloud –– ECEC22

�� Amazon’s Simple Storage Service Amazon’s Simple Storage Service –– SS33

Amazon Web Services » Service Health DashboardAmazon S3 Availability Event: July 20, 2008

12

Amazon S3 Availability Event: July 20, 2008Amazon S3 Availability Event: July 20, 2008

“At 8:40am PDT, error rates in all Amazon S3 datacenters began to quickly climb and our alarms went off. By 8:50am PDT, error rates were significantly elevated and very few requests were completing successfully. By 8:55am PDT, we had multiple engineers engaged and investigating the issue. Our alarms pointed at problems processing customer requests in multiple places within the system and across multiple data centers. While we began investigating several possible causes, we tried to restore system health... At 9:41am PDT, we determined that servers within Amazon S3 were having problems… By 11:05am PDT, all server-to-server communication was stopped, request processing components shut down, and the system's state cleared…. “

Page 13: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

AdAd--Hoc Mobile PHoc Mobile P22P NetworksP Networks

Peer-to-peer network invites drivers to get connectedCarTorrent could smarten up our daily commute, reducing accidents and bringing multimedia journey data to our fingertips

•Laura Parker •The Guardian, •Thursday January 17 2008

13

“The name BitTorrent has become part of most people's day-to-day vernacular, synonymous with downloading every kind of content via the internet's peer-to-peer networks. But if a team of US researchers have their way, we may all be talking about CarTorrent in the not too distant future…..

Researchers from the University of California Los Angeles are working on a wireless communication network that will allow cars to talk to each other, simultaneously downloading information in the shape of road safety warnings, entertainment content and navigational tools….”

Page 14: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Problem Model Problem Model –– Monitored FunctionMonitored Function�� Need to define a problem which is more general than Need to define a problem which is more general than

what was done so far (mostly, linear/monotonic what was done so far (mostly, linear/monotonic functions, aggregates).functions, aggregates).

�� But also a problem which is tractable and relevant to But also a problem which is tractable and relevant to practical applications.practical applications.

++

14

�� A satisfactory choice is A satisfactory choice is

nodes at the orsdata vect the

function, generala

ix

f

++n

xxf n...1

Page 15: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Problem Model Problem Model –– Monitored FunctionMonitored Function

�� Broad enough to cover many interesting Broad enough to cover many interesting problems, including maximum, topproblems, including maximum, top--kk, variance, , variance, effective dimension…effective dimension…

Maximum: augment local vectors by their Maximum: augment local vectors by their

15

�� Maximum: augment local vectors by their Maximum: augment local vectors by their powers and use the fact thatpowers and use the fact that

{ } ( )nnk

ni xxx

1

1 ...max ++≅

Page 16: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Other workOther work

•• CommunicationCommunication--efficient distributed monitoring of efficient distributed monitoring of thresholdedthresholded counts:counts: R. R. KeralapuraKeralapura, G. , G. CormodeCormode, J. , J. RamamirthamRamamirtham, , SIGMOD SIGMOD 20062006..

•• Algorithms for distributed functional monitoring: G. Algorithms for distributed functional monitoring: G. CormodeCormode, S. , S. MuthukrishnanMuthukrishnan, , KeKe Yi,Yi, SODA SODA 20082008. .

•• Handling NonHandling Non--linear Polynomial Queries over Dynamic Data: S. Shah, K. linear Polynomial Queries over Dynamic Data: S. Shah, K. RamamrithamRamamritham, , ICDE ICDE 20082008..RamamrithamRamamritham, , ICDE ICDE 20082008..

•• CommunicationCommunication--efficient online detection of networkefficient online detection of network--wide anomalies: L. Huang, wide anomalies: L. Huang, X.L. Nguyen, M. X.L. Nguyen, M. GarofalakisGarofalakis, J.M. , J.M. HellersteinHellerstein, , INFOCOM INFOCOM 20072007..

•• Efficient Detection of Distributed Constraint Violations: S. Efficient Detection of Distributed Constraint Violations: S. AgrawalAgrawal, , S.S. Deb, Deb, K.V.M. Naidu, R. K.V.M. Naidu, R. RastogiRastogi, , ICDE ICDE 20072007..

•• Efficient Constraint Monitoring Using Adaptive Thresholds: S. Efficient Constraint Monitoring Using Adaptive Thresholds: S. KashyapKashyap, J. , J. RamamirthamRamamirtham, R. , R. RastogiRastogi, P. , P. ShuklaShukla, ICDE , ICDE 20082008..

•• Approximate Decision Making in LargeApproximate Decision Making in Large--Scale Distributed Systems: L. Huang, M. Scale Distributed Systems: L. Huang, M. GarofalakisGarofalakis, , A.D.A.D. Joseph, N. Taft, MLSys'Joseph, N. Taft, MLSys'20072007..

16

Page 17: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

New Approach New Approach –– Based on GeometryBased on Geometry

““Let no one ignorant of geometry enter!”Let no one ignorant of geometry enter!”

17

�� We argue that for general functions, one must We argue that for general functions, one must monitor the monitor the domaindomain of the function and not its of the function and not its range.range.

Page 18: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Geometric ApproachGeometric Approach

�� Geometric Interpretation:Geometric Interpretation:�� Each node holds a data vectorEach node holds a data vector

�� Coloring the data space Coloring the data space �� Grey: function > thresholdGrey: function > threshold

White: function < thresholdWhite: function < threshold

18

�� White: function < thresholdWhite: function < threshold

�� Goal: determine Goal: determine colorcolor of global data vector of global data vector (average).(average).

�� General function General function –– not necessarily linear, not necessarily linear, monotonic, convex… implies a general subset of monotonic, convex… implies a general subset of Euclidean space in which function > threshold.Euclidean space in which function > threshold.

Page 19: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Bounding the Convex HullBounding the Convex Hull

19

�� Observation: average is in the convex hullObservation: average is in the convex hull

�� If convex hull is monochromatic, we know what If convex hull is monochromatic, we know what happens at the average vectorhappens at the average vector

�� Problem Problem –– convex hull may become largeconvex hull may become large

Page 20: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

DataMovie.avi

20

Page 21: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

The Bounding TheoremThe Bounding Theorem

�� A reference point is A reference point is known to all nodesknown to all nodes

�� Each vertex constructs a Each vertex constructs a spheresphere

21

spheresphere

�� Theorem: convex hull is Theorem: convex hull is bounded by the union of bounded by the union of spheresspheres

�� �� Local constraints!Local constraints!

Page 22: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

22

Page 23: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Basic AlgorithmBasic Algorithm

�� An initial estimate vector An initial estimate vector is calculatedis calculated

�� Nodes check color of Nodes check color of drift spheresdrift spheres�� Drift vector is the Drift vector is the

diameter of the drift diameter of the drift

23

Drift vector is the Drift vector is the diameter of the drift diameter of the drift spheresphere

�� If any sphere is nonIf any sphere is non--monochromatic: node monochromatic: node triggers retriggers re--calculation of calculation of estimate vectorestimate vector

Page 24: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

DataConvMovie.avi

24

DriftMovie.avi

Page 25: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Experiments: Reuters Corpus Experiments: Reuters Corpus

(RCV(RCV11--vv22))�� 800800,,000000+ + news stories news stories

�� Aug Aug 20 1996 20 1996 ---- Aug Aug 19 199719 1997

�� Corporate/Industrial tagging simulates spamCorporate/Industrial tagging simulates spam

�� Information Gain Information Gain ,,( ) log

( )( )i j

i jc

IG C cc c c c

= + +

∑ ∑

25

Information Gain vs. Document Index

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0 200000 400000 600000 800000Document Index

Info

rmat

ion

Gai

n

bosnia

ipo

febru

Broadcast Messages vs. Threshold

0

100

200

300

400

500

600

700

800

0 0.001 0.002 0.003 0.004 0.005 0.006Threshold

Bro

adca

st M

essa

ges

(x10

00)

bosnia

ipo

febru

Naive Alg.

Information Gain Information Gain

n=10

,,1 ,2 1, 2,{1,2} {1,2}

( )( )i ji i j ji j

c c c c∈ ∈

+ + ∑ ∑

Page 26: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

IssuesIssues

�� A general solution, but it lacks…A general solution, but it lacks…�� Rigorous optimality measureRigorous optimality measure�� Adapt to specific instance problemsAdapt to specific instance problems�� Reference point determinationReference point determination�� “A theory!!!”“A theory!!!”

26

�� “A theory!!!”“A theory!!!”

�� MonochromaticityMonochromaticity checkingchecking

Page 27: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Improvement: fit bounding Improvement: fit bounding

volumes to data distributionvolumes to data distribution

27

Page 28: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

28

Page 29: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Improvement: larger (but better) Improvement: larger (but better)

bounding volumes via “inner” bounding volumes via “inner”

reference vectorsreference vectors

29

Page 30: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Determining the new Reference Determining the new Reference

VectorVector

30

maxr

*e*en

e

Page 31: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Inner reference vectorInner reference vector

31

Page 32: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

ConvexityConvexity

�� The spherical constraints define regions in The spherical constraints define regions in

which the local vectors can roam freely (no which the local vectors can roam freely (no

alerts required).alerts required).

These regions turn out to be These regions turn out to be convex.convex.�� These regions turn out to be These regions turn out to be convex.convex.

32

Page 33: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Proof: for two points, the region is a half-plane.

Therefore, any region is the intersection of

half-planes, which are convex.

G1.5

2

2.5

33

r

-1.5 -1 -0.5 0 0.5 1 1.5

0

0.5

1

And vice-versa – it is easy to see that any

convex subset will do the job (since it is closed

under averages).

Page 34: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

So So –– why not look for an why not look for an optimaloptimalconvex region?convex region?

�� Very difficult (infinite dimensional, nonVery difficult (infinite dimensional, non--

linear) optimization problem: find a linear) optimization problem: find a linear) optimization problem: find a linear) optimization problem: find a

maximal (in the probabilistic sense) convex maximal (in the probabilistic sense) convex

subset. subset.

34

Page 35: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� A more general approach, requiring additional A more general approach, requiring additional machinery (optimization, probability, machinery (optimization, probability, algorithms).algorithms).

““As long as algebra and geometry have been

separated, their progress have been slow and their

uses limited, but when these two sciences have been

35

uses limited, but when these two sciences have been

united, they have lent each mutual forces, and have

marched together towards perfection.”

Page 36: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Spherical

constraints

36

Optimal

convex

octagon

Page 37: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

� So far, we assumed all nodes have the same

distribution, and therefore they are all assigned

the same region!

� General solution – assign each node its own

region, such that:

i. Each node is assigned a region which fits

37

i. Each node is assigned a region which fits

its data distribution.

ii. The average of vectors from the distinct

regions satisfies the threshold constraint

(it lies inside the set of points at which the

function is smaller than the threshold).

Page 38: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

xS

xp

G

yS

yp

38

Page 39: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

{ }BbAabaBA ∈∈+=⊕ ,|

Minkowski sum:

Optimization problem:

39

Optimization problem:

Maximize subject to ∫∫yx S

y

S

x dypdxp GSS yx ⊂

2

Page 40: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� This problem is NPThis problem is NP--hard even for the hard even for the

simplest case: two nodes, onesimplest case: two nodes, one--dimensional dimensional

data (reducible to maximal data (reducible to maximal bicliquebiclique):):

40

Page 41: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

Making it work:Making it work:

��Hierarchical clustering of nodes.Hierarchical clustering of nodes.

�� Various computational tricks to quickly Various computational tricks to quickly

41

�� Various computational tricks to quickly Various computational tricks to quickly

test the constraints and compute the target test the constraints and compute the target

function.function.

Page 42: Monitoring Large, Distributed, Dynamic SystemsAggregate Threshold Queries in Sensor Networks. IPDPS’IPDPS’0077 Shape Sensitive Geometrical Monitoring.PODS’PODS’0088,, IEEE

�� Violation recovery Violation recovery –– find optimal pairs find optimal pairs

of nodes which “balance” each othersof nodes which “balance” each others

42