100
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University

Data Mining using Fractals and Power laws

Embed Size (px)

DESCRIPTION

Data Mining using Fractals and Power laws. Christos Faloutsos Carnegie Mellon University. Thank you!. Prof. Hsing-Kuo Kenneth PAO Prof. Yuh-Jye LEE Hsin Yeh. And also thanks to. Lei LI Leman AKOGLU Ian ROLEWICZ. Ching-Hao (Eric) MAO Ming-Kung (Morgan) SUN Yi-Ren (Ian) YEH. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Mining using Fractals and  Power  laws

School of Computer ScienceCarnegie Mellon

Data Mining using Fractals and Power laws

Christos Faloutsos

Carnegie Mellon University

Page 2: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 2

School of Computer ScienceCarnegie Mellon

Thank you!

• Prof. Hsing-Kuo Kenneth PAO

• Prof. Yuh-Jye LEE

• Hsin Yeh

Page 3: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 3

School of Computer ScienceCarnegie Mellon

And also thanks to

Ching-Hao (Eric) MAO

Ming-Kung (Morgan) SUN

Yi-Ren (Ian) YEH

Lei LI

Leman AKOGLU

Ian ROLEWICZ

Page 4: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 4

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data– (B) network intrusion data

• Solutions: self-similarity and power laws

• Discussion

Page 5: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 5

School of Computer ScienceCarnegie Mellon

Applications of sensors/streams

• network monitoring

time

# alerts

Page 6: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 6

School of Computer ScienceCarnegie Mellon

Applications of sensors/streams

• Financial, sales, economic series

• Medical: ECGs +; blood pressure etc monitoring

Page 7: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 7

School of Computer ScienceCarnegie Mellon

Motivation - Applications• Scientific data: seismological;

astronomical; environment / anti-pollution; meteorological

Sunspot Data

0

50

100

150

200

250

300

Page 8: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 8

School of Computer ScienceCarnegie Mellon

Motivation - Applications (cont’d)

• Computer systems

– web servers (buffering, prefetching)

– ...

http://repository.cs.vt.edu/lbl-conn-7.tar.Z

Page 9: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 9

School of Computer ScienceCarnegie Mellon

Web traffic

• [Crovella Bestavros, SIGMETRICS’96]

Page 10: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 10

School of Computer ScienceCarnegie Mellon

...

survivable,self-managing storage

infrastructure

...

a storage brick(0.5–5 TB)~1 PB

“self-*” = self-managing, self-tuning, self-healing, …

Self-* Storage (Ganger+)

Page 11: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 11

School of Computer ScienceCarnegie Mellon

...

survivable,self-managing storage

infrastructure

...

a storage brick(0.5–5 TB)~1 PB

“self-*” = self-managing, self-tuning, self-healing, … Goal: 1 petabyte (PB) www.pdl.cmu.edu/SelfStar

Self-* Storage (Ganger+)

Page 12: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 12

School of Computer ScienceCarnegie Mellon

Problem definition

• Given: one or more sequences x1 , x2 , … , xt , …; (y1, y2, … , yt, …)

• Find – patterns; clusters; outliers; forecasts;

Page 13: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 13

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Page 14: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 14

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Page 15: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 15

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Page 16: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 16

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Q: Then, how to generatesuch bursty traffic?

Page 17: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 17

School of Computer ScienceCarnegie Mellon

Solutions

• New tools: power laws, self-similarity and ‘fractals’ work, where traditional assumptions fail

• Let’s see the details:

Page 18: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 18

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data– (B) network data

• Solutions: self-similarity and power laws

• Discussion

Page 19: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 19

School of Computer ScienceCarnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??

Page 20: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 20

School of Computer ScienceCarnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??A: log3 / log2 = 1.58 (!?!)

Page 21: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 21

School of Computer ScienceCarnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• Q: fd of a plane?

Page 22: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 22

School of Computer ScienceCarnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)

• Q: fd of a plane?• A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs..

log(r) )

Page 23: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 23

School of Computer ScienceCarnegie Mellon

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

Page 24: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 24

School of Computer ScienceCarnegie Mellon

Observations: Fractals <-> power laws

Closely related:

• fractals <=>

• self-similarity <=>

• scale-free <=>

• power laws ( y= xa ; F=K r-2)

• (vs y=e-ax or y=xa+b)log( r )

log(#pairs within <=r )

1.58

Page 25: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 25

School of Computer ScienceCarnegie Mellon

Outline

• Problems

• Self-similarity and power laws

• Solutions to posed problems

• Discussion

Page 26: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 26

School of Computer ScienceCarnegie Mellon

time

#bytes

Solution #1: traffic

• disk traces: self-similar: (also: [Leland+94])• How to generate such traffic?

Page 27: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 27

School of Computer ScienceCarnegie Mellon

Solution #1: traffic

• disk traces (80-20 ‘law’) – ‘multifractals’

time

#bytes

20% 80%

Page 28: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 28

School of Computer ScienceCarnegie Mellon

80-20 / multifractals20 80

Page 29: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 29

School of Computer ScienceCarnegie Mellon

80-20 / multifractals20

• p ; (1-p) in general

• yes, there are dependencies

80

Page 30: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 30

School of Computer ScienceCarnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

time

cylinder#

Page 31: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 31

School of Computer ScienceCarnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

p q

r s

q

r s

Page 32: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 32

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data

– (B) network data

• Solutions: self-similarity and power laws– sensor/traffic data

– network data

• Discussion

Page 33: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 33

School of Computer ScienceCarnegie Mellon

Problem dfn

<source-ip, target-ip, timestamp, alert-type>

eg., <192.168.2.5; 128.2.220.159; 3am june 6; ICMP-redirect-host>

goal: find patterns / anomalies

Page 34: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 34

School of Computer ScienceCarnegie Mellon

Power laws in intrusion data

rank

count

Page 35: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 35

School of Computer ScienceCarnegie Mellon

Page 36: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 36

School of Computer ScienceCarnegie Mellon

human-like

robot-like

robot-like (bursty)

Q: Can we visuallysummarize / classifyour sequences?

Page 37: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 37

School of Computer ScienceCarnegie Mellon

Answer: yes!

two features:

• F1: how periodic (24h-cycle) is a sequence

• F2: how bursty it is

Q: how to measure burstiness?

A: Fractal dimension!

Page 38: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 38

School of Computer ScienceCarnegie Mellon

Burstiness & f.d.

uniform: fd = 1

@same time-tick: fd = 0

Page 39: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 39

School of Computer ScienceCarnegie Mellon

Burstiness & f.d.

uniform: fd = 1

@same time-tick: fd = 0

bursts within bursts within bursts:0<fd<1

Page 40: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 40

School of Computer ScienceCarnegie Mellon

6. Proposed Methods: The FDP Plot

• Notice: clustering wrt alert types!

Page 41: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 41

School of Computer ScienceCarnegie Mellon

human-like

robot-like

robot-like (bursty)

can we visuallysummarize / classifyour sequences?

Page 42: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 42

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 43: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 43

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 44: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 44

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 45: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 45

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 46: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 46

School of Computer ScienceCarnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion – what else can they solve? – how frequent are fractals?

Page 47: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 47

School of Computer ScienceCarnegie Mellon

What else can they solve?

• separability [KDD’02]• forecasting [CIKM’02]• dimensionality reduction [SBBD’00]• non-linear axis scaling [KDD’02]• disk trace modeling [PEVA’02]• selectivity of spatial/multimedia queries

[PODS’94, VLDB’95, ICDE’00]• ...

Page 48: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 48

School of Computer ScienceCarnegie Mellon

Problem #3 - spatial d.m.

Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’

galaxies

- patterns? (not Gaussian; not uniform)

-attraction/repulsion?

- separability??

Page 49: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 49

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

CORRELATION INTEGRAL!

Page 50: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 50

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

[w/ Seeger, Traina, Traina, SIGMOD00]

Page 51: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 51

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

r1r2

r1

r2

Heuristic on choosing # of clusters

Page 52: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 52

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

Page 53: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 53

School of Computer ScienceCarnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion – what else can they solve? – how frequent are fractals?

Page 54: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 54

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

• <and many-many more! see [Mandelbrot]>

Page 55: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 55

School of Computer ScienceCarnegie Mellon

Fractals: Brain scans

• brain-scans

octree levels

Log(#octants)

2.63 = fd

Page 56: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 56

School of Computer ScienceCarnegie Mellon

More fractals

• periphery of malignant tumors: ~1.5

• benign: ~1.3

• [Burdet+]

Page 57: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 57

School of Computer ScienceCarnegie Mellon

More fractals:

• cardiovascular system: 3 (!) lungs: ~2.9

Page 58: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 58

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 59: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 59

School of Computer ScienceCarnegie Mellon

More fractals:

• Coastlines: 1.2-1.58

1 1.1

1.3

Page 60: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 60

School of Computer ScienceCarnegie Mellon

Page 61: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 61

School of Computer ScienceCarnegie Mellon

More fractals:

• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)

[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

Page 62: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 62

School of Computer ScienceCarnegie Mellon

More fractals:

• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)

[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

Page 63: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 63

School of Computer ScienceCarnegie Mellon

Cross-roads of Montgomery county:

•any rules?

GIS points

Page 64: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 64

School of Computer ScienceCarnegie Mellon

GIS

A: self-similarity:• intrinsic dim. = 1.51

log( r )

log(#pairs(within <= r))

1.51

Page 65: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 65

School of Computer ScienceCarnegie Mellon

Examples:LB county

• Long Beach county of CA (road end-points)

1.7

log(r)

log(#pairs)

Page 66: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 66

School of Computer ScienceCarnegie Mellon

More power laws: areas – Korcak’s law

Scandinavian lakes

Any pattern?

Page 67: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 67

School of Computer ScienceCarnegie Mellon

More power laws: areas – Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

Page 68: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 68

School of Computer ScienceCarnegie Mellon

More power laws: Korcak

Japan islands;

area vs cumulative count (log-log axes) log(area)

log(count( >= area))

Page 69: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 69

School of Computer ScienceCarnegie Mellon

More power laws

• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]

log(count)

Magnitude = log(energy)day

Energy released

Page 70: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 70

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 71: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 71

School of Computer ScienceCarnegie Mellon

A famous power law: Zipf’s law

• Bible - rank vs. frequency (log-log)

log(rank)

log(freq)

“a”

“the”

“Rank/frequency plot”

Page 72: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 72

School of Computer ScienceCarnegie Mellon

TELCO data

# of service units

count ofcustomers

‘best customer’

Page 73: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 73

School of Computer ScienceCarnegie Mellon

SALES data – store#96

# units sold

count of products

“aspirin”

Page 74: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 74

School of Computer ScienceCarnegie Mellon

Olympic medals (Sidney’00, Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

Page 75: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 75

School of Computer ScienceCarnegie Mellon

Olympic medals (Sidney’00, Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

Page 76: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 76

School of Computer ScienceCarnegie Mellon

Even more power laws:

• Income distribution (Pareto’s law)• size of firms

• publication counts (Lotka’s law)

Page 77: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 77

School of Computer ScienceCarnegie Mellon

Even more power laws:

library science (Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

Page 78: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 78

School of Computer ScienceCarnegie Mellon

Even more power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(freq)

log(count)

Zipf“yahoo.com”

Page 79: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 79

School of Computer ScienceCarnegie Mellon

Autonomous systems• Power law in the degree distribution [SIGCOMM99]

internet domains

log(rank)

log(degree)

-0.82

att.com

ibm.com

Page 80: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 80

School of Computer ScienceCarnegie Mellon

The Peer-to-Peer Topology

• Number of immediate peers (= degree), follows a power-law

[Jovanovic+]

degree

count

Page 81: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 81

School of Computer ScienceCarnegie Mellon

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

Page 82: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 82

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 83: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 83

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

- log(freq)

from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

Page 84: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 84

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

log(freq)

from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

Page 85: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 85

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

log(freq)

Q: ‘how can we usethese power laws?’

Page 86: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 86

School of Computer ScienceCarnegie Mellon

“Foiled by power law”

• [Broder+, WWW’00]

(log) in-degree

(log) count

Page 87: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 87

School of Computer ScienceCarnegie Mellon

“Foiled by power law”

• [Broder+, WWW’00]

“The anomalous bump at 120on the x-axis is due a large clique formed by a single spammer”

(log) in-degree

(log) count

Page 88: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 88

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

• length of file transfers [Crovella+Bestavros ‘96]

• duration of UNIX jobs

Page 89: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 89

School of Computer ScienceCarnegie Mellon

Conclusions

• Fascinating problems in Data Mining: find patterns in streams & network data

Page 90: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 90

School of Computer ScienceCarnegie Mellon

Conclusions - cont’d

New tools for Data Mining: self-similarity & power laws: appear in many cases

Bad news:

lead to skewed distributions

(no Gaussian, Poisson,

uniformity, independence,

mean, variance)

Good news:• ‘correlation integral’

for separability• rank/frequency plots• 80-20 (multifractals)• (Hurst exponent, • strange attractors,• renormalization theory, • ++)

Page 91: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 91

School of Computer ScienceCarnegie Mellon

Resources

• Manfred Schroeder “Chaos, Fractals and Power Laws”, 1991

Page 92: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 92

School of Computer ScienceCarnegie Mellon

References

• [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

• [Broder+’00] Andrei Broder, Ravi Kumar , Farzin Maghoul1, Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata, Andrew Tomkins , Janet Wiener, Graph structure in the web , WWW’00

• M. Crovella and A. Bestavros, Self similarity in World wide web traffic: Evidence and possible causes , SIGMETRICS ’96.

Page 93: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 93

School of Computer ScienceCarnegie Mellon

References

• J. Considine, F. Li, G. Kollios and J. Byers, Approximate Aggregation Techniques for Sensor Databases (ICDE’04, best paper award).

• [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Page 94: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 94

School of Computer ScienceCarnegie Mellon

References

• [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

• [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Page 95: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 95

School of Computer ScienceCarnegie Mellon

References

• [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996

• [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Page 96: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 96

School of Computer ScienceCarnegie Mellon

References

• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos: Graphs over time: densification laws, shrinking diameters and possible explanations. KDD 2005: 177-187

Page 97: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 97

School of Computer ScienceCarnegie Mellon

References

• [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

• [brite] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers. BRITE: An Approach to Universal Topology Generation. MASCOTS '01

Page 98: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 98

School of Computer ScienceCarnegie Mellon

References

• [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree (ICDE’99)

• Stan Sclaroff, Leonid Taycher and Marco La Cascia , "ImageRover: A content-based image browser for the world wide web" Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, pp 2-9, 1997.

Page 99: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 99

School of Computer ScienceCarnegie Mellon

References

• [kdd2001] Agma J. M. Traina, Caetano Traina Jr., Spiros Papadimitriou and Christos Faloutsos: Tri-plots: Scalable Tools for Multidimensional Data Mining, KDD 2001, San Francisco, CA.

Page 100: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 100

School of Computer ScienceCarnegie Mellon

Thank you!

Contact info:christos <at> cs.cmu.edu

www. cs.cmu.edu /~christos

(w/ papers, datasets, code for fractal dimension estimation, etc)