41
CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

Embed Size (px)

Citation preview

Page 1: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826: Multimedia Databases and Data Mining

Lecture #12: Fractals - case studies Part III

(quadtrees, knn queries)

C. Faloutsos

Page 2: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 2

Must-read Material

• Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

Page 3: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 3

Optional Material

Optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

Page 4: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 4

Outline

Goal: ‘Find similar / interesting things’

• Intro to DB

• Indexing - similarity search

• Data Mining

Optional

Page 5: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 5

Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods

– z-ordering– R-trees– misc

• fractals– intro– applications

• text• ...

Optional

Page 6: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 6

Indexing - Detailed outline

• fractals– intro

– applications• disk accesses for R-trees (range queries)

• dimensionality reduction

• dim. curse revisited

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

Optional

✔✔

Page 7: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 7

Fractals and Quadtrees

• Problem: how many quadtree nodes will we need, to store a region in some level of approximation? [Gaede+96]

Optional

Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India.

Page 8: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 8

Fractals and Quadtrees

• I.e.:

Optional

Page 9: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 9

Fractals and Quadtrees

• I.e.:

level of quadtree

# of quadtree‘blocks’

(= # gray nodes)

?

Optional

Page 10: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 10

Fractals and Quadtrees

• Datasets:

FranconiaBrain Atlas

Optional

Page 11: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 11

Fractals and Quadtrees

• Hint:– assume that the boundary is self-similar, with a

given fd– how will the quad-tree (oct-tree) look like?

Optional

Page 12: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 12

Fractals and Quadtrees

white

black

gray

Optional

Page 13: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 13

Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.

If self-similar, what can we say for pg(i) ?

Optional

Page 14: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 14

Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.

If self-similar, what can we say for pg(i) ?

A: pg(i) = pg= constant

Optional

Page 15: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 15

Fractals and QuadtreesAssume only ‘gray’ and ‘white’ nodes (ie., no volume’)

Assume that pg is given - how many gray nodes at level i?

Optional

Page 16: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 16

Fractals and QuadtreesAssume only ‘gray’ and ‘white’ nodes (ie., no volume’)

Assume that pg is given - how many gray nodes at level i?

A: 1 at level 0;

4*pg

(4*pg)* (4*pg)

...

(4*pg ) i

Optional

Page 17: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 17

Fractals and Quadtrees

• I.e.:

level of quadtree (‘i’)

# of quadtree‘blocks’

? (4*pg) i

Optional

Page 18: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 18

Fractals and Quadtrees

• I.e.:

level of quadtree

log(# of quadtree‘blocks’)

? log[(4*pg) i]

Optional

Page 19: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 19

Fractals and Quadtrees

• Conclusion: Self-similarity leads to easy and accurate estimation

level

log2(#blocks)

Optional

Page 20: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 20

Fractals and Quadtrees

• Conclusion: Self-similarity leads to easy and accurate estimation

level

log2(#blocks)

Optional

Page 21: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 21

Fractals and QuadtreesOptional

Page 22: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 22

Fractals and Quadtrees

level

log(#blocks)

Optional

Page 23: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 23

Fractals and Quadtrees

• Final observation: relationship between pg and fractal dimension?

Optional

Page 24: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 24

Fractals and Quadtrees

• Final observation: relationship between pg and fractal dimension?

• A: very close:(4*pg)i = # of gray nodes at level i =

# of Hausdorff grid-cells of side (1/2)i = r

Eventually: DH = 2 + log2( pg )

and, for E-d spaces: DH = E + log2( pg )

Optional

Page 25: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 25

Fractals and Quadtrees

for E-d spaces: DH = E + log2( pg )

Sanity check:

- point in 2-d: DH = 0 pg = ??

- line in 2-d: DH = 1 pg = ??

- plane in 2-d: DH=2 pg = ??

- point in 3-d: DH = 0 pg = ??

Optional

Page 26: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 26

Fractals and Quadtrees

for E-d spaces: DH = E + log2( pg )

Sanity check:

- point in 2-d: DH = 0 pg = 1/4

- line in 2-d: DH = 1 pg = 1/2

- plane in 2-d: DH=2 pg= 1

- point in 3-d: DH = 0 pg = 1/8

Optional

Page 27: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 27

Fractals and Quadtrees

Final conclusions:

• self-similarity leads to estimates for # of z-values = # of quadtree/oct-tree blocks

• close dependence on the Hausdorff fractal dimension of the boundary

Optional

Page 28: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 28

Indexing - Detailed outline

• fractals– intro

– applications• disk accesses for R-trees (range queries)

• dimensionality reduction

• dim. curse revisited

• quad-tree analysis [Gaede+]

• nn queries [Belussi+]

Optional

✔✔

Page 29: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 29

NN queries

• Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]

rLinf

L2

L1

Page 30: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 30

NN queries• Q: in NN queries, what is the effect of the

shape of the query region?• that is, for L2, and self-similar data:

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 31: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 31

NN queries• Q: What about L1, Linf?

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 32: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 32

NN queries• Q: What about L1, Linf?• A: Same slope, different intercept

log(d)

log(#pairs-within(<=d))

r L2

D2

Page 33: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 33

NN queries• Q: What about L1, Linf?• A: Same slope, different intercept

log(d)

log(#neighbors)

Page 34: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 34

NN queries• Q: what about the intercept? Ie., what can

we say about N2 and Ninf

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

SKIPOptional

Page 35: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 35

NN queries• Consider sphere with volume Vinf and r’

radius

r L2

N2 neighbors

rLinf

Ninf neighbors

volume: V2 volume: Vinf

r’

SKIPOptional

Page 36: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 36

NN queries• Consider sphere with volume Vinf and r’

radius• (r/r’)^E = V2 / Vinf

• (r/r’)^D2 = N2 / N2’• N2’ = Ninf (since shape does not matter)• and finally:

SKIPOptional

Page 37: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 37

NN queries ( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E

SKIPOptional

Page 38: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 38

NN queriesConclusions: for self-similar datasets• Avg # neighbors: grows like (distance)^D2 ,

regardless of query shape (circle, diamond, square, e.t.c. )

Optional

Page 39: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 39

Indexing - Detailed outline• fractals

– intro– applications

• disk accesses for R-trees (range queries)• dimensionality reduction• dim. curse revisited• quad-tree analysis [Gaede+]• nn queries [Belussi+]

– Conclusions

Optional

Page 40: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 40

Fractals - overall conclusions

• self-similar datasets: appear often• powerful tools: correlation integral, NCDF,

rank-frequency plot• intrinsic/fractal dimension helps in

– estimations (selectivities, quadtrees, etc)– dim. reduction / dim. curse

• (later: can help in image compression...)

Page 41: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

CMU SCS

15-826 Copyright: C. Faloutsos (2014) 41

References

1. Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the

Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland.

2. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India.

3. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R-tree. International Conference on Data Engineering (ICDE), Sydney, Australia.