18
Topological Data Analysis Visual presentation of multidimensional data sets

Topological Data Analysis: visual presentation of multidimensional data sets

Embed Size (px)

DESCRIPTION

Topology data analysis (TDA) is an unsupervised approach which may revolutionise the way data can be mined and eventually drive the new generation of analytical tools. The idea behind TDA is an attempt to "measure" shape of data and find compressed combinatorial representation of the shape. In ordinary topology, the combinatorial representations serve the purpose of providing the compressed representation of high dimensional data sets which retains information about the geometric relationships between data points. TDA can also be used as a very powerful clustering technique. Edward will present the comparison between TDA and other dimension reduction algorithms like PCA, LLE, Isomap, MDS, and Spectral Embedding.

Citation preview

Page 1: Topological Data Analysis: visual presentation of multidimensional data sets

Topological  Data  Analysis

Visual presentation of multidimensional data sets

Page 2: Topological Data Analysis: visual presentation of multidimensional data sets

Current  vs  New SQL Topological  Data  Analysis

Page 3: Topological Data Analysis: visual presentation of multidimensional data sets

Topology

The  Seven  Bridges  of  Königsberg,  a  problem  solved  by  Leonard  Euler  (1736).

The  study  of  qualitative  properties  of  certain  objects  (topological  spaces)  that  are  invariant  under  a  certain  kind  of  transformation  (continuous  map),  especially  those  properties  that  are  invariant  under  a  certain  kind  of  equivalence  (homeomorphism).

Page 4: Topological Data Analysis: visual presentation of multidimensional data sets

Topology  Data  Analysis  Pipeline

a b

a.  First  approximate  the  unknown  space  X  in  a  combinatorial  structure  K

b. Then  compute  topological  invariants  of  K

Page 5: Topological Data Analysis: visual presentation of multidimensional data sets

Combinatorial  Representations The  Čech  Complex

Page 6: Topological Data Analysis: visual presentation of multidimensional data sets

Combinatorial  Representations Alpha  Complex Vietoris-­‐‑Rips  Complex

Cubical  Complex Witness  Complex

Page 7: Topological Data Analysis: visual presentation of multidimensional data sets

Topological  Invariants A  topological  invariant  is  a  map  f    that  assigns  the  same  object  to homeomorphic  spaces,  that  is:

Homology:  is  a  machine  that  converts  local  data  about  a  space  into  global  algebraic  structure

Reference:  Wikipedia,  2010.

Page 8: Topological Data Analysis: visual presentation of multidimensional data sets

Morse  Theory  and  Reeb  Graph Theorem:   Suppose  h  :  X  g        is  a  discrete  Morse  function. Then  X  is  homotopy  equivalent  to  a  CW-­‐‑complex  with  exactly  one  cell  of  dimension  p  for  each  critical  simplex  of  dimension  p.

Reference:  Teng  Ma  ;  Zhuangzhi  Wu  ;  Pei  Luo  ;  Lu  Feng.  Reeb  graph  computation  through  spectral  clustering,  2011.

Page 9: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Demographics

Data  shape: [220:45]

Page 10: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  YT  channel  stats

Data  shape: [1500:12]

Page 11: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  dataset

Data  shape: [17770:480189] 8.5  billions  of  elements

Page 12: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  dataset

Music

Indian

Anime

French

Honk  Kong

US  Cartoons

Kids Movie

German

US Retro

Horror

Page 13: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  comparison

PCA Isomap

LLE

Spectral  Embedding

LTSA Hessian  LLE

Page 14: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (music)

Page 15: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (kids  movie)

Page 16: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (horror)

Page 17: Topological Data Analysis: visual presentation of multidimensional data sets

[email protected]

www.datarefiner.com

Please  sign  up  for  free  beta  access:

Page 18: Topological Data Analysis: visual presentation of multidimensional data sets

Questions?