47
TOPOLOGICAL CHARACTERIZATION OF AN IMAGE DATASET WITH BETTI NUMBERS AND A GENERATIVE MODEL. Maxime MAILLOT (Exalead) Michaël AUPETIT (CEA LIST) Gérard GOVAERT (UTC-CNRS) DataSense | 08-07-2014

Topological characterization of an image dataset with Betti numbers and a generative model

  • Upload
    jackie

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

Topological characterization of an image dataset with Betti numbers and a generative model. Context. Multivariate data exploration Signals , images, …. Classical ML techniques Clustering : K- Means ; Gaussian Mixture Models -> convex clusters - PowerPoint PPT Presentation

Citation preview

Page 1: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

TOPOLOGICAL CHARACTERIZATION OF AN

IMAGE DATASET WITH BETTI NUMBERS AND A

GENERATIVE MODEL.

Maxime MAILLOT (Exalead)

Michaël AUPETIT (CEA LIST)

Gérard GOVAERT (UTC-CNRS)

DataSense | 08-07-2014

Page 2: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

• Multivariate data exploration• Signals , images, ….

• Classical ML techniques • Clustering: K-Means; Gaussian Mixture Models -> convex clusters

• Dimension reduction : Self-Organizing Maps, MDS, PCA -> Dime Reduct artefacts imposed by the

representation space

•Topological information (from underlying structure) :•Number of connected components•Intrinsic dimension•Topological invariants (Betti numbers)

Context

Page 3: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

Cognition and topology

Neuronal encoding of topological information survived Darwinian natural selection showing the importance of this information in our cognitive processes

WHY TOPOLOGICAL INFORMATION?

DataSense | 08-07-2014

Retinotopic map of a mouse [Hübener 2003]

Page 4: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

Topology and visual perception

Gestalt psychological theory [1920]The whole is more than summing the partsLaw of continuity, proximity, similarity

WHY TOPOLOGICAL INFORMATION?

DataSense | 08-07-2014

Topological view

Underlying structure

Statistical view

Underlying density

Geometrical view

Points location or underlying shapes

Predictive model: Our visual system instantly provides a topological model of the population

Descriptive model: sample is enough, no hypothesis about the population underlying the data

Page 5: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

Mental map and topology

Topological invariants as an objective representation

WHY TOPOLOGICAL INFORMATION?

DataSense | 08-07-2014

Objective map Mof a building B Subjective

map M1

of B

Subjective map M2

of B

Whatever radically different the perception process and experience of each person are, a topological invariant still exists common to both persons’ mental models and the real building’s map:

They share the same connectedness

Page 6: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

WHY TOPOLOGICAL INFORMATION?

Patterns reliability and topology

A large family of transformations

Reliability- The processing pipeline from data to decision is more likely to be a homotopy - So topological information is more likely to survive to the distortions of the pipeline- Hence topological information is a more reliable basis for decision facing

uncertainty

DataSense | 08-07-2014

U

Isometries Similarities Homeomorphisms Homotopies

U U

Initial space

Betti numbersIntrinsic dimension

Probability density functionsGeometry

Page 7: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

SOME HINTS ABOUT TOPOLOGY

Topology in a nutshell

What is the difference between a mug and a doughnut?

DataSense | 08-07-2014

Page 8: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

SOME HINTS ABOUT TOPOLOGY

Topology in a nutshell

What is the difference between a mug and a doughnut?

DataSense | 08-07-2014

Taste is significantly different!

Page 9: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

SOME HINTS ABOUT TOPOLOGY

Topological invariants

Two spaces have the same topology iff they are homeomorphic to each other, i.e. they are linked through a continuous function H whose inverse H-1 is also continuous.

Topology classifies spaces based on their topological invariants like the Betti numbers

DataSense | 08-07-2014

1-cycle which can contract to a point

Blue and brown 1-cycles cannot collapse to each otherThey form a homology group, the rank of which is 2 (b1=2)

1-cycles which cannot contract to a point

(b0,b1,b2)= (1,2,1)# of connected components# of independent 1-cycles (tunnels)# of independent 2-cycles (cavities)

Measures

Topological

inference

Sensor spaceSample of a robot’s trajectory Image of walls 1 and 2

In the robot-to-sensors distance space

Wall 1

Wall 2

Sensor 3

Sensor 1

Sensor 2

Page 10: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM SETS OF POINTS TO BETTI NUMBERS

Simplex family

Simplex assembly

SIMPLICIAL COMPLEX

DataSense | 08-07-2014

0-simplex 1-simplex 2-simplex 3-simplex

Page 11: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM SETS OF POINTS TO BETTI NUMBERS

For any manifold V it exists a simplicial complex C which is homeomorphic to V (C(V) is a triangulation of V)

Two triangulations may have the same Betti numbers while their manifolds are not homeomorphic.

DataSense | 08-07-2014

Betti numbers

Computational topology

Simplicial complex

Page 12: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

FROM SETS OF POINTS TO BETTI NUMBERS

Vietori-Rips complex and Betti numbers

a c

b

d

R=11

10

88

8

a c

b

d8

88

R=9

[Ch

aza

l]

(b0,b1,b2)(N,0,0) (37,6,0) (1,2,0) (1,0,0)

R

Topological persistence and multiscale analytics = persistence of topological structure through scale

Page 13: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

RESTRICTED DELAUNAY COMPLEX

[Edelsbrunner, Shah 1997]

M1

M2

From manifold to triangulation

Page 14: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

RESTRICTED DELAUNAY COMPLEX

[Edelsbrunner, Shah 1997]

M1

M2

. From manifold to triangulation

Page 15: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

RESTRICTED DELAUNAY COMPLEX

Manifold = union of spheres Centered on the atoms’ core (alpha sets the spheres radius

Molecules topology [Edelsbrunner1994]

Alpha-shapes

Page 16: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Topology Representing Network [Martinetz, Schulten 1994]

Connect 1st and 2nd Nearest Neighbor prototype of each data : Competitive Hebbian Learning (CHL)

Page 17: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Topology Representing Network [Martinetz, Schulten 1994]

1er2nd

Connect 1st and 2nd Nearest Neighbor prototype of each data : Competitive Hebbian Learning (CHL)

Page 18: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Topology Representing Network [Martinetz, Schulten 1994]

1er2nd

Connect 1st and 2nd Nearest Neighbor prototype of each data : Competitive Hebbian Learning (CHL)

Page 19: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Topology Representing Network [Martinetz, Schulten 1994]

1er2nd

ROI = Order 2 Voronoi cells

Connect 1st and 2nd Nearest Neighbor prototype of each data : Competitive Hebbian Learning (CHL)

Page 20: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Topology Representing Network [Martinetz, Schulten 1994]

1er2nd

ROI = Order 2 Voronoi cells

Connect 1st and 2nd Nearest Neighbor prototype of each data : Competitive Hebbian Learning (CHL)

Page 21: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DataSense | 08-07-2014

TOPOLOGY REPRESENTING NETWORKS

Order 2 Voronoi cells

No noise

Sample with gaussian noise

Page 22: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

When a Statistician meets a Topologist…

What is the probability for a HEAD if you flip a coin cut in a Moebius strip?

A GENERATIVE MODEL APPROACH

Moebius strip

DataSense | 08-07-2014

Page 23: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

When a Statistician meets a Topologist…

What is the probability for a HEAD if you flip a coin cut in a Moebius strip?

A GENERATIVE MODEL APPROACH

HEAD or TAIL? Moebius strip

P( HEAD ) = ?

DataSense | 08-07-2014

Page 24: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

When a Statistician meets a Topologist…

What is the probability for a HEAD if you flip a coin cut in a Moebius strip?

A GENERATIVE MODEL APPROACH

HEAD or TAIL? Moebius strip

P( HEADACHE ) = 1

DataSense | 08-07-2014

Page 25: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

Unknown generative manifolds with

possible different topology, different

labels, and possibly overlapping…

GENERATIVE GRAPH [GAILLARD 2010]

Topological inferencefrom the sample to the population

…from which are drawn samples with unknown probability

density…

…corrupted with unknown noise…

Statistical generative model – Where the data come from?

…leading to the actual data observations.

DataSense | 08-07-2014

Page 26: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

Unknown generative manifolds with

possible different topology, different

labels, and possibly overlapping…

GENERATIVE GRAPH [GAILLARD 2010]

…from which are drawn samples with unknown probability

density…

…corrupted with unknown noise…

Statistical generative model – General hypotheses

…leading to the actual data observations.

DataSense | 08-07-2014

Unknown generative manifolds …

Page 27: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

GENERATIVE GRAPH [GAILLARD 2010]

…from which are drawn samples with unknown

probability density…

…corrupted with unknown noise…

Statistical generative model – Simplified hypotheses

DataSense | 08-07-2014

Unknown generative manifolds…

Page 28: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

GENERATIVE GRAPH [GAILLARD 2010]

…from which are drawn samples with unknown

probability density…

…corrupted with unknown noise…

Generative Gaussian Graph (GGG) – Simplified hypotheses

DataSense | 08-07-2014

Unknown generative manifolds…

Delaunay graph of some prototypes with class label probability

p1-p

10

Jj

)c,x(p )jc(p)j(p

Gaussian noise with identity covariance

),jx(p

)j(p

Uniform density over each topological component (vertices and edges)

)jc(p

),jx(p

c

j

)jc(p)jc(p

)j(p)j(p

01

01

Page 29: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Topological summary

Model selection (# vertices): Bayesian Information Criterion

GENERATIVE GRAPH [GAILLARD 2010]

GGG: From data to topological synthesis

)jc(p

)j(p

),jx(p

c

j

)jc(p)jc(p

)j(p)j(p

01

01

Jj

)c,x(p ),jx(p )jc(p)j(pLikelihood

Maximization (EM)

DataSense | 08-07-2014

Multivariate data GMM

Delaunay

Page 30: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

GENERATIVE SIMPLICIAL COMPLEX [MAILLOT2012]

Generative simplices familly

DataSense | 08-07-2014

A

g0

…(Pseudo-Monte Carlo estimation)

Page 31: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

DATA SAMPLED FROM A GENERATIVE GAUSSIAN SIMPLEX

DataSense | 08-07-2014

d= 0 d= 1 d= 2

σ= 0.1

σ= 0.5

σ= 0.2

Page 32: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

GENERATIVE SIMPLICIAL COMPLEX

DataSense | 08-07-2014

Expectation-Maximization

π1 < π2 < π3 < ………< πi < …… < πn

BIC max

Page 33: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

DataSense | 08-07-2014

Page 34: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

Protoypes location initialized with GMM

DataSense | 08-07-2014

Page 35: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

Delaunay complex built on top of the prototypes

First the edges…

DataSense | 08-07-2014

Page 36: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

DataSense | 08-07-2014

Delaunay complex built on top of the prototypes

First the edges…Then the surfaces…

Page 37: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

Likelihood maximization for dimension 1 components

The p proportion of each edge is estimated with EM

Edges with too low proportion do not contribute significantly to the model (wrt Bayesian Information Criterion), they are pruned from the model

DataSense | 08-07-2014

Page 38: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

Likelihood maximization for dimension 2 components

Proportions of both surfaces and remaining edges are estimated with EM, then pruned wrt BIC

DataSense | 08-07-2014

Page 39: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

FROM DATA TO GENERATIVE SIMPLICIAL COMPLEX

Topological cleaning

If a simplex survived, all its facets are pruned.

DataSense | 08-07-2014

Page 40: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

RESULTS (1/3)

DataSense | 08-07-2014

SPHERE (1,0,1,0…) TORE (1,2,1,0…)

KLEIN BOTTLE (1,1,0…)

Page 41: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

RESULTS (2/3)

Images data COIL-100 :

DataSense | 08-07-2014

• 100 objects in rotation each represented by 72 images (5°) with 64x64 pixels (projected by PCA on the 71 first principal components)

• O 2D simplices

• Delaunay complex only computed for 1D then 2D elements in the 71D space

• We recover a cycle structure

Page 42: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

RESULTS (3/3)

Images data COIL-100 :

DataSense | 08-07-2014

• Expected Betti numbers (1,1,0 …)

• (1,2,0 …) correspond to an 8 shape

• The (1,n,0 …) shows that many faces of the objects look similar

• (1,0,0,…) shows a rotatioal invariant object

Example for (1,2,0,…) (like an 8)

Page 43: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

CONCLUSIONS

• GSC: first generative model to extract Betti numbers from a data set

• No meta-parameter to tune (EM + BIC)

DataSense | 08-07-2014

Page 44: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

PERSPECTIVES

• Topological analysis for each connected component separately

• Algorithmic improvements (pseudo-monte-carlo, pruning…)

• Link BIC optimal and Betti numbers

• Deep Networks : how topological invariants could be explicitely encoded within each layer?

DataSense | 08-07-2014

Page 45: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

THANK YOU FOR YOUR ATTENTION

- MA, Learning Topology with the Generative Gaussian Graph and the EM algorithm. NIPS 2005 Conference proceeding, pp.83-90, 2006.

- Gaillard Pierre, MA, Gérard Govaert. Learning topology of a labeled data set with the supervised generative Gaussian graph. Neurocomputing, 71(7-9): 1283-1299, Elsevier March 2008

- Maillot Maxime, MA, Gérard Govaert. Extraction of Betti numbers based on a generative model. ESANN 2012

- Maillot Maxime, MA, Gérard Govaert. The Generative Simplicial Complex to extract Betti numbers from unlabeled data. Workshop at NIPS 2012

Questions?

DataSense | 08-07-2014

Page 46: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

QUESTIONS

•Pourquoi un modèle de bruit isovarié? -> pour la complexité du modèle soit attrapée

par le complexe simplicial et les nombres de Betti

•Pourquoi les nombres de Betti? La connexioté semble suffire pour les applications ?

Forme prise par les états d’un système dynamique (épilepsie / cas normal-alerte-

catastrophe… ) pas de cas réel mais mise au point d’un modèle/système de mesure.

DataSense | 08-07-2014

Page 47: Topological characterization  of an image  dataset with  Betti  numbers  and a  generative  model

•Suggestions:

•- comparaison topologie ND vs topologie 2D pour évaluation distorsions de projections

•- système dynamique changeant de forme et dont la forme indique l’état (bon, alerte,

mauvais)

•- analyse/caractérisation topologique de données

•- contrôle de passage dans zone d’alerte (système dynamique dont on observe l’état

bruité) on veut vérifier que l’on ne peut pas passer directement d’un état bon à un état

mauvais sans passer par l’état d’alerte: extension du SGGG au cas des CS: trous

dans la structure = fuite possible A CLARIFIER

•- Cas de l’analyse de locuteurs sur les lettres (triangle NSI2000): utiliser un locuteur

comme sommet du GSC et positionner les autres par rapport à lui, détecter la forme

des lettres prononcées NON NE MARCHE PAS la forme est similaire à une

homothétie près

DataSense | 08-07-2014