Anthony Bak, Principal Data Scientist at Ayasdi at MLconf SEA - 5/01/15

Preview:

Citation preview

Shape as Organizing Principle for

Data

MLConf Seattle 2015

Anthony Bak, Principal Data Scientist

The Data Problem: Complexity

Solution: Topological Summaries

Shape as Organizing

Principle for Data

Shape as Organizing Principle

Reduce Bias, Discover Models

TDA tells you the data you have,

not the data you want to have.

Generating Topological

Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Remember/Forget

Use multiple lenses/metrics to get the complete picture

Different lenses provide different summaries

Generating Topological Summaries

Lenses: where do they come from?

Mean/Max/Min

Variance

n-Moment

Density

Statistics

PCA/SVD

Autoencoders

Isomap/MDS/TS

NE

Machine

Learning

Centrality

Curvature

Harmonic Cycles

Geometry

Why Topology?

Key Properties of TDA

Deformation

Invariance

Compressed

Representation

Coordinate

Freeness

Coordinate Invariance

1. Topology of shape doesn’t depend on the coordinates used to

describe the shape

1. Different feature sets can describe the same phenomena

1. While processing data, we frequently alter coordinates: scaling,

rotating, whitening

You want to study properties of your data that are invariant

under coordinate changes

Coordinate Invariance: Gene Expression

NKI

GSE230

Coordinate Invariance: Disease State

Deformation Invariance

• Topological features don’t change when you stretch and distort the

data

Advantage: Makes problems easier

Noise resistance

Less pre-processing of data

Robust (stable) data

Deformation Invariance

Deformation Invariance

Deformation Invariance

Deformation Invariance

Compressed Representation

• Replace the metric space with a combinatorial summary: a simplicial

complex.

• Data becomes easier to manage, search, and query while

maintaining essential features.

• Leverages many known algorithms from graph theory, computational

topology, computational geometry.

Compressed Representation

Baby Steps: PCA

PCA

PCA

Data Stories

Model Introspection

Model Introspection

Predictive Maintenance

Customer Churn

Customer Churn

Transaction Fraud

Transaction Fraud

Transaction Fraud

We’re Hiring!http://www.ayasdi.com/company/careers/

Data Has Shape

And

Shape Has Meaning