25
Visualization of High dimensional Datasets Jahangheer Shaik

Visualization of High dimensional Datasets

  • Upload
    mimir

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Visualization of High dimensional Datasets. Jahangheer Shaik. Why do we need Visualization?. Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications. . - PowerPoint PPT Presentation

Citation preview

Page 1: Visualization of High dimensional Datasets

Visualization of High dimensional Datasets

Jahangheer Shaik

Page 2: Visualization of High dimensional Datasets

Why do we need Visualization?

Noise? Distribution? Classes? Structure?

Data visualization techniques are often required to obtain meaningful insights by reducing the cognitive load to effectively convert the data into information and knowledge for subsequent applications.

Page 3: Visualization of High dimensional Datasets

Line Graphs

Line graphs are used for displaying single valued or piecewise continuous functions of one variable

Page 4: Visualization of High dimensional Datasets

Problems

Different types of lines (colored, dashed) have to be used to distinguish between the labeled classes

Each of the dimensions may have different scale

Page 5: Visualization of High dimensional Datasets

Bar Charts, Histograms

Histograms visualize discrete probability density functions

Page 6: Visualization of High dimensional Datasets

Hierarchical Clustering

Page 7: Visualization of High dimensional Datasets

Scatter Plot

Most popular tool Helps find clusters, outliers, trends,

correlations etc Glyphs, icons, colors etc may be used

for better understanding Not very intuitive when dimensions

increase

Page 8: Visualization of High dimensional Datasets

Scatter Plot Matrix

Page 9: Visualization of High dimensional Datasets

Eigen values and Eigen vectors

511

31

*13

22

46

*41624

46

*13

22

23

*4812

23

*13

22

Page 10: Visualization of High dimensional Datasets

Eigen vectors(contd..)

A transformation matrix transforms a vector from its original position to another position

If the transform results in the vector itself then the vector and all multiples of it would be eigen vector of transformation matrix

Page 11: Visualization of High dimensional Datasets

Properties of eigen vectors

Eigen vectors can be found for only square matrices

Given a n x n matrix, there are ‘n’ eigen vectors

It’s the direction that matters not scale Eigen vectors are orthogonal to each

other

Page 12: Visualization of High dimensional Datasets

Linear Discriminant Analysis

Maximizes the ratio of between class variance to within class variance

Page 13: Visualization of High dimensional Datasets

PCA-LDA

Page 14: Visualization of High dimensional Datasets

Dimensions: Orthogonality

Dimensions are organized such that they are orthogonal to each other

Inselberg points out that orthogonality uses up the space rapidly

Page 15: Visualization of High dimensional Datasets

Parallel Coordinates

Page 16: Visualization of High dimensional Datasets

Circular Parallel co-ordinates

Page 17: Visualization of High dimensional Datasets

Star coordinate projection

Page 18: Visualization of High dimensional Datasets

Star Coordinate Projection

J. Shaik and M. Yeasin, "Visualization of High Dimensional Data using an Automated 3D Star Co-ordinate System," Proceedings of IEEE IJCNN'06, Vancouver, Canada., pp. 1339-1346, 2006

Page 19: Visualization of High dimensional Datasets

Mathematical Representation

x

cosx

sinx

x

cosx

sinx

cosx

sinx

sinx

cosx

Page 20: Visualization of High dimensional Datasets

2D vs 3D

Page 21: Visualization of High dimensional Datasets

3D star coordinate system

cossinuu x

sinsinuu y

cosuu z

Page 22: Visualization of High dimensional Datasets

Results

00.5

1

-0.1

0

0.1-0.5

0

0.5

1

First 3DSCP component

3D scatter plot

Second 3DSCP component

Third

3D

SCP

com

pone

nt

class1class2class3

-0.4 -0.2 0 0.2 0.4 0.6-1

-0.50

0.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

X axis

3D star coordinate projection of IRIS dataset

Y axis

Z ax

is

Page 23: Visualization of High dimensional Datasets

Results

-4

-2

0

2

x 10-14

First principal component

PCA projection of Swiss roll Data

Second principal component

Third

prin

cipa

l com

pone

nt

Page 24: Visualization of High dimensional Datasets

Results

-200

20

-1

0

1

-0.5

0

0.5

1

x 10-14

First PCA component

3D scatter plot using PCA

Second PCA component

Third

PC

A c

ompo

nent

class1

class2

class3

-1.5-1

-0.50

-20

-10

0-1

-0.5

0

First LDA component

3D scatter plot using LDA

Second LDA component

Third

LD

A c

ompo

nent

class1class2class3

00.5

1

-0.1

0

0.1-0.5

0

0.5

1

First 3DSCP component

3D scatter plot

Second 3DSCP component

Third

3D

SCP

com

pone

nt

class1class2class3

Page 25: Visualization of High dimensional Datasets

Results

050

100150

200

-6

-4

-2

0-6.5

-6

-5.5

-5

-4.5

-4

-3.5

First PCA component

3D scatter plot

Second PCA component

Third

PC

A c

ompo

nent

class1class2class3

-2-1

01

2

-12

-10

-8

-6-4

-150

-100

-50

0

First LDA component

3D scatter plot

Second LDA component

Third

LD

A co

mpo

nent

class1class2class3

-0.4 -0.2 0 0.2 0.4 0.6-1

-0.50

0.5-0.2

0

0.2

0.4

0.6

0.8

1

1.2

X axis

3D star coordinate projection of IRIS dataset

Y axis

Z ax

is