DIMENSIONALITY REDUCTION Computer Graphics CourseJune 2013

DIMENSIONALITY REDUCTION Computer Graphics CourseJune 2013 Slide 2 What is high dimensional data? ImagesVideos Documents Most data, actually! Slide 3 What is high dimensional data? Images dimension 3XY Videos dimension of image * number of frames Documents Most data, actually Slide 4 Images dimension 3XY This is the number of bytes in the image file We can treat each byte as a dimension Each image is a point in high dimensional space Which space? space of images of size XY How many dimensions? Slide 5 But we can describe an image using less bytes! Blue sky, green grass, yellow road Drawing of a kong-fu rat How many dimensions? Slide 6 Visualization: Understanding the structure of data Why do Dimensionality Reduction? Slide 7 Visualization: Understanding the structure of data Fewer dimensions are easy to describe and find correlations (rules) Compression of data for efficiency Clustering Discovering similarities between elements Why do Dimensionality Reduction? Slide 8 Curse of dimensionality 100000000000 010000000000 001000000000 000100000000 All these vectors are the same Euclidean distance from each other But some dimensions could be worth more Can you work with 1,000 images of 1,000,000 dimensions? Why do Dimensionality Reduction? Slide 9 Image features: Average colors Histograms FFT based features (Frequency space) More Video features Document features Etc How to reduce dimensions? Slide 10 Feature dimension is still quite high (512, 1024, etc) What now? How to reduce dimensions? Slide 11 Simplest way: Project all points on a plane (2D) or a lower dimension sub-space Linear Dimensionality Reduction Slide 12 Simplest way: Project all points on a plane (2D) Only one question: Which plane is the best? PCA (SVD) Linear Dimensionality Reduction Slide 13 Simplest way: Project all points on a plane (2D) Only one question: Which plane is the best? PCA (SVD) For specific applications: CCA (correlation) LDA (data with labels) NMF (non-negative components) ICA (multiple sources) Linear Dimensionality Reduction Slide 14 What if data is not linear? No plane will work here Non-Linear Dimensionality Reduction Slide 15 MDS MultiDimensional Scaling Use only distances between elements Try to reconstruct element positions from distances such that: Reconstruction can happen in 1D, 2D, 3D, More dimensions = less error Non-Linear Dimensionality Reduction Slide 16 Slide 17 MDS MultiDimensional Scaling Classical MDS: an algebraic solution Construct a squared proximity matrix using some normalization (double centering) Extract d largest eigenvectors / eigenvalues Multiply each eigenvector with sqrt(eigenvalue) Each row is the coordinates of its corresponding point Non-Linear Dimensionality Reduction Slide 18 MDS MultiDimensional Scaling Classical MDS: an algebraic solution Non-Linear Dimensionality Reduction e1e2e3e4e5 x1 x2 x3 x4 x5 Each vector adds a dimension to the mapping Slide 19 Non-metric MDS: Optimization problem Example: Sammons projection Start from random positions for each element Define stress of the system: In each step, move towards positions that reduce the stress (gradient descent) Continue until convergence Non-Linear Dimensionality Reduction Slide 20 Spectral embedding: Create a graph of nearest neighbors Compute the graph laplacian (relates to probability of walking on each edge in a random walk) Compute Eigenvalues why? Computing Eigenvalues is like multiplying the matrix by itself many many times (towards infinity), which is like performing random walks over and over until we reach a stable point Again, the eigenvectors are the coordinates Does not preserve distances like MDS instead it groups together points that are likely neighbors Non-Linear Dimensionality Reduction Slide 21 Other non-linear methods Locally Linear Embedding (LLE): express each point as a linear combination of its neighbors Isomap: Takes adjacency graph as input, and calculate MDS of the geodesic distances (distances on the graph) Self Organizing Maps (SOM): Next part Non-Linear Dimensionality Reduction Slide 22 SELF ORGANIZING MAPS & RECENT APPLICATIONS Computer Graphics CourseJune 2013 Slide 23 Self Organizing Maps (SOM) Originated from neural networks Created by Kohonen, 1982 Also known as Kohonen Maps Teuvo Kohonen: A Finnish researcher, learning and neural networks Due to SOM, became the most cited Finnish scientist! More than 8,000 citations So what is it? Slide 24 What is a SOM? A type of neural network What is a neuron? A function with several inputs and one output In this case usually a linear combination of the input according to weights Slide 25 What is a SOM? neurons input (x k ) weights (m ik ) no connection (feedback/feed forward) between neurons Slide 26 Training a SOM Start from random weights For each input X(t) at iteration t: Find the Best Matching Cell (BMC) (also called Best Matching Unit or BMU) for X(t) Update weights for each neuron close to the BMU Weights are updated according to a decaying learning rate and radius Slide 27 Training a SOM neurons (m i ) X(1) BMC(1) X(2) BMC(2) Slide 28 Training a SOM The Math Best Matching Cell: m c for which is minimal Another option for BMC: maximal dot product x(t) T m c (t) Weight adaptation: is a learning rate dependant of both the time and the distance of m i from the BMC m c Slide 29 Training a SOM The Math Example (motion map): distance between BMC and m i learning ratekernel width maximum number of iterations height and width of the neuron map Slide 30 Training a SOM The Math Example (motion map): =0.25*(H+W)*(1-t/n L ) distance between BMC and m i learning ratekernel width maximum number of iterationsheight and width of the neuron map Slide 31 Presenting a SOM Option 1: at each node present the data that relates to vector m i (3D data, colors, continuous spaces) So for a color map with 3 inputs, if a neuron weights are (0.7, 0.2, 0.3) we would show a reddish color with 0.7 red component, 0.2 green component and 0.3 blue component For a map of points on the plane with 2 inputs, we would draw a point for each neuron in position (W x, W y ) Slide 32 Presenting a SOM Option 1: at each node present the data that relates to vector m i (3D data, colors, continuous spaces) Slide 33 Presenting a SOM Option 2: give each neuron a representation from the training set X which is closest to vector m i Slide 34 More Examples Slide 35 Slide 36 Slide 37 Motion Map Motion Map: Image-based Retrieval and Segmentation of Motion Data Sakamato, Kuriyama, Kenko SCA: Symposium on Computer Animation 2004 Goal: Presenting the user with a grid of postures in order to select a clip of motion data from a large database Perform clustering on the SOM instead of the abstract data Slide 38 Motion Map Example results: 436 posture samples from 55K frames of 51 motion files Slide 39 Motion Map Example results: Clustering based on SOM Slide 40 Motion Map - Details A map of posture samples is created from all motion files together Each sample similarity to its closest sample is over a given threshold to reduce computation time A standard SOM is calculated Each posture is then connected to a hash table of the motion files that contain similar postures Clustering the SOM enables display of a simplified map to the user (next page) Slide 41 Motion Map - Details Simplified map after SOM clustering: 17 dance styles Slide 42 Procedural Texture Preview Eurographics 2012 Goal: Present the user with a single image which shows all possibilities of a procedural texture Method overview: Selecting candidate vectors of parameters which maximize completeness, variety and smoothness Organizing the candidates in a SOM Synthesis of a continuous map Slide 43 Procedural Texture Preview Results thumbnails of random parameters texture preview in a single image texture parameters Slide 44 Procedural Texture Preview - Details Selecting candidates for the parameters map using the following optimizations: C = a set of dense samples X = the candidates in the parameter map Completeness: minimize Variety: maximize Smoothness: minimize Slide 45 Procedural Texture Preview - Details A standard SOM will jointly optimize the completeness and the smoothness To optimize the variety as well, the SOM implementation switches between minimizing Ev and maximizing Ec Instead of regular learning rate, at each step the candidates (weights vectors) are replaced by a new candidate according to the above optimizations Slide 46 Procedural Texture Preview - Details After the candidate selection, an image is synthesized which smoothly combines all selected candidates Stitching is done using standard patch based texture synthesis methods (Graphcut Textures, Kwarta et al, TOG 2003) Slide 47 Procedural Texture Preview Some more results Slide 48 Thats all folks! Questions?