85
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis Denis Helic ISDS, TU Graz November 29, 2018 Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 1 / 85

Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Knowledge Discovery and Data Mining 1 (VO)(706.701)

Singular Value Decomposition and Latent Semantic Analysis

Denis Helic

ISDS, TU Graz

November 29, 2018

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 1 / 85

Page 2: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Big picture: KDDM

Knowledge Discovery Process

Preprocessing

Linear Algebra

Mathematical Tools

Information Theory Statistical Inference

Probability Theory Hardware & ProgrammingModel

Infrastructure

Transformation

Data Mining

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 2 / 85

Page 3: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Outline

1 Introduction2 Singular Value Decomposition3 SVD Example and Interpretation4 Mathematics of SVD5 Dimensionality Reduction with SVD6 Advanced: SVD minimizes the approximation error7 Latent Semantic Indexing

SlidesSlides are partially based on “Mining Massive Datasets” Chapter 11,“Introduction to Information Retrieval” by Manning, Raghavan andSchütze, and Melanie Martin AI Seminar

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 3 / 85

Page 4: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Introduction

Recap

RecapReview of Principal Component Analysis

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 4 / 85

Page 5: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Introduction

Recap — PCA: Algorithm

Organize data as an 𝑛 × 𝑚 matrix, with 𝑛 data points and 𝑚 featuresSubtract the average for each feature to obtain centered datamatrix 𝐗Calculate the sample covariance matrix 𝐒 = 1

𝑛𝐗𝑇 𝐗Calculate the eigenvalues and the eigenvectors of 𝐒Select the top 𝑘 eigenvectorsProject the data to the new space spanned by those 𝑘 eigenvectors:𝐗𝐐𝑘 ∈ ℝ𝑛×𝑘, where 𝐐𝑘 ∈ ℝ𝑚×𝑘

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 5 / 85

Page 6: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Introduction

Recap — PCA: Interpretation

You can interpret the first couple of principal components to learnsomething about the datasetData miningHowever, be very careful: you can not generalize from a single datasetPCA transforms the set of correlated observations into a set oflinearly uncorrelated observationsI.e. the goal of the analysis is to decorrelate the data

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 6 / 85

Page 7: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Singular Value Decomposition

SVD

We investigate now a second form of matrix analysis called SingularValue DecompositionIt allows an exact representation of any matrix, i.e., the matrix neednot be a square matrixIt decomposes a matrix into a product of three matricesIt also provides an elegant way of dimensionality reductionBy eliminating the less important parts of the data representation

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 7 / 85

Page 8: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Singular Value Decomposition

SVD — Interpretation

SVD is based on the idea that there exist a small number of hidden(latent) “concepts” that connect the rows and columns of the matrixWe can rank the “concepts” from the most to the least importantWe use the ranking to remove the less important “concepts” from thematrixSuch that the new reduced matrix closely approximates the originalmatrix (according to a certain criteria)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 8 / 85

Page 9: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Singular Value Decomposition

SVD: Definition

Singular Value DecompositionLet 𝐌 ∈ ℝ𝑛×𝑚 be a matrix and let 𝑟 be the rank of 𝐌 (the rank of amatrix is the largest number of linearly independent rows or columns).Then we can find matrices 𝐔, 𝐕, and 𝚺 with the following properties:

𝐔 ∈ ℝ𝑛×𝑟 is a column-orthonormal matrix𝐕 ∈ ℝ𝑚×𝑟 is a column-orthonormal matrix𝚺 ∈ ℝ𝑟×𝑟 is a diagonal matrix.

The matrix 𝐌 can be then written as:

𝐌 = 𝐔𝚺𝐕𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 9 / 85

Page 10: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Singular Value Decomposition

SVD form

Figure: Figure from “Mining Massive Datasets”

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 10 / 85

Page 11: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

We will decompose a utility matrix of a movie recommender systemWe have users who rate moviesAssume we have only two “concepts”, which underlie the movies andsteer the rating processE.g. concepts represent two movie genres: science fiction andromanceUsers from group A rate only science fiction and users from group Bonly romance

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 11 / 85

Page 12: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

UserMovie Matrix Alien Star Wars Casablanca Titanic

𝐴1 1 1 1 0 0𝐴2 3 3 3 0 0𝐴3 4 4 4 0 0𝐴4 5 5 5 0 0𝐵1 0 0 0 4 4𝐵2 0 0 0 5 5𝐵3 0 0 0 2 2

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 12 / 85

Page 13: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

Strict adherence to those two concepts gives the matrix a rank of𝑟 = 2We may pick one of the first four rows and one of the last three rowsand we can not find a nonzero linear combination that gives 𝟎But we can not pick three independent rowsIf we pick rows 1, 2 and 7 then three times the first minus the secondplus zero times the seventh gives 𝟎

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 13 / 85

Page 14: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

Similarly for columnsWe may pick one of the first three and one of the last two and theywill be independentBut we can not pick three independent columnsIf we pick columns 1, 2, and 5 then the first minus the second pluszero times the fifth gives 𝟎Thus, the rank is indeed 𝑟 = 2 and 𝚺 ∈ ℝ2×2

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 14 / 85

Page 15: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

We will see later how to calculate the decomposition:

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 15 / 85

Page 16: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

The key to understanding SVD is in viewing the 𝑟 columns of 𝐔, 𝚺,and 𝐕 as representing concepts that are hidden or latent in theoriginal matrix 𝐌In our example these concepts are clearOne is science fictionThe other one is romance

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 16 / 85

Page 17: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

The rows of 𝐌 are peopleThe columns of 𝐌 are moviesThen the rows of 𝐔 are peopleThe columns of 𝐔 are concepts𝐔 connects people to concepts

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 17 / 85

Page 18: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

For example, the user 𝐴1 likes only science fiction:

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 18 / 85

Page 19: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

The value of 0.14 in the first row and first column of 𝐔 indicates thisfactHowever, this value is smaller than some of other values in the firstcolumn of 𝐔

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 19 / 85

Page 20: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

While 𝐴1 watches only science fiction she does not rate these movieshighly𝐴1 contributes to the concept of science fiction but not as much ase.g. 𝐴4 who rated these movies highly

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 20 / 85

Page 21: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

On the other hand, the second column of the first row of 𝐔 is zero:

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 21 / 85

Page 22: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

𝐴1 does not rate romance movies at all and does not contribute tothat concept:

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 22 / 85

Page 23: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

𝐕𝑇 = (0.58 0.58 0.58 0 00 0 0 0.71 0.71)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 23 / 85

Page 24: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

The rows of 𝐌 are peopleThe columns of 𝐌 are moviesThen the rows of 𝐕𝑇 are conceptsThe columns of 𝐕𝑇 are movies𝐕 connects movies to concepts

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 24 / 85

Page 25: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

For example, the 0.58 in the first three columns of the first row of𝐕𝑇 indicates that the first three movies are of science fiction genreMatrix, Alien and Star Wars

𝐕𝑇 = (0.58 0.58 0.58 0 00 0 0 0.71 0.71)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 25 / 85

Page 26: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

On the other hand, the last two movies have nothing to do withscience fictionCasablanca and Titanic

𝐕𝑇 = (0.58 0.58 0.58 0 00 0 0 0.71 0.71)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 26 / 85

Page 27: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

Also, Matrix, Alien and Star Wars do not partake of the concept ofromance at allAs indicated by 0’s in the first three columns of the second row

𝐕𝑇 = (0.58 0.58 0.58 0 00 0 0 0.71 0.71)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 27 / 85

Page 28: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

Whereas, Casablanca and Titanic are romance moviesThe 0.71 in the last two columns of the second row

𝐕𝑇 = (0.58 0.58 0.58 0 00 0 0 0.71 0.71)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 28 / 85

Page 29: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

𝚺 = (12.4 00 9.5)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 29 / 85

Page 30: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

Finally, the matrix 𝚺 gives the strength of each conceptIn our example the strength of science fiction is 12.4The strength of romance is 9.4Intuitively, science fiction is a stronger concept because we have moredata on movies of that genre and more people who rate these movies

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 30 / 85

Page 31: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Simple example

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 0 0 4 40 0 0 5 50 0 0 2 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

𝐌

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.14 00.42 00.56 00.70 0

0 0.600 0.750 0.30

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

×

𝐔

(12.4 00 9.5) ×

𝚺(0.58 0.58 0.58 0 0

0 0 0 0.71 0.71)

𝐕𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 31 / 85

Page 32: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

In general, the concepts will not be so clearly composedThere will be fewer 0’s in 𝐔 and 𝐕𝚺 is always diagonalTypically the entities represented by the rows and columns of 𝐌 willcontribute to several different concepts to varying degrees

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 32 / 85

Page 33: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Interpretation

In our simple example the rank of the matrix 𝐌 was equal to thenumber of conceptsIn practice that is rarely the caseThe rank 𝑟 is often greater than the number of the concepts andsome of the columns in 𝐔 and 𝐕 are harder to interpret

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 33 / 85

Page 34: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

UserMovie Matrix Alien Star Wars Casablanca Titanic

𝐴1 1 1 1 0 0𝐴2 3 3 3 0 0𝐴3 4 4 4 0 0𝐴4 5 5 5 0 0𝐵1 0 2 0 4 4𝐵2 0 0 0 5 5𝐵3 0 1 0 2 2

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 34 / 85

Page 35: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

In this (more realistic) example 𝐵1 and 𝐵3 rated also “Alien”Neither liked it much, but nevertheless they rated itThis gives the matrix a rank of 3E.g. we may pick the first, sixth, and seventh rows and check thatthey are independentHowever no four rows are independent

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 35 / 85

Page 36: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

Now, in our decomposition we have 𝑟 = 3We will have three columns in 𝐔, 𝐕, and 𝚺The first column corresponds to science fictionThe second column corresponds to romanceThe interpretation of the third column is not easy (it is a linearcombination of the users)Nice property: the third columns is the least important one

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 36 / 85

Page 37: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

𝐔 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.13 0.02 −0.010.41 0.07 −0.030.55 0.09 −0.040.68 0.11 −0.050.15 −0.59 0.650.07 −0.73 −0.670.07 −0.29 0.32

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 37 / 85

Page 38: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

𝐕𝑇 = ⎛⎜⎝

0.56 0.59 0.56 0.09 0.090.12 −0.02 0.12 −0.69 −0.690.40 −0.80 0.40 0.09 0.09

⎞⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 38 / 85

Page 39: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

𝚺 = ⎛⎜⎝

12.4 0 00 9.5 00 0 1.3

⎞⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 39 / 85

Page 40: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

SVD Example and Interpretation

SVD: Another example

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.13 0.02 −0.010.41 0.07 −0.030.55 0.09 −0.040.68 0.11 −0.050.15 −0.59 0.650.07 −0.73 −0.670.07 −0.29 0.32

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

× ⎛⎜⎝

12.4 0 00 9.5 00 0 1.3

⎞⎟⎠

× ⎛⎜⎝

0.56 0.59 0.56 0.09 0.090.12 −0.02 0.12 −0.69 −0.690.40 −0.80 0.40 0.09 0.09

⎞⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 40 / 85

Page 41: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

Matrix diagonalization theorem

TheoremLet 𝐒 ∈ ℝ𝑛×𝑛 be a square matrix with 𝑛 linearly independenteigenvectors. Then there exists an eigendecomposition:

𝐒 = 𝐔𝚲𝐔−1

where the columns of 𝐔 are the eigenvectors of 𝐒 and 𝚲 is a diagonalmatrix whose diagonal entries are the eigenvalues of 𝐒 in decreasing order

𝚲 =⎛⎜⎜⎜⎜⎝

𝜆1𝜆2

…𝜆𝑛

⎞⎟⎟⎟⎟⎠

, 𝜆𝑖 ≥ 𝜆𝑖+1.

If the eigenvalues are distinct, then this decomposition is unique.

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 41 / 85

Page 42: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

Matrix diagonalization theorem

How does this theorem work?𝐔 has eigenvectors of 𝐒 as its columns

𝐔 = (𝐮1 𝐮2 … 𝐮𝑛)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 42 / 85

Page 43: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

Matrix diagonalization theorem

Then we have

𝐒𝐔 = 𝐒 × (𝐮1 𝐮2 … 𝐮𝑛)= (𝜆1𝐮1 𝜆2𝐮2 … 𝜆𝑛𝐮𝑛)

= (𝐮1 𝐮2 … 𝐮𝑛)⎛⎜⎜⎜⎜⎝

𝜆1𝜆2

…𝜆𝑛

⎞⎟⎟⎟⎟⎠

= 𝐔𝚲

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 43 / 85

Page 44: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

Matrix diagonalization theorem

Thus we have

𝐒𝐔 = 𝐔𝚲

𝐒 = 𝐔𝚲𝐔−1

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 44 / 85

Page 45: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

Symmetric diagonalization theorem

TheoremLet 𝐒 ∈ ℝ𝑛×𝑛 be a square symmetric matrix with 𝑛 linearly independenteigenvectors. Then there exists a symmetric diagonal decomposition:

𝐒 = 𝐐𝚲𝐐−1

where the columns of 𝐐 are the orthogonal and normalized eigenvectors of𝐒 (i.e. 𝐐 is an orthonormal matrix) and 𝚲 is a diagonal matrix whosediagonal entries are the eigenvalues of 𝐒. Further, all entries of 𝚲 and of𝐐 are real and we have 𝐐−1 = 𝐐𝑇 .

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 45 / 85

Page 46: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐌 = 𝐔𝚺𝐕𝑇

Let us calculate 𝐌𝑇

𝐌𝑇 = (𝐔𝚺𝐕𝑇 )𝑇

= (𝐕𝑇 )𝑇 𝚺𝑇 𝐔𝑇

= 𝐕𝚺𝑇 𝐔𝑇

= 𝐕𝚺𝐔𝑇

The last equality since 𝚺 is diagonal and thus 𝚺𝑇 = 𝚺

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 46 / 85

Page 47: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐌 = 𝐔𝚺𝐕𝑇

𝐌𝑇 = 𝐕𝚺𝐔𝑇

Let us calculate 𝐌𝐌𝑇

𝐌𝐌𝑇 = 𝐔𝚺𝐕𝑇 𝐕𝚺𝐔𝑇

= 𝐔𝚺2𝐔𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 47 / 85

Page 48: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐌𝐌𝑇 = 𝐔𝚺2𝐔𝑇

Thus, we have 𝐌𝐌𝑇 = 𝐒 and 𝚺2 = 𝚲That is 𝐔 is the matrix of eigenvectors of 𝐌𝐌𝑇

𝚺 = 𝚲1/2 is the matrix of square roots of the eigenvalues of 𝐌𝐌𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 48 / 85

Page 49: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐌 = 𝐔𝚺𝐕𝑇

𝐌𝑇 = 𝐕𝚺𝐔𝑇

Let us calculate 𝐌𝑇 𝐌

𝐌𝑇 𝐌 = 𝐕𝚺𝐔𝑇 𝐔𝚺𝐕𝑇

= 𝐕𝚺2𝐕𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 49 / 85

Page 50: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐌𝑇 𝐌 = 𝐕𝚺2𝐕𝑇

Thus, we have 𝐌𝑇 𝐌 = 𝐒 and 𝚺2 = 𝚲That is 𝐕 is the matrix of eigenvectors of 𝐌𝑇 𝐌𝚺 = 𝚲1/2 is the matrix of square roots of the eigenvalues of 𝐌𝑇 𝐌

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 50 / 85

Page 51: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

What is the relationship between eigenvalues of 𝐌𝐌𝑇 and 𝐌𝑇 𝐌Suppose 𝐮 is an eigenvector of 𝐌𝐌𝑇

𝐌𝐌𝑇 𝐮 = 𝜆𝐮

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 51 / 85

Page 52: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

We multiply both sides of the equation by 𝐌𝑇 on the left

𝐌𝑇 𝐌𝐌𝑇 𝐮 = 𝐌𝑇 𝜆𝐮𝐌𝑇 𝐌(𝐌𝑇 𝐮) = 𝜆(𝐌𝑇 𝐮)

As long as (𝐌𝑇 𝐮) is not the zero vector 𝟎 it will be an eigenvector of𝐌𝑇 𝐌, i.e. it is one of 𝐯 vectors

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 52 / 85

Page 53: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

The converse holds as wellSuppose 𝐯 is an eigenvector of 𝐌𝑇 𝐌

𝐌𝑇 𝐌𝐯 = 𝜆𝐯

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 53 / 85

Page 54: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

We multiply both sides of the equation by 𝐌 on the left

𝐌𝐌𝑇 𝐌𝐯 = 𝐌𝜆𝐯𝐌𝐌𝑇 (𝐌𝐯) = 𝜆(𝐌𝐯)

As long as (𝐌𝐯) is not the zero vector 𝟎 it will be an eigenvector of𝐌𝐌𝑇 , i.e. it is one of 𝐮 vectors

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 54 / 85

Page 55: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

What happens when e.g. 𝐌𝐯 = 𝟎

𝐌𝑇 𝐌𝐯 = 𝟎𝐌𝑇 𝐌𝐯 = 𝜆𝐯

𝜆𝐯 = 𝟎

Since 𝐯 is not 𝟎 it must be 𝜆 = 0

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 55 / 85

Page 56: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

If the dimension of 𝐌𝑇 𝐌 (𝑚) are less than the dimension of 𝐌𝐌𝑇

(𝑛):Eigenvalues of 𝐌𝑇 𝐌 are eigenvalues of 𝐌𝐌𝑇 plus additional zerosIf the dimension of 𝐌𝑇 𝐌 (𝑚) are greater than the dimension of𝐌𝐌𝑇 (𝑛):Eigenvalues of 𝐌𝐌𝑇 are eigenvalues of 𝐌𝑇 𝐌 plus additional zeros

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 56 / 85

Page 57: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD

𝐔 is the matrix of eigenvectors of 𝐌𝐌𝑇

𝚺 is the matrix of square roots of the non-zero eigenvalues of 𝐌𝐌𝑇

and 𝐌𝑇 𝐌 (equal eigenvalues + additional zeros)𝐕 is the matrix of eigenvectors of 𝐌𝑇 𝐌This gives also the algorithm for calculating the decomposition:eigenvalues and eigenvectors of 𝐌𝑇 𝐌 and 𝐌𝐌𝑇

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 57 / 85

Page 58: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD Interpretation

What does 𝐌𝐌𝑇 represent?It is a square matrix with rows and columns corresponding to usersEach element measures the overlap between the people based on theirco-ratings of the moviesIt is the sum of the products of their movie ratingsCo-occurrence matrix

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 58 / 85

Page 59: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD Interpretation

𝐌 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

𝐌𝐌𝑇 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

3 9 12 15 2 0 19 27 36 45 6 0 3

12 36 48 60 8 0 415 45 60 75 10 0 52 6 8 10 36 40 180 0 0 0 40 50 201 3 4 5 18 20 9

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 59 / 85

Page 60: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD Interpretation

What does 𝐌𝑇 𝐌 represent?It is a square matrix with rows and columns corresponding to moviesEach element measures the overlap between the movies based ontheir co-ratings by usersIt is the sum of the products of the ratings that they got fromdifferent peopleCo-occurrence matrix

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 60 / 85

Page 61: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mathematics of SVD

SVD Interpretation

𝐌 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

𝐌𝑇 𝐌 =⎛⎜⎜⎜⎜⎜⎜⎝

51 51 51 0 051 56 51 10 1051 51 51 0 00 10 0 45 450 10 0 45 45

⎞⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 61 / 85

Page 62: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Dimensionality Reduction with SVD

SVD dimensionality reduction

How do we use SVD for dimensionality reduction?We want to represent a very large matrix 𝐌 by its SVD components𝐔, 𝚺, and 𝐕We interpreted the entries in 𝚺 as the measure of importance ofconceptsWe might set the 𝑟 − 𝑘 smallest entries in 𝚺 to zeroWith this we eliminate the 𝑟 − 𝑘 rows of 𝐔 and 𝐕

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 62 / 85

Page 63: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Dimensionality Reduction with SVD

SVD dimensionality reduction

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.13 0.02 −0.010.41 0.07 −0.030.55 0.09 −0.040.68 0.11 −0.050.15 −0.59 0.650.07 −0.73 −0.670.07 −0.29 0.32

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

× ⎛⎜⎝

12.4 0 00 9.5 00 0 1.3

⎞⎟⎠

× ⎛⎜⎝

0.56 0.59 0.56 0.09 0.090.12 −0.02 0.12 −0.69 −0.690.40 −0.80 0.40 0.09 0.09

⎞⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 63 / 85

Page 64: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Dimensionality Reduction with SVD

SVD dimensionality reduction

Now we set the smallest value in 𝚺 to 0 and eliminate thecorresponding rows and columns in 𝐔 and 𝐕

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.13 0.020.41 0.070.55 0.090.68 0.110.15 −0.590.07 −0.730.07 −0.29

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

× (12.4 00 9.5) × (0.56 0.59 0.56 0.09 0.09

0.12 −0.02 0.12 −0.69 −0.69)

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 64 / 85

Page 65: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Dimensionality Reduction with SVD

SVD dimensionality reduction

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.13 0.020.41 0.070.55 0.090.68 0.110.15 −0.590.07 −0.730.07 −0.29

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

× (12.4 00 9.5) × (0.56 0.59 0.56 0.09 0.09

0.12 −0.02 0.12 −0.69 −0.69) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0.93 0.95 0.93 0.014 0.0142.93 2.99 2.93 0.000 0.0003.92 4.01 3.92 0.026 0.0264.84 4.96 4.84 0.040 0.0400.37 1.21 0.37 4.04 4.040.35 0.65 0.35 4.87 4.870.16 0.57 0.16 1.98 1.98

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 65 / 85

Page 66: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

Advanced: SVD dimensionality reduction

The resulting matrix is quite close to the original matrixThus, this approach to dimensionality reduction seems to work quitewellHowever, since we approximate 𝐌 → we need to measure theapproximation errorWe can pick among several measures for this errorFor SVD decomposition we pick Frobenius norm

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 66 / 85

Page 67: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

Advanced: SVD dimensionality reduction

Frobenius norm ||𝐌|| of a matrix 𝐌 is the square root of the sum ofthe squares of the elements of 𝐌

||𝐌|| =√√√⎷

𝑛∑𝑖=1

𝑚∑𝑗=1

𝑚2𝑖𝑗

= √𝑡𝑟(𝐌𝑇 𝐌)

=√√√⎷

𝑚𝑖𝑛(𝑛,𝑚)∑𝑖=1

𝜆𝑖

=√√√⎷

𝑚𝑖𝑛(𝑛,𝑚)∑𝑖=1

𝜎2𝑖

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 67 / 85

Page 68: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

Advanced: SVD dimensionality reduction

Now suppose we want to approximate 𝐌 with a matrix 𝐌′ of therank 𝑘 < 𝑟 such that ||𝐌 − 𝐌′|| is minimalThus, we minimize the Frobenius norm of the difference between theoriginal matrix and its approximationEckart-Young theorem states that the solution to this problem is ofthe form (proof is complicated):

𝐌′ = 𝐔𝚺′𝐕𝑇

𝚺′ is the same matrix as 𝚺 except that it contains 𝑘 largest singularvalues and 𝑟 − 𝑘 of the smallest values are replaced by zero

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 68 / 85

Page 69: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

Advanced: SVD dimensionality reduction

𝐌 − 𝐌′ = 𝐔(𝚺 − 𝚺′)𝐕𝑇

||𝐌 − 𝐌′|| =√√√⎷

𝑚𝑖𝑛(𝑛,𝑚)∑𝑖=1

(𝜎𝑖 − 𝜎′𝑖)2

The singular values of this matrix are kept in 𝚺 − 𝚺’These are zeros for 𝑟 − 𝑘 singular values that we choose to keepThey are non-zeros for all singular values that we set to zeroThus, to minimize the Frobenius norm we should set the smallestvalues to zero

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 69 / 85

Page 70: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

SVD dimensionality reduction

How many singular values should we keep?A useful rule of the thumb is to keep enough singular values to makeup 90% of energy in 𝚺We define energy as the sum of squares of singular values

𝑚𝑖𝑛(𝑛,𝑚)∑𝑖=1

𝜎2𝑖

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 70 / 85

Page 71: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Advanced: SVD minimizes the approximation error

SVD dimensionality reduction

In the example the total energy:

12.42 + 9.52 + 1.32 = 245.7

By removing the smallest singular value: (12.42 + 9.52 = 244.01) wekeep 99% of the energyBy also removing the second smallest singular value:(12.42 = 153.76) we would keep only 63% of the energy

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 71 / 85

Page 72: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

SVD of a term-document matrix

Vector Space Model can not cope with two classic problems arising innatural languagesSynonymy: two words having the same meaningE.g. “car” and “automobile”Those synonym words get separate dimensions in the VSMThe model would underestimate the similarity of a documentcontaining both “car” and “automobile” to a query containing only“car”Polysemy: one word having multiple meaningsE.g. “bank” may mean a financial institution or a river bank

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 72 / 85

Page 73: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

SVD of a term-document matrix

Another problem is the dimension of the term-document matrixIn latent semantic analysis (LSA) or latent semantic indexing (LSI)we use SVD to create a low-rank approximation of theterm-document matrixWe select 𝑘 largest singular values and create 𝐌𝑘 approximation tothe original matrix

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 73 / 85

Page 74: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

SVD of a term-document matrix

With SVD we map each term/document to latent topicsIn practice, however the interpretation is rather difficultBy computing low-rank approximation of the original term-documentmatrix the SVD brings together the terms with similar co-occurrencesRetrieval quality may actually be improved by the approximation!Projecting onto a smaller number of dimensions brings similardocuments closer (signal-to-noise ratio)Retrieval by folding the query into the low-rank space

𝐪𝑘 = 𝚺−1𝐔𝑇 𝐪

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 74 / 85

Page 75: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

SVD of a term-document matrix

As we reduce 𝑘 recall improvesA value of 𝑘 in low hundreds tend to increase precision as well (thissuggests that a suitable 𝑘 addresses some of the challenges ofsynonymy)LSI works best in applications where there is little overlap betweendocuments and the queryLSI can be viewed as a soft clustering methodEach concept is a cluster and the value that a document has at thatconcept is its fractional membership in that concept

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 75 / 85

Page 76: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: Example

Technical memo titlesTwo different collectionsThe first about HCIThe second about graph theory

ExampleExample from Melanie Martin AI Seminar

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 76 / 85

Page 77: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: Example

c1: Human machine interface for ABC computer applicationsc2: A survey of user opinion of computer system response timec3: The EPS user interface management systemc4: System and human system engineering testing of EPSc5: Relation of user perceived response time to error measurement

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 77 / 85

Page 78: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: Example

m1: The generation of random, binary, ordered treesm2: The intersection graph of paths in treesm3: Graph minors IV: Widths of trees and well-quasi-orderingm4: Graph minors: A survey

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 78 / 85

Page 79: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

TermTitle c1 c2 c3 c4 c5 m1 m2 m3 m4

human 1 0 0 1 0 0 0 0 0interface 1 0 1 0 0 0 0 0 0

computer 1 1 0 0 0 0 0 0 0user 0 1 1 0 1 0 0 0 0

system 0 1 1 2 0 0 0 0 0response 0 1 0 0 1 0 0 0 0

time 0 1 0 0 1 0 0 0 0EPS 0 0 1 1 0 0 0 0 0

survey 0 1 0 0 0 0 0 0 1trees 0 0 0 0 0 1 1 1 0

graph 0 0 0 0 0 0 1 1 1minors 0 0 0 0 0 0 0 1 1

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 79 / 85

Page 80: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

We would expect that human is similar to user but not to minors inthis contextCorrelation coefficient (covariance normalized to interval [−1, 1])𝑟(ℎ𝑢𝑚𝑎𝑛, 𝑢𝑠𝑒𝑟) = −0.37796𝑟(ℎ𝑢𝑚𝑎𝑛, 𝑚𝑖𝑛𝑜𝑟𝑠) = −0.28571

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 80 / 85

Page 81: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

1 2 3 4 5 6 7 8 9human -0.22 -0.11 0.29 -0.41 -0.11 -0.34 -0.52 0.06 0.41

interface -0.20 -0.07 0.14 -0.55 0.28 0.50 0.07 0.01 0.11computer -0.24 0.04 -0.16 -0.60 -0.11 -0.26 0.30 -0.06 -0.49

user -0.40 0.06 -0.34 0.10 0.33 0.38 -0.00 0.00 -0.01system -0.64 -0.17 0.36 0.33 -0.16 -0.21 0.17 -0.03 -0.27

response -0.27 0.11 -0.43 0.07 0.08 -0.17 -0.28 0.02 0.05time -0.27 0.11 -0.43 0.07 0.08 -0.17 -0.28 0.02 0.05EPS -0.30 -0.14 0.33 0.19 0.11 0.27 -0.03 0.02 0.17

survey -0.21 0.27 -0.18 -0.03 -0.54 0.08 0.47 0.04 0.58trees -0.01 0.49 0.23 0.02 0.59 -0.39 0.29 -0.25 0.23

graph -0.04 0.62 0.22 0.00 -0.07 0.11 -0.16 0.68 -0.23minors -0.03 0.45 0.14 -0.01 -0.30 0.28 -0.34 -0.68 -0.18

Table: 𝐔

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 81 / 85

Page 82: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

1 2 3 4 5 6 7 8 91 3.34 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.002 0.00 2.54 0.00 0.00 0.00 0.00 0.00 0.00 0.003 0.00 0.00 2.35 0.00 0.00 0.00 0.00 0.00 0.004 0.00 0.00 0.00 1.64 0.00 0.00 0.00 0.00 0.005 0.00 0.00 0.00 0.00 1.50 0.00 0.00 0.00 0.006 0.00 0.00 0.00 0.00 0.00 1.31 0.00 0.00 0.007 0.00 0.00 0.00 0.00 0.00 0.00 0.85 0.00 0.008 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.56 0.009 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.36

Table: 𝚺

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 82 / 85

Page 83: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

1 2 3 4 5 6 7 8 9c1 -0.20 -0.61 -0.46 -0.54 -0.28 -0.00 -0.01 -0.02 -0.08c2 -0.06 0.17 -0.13 -0.23 0.11 0.19 0.44 0.62 0.53c3 0.11 -0.50 0.21 0.57 -0.51 0.10 0.19 0.25 0.08c4 -0.95 -0.03 0.04 0.27 0.15 0.02 0.02 0.01 -0.02c5 0.05 -0.21 0.38 -0.21 0.33 0.39 0.35 0.15 -0.60

m1 -0.08 -0.26 0.72 -0.37 0.03 -0.30 -0.21 0.00 0.36m2 -0.18 0.43 0.24 -0.26 -0.67 0.34 0.15 -0.25 -0.04m3 0.01 -0.05 -0.01 0.02 0.06 -0.45 0.76 -0.45 0.07m4 0.06 -0.24 -0.02 0.08 0.26 0.62 -0.02 -0.52 0.45

Table: 𝐕

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 83 / 85

Page 84: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

TermTitle c1 c2 c3 c4 c5 m1 m2 m3 m4

human 0.16 0.40 0.38 0.47 0.18 -0.05 -0.12 -0.16 -0.09interface 0.14 0.37 0.33 0.40 0.16 -0.03 -0.07 -0.10 -0.04

computer 0.15 0.51 0.36 0.41 0.24 0.02 0.06 0.09 0.12user 0.26 0.84 0.61 0.70 0.39 0.03 0.08 0.12 0.19

system 0.45 1.23 1.05 1.27 0.56 -0.07 -0.15 -0.21 -0.05response 0.16 0.58 0.38 0.42 0.28 0.06 0.13 0.19 0.22

time 0.16 0.58 0.38 0.42 0.28 0.06 0.13 0.19 0.22EPS 0.22 0.55 0.51 0.63 0.24 -0.07 -0.14 -0.20 -0.11

survey 0.10 0.53 0.23 0.21 0.27 0.14 0.31 0.44 0.42trees -0.06 0.23 -0.14 -0.27 0.14 0.24 0.55 0.77 0.66

graphs -0.06 0.34 -0.15 -0.30 0.20 0.31 0.69 0.98 0.85minors -0.04 0.25 -0.10 -0.21 0.15 0.22 0.50 0.71 0.62

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 84 / 85

Page 85: Knowledge Discovery and Data Mining 1 (VO) (706.701) · Knowledge Discovery and Data Mining 1 (VO) (706.701) Singular Value Decomposition and Latent Semantic Analysis ... November

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Latent Semantic Indexing

LSA: example

𝑟(ℎ𝑢𝑚𝑎𝑛, 𝑢𝑠𝑒𝑟) = 0.9385𝑟(ℎ𝑢𝑚𝑎𝑛, 𝑚𝑖𝑛𝑜𝑟𝑠) = −0.8309LSA brought together human and user through co-occurrencesAlso, the dissimilarity between human and minors is now stronger

Denis Helic (ISDS, TU Graz) KDDM1 November 29, 2018 85 / 85