Classification With Low Rank and Missing Data › ~rlivni › files › MissingPoster.pdf · Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a

Printing:This poster is 48” wide by 36” high. It’s designed to be printed on a largeprinter.

Customizing the Content:The placeholders in this formatted for you. placeholders to add text, or click an icon to add a table, chart, SmartArt graphic, picture or multimedia file.

Tjust click the Bullets button on the Home tab.

If you need more placeholders for titles, contentwhat you need and drag it into place. PowerPoint’s Smart Guides will help you align it with everything else.

Want to use your own pictures instead of ours? No problem! Just pictureMaintain the proportion of pictures as you resize by dragging a corner.

• We devise a “scalar product” with missing attributes:

𝜙𝛾 (𝑥𝑜 𝑖) ⋅ 𝜙𝛾( 𝑥𝑜 𝑗) =

1 − 𝑜𝑖 ∩ 𝑜𝑗𝛾

1 − |𝑜𝑖 ∩ 𝑜𝑗|

𝑘∈𝑜𝑖∩𝑜𝑗

𝑥𝑜 𝑖,𝑘 𝑥𝑜 𝑗,𝑘

Return: 𝑓𝑆 𝑥𝑜 : = ∑𝛼𝑖𝜙𝛾( 𝑥𝑜 𝑖) s.t.:

𝒎𝒊𝒏𝒊𝒎𝒊𝒛𝒆 𝑪 𝒇𝑺𝟐 +

𝟏

𝒎

𝒊=𝟏

𝒎

ℓ 𝒇𝑺 𝒙𝒐 𝒊 , 𝒚𝒊

We devise a simple yet powerful algorithm that can learn class of linear predictors coupled with reconstructions.

• We consider classification tasks with missing data and assume the data resides in a low rank subspace.

• We avoid any explicit matrix completion. Instead we directly dive in to the classification problem.

• Our main result is an efficient algorithm for linear and kernel classification that performs as well as the best classifier that has access to all data.

Common approach for missing data invoke imputations and Generative Models.

• Hard problems.

• ill-posed problems.

• Wrong completion leads to bad Classification.

Classification With Low Rank and Missing DataElad Hazan1,2, Roi Livni1,3 and Yishay Mansour1,4

1 Microsoft Research, Hertzeliya, Israel. 2 Princeton university, 3 The Hebrew University, Israel 4 Tel Aviv University, Israel

“One should solve the classification problem directly and

never solve a more general problem as an intermediate

step.”

Validimir Vapnik.

1 * 1 * 1

* -1 * -1 -1

1 -1 1 -1 1

1 * 1 * ?

1 1 1 1 1

-1 -1 -1 -1 -1

1 -1 1 -1 1

1 1 1 1 1

1 -1 1 -1 1

1 -1 1 -1 -1

1 -1 1 -1 1

1 -1 1 -1 -1

A DISCRIMINATIVE APPROACH

PROBLEM AND MODEL ASSUMPTIONS

• A sample {𝑥𝑖 , 𝒐𝒊, 𝑦𝑖}𝑖=1𝑚 is drawn: 𝒐𝒊, are the observed attributes.

• Learner is provided with sample 𝑆 = { 𝑥𝑜 𝑖 , 𝑦𝑖}𝑖=1𝑚 where

𝑥𝑜 𝑖,𝑗 = 𝑥𝑖,𝑗 𝑗 ∈ 𝑜𝑖∗ 𝑒𝑙𝑠𝑒

:

• Objective: Output target function 𝑓𝑆 𝑥𝑜 and compete with full knowledge linear classifier:

𝐸 ℓ 𝑓𝑆 𝑥𝑜 , 𝑦 ≤ min| 𝑤|<1

𝐸 ℓ 𝑤 ⋅ 𝑥, 𝑦 + 𝜖

Model Assumptions:

• Low rank assumption-, 𝑥 can be linearly reconstructed from 𝑥𝑜.

• Regularity: Reconstruction matrix has singular values 𝜆𝑗 > 𝜆 > 0

ALGORITHM-KARMA

1 -1 1 0

1 -1 1 -1

1 -1 1 0

1 -1

1 -1

1 -1

Kernelized Algorithm for Risk Minimization with Missing Attributes

• Karma Algorithm runs in polynomial time– sample size, dimension.

• Quasi-polynomial sample complexity–

• Choose 𝛾 𝜖 >𝑙𝑜𝑔 1/𝜖

𝜆. Let Γ 𝜖 ≔

𝑑𝛾 𝜖 +1−𝑑

𝑑−1

• If 𝑚 ∈ ΩΓ 𝜖 2𝑙𝑜𝑔

1

𝛿

𝜖2then Output 𝑓𝑆 achieves objective.

• Polynomial sample complexity: Under further large margin assumptions.

• Dependence in 𝜆, may be improved.

• Online regret bounds may be derived.

• We conducted two types of toy experiments, with two noise models: MCAR and three-blocks features.

• We conducted experiments on real data.

GUARANTEES

Multi-Class

name Karma 0-imp mcb

cleveland 0.44 0.42 0.48

Dermatol 0.03 0.04 0.04

movielens 0.81 0.87 0.86

Regression

name Karma 0-imp mc

jester 0.23 0.24 0.27

books 0.25 0.25 0.25

movielens 0.16 0.22 0.25

Binary Labeling

name Karma 0-imp mc

horses 0.35 0.36 0.37

Bands 0.24 0.34 0.40

movielens 0.22 0.26 0.28

1 * 1

1 * 1

-1 * -1

* -1 1

* 1 -1

1 -1 1 0

1 -1 1 -1

1 -1 1 0

Projection to observable attributes

Unobserved attributes can be linearly reconstructed

KARMA is a simple algorithm with expressive power.

• Objective: find best 𝑉,𝑤 𝑡ℎ𝑎𝑡

predicts: 𝑓(𝑥𝑜) = 𝑤𝑜⊤((𝑉𝑉⊤)𝑜)

† 𝑥𝑜.

• Approximation: 𝑓 𝑥𝑜 ≈ 𝑤𝑜 ∑𝑘=1𝛾

𝑉𝑉⊤𝑜

𝑘𝑥𝑜 + O(𝜆𝛾).

• Improper Learning: 𝜙𝛾, embeds 𝑥𝑜 in a Hilbert space that

contains all Taylor approximations of order 𝛾

GOOD KARMA

−𝐴1 − −𝐴2 − −𝐴𝑑 −⋯

o

o 𝐴𝑜

REFERENCES

• [1] Goldberg, Andrew B., Zhu, Xiaojin, Recht, Ben, Xu, JunMing, and Nowak, Robert D. Transduction with matrix completion: Three birds with one stone. (2010)

• [2] Alcal-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garca, S., Snchez, L., and Herrera., Keel data-mining software tool: Data set repository.

• [3] V.N Vapnik, Statistical Learning Theory (1998).

PREVIOUS WORK EXPERIMENTS

INTRODUCTION

Documents

Classification With Low Rank and Missing Data › ~rlivni › files › MissingPoster.pdf · Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a