33
Understanding Feature Space in Machine Learning Alice Zheng, Dato September 9, 2015 1

Understanding Feature Space in Machine Learning

Embed Size (px)

Citation preview

Page 1: Understanding Feature Space in Machine Learning

1

Understanding Feature Space in Machine Learning

Alice Zheng, DatoSeptember 9, 2015

Page 2: Understanding Feature Space in Machine Learning

2

My journey so farApplied machine learning

(Data science)

Build ML tools

Shortage of expertsand good tools.

Page 3: Understanding Feature Space in Machine Learning

3

Why machine learning?

Model data.Make predictions.Build intelligent

applications.

Page 4: Understanding Feature Space in Machine Learning

4

The machine learning pipeline

I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, …

Raw data

FeaturesModels

Predictions

Deploy inproduction

Page 5: Understanding Feature Space in Machine Learning

Feature = numeric representation of raw data

Page 6: Understanding Feature Space in Machine Learning

6

Representing natural text

It is a puppy and it is extremely cute.

What’s important? Phrases? Specific words? Ordering?

Subject, object, verb?

Classify: puppy or not?

Raw Text

{“it”:2, “is”:2, “a”:1, “puppy”:1, “and”:1, “extremely”:1, “cute”:1 }

Bag of Words

Page 7: Understanding Feature Space in Machine Learning

7

Representing natural text

It is a puppy and it is extremely cute.

Classify: puppy or not?

Raw Text Bag of Wordsit 2

they 0

I 1

am 0

how 0

puppy 1

and 1

cat 0

aardvark 0

cute 1

extremely 1

… …

Sparse vector representation

Page 8: Understanding Feature Space in Machine Learning

8

Representing images

Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009.

Raw image: millions of RGB triplets,one for each pixel

Classify: person or animal?Raw Image Bag of Visual Words

Page 9: Understanding Feature Space in Machine Learning

9

Representing imagesClassify: person or animal?Raw Image Deep learning features

3.29-15

-5.2448.31.3647.1

-1.9236.52.8395.4-19-89

5.0937.8

Dense vector representation

Page 10: Understanding Feature Space in Machine Learning

10

Feature space in machine learning• Raw data high dimensional vectors• Collection of data points point cloud in feature

space• Model = geometric summary of point cloud• Feature engineering = creating features of the

appropriate granularity for the task

Page 11: Understanding Feature Space in Machine Learning

Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes.

-- Masha Gessen, “Perfect Rigor”

Page 12: Understanding Feature Space in Machine Learning

12

Algebra vs. Geometry

a

bc

a2 + b2 = c2

Algebra GeometryPythagoreanTheorem

(Euclidean space)

Page 13: Understanding Feature Space in Machine Learning

13

Visualizing a sphere in 2D

x2 + y2 = 1

a

bc

Pythagorean theorem:a2 + b2 = c2

x

y

1

1

Page 14: Understanding Feature Space in Machine Learning

14

Visualizing a sphere in 3D

x2 + y2 + z2 = 1

x

y

z

1

11

Page 15: Understanding Feature Space in Machine Learning

15

Visualizing a sphere in 4D

x2 + y2 + z2 + t2 = 1

x

y

z

1

11

Page 16: Understanding Feature Space in Machine Learning

16

Why are we looking at spheres?

= =

= =

Poincaré Conjecture:All physical objects without holes

is “equivalent” to a sphere.

Page 17: Understanding Feature Space in Machine Learning

17

The power of higher dimensions• A sphere in 4D can model the birth and death

process of physical objects• Point clouds = approximate geometric shapes• High dimensional features can model many things

Page 18: Understanding Feature Space in Machine Learning

Visualizing Feature Space

Page 19: Understanding Feature Space in Machine Learning

19

The challenge of high dimension geometry• Feature space can have hundreds to millions of

dimensions• In high dimensions, our geometric imagination is

limited- Algebra comes to our aid

Page 20: Understanding Feature Space in Machine Learning

20

Visualizing bag-of-words

puppy

cute

1

1

I have a puppy andit is extremely cute

I have a puppy andit is extremely cute

it 1

they 0

I 1

am 0

how 0

puppy 1

and 1

cat 0

aardvark 0

zebra 0

cute 1

extremely 1

… …

Page 21: Understanding Feature Space in Machine Learning

21

Visualizing bag-of-words

puppy

cute

1

11

extremely

I have a puppy and it is extremely cute

I have an extremely cute cat

I have a cute puppy

Page 22: Understanding Feature Space in Machine Learning

22

Document point cloudword 1

word 2

Page 23: Understanding Feature Space in Machine Learning

23

What is a model?• Model = mathematical “summary” of data• What’s a summary?

- A geometric shape

Page 24: Understanding Feature Space in Machine Learning

24

Classification modelFeature 2

Feature 1

Decide between two classes

Page 25: Understanding Feature Space in Machine Learning

25

Clustering modelFeature 2

Feature 1

Group data points tightly

Page 26: Understanding Feature Space in Machine Learning

26

Regression modelTarget

Feature

Fit the target values

Page 27: Understanding Feature Space in Machine Learning

Visualizing Feature Engineering

Page 28: Understanding Feature Space in Machine Learning

28

When does bag-of-words fail?

puppy

cat

2

11

have

I have a puppy

I have a catI have a kitten

Task: find a surface that separates documents about dogs vs. cats

Problem: the word “have” adds fluff instead of information

I have a dogand I have a pen

1

Page 29: Understanding Feature Space in Machine Learning

29

Improving on bag-of-words• Idea: “normalize” word counts so that popular words

are discounted• Term frequency (tf) = Number of times a terms

appears in a document• Inverse document frequency of word (idf) =

• N = total number of documents• Tf-idf count = tf x idf

Page 30: Understanding Feature Space in Machine Learning

30

From BOW to tf-idf

puppy

cat

2

11

have

I have a puppy

I have a catI have a kitten

idf(puppy) = log 4idf(cat) = log 4idf(have) = log 1 = 0

I have a dogand I have a pen

1

Page 31: Understanding Feature Space in Machine Learning

31

From BOW to tf-idf

puppy

cat1

have

tfidf(puppy) = log 4tfidf(cat) = log 4tfidf(have) = 0

I have a dogand I have a pen,I have a kitten

1

log 4

log 4

I have a cat

I have a puppy

Decision surface

Tf-idf flattens uninformative

dimensions in the BOW point cloud

Page 32: Understanding Feature Space in Machine Learning

32

Entry points of feature engineering• Start from data and task

- What’s the best text representation for classification?• Start from modeling method

- What kind of features does k-means assume?- What does linear regression assume about the data?

Page 33: Understanding Feature Space in Machine Learning

33

That’s not all, folks!• There’s a lot more to feature engineering:

- Feature normalization- Feature transformations- “Regularizing” models- Learning the right features

• Dato is hiring! [email protected]@dato.com @RainyData