MACHINE LEARNING & APPLICATIONS - WordPress.com · Machine Learning & Applications Course Objectives: Understanding Human learning aspects. Understanding primitives and methods in

MACHINE LEARNING &

APPLICATIONS

Introductory Videos

What is Machine Learning

Example of Machine Learning

Machine Learning - What Is Machine Learning.mp4

What is Machine Learning.mp4

Career opportunities

Search Engine companies: Google Yahoo Bing (Microsoft) Ask

Social network companies: Facebook Twitter Linkedin Instagram Tumblr

Engineering related companies:

Intel Oil industry IBM HCL Technologies Wipro Technologies Verizon Visa Boeing SAP Oracle

Financial related companies: Amazon Apple eBay EMC Bank of America Capital One Paypal GE Capital

Career opportunities

Data science vendors: Palantir Teradata Pixar SAS Alpine Labs Pivotal Tableau

More than 8000 companies hiring data

scientist.

First Lecture

Examination Scheme

Prerequisites

Course Objectives

Course Outcomes

Syllabus Mapping

Last Three Year Result

Examination Scheme

Machine Learning & Applications

Course Objectives:

Understanding Human learning aspects.

Understanding primitives and methods in learning process by computer.

Understanding the nature of problems solved with Machine Learning

Machine Learning & Applications

Course Outcomes:

Model the learning primitives.

Build the learning model.

Tackle real world problems in the domain of Data Mining and Big Data Analytics, Information Retrieval, Computer vision, Linguistics and Bioinformatics.

Text Books

Text Books

10 to 15

[email protected]

http://www-bcf.usc.edu/~gareth/

faculty.washington.edu/ dwitten/

[email protected]

web.stanford.edu/~hastie/

tatweb.stanford.edu/~tibs/

mailto:[email protected]




mailto:[email protected]

Syllabus Mapping

UNIT-I

Syllabus Mapping

UNIT-II

Syllabus Mapping

UNIT-III

Syllabus Mapping

UNIT-IV

Syllabus Mapping

UNIT-V

Syllabus Mapping

UNIT-VI

Result of Last Three Years

2017-18 2016-17 2015-16

0

10

20

30

40

50

60

70

80

90

100

Result Analysis

Result in %

Academic Year

Re

su

lt in

%

Academic Year Highest Marks Name of the Subject

Topper

2017-18 79 Pagare Snehal

2016-17 79 Prerana Bafna

2015-16 71 Kothawade Priyanka

Unit 1

INTRODUCTION TO MACHINE LEARNING Introduction: What is Machine Learning, Examples of Machine Learning applications, Training versus Testing, Positive and Negative Class, Cross-validation Types of Learning: Supervised, Unsupervised and Semi-Supervised Learning Dimensionality Reduction: Introduction to Dimensionality Reduction, Subset

Selection, Introduction to Principal Component Analysis

Overview of Machine Learning

Recap

Machine Designing Or

Machine Learning ?

8/2/2018 21

Designing Algorithms and Analysis (DAA)

OR (LAA)

8/2/2018 22

Recap

• Y is calculated

• A is Designed

• One Stroke Process

Complexity ( LB, UB)

(e.g. Straussan’s-Matrix and it’s LB)

A

X Y

DAA

Design Analysis (Design Effects)

Designing

8/2/2018 23

Techniques/Models/Methods:

Divide and Conquer

Greedy

Dynamic Programming

BackTracking, etc.

A

calculates /searches

actual output

• Y is Estimated

• A is Learned

• Two Stroke Process

• Complexity

• Overfitting/ Underfitting

• Bias-Variance Tradeoff

• Learning Curves - (Ein & Eout Vs N )

A

X

Y

LAA

Learning Analysis (Learning Effects)

Two Stroke Process :

• Training Data Set and Testing Data Set

• Universal Data Set (OMG -- !! ) --- ??

• Probability for Inferencing and Emulating Universal Set to prove feasibility of learning

• Hoeffding Inequality-- Tells us How Poor the Training Set is !!

( Relates knowledge hidden in Training Set to Universal Set )

Techniques/Models/Methods:

Linear/Non-Linear

Parametric/Non-Parametric

Kernel Based Models

Probabilistic Models , etc.

Designing and Learning

8/2/2018 24

A

learns/searches

“Hypothesis”

that helps to

predict/describe output

Definition of Machine Learning

Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed". Definition by Herbert Simon’s “Learning is any process by which a system improves performance from experience.” According to Definition by Tom M. Mitchell "Machine Learning is the study of algorithms that improve their performance P at some task T with experience E.” A well define learning task is given by <P, T, E>.

Spam filter Example

Consider the example of Spam Filter, an email program which watches the email is to be mark as spam or not. T: To decide whether an email is spam or not E: The number of emails which are correctly decided as spam/not spam P: Observing the label of emails for available email data

Let’s do exercise…. Importance of label

• Apple • Banana • Red • Yellow • Orange • Blue

• Cherry

• Apple • Banana • Orange • Cherry

• Red • Yellow • Blue

• Apple • Banana • Cherry

• Red • Yellow • Blue • Orange

Let’s do exercise….

• Apple - Fruit • Banana -Fruit • Red -Color • Yellow-Color • Orange – Fruit • Blue - Color

• Cherry -Fruit

• Apple • Banana • Orange • Cherry

• Red • Yellow • Blue

supervised and unsupervised learning

suppose you had a basket and it is filled with some different kinds of fruits, your task is to arrange them as groups. For understanding let me clear the names of the fruits in our basket.

Unsupervised Learning :

No Training rules or data available while grouping of fruits. suppose you have considered color RED COLOR GROUP: apples & cherry fruits. GREEN COLOR GROUP: bananas & grapes. so now you will take another physical character such as size

RED COLOR AND BIG SIZE: apple. RED COLOR AND SMALL SIZE: cherry fruits. GREEN COLOR AND BIG SIZE: bananas. GREEN COLOR AND SMALL SIZE: grapes.

Unsupervised Learning

Here you didn’t know learn any thing before , means label are not included in training data and no response variable. This type of learning is know unsupervised learning.

Apple

Banana

Grapes

Cherry

four types of fruits

Supervised Learning :

physical characters of fruits are known . So arranging the same type of fruits at one place is easy now. Your previous work is called as training data in data mining. so you already learn the things from your train data, this is because of response variable. Response variable mean just a decision variable. You can observe response variable below (FRUIT NAME) .

No. SIZE COLOR SHAPE FRUIT NAME

1 Big Red Rounded shape with a depression at the

top Apple

2 Small Red Heart-shaped to nearly globular

Cherry

3 Big Green Long curving

cylinder Banana

4 Small Green Round to oval,Bunch

shape Cylindrical Grape

Supervised LEarning

If you learn the thing before from training data and

then applying that knowledge to the test data(for

new fruit), This type of learning is called as

Supervised Learning.

Supervised Learning

If you learn the thing before from training data and

then applying that knowledge to the test data(for

new fruit), This type of learning is called as

Supervised Learning.

Learning Associations

100 customers

10 8

6

P (Milk | Butter) = 6/100 = 0.06. It concludes that 6 percent of customers who buy butter also

buy milk.

Classification

IF income> θ1 AND savings> θ2 THEN low-risk ELSE high-risk

Pattern recognition

• Recognition or authentication of people using their physiological characteristics

• Difficult to write programs that recognizing a face

• A machine learning algorithm then takes these examples and produces a program

Regression

Brand, Year, Engine Capacity,

Mileage, And

Other Information

X denotes the car attributes and Y be the price of the car

Y = WX + W0, for suitable values of w and w0


To find clusters or groupings of input of similar data.

Reinforcement learning

Examples of Machine Learning Application

•Learning Associations •Classification •Regression •Unsupervised Learning •Reinforcement Learning

Training Vs Testing

Training Vs Testing

Training Phase: • Input :Training dataset having attributes

and class labels to prepare model.

• to find relationships, detect patterns, understand complex problems and make

decisions. • Training error is the error that is occurred

by applying the model to the same data

from which the model is trained. • In simple way the actual output of training

data and predicted output of model does not match the training error Ein is said to be occurred.

• Training error is much easier to compute.

Testing Phase: • Input: Test dataset is a dataset for which

class label is unknown

• For assessment of the finally chosen model. • Training and Testing dataset are completely

different. • Testing error is the error that is occurred by

assessing the model by providing the

unknown data to the model. • In simple way the actual output of testing

data and predicted output of model does not match the testing error E out is said to be occurred.

• E out is observed generally larger then Ein.

Cross-validation

• To minimize the generalization error. • The generalization error is essentially the average error for data the model has have never seen. • In general, the dataset is divided into two partition training and test sets.

• The fit method is called on the training set to build the model. • This fit method is applied to the model on the test set to estimate the target value and evaluate the

model's performance. • The reason the data is divided into training and test sets is to use the test set to estimate how well

the model trained on the training data and how well it would perform on the unseen data.

• However, cross-validation is a method that goes beyond evaluating a single model using a single train and test split of the data.

• It is applied to more subsets created using the training dataset and each of which is used to train and evaluate a separate model.

• Cross-validation is a method for getting a reliable estimate of model performance using only the

available training data. • There are several ways to cross-validate. The most common is K fold cross-validation.

Cross Validation….

K Fold Cross Validation

Algorithm for K Fold Cross Validation:

1. Split the dataset into K equal partitions (or “folds”). 2. Use fold 1 as the testing set and the union of the other folds as the training

set. 3. Calculate testing accuracy. 4. Repeat steps 2 and 3 K times, using a different fold as the testing set each

time. 5. Use the average testing accuracy as the estimate of out-of-sample accuracy. A value of k=10 is very common in the field of applied machine learning.

Positive and Negative Class

The ingredients of Machine Learning

right features to build the right models that achieve the right tasks

• Features define a ‘language’ in which the relevant objects are defined in particular domain.

E.g. Car object can have features Model No, Manufacturing Year, run kilometers etc. • Task is an abstract representation of a problem we want to solve

regarding those domain objects E.g. to decide the price of used car. • Many of the tasks can be represented as a mapping from data

points to outputs. This mapping is done by the machine learning model.

• There is a wide variety of models to choose from, so it is

observed that models lend the machine learning fielddiversity, but

tasks and features give it unity

Machine Learning Task:

The problems that can be solved with machine learning is generally defined by task Task have the broad categories:

Supervised Learning and Unsupervised Learning:

The task of grouping data with prior information is known as Supervised Learning and the task of finding out hidden structure from given data is unsupervised learning.

Predictive Model and Descriptive Model: The output of predictive model involves the target variable. The model tries to predict a value X using

other values in the dataset. For example, it tries to predict if loan is approved or not, an e-mail is spam or not. The output of descriptive model does not involve the target variable. A descriptive model instead tries

to find structure of data in novel and interesting ways. More specifically it detects or recognizes a particular pattern.

Categories of Machine Learning Task:

Predictive Task

Binary Classification: The task of classifying the given instances into two groups on the basis of classification rules. It is intuitive and easy to explain. E.g. decide the category of Email Spam or Ham

Multiclass Classification: The task of classifying the instances into more than two groups. E.g. decide the category of Email Spam or Private Mail or Work-related mail.

Regression: Sometimes it is natural to discard the notion of discrete classes, instead predict a real number. E.g. randomly selecting a n email from inbox and label it with an urgency score (between 0 to 1),

work related email are labelled with priority 1.1 and so on. Clustering: The task of grouping data without prior information is known as clustering. A typical

clustering works by measuring the similarities between given instances, putting similar instances in same cluster and dissimilar instances into different cluster. In one way of clustering, every cluster has one representative known as exemplar, this clustering is known as predictive clustering.

Descriptive Task

Subgroup discovery: In Subgroup discovery the dataset is given with instances and some attributes of instances. The task of machine learning is to find the sub groups of the instances that are statistically more interesting. Subgroup discovery attempts to search relations

between different properties or variables of a set with respect to a target variable. The relations are generally represented through rules, e.g. if LoC > 100 and complexity > 4 then

code is defective. Association rule discovery: Association analysis is useful for discovering interesting

relationship hidden in large dataset. The relationship can be represented in the form of association rules or frequent item set. For Market Basket Analysis Considering the

association rule of two item set are in the form of x and y is X→Y, e.g. {bread}→{milk}, the person who has purchased bread also purchased milk.

Descriptive Clustering: In descriptive clustering exemplars are not used

Unit 1



Unit 1



Supervised Learning

• The task of grouping data with prior information in terms of labelled training data is known as supervised learning.

• In training data each instance is a pair of an input object a desired output value.

• supervised learning is to have input variables (x), output variable (Y) and an algorithm to learn the mapping function from the input to the output.

𝑌 = 𝑓(𝑋) • A supervise learning analyze the training data and produce inferred function,

which can be used to map new examples.

Supervised Learning Model

Supervised Learning Task

Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.


• In machine learning task of unsupervised learning is that of trying to find hidden structure in unlabeled data.

• The training data is unlabeled, so there is no error or reward signal to evaluate a partial solution.

• Unsupervised learning is to have input data (X) and no corresponding output variables.

• The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

• These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher.

Unsupervised Learning Model

Unsupervised Learning Task

Clustering A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Semisupervised Learning Task

• Large amount of input data (X) and only some of the data is labeled (Y) • originate in between both supervised and unsupervised learning. • A good example is a photo archive where only some of the images are labeled,

(e.g. dog, cat, person) and the majority are unlabeled. • expensive or time-consuming to label data as it may require access to domain

experts. Whereas unlabeled data is cheap and easy to collect and store. • One can use unsupervised learning techniques to discover and learn the

structure in the input variables. • One can also use supervised learning techniques to make best guess

predictions for the unlabeled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data.

Unit 1



Features

• The pillars of machine learning. • Determine much of the success of a machine learning application • A goodness of model is depending on goodness of • Mathematically, features are functions that map from the instance

space to some set of feature values called the domain of the feature. • Features are variety in nature e.g.

o Set of integers, the number of occurrence of particular word o Boolean, true or false for email is spam or ham o Arbitrary finite set of colors or shapes etc.

Usages of Features

• Feature as Split

Binary Split : Spam , Ham Non-Binary Split : Priority mail (Education, Placement etc.)

• Feature as Predictors 𝑤𝑖 ∗ 𝑥𝑖

𝑛𝑖=1 where 𝑥𝑖is a numerical feature

• if this is large and positive, a positive 𝑥𝑖 increases the score;

• if 𝑤𝑖 = 0, a positive 𝑥𝑖 decreases the score; • if 𝑤𝑖 ≈ 0, 𝑥𝑖s influence is negligible.

These two uses of features – ‘features as splits’ and ‘features as predictors’ – are sometimes combined in a single model

Dimensionality Reduction

• Too many factors on the basis for the final classification or

regression • Factors are basically variables called features

• The higher the number of features, the harder it gets to visualize

the training set

• These features are correlated, and hence redundant

• Dimensionality reduction is a series of techniques in machine

learning and statistics to reduce the number of random variables to

consider

• feature selection and feature extraction

Dimensionality Reduction

Why Dimensionality Reduction?

• Reduces Time Complexity: Less computation

• Reduces Space Complexity: Less parameters

• Saves the cost of observing the feature

• Simpler models are more robust on small datasets

• More interpretable; simpler explanation

• Data visualization (structure, groups, outliers, etc) if

plotted in 2 or 3 dimensions

Feature Selection

• find a smaller subset of a many-dimensional data

set to create a data model

• finding k features of the d dimensions that give us

the most information and discard the other (d − k)

dimensions.

• Subset selection is one of the widely used method

Feature Extraction

• transforming high-dimensional data into spaces of fewer dimensions

• finding a new set of k dimensions that are combinations of the original d dimensions.

• supervised or unsupervised depending on whether or not they use the output information.

• Principal Components Analysis (PCA) is most widely used

Subset Selection

• to find the best subset of the set of features • The best subset contains the least number of

dimensions that most contribute to accuracy • used in both regression and classification problems. • 2𝑑 possible subsets of 𝑑 variables • is not possible to test for all of them unless 𝑑 is small • Instead some heuristics is designed to get a

reasonable (but not optimal) solution in reasonable (polynomial) time.

Subset Selection




• Forward Selection and Backward Selection

Subset Selection




• Forward Selection and Backward Selection

Forward Selection

o It starts with no variables or null model. o In next step it will add one by one feature which is not

already considered before. o At each step after adding the one feature the error is

checked. o The process is continuing until it will find the subset

of features that decreases the error the most, or until any further addition does not decrease the error.

Forward Selection

o It starts with no variables or null model. o In next step it will add one by one feature which is not

already considered before. o At each step after adding the one feature the error is

checked. o The process is continuing until it will find the subset

of features that decreases the error the most, or until any further addition does not decrease the error.

Terminologies for Algorithm(FS & BS)

• In either case, checking the error should be done on a validation set which is distinct from the training set.

• With more features, generally training error can be reduced, but validation error may not be reduced.

• Let 𝐹 denotes, a feature set of input dimensions, 𝑥𝑖 , 𝑖 = 1, . . . , 𝑑.

• 𝐸(𝐹) denotes the error incurred on the validation sample when only the inputs in 𝐹 are used.

• Depending on the application, the error is either the mean square error or misclassification error.

Algorithm -Backward Selection

1. It starts with no features: F = ∅. 2. At each step, for all possible, x_i, the model is trained with

the training set and calculate E(F ∪ ,x_i) on the validation set. 3. The input, x_i is chosen that causes the least error

4. stop if adding any feature does not decrease E. It stops earlier if the decrease in error is too small

𝑗 = 𝑎𝑟𝑔 min𝑖 𝐸(𝐹 ∪ 𝑥𝑖)

and

add 𝑥𝑗 to 𝐹 𝑖𝑓 𝐸(𝐹 ∪ 𝑥𝑗 ) < 𝐸(𝐹)

Limitation of Backward Selection

• May be costly because to decrease the dimensions from d to k, to train and test the system runs for 𝑑 + (𝑑 − 1) + (𝑑 −2) +· · · + 𝑑 − 𝑘 times, and the time required is 𝑂(𝑑2).

• Local search procedure which does not guarantee finding the optimal subset, namely, the minimal subset causing the smallest error.

• For example,𝑥𝑖and 𝑥𝑗 individually does not give good effect

but together may decrease the error significantly. In this situation forward selection is not a good choice because this algorithm is greedy and adds attributes one by one, it may not be able to detect the effect of more than one features.

Algorithm -Backward Selection

1. Start with F containing all features 2. Remove one attribute from F that causes the least

error

And remove 𝑥𝑖 from F 𝑖𝑓 𝐸(𝐹 − 𝑥𝑗) < 𝐸(𝐹)

3. Stop if removing a feature does not decrease the error

𝑗 = 𝑎𝑟𝑔 min𝑖𝐸(𝐹− 𝑥𝑖)

Comment on -Backward Selection

The complexity of backward search has the same order of complexity as forward search, except that training a system with more features is costlier than training a system with fewer features, and forward search may be preferable especially if we expect many useless features.

Principal Component Analysis:

../../StatQuest- Principal Component Analysis (PCA), Step-by-Step.mp4

Principal Component Analysis:

• mapping from d-dimensional space to a new (k < d)-

dimensional space, with minimum loss of information.

• As the dimensions of data increases, the difficulty to

visualize it and perform computations on it also increase

• Mainly there are two strategies to reduce the dimensions

of a data-

o Remove the redundant dimensions

o Only keep the most important dimensions

Concepts used in PCA

Variance: It is a measure of the variability or it simply measures how spread the data set is


Covariance: : It is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction


• PCA finds a new set of dimensions such that all the dimensions are orthogonal (and hence linearly independent) and ranked according to the variance of data along them.

• It means more important principle axis occurs first. (more important = more variance/more spread out data).

• The principal direction in which the data varies is shown by the U axis and the second most important direction is the V axis orthogonal to it.

• If the each (X, Y) instance is transform coordinate into its corresponding (U, V) value, the data is de-correlated, meaning that the co-variance between the U and V variables is zero.


• The directions U and V are called the principal components.

PCA for Data Representation

PCA for Dimension Reduction

Working of PCA

1. The first step is to gather reliable raw data from a sample based on a questionnaire

2. The second step is to calculate correlations between the variables.

3. In principal component analysis, principal components are extracted and presented as a table with the components in columns and variables in rows.

4. The principal components analysis table is truncated. Components are reported in order by eigenvalue and by the proportion of total variance. Frequently, these components are easily interpreted.

Objectives of PCA

• PCA helps to Extract the most important information from

the data table and compress the size of the data set by

keeping only important information,

• PCA also Simplify the description of the data

• set and analyze the structure of the observations and the

variables.

Achieved 1st Milestone

Documents

MACHINE LEARNING & APPLICATIONS - WordPress.com · Machine Learning & Applications Course Objectives: Understanding Human learning aspects. Understanding primitives and methods in