Learning is a general term that denotes the way in which peopleand machine enrich their knowledge and improve their skills.

Learning is one of the component of Intelligence.

Learning Algorithm

Learning algorithm is a procedure to learn systematically.

Algorithm specifies how to learn or acquire the knowledge.

Artificial Intelligence (AI)

AI is a branch of Computer Science that deals with thestudy and creation of computer systems that exhibit

some form of intelligence.


and use knowledge in a meaningful way.

The term Artificial Intelligence was coined by John

McCarthy, in 1956 at MIT

Components of  Intelligence 

Learning – It is the process of  acquiring knowledge, skills, experience by study 

and training.


Reasoning –It refers to the ability of  drawing conclusions that are appropriate 

to the





Understanding – It refers to the identification of the significance,

interpretation, or explanation for certain data or information. It is the ability to


Creativity‐ It is the ability to generate new ideas or to conceive new perspectives 

on existing data.

Domain areas of  AI

Game playing Search

Speech recognition


Computer Vision

Expert systems

Heuristic classification


Knowledge based systems

Knowledge is defined as the remembering of previously learned material. It is the result 

the result






cells) in the brain, which contains approximately 10 12  neurons. 

Knowledge plays an important role in building intelligent knowledge based systems. 


Knowledge needs to be acquired from different sources such as procedures, 

rules and


Knowledge is to be stored/organised in such a way that retrieval of  information 

can be easy.

Knowledge based systems

In knowledge based systems, decisions and actions are based on the

manipulation of knowledge. (facts are compared and altered in some

manner). Manipulation is computational equivalent of reasoning.

Knowledge can be represented using various schemes. Eg. FOPL (First Order

Predicate Logic), and associative network.

The knowledge base contains predefined concepts, domain constraints, 


Knowledge representation approaches that can be used in learning systems are

•production rules


•semantic networks

•predicate calculus

•vectors and matrices


•formal grammar


 •procedural encoding

FOPL‐First order predicate logic 

‐is a representation scheme for reasoning.

‐it comprises of  symbols to represent statements.

Predicate  – This s mbol returns a value of  true or false. Ca ital 

letters and words are used to represent predicates. 





EQUAL)  –  

(eg. Has‐a , part‐of  ,Father‐of)


ConstantsLogical quantifiers (Existential‐ there exist an x, Universal‐ for all x)

Connectors v-disjunction ~ -negation ^ -conjunction 

Structure of  machine learning system




system•the environment 

•t e  earning e ement

•the knowledge base


Learning Element Knowledge Base Performance Element 


nv ronmen

Structure of  machine learning system


Element Knowledge 

Base Performance 


The environment supplies some piece of information to the learning element

explicit knowledge base

the performance element uses the knowledge base to perform its task.

Finally, the information gained during performing the task can serve as a

feedback to the learning element.

earn ng sys em may acqu re ru es o e av our, escr p ons o

physical objects, problem‐solving heuristics, classification

taxonomies over a sample space, and many other types of 

knowledge useful in the performance of a wide variety of tasks.

Complex tasks require more knowledge than simple ones.

The knowledge base grows more in size, the problems of 


Definition for Learning

A computer program is said to learn from experienceE

withrespect to some class of tasks T and performance measure P, if 

, ,

experience E.

A handwriting recognition learning problem

Task T: recognizing and classifying handwritten words within images

Performance measure P: Percent of words correctly classified

Training experience E: database of handwritten words with given classifications


A robot driving learning problem

Task T: driving on public four-lane highways using vision sensors.

Performance measure P: average distance travelled before an error

(as judged by human overseer)

Training experience E: a sequence of images and steering

commands recorded while observing a human driver.

Some major issues 

Some major issues which concern the process of learning knowledge in general are:

general are:

data do not contain enough information a system cannot "learn" much

from it.

What should a system learn? In order to solve a particular problem, a

system has to learn specific features, dependencies relevant to the

solution, but not learn everything(Some irrelevant features only)

Some major issues 

How to test how well the system has learned appropriate knowledge?

Testing the learning process is usually done through measuring thelearning error. The main approaches are:

Partitioning of data. A part of the data, say 70%, is used for training and

the other part for testing.

The leaving‐one‐out method means that we train n times the system

with (n ‐ 1) examples and check the system's reaction to the left‐out

exam le. After doin this n times we can calculate the correct answer of 

the system as the ratio between the number of correctly processedexamples and the number n of all the examples

Machine learning

application areas of artificial intelligence (AI).

Machine learning is the study of making machines acquire new

knowledge, and recognize existing knowledge.

Machine learning is the capability of a computer to learn from


Machine learning

performance criterion using example data or past experience.

With a model using some parameters, learning is the execution

of a computer program to optimize the parameters of the model.

future, or descriptive to gain knowledge from data, or both.

Machine learning ‐ Goal

The goal of machine learning is to design programs that learnand/or discover, i.e. automatically improve their performance on

Successful learner


Makes general conclusions about the data it is trained on.

Act appropriately in new situations.

Machine learning 

There are certain patterns in the data.


construct a good and useful approximation.

at approx mat on may not exp a n everyt ng, ut may st

be able to account for some part of the data.

We believe that though identifying the complete process maynot be possible, we can still detect certain patterns or

Machine learning draws on concepts and results from


Artificial intelligence

Information theory

BiologyCognitive science

Computational complexity  and 

on ro 


Major types of  Machine Learning

•Supervised learning

Unsupervised learning

•Reinforcement learning

In supervised learning, there is a “teacher” that provides

the learner with a set of input‐output pairs.

Process of supervised learning

Process of supervised learning

Training data

Decision Tress Classifier

Training data

In unsupervised learning, there is no teacher, providingdesired answers, but since the data are not entirely random,

can be applied in new cases.

Eg. Clustering


Reinforcement learning

Reinforcement learning corresponds to something between supervised andunsupervised approaches.

It differs from supervised learning in the sense that explicit input‐output pairs

are not available.

An agent explores environment and is able to take actions. Depending on the

outcome of the series of actions taken, the agent is rewarded or penalized.

Reinforcement learning is called “learning with critic”, as opposed to learningwith a teacher which is the supervised learning.

Reinforcement learning

Reinforcement learning is called "learning with critic", as opposed to learning with a teacher which is the supervised learning.

Eg. Game playing is an important research area in both artificial intelligence 

and machine



Learning approaches are also classified as 

Statistical (probabilistic or stochastic) methods

Connectionist methods/neural networks

Symbolic machine learning algorithms

Genetic methods and

Other hybrid approaches

Statistical methods 


Perform some analysis which uses primarily the text characteristics without


Statistical techniques are


N-gram techniques

Unsupervised clustering and

Hidden Markov model

that have been used for corpus‐based language analysis, probabilistic

grammar learning and lexicon‐building.

Statistical methods 

Naïve Bayes classifier

K-Nearest neighbour

Hidden Markov Model

Expectation Maximisation algorithm Forward-Backward algorithm

Maximum Entropy models

A neural network

A neural network is based on concepts of how the human brain is organised

and how it learns.

The nodes correspond to the neurons in the brain, and the links correspond

to the connections between neurons.

n or er o ma e pre c on, e neura ne wor accep s e va ues or e

predictors on the input nodes. These values are then multiplied by values

that are stored in the links, called weights.

These values are then added together at the output node, and a specialthreshold function is applied to get the prediction.

Symbolic learning methods 

Do not use probabilities explicitly

Decision trees

Transformation based learning (TBL)

Inductive logic programming (ILP) and


Minimum description length

MDL approaches aim to minimize the description of  the set of  words in the 

input corpus.


defined as reducing the total length of a set of data.

Introducin a theor which can enerate certain data and thus serves as anabbreviation of the data set.

The implementation uses a learning mechanism which decreases the total

description length in each step.

Genetic Algorithms (GAs)

Randomized search and optimization techniques

Guided by the principles of evolution and natural genetics

Efficient, adaptive and robust search processes

Hybrid Methods ‐ Ensemble Learning



the so called base classifier, by changing the training set or the input features

or the parameters of the classifier.

The predictions of all base classifiers are combined into a single final


The idea builds on the assumption that combining the output of multipleexperts is better than the output of any single expert

Hybrid Methods ‐ Ensemble Learning

Selection of features

ac ne earn ng requ res se ec on o

Samples and features

Choice of algorithm.

Features are usuall re- rocessed.Dimensional reduction will be used to identify a subset of features, or

mathematical combinations of features that greatly reduces the size of the

machine learning problem.

A distance metric represents how far samples are separated from one another

in `feature space'. Proper representation schemes should be used for better

About Features

The input vector is called by a variety of names, some of these are input

vector, pattern vector, feature vector.

e componen s o e npu vec or are var ous y ca e ea ures,

attributes, input variables, and components.

The values of  the components can be of  three main types:  real valued 

numbers, discrete valued numbers, or categorical values .

About Features

The number of features in the instances determine the search space that needs

to be explored by the learning algorithm for a given classification task.

e presence o a arge num er o rre evan ea ures unnecessar y ncreases e

size of the search space, thus increasing the time needed for classification.

, ,

it difficult to extract knowledge such as classification rules in a way that is

comprehensible to humans. Conversely, the rules based on a small number of 

relevant features are often concise easier to understand and use.

The most important reason behind feature selection is that it can eliminate the

effects of the curse of dimensionality.

Applications of 




Problems that can be solved by Machine learning are classified as

Association rule learnin




Pattern recognition

Applications of 




Association rule learning

One application of machine learning is basket analysis , which isfinding associations between products bought by customers:

If people who buy X typically also buy Y, and if there is a customer who

buys X and does not buy Y, he or she is a potential Y customer.

Applications of 




Association rule learning

Finding an association rule‐

is learning a conditional probability of theform P(YIX)

where Y is the product we would like to condition on X, which is the

product or the set of products which we know that the customer has

Applications of 




Association rule learning

We may want to make a distinction among customers and toward this,estimate P(YIX, D)

where D is the set of customer attributes, for example, gender, age, marital

status, and so on.

If this is a bookseller instead of a supermarket, products can be books or


In the case of a Web portal, items correspond to links to Web pages, and wecan estimate the links a user is likely to click and use this information to

download such pages in advance for faster access.

Applications of 






credit  scoring,













the information about the customer. 

This is an example of  a classification  problem where there are two classes: 

low‐risk and high‐risk customers.

The information about a customer makes up the input  to the classifier  whose task  is to assign the input to one of  the two classes.

Applications of 




Classification & Prediction

After training with the past data, a classification rule learned may be of  the form

IF income>











It is a function that separates the examples of  different classes.


predictions for novel instances , , if  the future is similar to the past. 

Applications of 





In a system that can predict the price of a used car.

‐ , , , ,

information‐that we believe affect a car's worth.


Such problems where the output is a number are REGRESSION problems.

Applications of 





Clustering is grouping of  input. 

with a data of customers, a clustering model allocates customers similar in

their attributes to the same group.

Input : demographic information as well as the past transactions with the 

the company may decide strategies,  services and products, specific to 

Such a grouping also allows identifying those who are outliers, namely, those 

who are different from other customers.

Applications of 




in Pattern Recognition 



Applications of 




in Pattern Recognition 

is recognizing characters from their images

Character  recognition 

• Printed character recognition (OCR) – Collection of dots

• Handwritten character recognition – Collection of dots and strokes

There are multiple classes - as many as the number of characters

A character image is not just a collection of random dots; it is a collection of strokes and has a regularity that we can capture by a learning program.

Applications of 




Sequence learning

A word is a sequence of characters and successive characters are notindependent but are constrained by the words of the language.

This has the advantage that even if we cannot recognize a character, we can

still read t?e word.

Such contextual dependencies may also occur in higher levels, between

words and sentences, through the syntax and semantics of the language.

There are machine learning algorithms to learn sequences and model suchdependencies

Applications of 




Face Recognition

In Face recognition

•the classes are people to be recognized

•the learning program should learn to associate the face images to


Applications of 




The input is acoustic


The classes are words that can be uttered

This time the association to be learned is from an acoustic signal to a word


Applications of 




Medical  diagnosis The inputs are the relevant information  about the patient


The inputs contain the patient's age, gender, past medical history, and 


Applications of 




Outlier  detection

Another use of  machine learning is outlier  detection, . 

After learning the rule, we are not interested in the rule but the exceptions 


Applications of 




Outlier  detection

Learning a rule from data allows knowledge extraction. , 

we have an explanation about the process underlying the data.

‐ ‐, 

risk customers (Credit scoring) we have the knowledge of  the properties of  

low‐risk customers. 













more efficiently.

Applications of 





Learning also performs compression by fitting a rule to the data.


to store and less computation to process. 

sum of  every possible pair of  numbers.

Applications of 




Networking & Communications

In  networking and telecommunications, call patterns and traffic data are 

analyzed for network optimization and maximizing the quality of  service.

Applications of  machine learning in 

Natural Lan ua e Processin  

Machine learning also helps us find solutions for many natural language 

processing tasks.

Text categorisation

Text summarisation

Word segmentation

tagg ng


Word sense disambiguation

Unknown word recognitionSpeech recognition and

Applications of  machine learning in 

Natural Lan ua e Processin  

Machine learning also helps us find solutions for many natural language 

processing tasks.

Language Identification

From text

From Speech

Speaker Diarization

To identify the speaker changes and the speaker clusters, and to estimate the 

number of  speakers involved in the document


To be











 music, silence, and other sounds.

Speaker Diarization

•To identify the speaker turns and the speaker clusters, and to estimate the

number of 










To be able to process speech documents as well as documents containing music, silence, and other sounds.

silence, and other sounds.

