Classification : Basic Concepts

Embed Size (px)

Citation preview

  • 8/14/2019 Classification : Basic Concepts

    1/14

    Lect10-11-08-09 1

    Classification : Basic

    ConceptsLecture 10

  • 8/14/2019 Classification : Basic Concepts

    2/14

    Lect10-11-08-09 2

    Classification:

    predicts categorical class labels

    Most suited for predicting/ describing data

    sets with binary or nominal categories.

    Less effective for ordinal categories. Supervised learning

  • 8/14/2019 Classification : Basic Concepts

    3/14

    Lect10-11-08-09 3

    Examples of Classification 1.Predicting tumor cells as benign or

    malignant

    2. Classifying credit card transactions as

    legitimate or fraudulent

    3. Classifying secondary structures of protein

    as alpha-helix, beta-sheet or random coil

    4. Categorizing news stories as finance,weather, entertainment, sports etc

  • 8/14/2019 Classification : Basic Concepts

    4/14

    Lect10-11-08-09 4

    Examples Contd 5. Determine those characteristics that differentiate

    individuals who have suffered a heart attack from thosewho have not suffered.

    6. Develop a profile of a successful man.

    7. Classifying galaxies based on their shapes.

    8. Detecting spam emails based on their message

    header and content.

  • 8/14/2019 Classification : Basic Concepts

    5/14

    Lect10-11-08-09 5

  • 8/14/2019 Classification : Basic Concepts

    6/14

    Lect10-11-08-09 6

    Classification: Definition

    Given a collection of records (training set)

    Each record contains a set ofattributes, one of the attributes is the class label.

    Find a model for class attribute as a function of the values of otherattributes.

    Goal: previously unseen records should be assigned a class asaccurately as possible.

    A test setis used to determine the accuracy of the model. Usually, the givendata set is divided into training and test sets, with training set used to build themodel and test set used to validate it.

  • 8/14/2019 Classification : Basic Concepts

    7/14

    Lect10-11-08-09 7

    Illustrating Classification Task

    Apply

    Model

    Induction

    Deduction

    Learn

    Model

    Tid Attrib1 Attrib2 Attrib3 Class

    1 Yes Large 125K No

    2 No Medium 100K No

    3 No Small 70K No

    4 Yes Medium 120K No

    5 No Large 95K Yes

    6 No Medium 60K No

    7 Yes Large 220K No

    8 No Small 85K Yes

    9 No Medium 75K No

    10 No Small 90K Yes10

    Tid Attrib1 Attrib2 Attrib3 Class

    11 No Small 55K ?

    12 Yes Medium 80K ?

    13 Yes Large 110K ?

    14 No Small 95K ?

    Learning

    algorithm

    Training Set

  • 8/14/2019 Classification : Basic Concepts

    8/14

    Lect10-11-08-09 8

    Definition Is a task of learning a target function f

    (classification model) that maps each attribute set

    x to one of the predefined class label y. Each record is characterized by a tuple (x,y) where xis the attribute set and y is special attribute (classlabel)

    Output attributes are also known as dependent

    variables Input attributes are termed as independent variables Classification can be categorized based on whether

    output variable is discrete/categorical. Or whether models are designed fora current

    condition/predicting future outcomes.

  • 8/14/2019 Classification : Basic Concepts

    9/14

    Lect10-11-08-09 9

    The vertebrate data setName Body temp Skin Cover Gives birth Aquatic creature Legs Hibernates Class label

    human warm hair y n y n mammal

    python cold scales n n n y reptile

    salmon cold scales n yes n n fish

    whale warm hair y yes n n mammal

    frog cold none n semi y y amphibian

    komodo cold scales n n y n reptile

    dragon

    bat warm hair y n y y mammal

    pigeon warm feathers n n y n bird

    cat warm fur y n y n mammal

    leopard cold scales y yes y n fish

    shark

    turtle cold scales n semi y n reptile

    penguin warm feathers n semi y n bird

  • 8/14/2019 Classification : Basic Concepts

    10/14

    Lect10-11-08-09 10

    A Classification model is useful for the

    following purposes:

    Descriptive Modeling

    Predictive Modeling

  • 8/14/2019 Classification : Basic Concepts

    11/14

    Lect10-11-08-09 11

    Descriptive Modeling: A classification model canbe used as an explanatory tool to differentiatebetween objects of different classes. Examples: (1)A bank loan officer wants to analyze the

    data regarding the loans applications assafe or risky for the bank.

    Here data analysis task is CLASSIFICATION,where a model or classifier is constructed topredict categorical labels such as safe orrisky

  • 8/14/2019 Classification : Basic Concepts

    12/14

    Lect10-11-08-09 12

    The vertebrate datasetName Body temp Skin Cover Gives birth Aquatic creature Legs Hibernates Class label

    human warm hair y n y n mammal

    python cold scales n n n y reptile

    salmon cold scales n yes n n fish

    whale warm hair y yes n n mammal

    frog cold none n semi y y amphibian

    komodo cold scales n n y n reptile

    dragon

    bat warm hair y n y y mammal

    pigeon warm feathers n n y n bird

    cat warm fur y n y n mammal

    leopard cold scales y yes y n fish

    shark

    turtle cold scales n semi y n reptile

    penguin warm feathers n semi y n bird

  • 8/14/2019 Classification : Basic Concepts

    13/14

    Lect10-11-08-09 13

    Predictive Modeling: A classification model can beused to predict the class label of unknown records.

    Example

    (1) Suppose a Marketing Manager wants to estimatethe amount that a customer will spend during anongoing sale.

    This is an example data analysis of numeric

    prediction. Here the model (predictor) so constructed predicts a

    continuous-valued function.

  • 8/14/2019 Classification : Basic Concepts

    14/14

    Lect10-11-08-09 14

    (2) A medical researcher wants to analyze

    brain tumour data to predict which particular

    type of treatment say A, B or C is to be given

    to the patient. Treatment A, Treatment B, or Treatment

    C in this case, is classification task