Upload
swapnil-shinde
View
120
Download
2
Tags:
Embed Size (px)
Citation preview
Automatic ClassificationAutomatic Classification
Classification??? Classificatory systems Output of such system Example of classification :
Indexing
Classification v/s Diagnosis ?? Classification = grouping Diagnosis = identification
Classification MethodsClassification Methods
Classification Methods Why?? Data Objects
Documents , keywords, characters
Data & objects Corresponding description
attributes
Classification MethodsClassification Methods
Uses set of parameters to characterize each object Features should be relevant to task at hand Supervised classification
What classes??? Set of sample objects with known classes
Training set Set of known objects Used by classification program
Two phases for classification ?? ??
Classification MethodsClassification Methods
1. Training Phase: Uses training set Decision is about
How to weight parameters How to combine these objects under different classes
1. Application Phase: Weights determined in phase 1 are used with set of objects That do not have known classes Determine their possible class
Classification MethodsClassification Methods
With few parameters ; process is easy Example:
With much more parameters ; process is tough Example:
Depending on structure ; find types of attributes Multi State Attribute
Example:
Binary State Attribute Example:
Numerical Attributes Example
Classification MethodsClassification Methods
Binary State Bold , underline
Multi State Color , position , font type
Execution of operation changes attribute value. Example:
MOVE FILL INSERT DELETE CREATE
Classification MethodsClassification Methods
Relation between Classes & Properties1. Monothetic:
To get membership of class , object must posses the set of properties which are necessary as well as sufficient Example
1. Polythetic: Large number of members have some number of
properties No individual is having all the properties example
Classification MethodsClassification Methods
Relation between Object & Classes1. Exclusive:
Object belongs to single class Example
1. Overlapping: Membership is with different classes Example
Classification MethodsClassification Methods
Relationship between Classes & Classes:1. Ordered:
Structure is imposed Hierarchical structure Example
1. Unordered: No imposed structure All are at same level example
Measures of AssociationMeasures of Association
Some classification methods are based on a binary relationship between objects
On the basis of this relationship a classification method can construct a system of clusters
Relationship type:1. similarity
2. dissimilarity
3. association
Measures of AssociationMeasures of Association
Similarity: The measure of similarity is designed to quantify the likeness
between objects so that if one assumes it is possible to group objects in such a
way that an object in a group is more like the other members of the group
than it is like any object outside the group, then a cluster method enables such a group structure to be
discovered.
Measures of AssociationMeasures of Association
Association: Association means??? Dependency… Occurrence… reserved for the similarity between objects
characterized by discrete-state attributes.
Measures of AssociationMeasures of Association
Used to measure strength of relationship measure of association increases as the number or
proportion of shared attribute states increases. Five measures of association
1. Simple
2. Dice’s coefficient
3. Saccard’s coefficient
4. Cosine coefficient
5. Overlap coefficient
Measures of AssociationMeasures of Association
Used in information and data retrieval | | specifies size of set
Probabilistic IndexingProbabilistic Indexing
Probability of relevance Experiments and observations Sample space May Consist relevant as well as non relevant objects Consider a document Find no. of relevant document with respect to it That gives probability quotient probability measured as per the terms present in
document
Probabilistic IndexingProbabilistic Indexing
Probabilistic indexing model Contains random variable Denotes no. of relevant documents If this variable is selected by system Gives possible relevant document description Probabilistic information retrieval models are based on the
probabilistic ranking principle, which says that documents should be ranked according to
their probability of relevance with respect to the actual request.