24
One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI , Maria Orlowska DDDM 2008 The University of Queensland Australia

One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Embed Size (px)

Citation preview

Page 1: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

One-class Classification of Text Streams with Concept Drift

Yang ZHANG, Xue LI, Maria Orlowska

DDDM 2008

The University of QueenslandAustralia

Page 2: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Outline Motivation Related Work Framework for One-class

Classification of Data Stream Learning Concept Drift under One-

class Scenario Experiment Result Future Work

Page 3: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Motivation

State-of-art data stream classification algorithm: Based on fully labeled data.

Impossible to label all data. Expensive to label data. Changing of user interests.

Difficult apply to real-life applications.

Page 4: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Scenario

The user feedback emails to the customer service section: finding out the feedback emails of a certain

newly launched product. Building a text data stream classification

system to retrieve all the ontopic feedbacks. Section manager behavior:

Patient enough to label only a few ontopic emails.

No patient to label offtopic emails.

Page 5: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

One-class Classification of Text Stream

Challenge Concept drift. Small number of training data. No negative training data. Noisy data. Limited memory space.

Page 6: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Related work Semi-supervised classification of data

stream, cannot cope with concept drift. [Wu&Yang, ICDMW06]

Active learning for data stream classification, cannot cope with concept drift caused by sudden shift of user interests. [Fan&Huang, SDM04] [Fan&Huang, ICDM04]

[Huang&Dong, IDA07] Need multiply scan.

[Klinkenberg &Joachims, ICML00]

Page 7: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Related Work Static approaches for data stream

classification (fully labelled). [Street&Kim, KDD01] [Wang&Fan, KDD03]

Dynamic approaches for data stream classification (fully labelled). [Kolter&Maloof, ICDM03]

[Zhang&Jin,SIGmodRecord06] [Zhu&Wu, ICDM04]

One-class text classification. [Li&Liu, ECML05] [Liu&Dai, ICDM03] [Liu&Li,

AAAI04]

Page 8: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Proposed Approach

Page 9: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Base Classifier Selection – phenomena observed If the reader is very interested in a certain

topic today, say, sports, then, there is a high probability that he is also interested in sports tomorrow.

If the reader is interested in a topic, say, sports, and for some reason his interests change to another topic, say, politics, then after sometime, there is high probability that his interests will change back to sports again.

Page 10: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Base Classifier Selection - strategy

The ensemble should keep some recent base classifier.

The ensemble should keep some base classifiers which represent the reader's interests in the long run.

Page 11: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment Result Dataset: 20NewsGroup We compare the following approaches:

Single Window (SW): The classifier is built on the current batch of data.

Full Memory (FM): The classifier is built on the current batch of data, together with positive samples dated back to batch 0.

Fixed Window (FW): The classifier is built on the samples from a fixed size of windows.

Ensemble (EN): The classifier is built by the algorithms proposed in this paper.

Page 12: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment Scenarios 4 groups of experiments:

Experiment with concept drift caused by changing of user interests.

Experiment with heavy vs. gradual concept drift.

Experiment with concept drift caused by both changing of user interests and data distribution.

Experiment with 5 ontopic categories.

Page 13: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: concept drift caused by changing of user interests.

Page 14: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: concept drift caused by changing of user interests.

Page 15: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: concept drift caused by changing of user interests.

Page 16: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: heavy vs. gradual concept drift.

Page 17: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: heavy vs. gradual concept drift.

Page 18: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: heavy vs. gradual concept drift.

Page 19: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment: changing of user interests & data distribution.

Very similar to the experiment result observed in the first group of experiment.

Page 20: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment : 5 ontopic categories.

Page 21: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment : 5 ontopic categories.

Page 22: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Experiment : 5 ontopic categories.

Page 23: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Conclusion & Future Research We firstly tackled the problem of

the one-class classification on streaming data, by ensemble based approach.

Future research Dynamic feature space One-class classification on general

data streaming.

Page 24: One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Thank you!