18

Implications of Ceiling Effects in Defect Predictors

Embed Size (px)

DESCRIPTION

Implications of Ceiling Effects in Defect Predictors - PROMISE 2008

Citation preview

Page 1: Implications of Ceiling Effects in Defect Predictors
Page 2: Implications of Ceiling Effects in Defect Predictors

OutlineApproachUse More DataUse Less DataUse Even Less DataDiscussionsExamplesConclusions

Page 3: Implications of Ceiling Effects in Defect Predictors

ApproachOther Research: Try changing data miners

Various data miners: no ground-breaking improvements

This Research: Try changing training dataSub-sampling: Over/ Under/ Micro samplingHypothesis: Static Code Attributes have limited

information contentPredictions:

Simple learners can extract limited information content No need for more complex learners Further progress needs increasing the information content

in data

Page 4: Implications of Ceiling Effects in Defect Predictors

State-of-the-art Defect PredictorNaive Bayes with simple log-

filteringProbability of detection (pd): 75%Probability of false alarms (pf): 21%Other data miners failed to achieve

such performance:Logistic regression J48OneRComplex variants of BayesVarious others available in WEKA...

Page 5: Implications of Ceiling Effects in Defect Predictors

How Much Data: Use more...Experimental Rig:

Stratify |Test|=100 samplesN={100, 200,

300,...} |Training|=N*90%

samplesRandomize and

repeat 20 times

Plots of N vs. balance

Page 6: Implications of Ceiling Effects in Defect Predictors

Over/ Under Sampling: Use Less...Software Datasets are not balanced

~10% DefectiveTarget Class: Defective (modules)Under Sampling:

Use all target class instances, say NPick N from other classLearn theories on 2N instances

Over Sampling:Use all from other class, say M (M>N)Using N target class instances, populate M

instancesLearn theories on 2M instances

Page 7: Implications of Ceiling Effects in Defect Predictors

Over/ Under Sampling: Use Less...NB/none is still among the

bestSampling J48 does not out-

perform NB NB/none is equivalent with

NB/ underUnder sampling does not

harm classifier performance.Theories can be learned from

a very small sample of available data

Page 8: Implications of Ceiling Effects in Defect Predictors

Micro Sampling: Use Even Less... Given N defective

modules:M = {25, 50, 75, ...} <= NSelect M defective and M

defect-free modules.Learn theories on 2M

instancesUndersampling: M=N8/12 datasets -> M = 25 1/12 datasets -> M = 75 3/12 datasets -> M =

{200, 575, 1025}

Page 9: Implications of Ceiling Effects in Defect Predictors

DiscussionsIncremental Case Based ReasoningAutomatic Data MinersWhen is CBR preferable to ADM?

Impractical in large number of casesOur results suggest 50 samples are

adequate.CBR can perform as well as ADM.One step further: CBR can perform better

than ADM.

Page 10: Implications of Ceiling Effects in Defect Predictors

Example 1: Requirement MetricsDoes not mean “Use

Requirement Docs” all the time!

Combine features from whatever sources available.

Explore whatever is not a black-box approach.

Consistent with prior research

SE should make use of domain specific knowledge!

From: Text MiningTo: NLPSubject: Semantics

Page 11: Implications of Ceiling Effects in Defect Predictors

Example 2: Simple WeightingCombine features

wisely!Black-box Feature

Selection -> NP-hard.Information provided

by black-box approach is not necessarily meaningful to humans.

Information provided by humans is meaningful for black-boxes.

Check the validity of NB assumptions!

Page 12: Implications of Ceiling Effects in Defect Predictors

Example 3: WHICH Rule Learner

Current practice: Learn predictors with

criteria P Assess predictors with

criteria Q In general: P≠Q

WHICH supports defining P≈Q Learn what you will assess

later. micro20 means only 20+20

samples.

Page 13: Implications of Ceiling Effects in Defect Predictors

13

• WHICH initially creates a sorted stack of all attribute ranges in isolation.

• It then, based on score, randomly selects two rules from the stack, combines them, and places the new rule in the stack in sorted order.

• It continues to do this until a stopping criterion is met.

• WHICH supports both conjunction and disjunctions.

• If a the two rules selected both contain different ranges from the same attribute, they are OR'd together instead of AND'doutlook=sunny

AND rain=true

outlook=overcast

outlook = [ sunny OR overcast ]AND rain = true

Example 3: WHICH Rule Learner

Page 14: Implications of Ceiling Effects in Defect Predictors

Example 4: NN-SamplingWithin vs. Cross Company Data

Substantial increase in pd... ...with the cost of substantial increase in pf. CC Data should only be used for mission critical

projects Companies should starve for local (WC) data

Why? CC data contains a larger space of samples... ...it also includes irrelavancies.

Howto decrese pf? Remove irrelavancies by sampling from CC data.

Page 15: Implications of Ceiling Effects in Defect Predictors

Example 4: NN-SamplingSame patterns in:

NASA MDP and Turkish washing

machines

Page 16: Implications of Ceiling Effects in Defect Predictors

ConclusionsDefect predictors are practical toolsLimited information content hypothesis

Simple learners can extract limited information content No need for more complex learners Further progress needs increasing the information content in data

Current research paradigm has reached its limits

Black-box methods lack the business knowledgeHuman-in-the-loop CBR tools should take place

Practical: Small samples to examine Instantaneous: ADM will run fastDirection: Increase information content

Promise data: OK.What about Promise tools?

Increase in information content?

Building predictors aligned with business goals.

Page 17: Implications of Ceiling Effects in Defect Predictors

Future WorkBenchmark Human-in-the-loop CBR against

ADM.Instead of which learner, ask which data.Better sampling strategies?

Page 18: Implications of Ceiling Effects in Defect Predictors

Thanks...

Questions ?