Development Emails Content Analyzer: Intention Mining in Developer Discussions

Development Emails Content Analyzer: Intention Mining in Developer Discussions

Andrea Di Sorbo

Sebastiano Panichella

Corrado Visaggio

Massimiliano Di Penta

Gerardo Canfora

Harald Gall

Outline

Context: Wri5en Development Discussions Case Study: Development Mailing List of 2 Open Source Projects

Results: Automatic Classification of Relevant Contents in Developers’ Communication

2

Open Source (OS) and Industrial Projects

3


4


5


6

Development Communication Means

Recommender systems: -‐‑  Bug Triaging [1] -‐‑  Suggest Mentors [2] -‐‑  Code re-‐‑documentation [3] -‐‑  Etc.

[1] Anvik et al. “Who should fix this bug?”. [2] Canfora et al. “Who is going to mentor newcomers in open source projects?” [3] Panichella et al. “Mining source code descriptions from developer communications” 7


8


[1] Bacchelli et al. “Content classification of development emails”. [2] Cerulo et al. “A Hidden Markov Model to detect coded information islands in free text.”

9

Different Kinds of Data

Structured

Semi-‐‑Structured

Unstructured

10

A Considerable Effort for Developers

Many messages

Developers get lost in unnecessary details missing potential useful information…

11

Previous Work

12

Hana et al.

“…Lazy” RTC occurs when a core developer post a change to a mailing lists and nobody responds, it assumed that other developers reviewed the

code…”

Previous Work

Approaches for: -‐‑  Generating summaries of emails. à Lam et al. , à Rambow et al.

-‐‑  Generating summaries of bug reports. à Rastkar et al.

13

Different Purposes

Feature requests

Bug disclosures

Project Management

14

DECA (Development Email Content Analyzer)

An approach to Classify Paragraphs According to Intentions

hSp://www.ifi.uzh.ch/seal/people/panichella/tools/DECA.html 15

Why use NLP for Classifying Paragraphs According to

Intentions?

16

Example

i.  We could use a leaky bucket algorithm to limit the bandwidth

ii. The leaky bucket algorithm fails in limiting the bandwidth

17



An high percentage of words in common

Example

18



Discuss about the same topics

Example

19



Have different intentions

Example

20



Have different intentions

Example

“Techniques based on lexicon analysis, such as VSM [1], LSI [2], or LDA [3] would not be sufficient to classify paragraphs according to intentions”. .

[1] Baeza-‐‑Yates et al. “Modern Information Retrieval”. [2] de Marneffe et al., “The Stanford typed dependencies representation”. [3] Blei et al., “Latent dirichlet allocation”.

21

Perspective

22

Goal: Understanding to what extent NL parsing could be used in recognizing informative text fragments in emails from a software maintenance and evolution perspective Quality focus: Detection of text paragraphs in development discussions containing helpful information for developers. Perspective: Guide developers in maintaining and evolving their products.

Case Study

23

Research Questions

RQ1: Can an NLP approach (i.e. DECA) be effective in classifying writers’ intentions in development emails?

RQ2: Is DECA more effective than existing Machine Learning techniques in classifying development emails content?

24

Qt

Ubuntu

Context

25

STEPS: 1) Taxonomy Definition

2) Classification Based on DECA (NLP Analyzer)

26

Taxonomy Definition

27

Sampling We selected 100

Of the Project

28

Clustering

Clusters Implementation

Technical Infrastructure Project Status

Social Interations Usage

Discarded

Guzzi et. al – MSR2013

29

Clustering

Guzzi et. al – ICSE2012

30

The final taxonomy

31

Differences with Guzzi et. al.

32

Examples

33

Natural Language Parsing

DECA (Development Email Content Analyzer)

34

Recurrent Linguistic PaSerns

35

Why NL parsing? Well defined predicate-‐‑argument structures

use

we could algorithm

a leaky bucket

limit

to bandwidth

the

nsubj aux dobj xcomp

det amod nn aux dobj

det

fails

algorithm

the leaky bucket

in

limiting

bandwidth

the

nsubj prep

det amod nn pcomp

dobj

det

36

NL parsing Natural Language Templates

use

[someone] could [something]

nsubj aux dobj

fails

[somehing]

nsubj

37

Natural Language Templates

use


nsubj aux dobj

fails

[somehing]

nsubj

NL parsing

38

Natural Language Templates

use


nsubj aux dobj

fails

[somehing]

nsubj

NL parsing

39

NLP Heuristics

40

NLP Parser

raw text NLP parser NLP heuristics

41

42

43

RQ1: Is DECA effective in

classifying writers’ intentions in development emails?

44

Experiment I

training

test

102 87

100

45

Experiment I

training

test

102 87

100

Experiment II False

Negative 46

Experiment II

training 100 169

test 100

Experiment III False

Negative 47

Experiment III

training 100 231

test 100

48

49

50

51

52

53

54

RQ2: Is the proposed approach more

effective than existing ML in classifying development emails content?

55

ML for Email Classification

An Approach Based on ML for Email Content Classification

à Antoniol et. al., CASCON 2008 à Zhou et al. , ICSME 2014

56


An Approach Based on ML for Email Content Classification 1)Text Features

57



2) Split training and test sets

58




3) Oracle building

59




3) Oracle building

4) Classification

training

prediction

à Antoniol et. al., CASCON 2008 à Zhou et al. , ICSME 2014

60

61

62

63

64

65

66

67

68

69

Summary

•  RQ2: DECA outperforms traditional ML techniques in terms of recall, precision and F-Measure when classifying e-mail content.

•  RQ1: the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments).

70

Summary

•  RQ2: DECA outperforms traditional ML techniques in terms of recall, precision and F-Measure when classifying e-mail content.

”…it took the MSR community more than 10 years to figure out that machine learning is not the best method for analyzing human-written text. Thank you for helping move the field forward…” [One of the ASE Reviewers]

•  RQ1: the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments).

71

72

Code e-‐‑documentation

àPanichella et. al. – ICPC 2012 Extract methods’ descriptions from developers discussions

à Vector Space Models à ad hoc heuristics

“… several are the discourse paIerns that characterize false negative method descriptions… “

73

Code re-‐‑documentation “… several are the discourse

paIerns that characterize false negative method descriptions… “

74



75



76



77



78



79

Code re-‐‑documentation

delete

80

Conclusion

81

Conclusion

82

Conclusion

83

Conclusion

84

Conclusion

85

Conclusion

86

Future work

1)DECA as preprocessing support to discard irrelevant sentences in summarization

approaches

87

Future work

1)DECA as preprocessing support to discard irrelevant sentences in summarization

approaches

2)DECA in combination with topic models for mining

contents with the same intentions and the same topics

88

Presentations & Public Speaking

Development Emails Content Analyzer: Intention Mining in Developer Discussions