Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs

Preview:

DESCRIPTION

This is the presentation slides for our paper in WCRE 2012: Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs (http://webdocs.cs.ualberta.ca/~chenlei1/publication/Zhang-wcre-2012.pdf).

Citation preview

Understanding Android Fragmentation with Topic

Analysis of Vendor-Specific Bugs

Dan Han, Chenlei Zhang, Xiaochao Fan, Abram Hindle, Kenny Wong and Eleni StrouliaDepartment of Computing Science

University of Alberta

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Outline

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Introduction

Hardware-Based Fragmentation

http://www.android.com/devices/?country=all

Software-Based Fragmentation

http://www.blackeco.com/petites-speculations-autour-de-la-prochaine-version-dandroid/

Why do we care

More than 20 Android device manufacturers

Multiple Android versions

6

Hundreds of different Android devices

Developers

Users

Stakeholders

What do we do in this study

Goal: search for evidence of Android fragmentation within Android ecosystem based on the Android bug reports

Approach: apply topic model and topic analysis

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Previous Work

Topic Model and Topic Analysis

Topic Model: a statistical model for discovering abstract topics that occur in a collection of documents. Latent Dirichlet Allocation (LDA)

Topic Analysis: extract and evaluate the topics from a corpus of text documents through topic models Traceability recovery: Asuncion et al., Lukins et al., Hindle et

al.

Feature location: Marcus et al., Poshyvanyk et al., Grant et al.

Software evolution and trend analysis: Thomas et al., Martie et al.

Differences between previous work and our work

Previous work applied unsupervised topic models, e.g. LDA

We performed Labeled-LDA, a supervised topic model to analyze topic evolution

We compared the performance between LDA and Labeled-LDA on our dataset

LDA and Labeled-LDA

Labeled-LDA A novel method applied in

software engineering so far

Manual labeling Supervised topic

modeling algorithm Labeled-LDA only

predicts the relevance between each document and its labels

LDA Well studied in software

engineering Unsupervised topic

modeling algorithm Need documents and

number of topics N as input

LDA predicts the relevance between each document and all the N topics

Difference between a topic and a label

Topic: A word distribution extracted from bug

reports by topic models

Label: The annotation of a document

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Methodology

Methodology

Create labels for Android bug reports

Feature-oriented labels for Android bug reports

Android labels Features in Android versionse.g. Language, Bluetooth

Popular applicationse.g. Google Maps, Gmail

Hardware of Android devices e.g. Keyboard, GPS

Label Android bug reports

60 person-hours of manual labeling effort

Labeled bug reports are public now

HTC – 72 labels in total Motorola – 58 labels in total

Apply Labeled-LDA

Apply LDA

Try a range of N to find the most distinct topics

Label each topic using our manual labels for the bug reports of HTC and Motorola

2 hours of labeling effort

Comparing LDA and Labeled-LDA

Each topic model generates the document-topic matrix

Determine if LDA generates similar results to Labeled-LDA

Compute and compare the Jaccard similarity of documents related to each topic generated by LDA and Labeled-LDA

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Comparing Topic Models

Comparing Topic Models in HTCPairwise Jaccard Similarity between each topic in LDA and Labeled-LDA

Labeled-LDA

LDA

Diagonal Entries in HTC

Labeled-LDA

LDA

Comparing Topic Models in MotorolaPairwise Jaccard Similarity between each topic in LDA and Labeled-LDA

LDA

Labeled-LDA

Conclusion of comparing LDA and Labeled-LDA

Mean Jaccard similarities of the diagonal entries are 0.2 for HTC and 0.08 for Motorola

The number of bug reports related to same labels in LDA and Labeled-LDA are different ( tests: p<0.01) for both HTC and Motorola

Labeled-LDA produced more feature relevant topics than LDA

2

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Topic Analysis

Categorized Topics

Common Topics Unique Topics

Unique TopicsCommon

Topics Unique Topics

HTC Motorola

Common Topics

Both vendors share many identical topic wordsLabel HTC Motorola

bluetooth bluetooth, headset, car, connect, device, connection, version, data, app, desire, 2.2, work, connects, behavior, 2.1

bluetooth, headset, droid, device, connected, connect, devices, calls, car, issue, connection, 2.2, car, pair, time

Relevance of common topic “bluetooth” in HTC and MotorolaAndroid 2.1 Android 2.2

Common Topics

Topics of each vendor tend to have vendor-specific terms

Label HTC Motorola

display screen, version, desire, behavior, app, home, number, code, final, press, sure, user, black, new, power

droid, screen, button, correct, home, display, behavior, landscape, 2.1, menu, bar, xoom, device, user, status

http://www.motorola.com/xoomhttp://en.wikipedia.org/wiki/Motorola_droid

http://en.wikipedia.org/wiki/HTC_Desire

Unique Topics in HTCLabel HTC Motorola

keyboard keyboard, input, text, key, version, number, typing, on-screen, mode, field, landscape, virtual, keys, type, message

keyboard, droid, keys, text, press, space, box, open, device, key, app, software, 2.0.1, landscape

Relevance of unique topic “keyboard” in HTCAndroid 2.1Android 1.5

Unique Topics in Motorola

Label HTC Motorola

GPS gps, data, position, location, maps, google, time, lock, wrong, icon, turn, home, latitude, unit, tag, available

maps, gps, google, app, droid, location, application, navigation, map, device, traffic, time, upgrade, turn, route

Relevance of unique topic “GPS” in Motorola

Android 2.2

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Discussion

Fragmentation Discussion

Software-Based Fragmentation New features and changes contribute the bug reports Difficult to test across all of the vendor and product-lines

Relevance of common topic “bluetooth” in HTC and Motorola

Android 2.1 Android 2.2

Fragmentation Discussion

Hardware-Based Fragmentation Different product lines were associated with different topics Evident by differing bug topics and product specific issues

Label HTC Motorola

display screen, version, desire, behavior, app, home, number, code, final, press, sure, user, black, new, power

droid, screen, button, correct, home, display, behavior, landscape, 2.1, menu, bar, xoom, device, user, status

• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion

Conclusion

ConclusionFound how fragmentation is manifested within Android between HTC and Motorola

Incompatibility issues Portability issues

Compared the performance of Labeled-LDA and LDA

Labeled-LDA produced more feature relevant topics than LDA Labeled-LDA need more manual effort http://softwareprocess.es/static/Fragmentation.html (http://goo.gl/SwGDT)

Could be useful to make project dashboards, process mining and software process recovery

Recommended