Upload
sailqu
View
48
Download
1
Embed Size (px)
Citation preview
What Are the Characteristics of High-rated Apps?
A Case Study on Free Android Applications
David LoYuan Tian MeiyappanNagappan
Ahmed E. Hassan
2
Dramatic mobile app market & mobile app usage growth in recent years.
“
”
In 2012 the app economy was worth $53Bn and is expected to expand at a 28% CAGR up to 2016, reaching $143Bn.
3
Mobile app market is attractive, however building a successful app is hard.
Highly Competitive:With other 50,000 apps in the app store
Limited Chance: Smartphone users stick to limited mobile apps.
How to make a successful app? - Let’s contrast high- and low-rated apps!
5
Prior Studies
• Bavota et al. find that low-rated apps have method calls to APIs that are more change or fault prone.(Bavota et al., TSE 2015)
• Taba et al. find that low-rated apps have more complex user interface. (Taba et al., ICWE 2014)
• Chia et al. find that number of required permissions impacts app ratings (Chia et al. WWW 2012)
• Many other studies.
6
This Study
• Consider a comprehensive set of factors that may impact app ratings instead of only a few.
‒ Consider existing factors‒ Plus additional factors not considered before
• Compare the relative importance of each of the factors on app ratings
‒ In predicting high-rated and low-rated apps‒ On a reasonably large dataset of more than 1000 apps
7
Agenda
• Motivation• Factors• Case study• Discussion• Conclusion
8
We analyze 28 factors in 8 dimensions
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
9
App Size Dimension
Why App Size?
+ Large code => richer functionality
- Larger code => higher chance for bugs (Zimmermann, 2007)
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
10
Why Code Complexity?
+ More complex code => more advanced functionality
- More complex code => more bugs (Subramanyam, 2003)
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
Code Complexity Dimension
11
Why Library Dependence?
+ Higher dependence => richer functionality built upon third party code
- Higher dependence => difficulty to keep up with library evolution, which results in bugs. (Syer, 2014)
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
Library Dependence Dimension
12
Why Library Quality?
- Buggy library code => buggy apps built on top of them.
- Frequently changed libraries => bugs if apps are not properly maintained
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
Library Quality Dimension
13
Why UI Complexity?
- More complex UI => app is harder to use.
+ More complex UI => more functionality.
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
UI Complexity Dimension
14
Why User Requirements?
+ Larger target SDK version => incorporation
of latest feature, active maintenance effort.
+ Number of permission request => more features.
- Number of permission request => privacy risk.
User Requirement Dimension
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
15
Why Marketing Effort?
+ More marketing effort => better first impression,
more functionality.
Marketing Dimension
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
16
Why Category?
Different category => different user expectation.
Category Dimension
Cate-gory
App Size
Code.Comp.
LibraryDepend.
LibraryQuality
UIComp.
User Req.
Market-ing
17
Factors in App Size Dimension
• Binary size of the APK file (measured in KB) Install Size
• Total number of classes (including library code). Total classes
• Total number of app specific classes. App classes
• Total number of activities defined in the AndroidManifest.xml file.
# Activities
• Total number of services defined in the AndroidManifest.xml file.
# Services
18
Factors in Code Complexity Dimension
• Chidamber and Kemerer’s object oriented complexity metrics, e.g., the number of methods in each class.
• Note that we compute the mean over all classes in each app.
Six CK metrics
• Mean of the number of other classes that depend upon each class.
Afferent coupling
• Mean of the number of public methods in each class.
Number of public methods
19
Factors in Library Dependence Dimension
• Total number of (percentage of) calls to libraries that start with “android.”.
Absolute (percentage) dependence on Android
• Total number of (percentage of) calls to third party libraries.
Absolute (percentage) dependence on third party libraries
20
Factors in Library Quality Dimension
• Mean number of methods changed in used Android APIs (Bavota et al. 2015)
Change of used Android APIs
• Mean number of bugs in the used Android APIs (Bavota et al. 2015)
Faultiness of used Android API
21
Factors in UI Complexity Dimension
• Mean number of input elements per layout.
Input elements per layout
• Mean number of output elements per layout.
Output elements per layout
22
Factors in User Requirements Dimension
• The minimum SDK version required for the app to run.
Minimum SDK version
• The SDK version that the app targets. If not set, the default value equals to minSDK.
Target SDK version
• Number of required features from user’s device (e.g., camera).
Required device features
• Number of permissions needed from user. Required user permission
23
Factors in Marketing Effort Dimension
• Number of words appearing in the description of the app in its Play Store page.
Length of description
• Number of images shown on the app’s store page.
Promotional images
24
All the factors could be calculated by using tools including: ApkTool, dex2jar, BCEL based on app apk and info on app store.
25
Meta Data
Extract: Category, SizeRating, Rating Count
Google Play
Extract: Marketing
Extract: Size
APKsApkTool
AndroidManifest Files, Resource
Extract: Requirements on Users, UI
BCEL
dex2jar
Extract: Code Complexity, Library Dependence, quality of library Code
Jars
Android API Changeand Bug Logs
Step 1
Step 2 Step 3
26
Meta Data
Extract : Marketing
Extract : Category, Size
Rating, Rating Count
Google Play
Step 1
27
Extract: Size
Step 2
APKsApkTool
AndroidManifest Files, Resource
Extract: Requirements on Users, UI
28
BCEL
Step 3
APKs dex2jar
Extract: Code Complexity,Library Dependence, quality of library Code
JarsExtract: Size
Android API Changeand Bug Logs
Our case study is done on 1,492 android apps:
29
Step 1: Randomly selected and crawled 10,000 apps.
Step 2: Filter out apps that:‒ Have less than 10 ratings‒ Could not be processed by
our tools
Step 3: Sort apps in each category by
their ratings. Select the top 10% (high-rated) and bottom 10% (low-rated) apps.
30
Research Questions
RQ2: What are the important factors that could be
used to predict app ratings?
RQ1: Is there a relationship between each factor an app rating?
31
RQ1: Relation between factors and rating
• Compare the values of each factor between high-rated and low rated apps.
• Analyze the statistical significance and effect size of the difference between the two groups of apps.
‒ Use Mann-Whitney U test at p-value of 0.01.‒ Compute Cliff’s Delta (or d).
32
-ve
+ve
+ve
RQ1: Relation between factors and rating
33
RQ1: Summary of Findings
High-rated apps are statistically significantly different from low-rated apps in 17 out of the 28 factors.
Generally, high-rated apps are larger with more complex code, more preconditions, more marketing efforts, more dependence on libraries, and they make use of higher quality Android libraries.
34
RQ2: Important factors for prediction
• Remove highly correlated factors• Remove redundant factors• Use random forest
‒ Ten fold cross validation‒ Measure performance in terms of F1 and AUC‒ Repeat the process with different factors omitted
• Employ Scott-Knott test ‒ Identify groups of factors that are statistically
significantly different from one another
35
The RF model could achieve:F-measure of 0.74 + AUC of 0.81
RQ2: Important factors for predictionTop-3
Size of the app
# Promotional Images
Target SDK
RQ2: Important factors for prediction
36
RQ2: Summary of Findings
The size of an app, the number of promotional images on its store page, and the target SDK are the three most influential factors in determining the likelihood of an app being a high rated app.
37
Discussion: Comparison with Past Findings
• Reinforce findings by Bavota et al. (TSE’15)‒ API quality influence rating‒ However, it is not the most important factors (#5)
• Refute findings by Taba et al. (ICWE’14)‒ UI complexity is not statistically significantly related with
app ratings.
• Reinforce findings by Chia et a. et al. (WWW’12)‒ # required permission is weakly associated with rating
• Highlight additional factors: app size, promotion effort, target sdk.
38
Discussion: Power of Multi-Factor Analysis
Precision Recall F-measure0
0.10.20.30.40.50.60.70.8
All Size Code ComplexityDependence Library Quality UI ComplexityRequirement on User Marketing Category
Single dimension factors are not enough to successfully differentiate high-rated from low-rated apps.
0.7
39
40
Future Work
• Explore additional factors• Do a more fine-grained study (individual category)• Employ a causality analysis to get a deeper
understanding• Interview Android app developers to get their
insight
41
Questions? Comments? Advice?{yuan.tian.2012,davidlo}@smu.edu.sg
[email protected], [email protected]
Thank You !