What are the Characteristics of High-rated Apps

What Are the Characteristics of High-rated Apps?

A Case Study on Free Android Applications

David LoYuan Tian MeiyappanNagappan

Ahmed E. Hassan

2

Dramatic mobile app market & mobile app usage growth in recent years.

“

”

In 2012 the app economy was worth $53Bn and is expected to expand at a 28% CAGR up to 2016, reaching $143Bn.

3

Mobile app market is attractive, however building a successful app is hard.

Highly Competitive:With other 50,000 apps in the app store

Limited Chance: Smartphone users stick to limited mobile apps.

How to make a successful app? - Let’s contrast high- and low-rated apps!

5

Prior Studies

• Bavota et al. find that low-rated apps have method calls to APIs that are more change or fault prone.(Bavota et al., TSE 2015)

• Taba et al. find that low-rated apps have more complex user interface. (Taba et al., ICWE 2014)

• Chia et al. find that number of required permissions impacts app ratings (Chia et al. WWW 2012)

• Many other studies.

6

This Study

• Consider a comprehensive set of factors that may impact app ratings instead of only a few.

‒ Consider existing factors‒ Plus additional factors not considered before

• Compare the relative importance of each of the factors on app ratings

‒ In predicting high-rated and low-rated apps‒ On a reasonably large dataset of more than 1000 apps

7

Agenda

• Motivation• Factors• Case study• Discussion• Conclusion

8

We analyze 28 factors in 8 dimensions

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

9

App Size Dimension

Why App Size?

+ Large code => richer functionality

- Larger code => higher chance for bugs (Zimmermann, 2007)

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

10

Why Code Complexity?

+ More complex code => more advanced functionality

- More complex code => more bugs (Subramanyam, 2003)

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

Code Complexity Dimension

11

Why Library Dependence?

+ Higher dependence => richer functionality built upon third party code

- Higher dependence => difficulty to keep up with library evolution, which results in bugs. (Syer, 2014)

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

Library Dependence Dimension

12

Why Library Quality?

- Buggy library code => buggy apps built on top of them.

- Frequently changed libraries => bugs if apps are not properly maintained

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

Library Quality Dimension

13

Why UI Complexity?

- More complex UI => app is harder to use.

+ More complex UI => more functionality.

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

UI Complexity Dimension

14

Why User Requirements?

+ Larger target SDK version => incorporation

of latest feature, active maintenance effort.

+ Number of permission request => more features.

- Number of permission request => privacy risk.

User Requirement Dimension

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

15

Why Marketing Effort?

+ More marketing effort => better first impression,

more functionality.

Marketing Dimension

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

16

Why Category?

Different category => different user expectation.

Category Dimension

Cate-gory

App Size

Code.Comp.

LibraryDepend.

LibraryQuality

UIComp.

User Req.

Market-ing

17

Factors in App Size Dimension

• Binary size of the APK file (measured in KB) Install Size

• Total number of classes (including library code). Total classes

• Total number of app specific classes. App classes

• Total number of activities defined in the AndroidManifest.xml file.

# Activities

• Total number of services defined in the AndroidManifest.xml file.

# Services

18

Factors in Code Complexity Dimension

• Chidamber and Kemerer’s object oriented complexity metrics, e.g., the number of methods in each class.

• Note that we compute the mean over all classes in each app.

Six CK metrics

• Mean of the number of other classes that depend upon each class.

Afferent coupling

• Mean of the number of public methods in each class.

Number of public methods

19

Factors in Library Dependence Dimension

• Total number of (percentage of) calls to libraries that start with “android.”.

Absolute (percentage) dependence on Android

• Total number of (percentage of) calls to third party libraries.

Absolute (percentage) dependence on third party libraries

20

Factors in Library Quality Dimension

• Mean number of methods changed in used Android APIs (Bavota et al. 2015)

Change of used Android APIs

• Mean number of bugs in the used Android APIs (Bavota et al. 2015)

Faultiness of used Android API

21

Factors in UI Complexity Dimension

• Mean number of input elements per layout.

Input elements per layout

• Mean number of output elements per layout.

Output elements per layout

22

Factors in User Requirements Dimension

• The minimum SDK version required for the app to run.

Minimum SDK version

• The SDK version that the app targets. If not set, the default value equals to minSDK.

Target SDK version

• Number of required features from user’s device (e.g., camera).

Required device features

• Number of permissions needed from user. Required user permission

23

Factors in Marketing Effort Dimension

• Number of words appearing in the description of the app in its Play Store page.

Length of description

• Number of images shown on the app’s store page.

Promotional images

24

All the factors could be calculated by using tools including: ApkTool, dex2jar, BCEL based on app apk and info on app store.

25

Meta Data

Extract: Category, SizeRating, Rating Count

Google Play

Extract: Marketing

Extract: Size

APKsApkTool

AndroidManifest Files, Resource

Extract: Requirements on Users, UI

BCEL

dex2jar

Extract: Code Complexity, Library Dependence, quality of library Code

Jars

Android API Changeand Bug Logs

Step 1

Step 2 Step 3

26

Meta Data

Extract : Marketing

Extract : Category, Size

Rating, Rating Count

Google Play

Step 1

27

Extract: Size

Step 2

APKsApkTool

AndroidManifest Files, Resource

Extract: Requirements on Users, UI

28

BCEL

Step 3

APKs dex2jar

Extract: Code Complexity,Library Dependence, quality of library Code

JarsExtract: Size

Android API Changeand Bug Logs

Our case study is done on 1,492 android apps:

29

Step 1: Randomly selected and crawled 10,000 apps.

Step 2: Filter out apps that:‒ Have less than 10 ratings‒ Could not be processed by

our tools

Step 3: Sort apps in each category by

their ratings. Select the top 10% (high-rated) and bottom 10% (low-rated) apps.

30

Research Questions

RQ2: What are the important factors that could be

used to predict app ratings?

RQ1: Is there a relationship between each factor an app rating?

31

RQ1: Relation between factors and rating

• Compare the values of each factor between high-rated and low rated apps.

• Analyze the statistical significance and effect size of the difference between the two groups of apps.

‒ Use Mann-Whitney U test at p-value of 0.01.‒ Compute Cliff’s Delta (or d).

32

-ve

+ve

+ve

RQ1: Relation between factors and rating

33

RQ1: Summary of Findings

High-rated apps are statistically significantly different from low-rated apps in 17 out of the 28 factors.

Generally, high-rated apps are larger with more complex code, more preconditions, more marketing efforts, more dependence on libraries, and they make use of higher quality Android libraries.

34

RQ2: Important factors for prediction

• Remove highly correlated factors• Remove redundant factors• Use random forest

‒ Ten fold cross validation‒ Measure performance in terms of F1 and AUC‒ Repeat the process with different factors omitted

• Employ Scott-Knott test ‒ Identify groups of factors that are statistically

significantly different from one another

35

The RF model could achieve:F-measure of 0.74 + AUC of 0.81

RQ2: Important factors for predictionTop-3

Size of the app

# Promotional Images

Target SDK

RQ2: Important factors for prediction

36

RQ2: Summary of Findings

The size of an app, the number of promotional images on its store page, and the target SDK are the three most influential factors in determining the likelihood of an app being a high rated app.

37

Discussion: Comparison with Past Findings

• Reinforce findings by Bavota et al. (TSE’15)‒ API quality influence rating‒ However, it is not the most important factors (#5)

• Refute findings by Taba et al. (ICWE’14)‒ UI complexity is not statistically significantly related with

app ratings.

• Reinforce findings by Chia et a. et al. (WWW’12)‒ # required permission is weakly associated with rating

• Highlight additional factors: app size, promotion effort, target sdk.

38

Discussion: Power of Multi-Factor Analysis

Precision Recall F-measure0

0.10.20.30.40.50.60.70.8

All Size Code ComplexityDependence Library Quality UI ComplexityRequirement on User Marketing Category

Single dimension factors are not enough to successfully differentiate high-rated from low-rated apps.

0.7

39

40

Future Work

• Explore additional factors• Do a more fine-grained study (individual category)• Employ a causality analysis to get a deeper

understanding• Interview Android app developers to get their

insight

41

Questions? Comments? Advice?{yuan.tian.2012,davidlo}@smu.edu.sg

[email protected], [email protected]

Thank You !

Software

What are the Characteristics of High-rated Apps