51
Improving the Performance of Database-Centric Applications Through Program Analysis 1 Tse-Hsun (Peter) Chen Supervisor: Dr. Ahmed E. Hassan

Improving the Performance of Database-Centric Applications Through Program Analysis

Embed Size (px)

Citation preview

Page 1: Improving the Performance of Database-Centric Applications Through Program Analysis

1

Improving the Performance of Database-Centric Applications Through Program Analysis

1

Tse-Hsun (Peter) ChenSupervisor: Dr. Ahmed E. Hassan

Page 2: Improving the Performance of Database-Centric Applications Through Program Analysis

2

Databases are the backbone of large-scale software applications

Database

Page 3: Improving the Performance of Database-Centric Applications Through Program Analysis

3

The key to improving the performance of database-centric applications is not only improving the backend database management system, but also improving the database access code, which is rarely considered in prior studies.

Thesis statement

Page 4: Improving the Performance of Database-Centric Applications Through Program Analysis

4

Focus of the thesis

ApplicationSource Code Database

abstraction framework

SQL queries Database

This thesis focuses on the database access code at the application-level

Most prior work in the database community focus on DB and SQL queries

Database access code

Page 5: Improving the Performance of Database-Centric Applications Through Program Analysis

5

Related publications• Chapter 1: Improving the Quality of Large-Scale Database-Centric Software

Systems by Analyzing Database Access Code, International Conference on Data Engineering, PhD Symposium (ICDE-PhD), 2015

• Chapter 4: An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems, International Conference on Mining Software Repositories (MSR), 2016

• Chapter 5: Detecting Performance Anti-patterns for Applications Developed Using Object-Relational Mapping, International Conference on Software Engineering (ICSE), 2014

• Chapter 6: Detecting Problems in Database Access Code of Large Scale Systems – An Industrial Experience Report, International Conference on Software Engineering, Software Engineering in Practice Track (ICSE-SEIP), 2016

• Chapter 7: Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks, IEEE Transactions on Software Engineering (TSE), 2016

• Chapter 8: CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based Database-centric Web Applications, International Symposium on the Foundations of Software Engineering (FSE), 2016

Page 6: Improving the Performance of Database-Centric Applications Through Program Analysis

6

Goals of the literature review

What is missing in prior studies that use program analysis to improve database access code?

Can we further improve the performance of database-centric applications from the point of view of improving application source code?

Page 7: Improving the Performance of Database-Centric Applications Through Program Analysis

7

Literature review of using program analysis to improve database access code

SQL synthesis and

transformation

Domain-specific languages and

APIs

Anti-pattern detection

Page 8: Improving the Performance of Database-Centric Applications Through Program Analysis

8

SQL synthesis and

transformation

Domain-specific languages and

APIs

Anti-pattern detection

F1: Limited tooling support for detecting anti-patterns in database access codeF2: Limited research on detecting performance problems or pinpointing the root cause for database access code

• Static anti-pattern detection tools:• Static analysis framework for database access code: [Dasgupta et al.]• State-of-the-art static anti-pattern detection frameworks: [FindBugs, Coverity, PMD]

• Detecting anti-patterns in database access code:• Anti-pattern in database schema: [Nijjar and Bultan]• Static SQL validation by string analysis: [Gould et al.]• Linking code and SQL: [Tamayo et al.]• Deadlock detection: [Grechanik et al.]

Literature review of using program analysis to improve database access code

Page 9: Improving the Performance of Database-Centric Applications Through Program Analysis

9

SQL synthesis and

transformation

Domain-specific languages and

APIs

Anti-pattern detection

F3: Prior studies focus on optimizing SQL queries but do not consider how the queried data would be used in the application

• Batching SQL queries: [Cheung et al.]• Pre-fetching SQL queries: [Ramachandra et al.]• Asynchronous query execution: [Chavan etal.] • Translate application logic to stored procedure calls: [Cheung et al.]• Synthesize SQL queries from Java code: [Cheung et al.]

Literature review of using program analysis to improve database access code

Page 10: Improving the Performance of Database-Centric Applications Through Program Analysis

10

SQL synthesis and

transformation

Domain-specific languages and

APIs

Anti-pattern detection

F4: Most prior approaches only consider providing APIs to improve SQLs, but not the code for interacting with database abstraction frameworks

• Domain-specific languages for parallel query execution: [Ackermann et al.]• Domain specific language that compile list iterations into SQL queries: [Grust et al]• Java APIs that can be translated to SQL queries: [Iu et al.]

Literature review of using program analysis to improve database access code

Page 11: Improving the Performance of Database-Centric Applications Through Program Analysis

11

Object-Relational Mapping (ORM) eliminates the gap between objects and SQL

Database

• Less boilerplate code • Object-DB translations are done automatically• More than 67% of developers use ORM

JavaClasses

Benefits over raw SQLs

ORM

Much less code and shorter development time

Page 12: Improving the Performance of Database-Centric Applications Through Program Analysis

12

The key to improving the performance of database-centric applications is not only improving the backend database management system, but also improving the database access code, which is rarely considered in prior studies.

Thesis statement

Page 13: Improving the Performance of Database-Centric Applications Through Program Analysis

13

Part I: Empirical Study• How is ORM code maintained?

Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data

anti-patterns

Page 14: Improving the Performance of Database-Centric Applications Through Program Analysis

An example class with Java ORM code

14

@Entity@Table(name = “user”)@Cachablepublic class User{

@Column(name=“id”)private int id;

@Column(name=“name”)String userName;

@OneToMany(fetch=FetchType.EAGER)List<Team> teams;public void setName(String n){

userName = n;}

… other getter and setter methods

User.javaUser class is

mapped to “user” table in DB

id is mapped to the column “id” in the

user table

A user can belong to multiple teams

Eagerly retrieve associated teams when retrieving a

user object

Performance-related configs

Page 15: Improving the Performance of Database-Centric Applications Through Program Analysis

Accessing the database using ORM

15

User u = findUserByID(1);

ORMdatabase

select u from user where u.id = 1;

u.setName(“Peter”);

update user set name=“Peter” where user.id = 1;

Objects SQLs

Page 16: Improving the Performance of Database-Centric Applications Through Program Analysis

16

ORM is not a silver bullet

Using ORM comes with some hidden costs…

We have been working with developers to understand how to improve ORM code maintenance

[MSR 2016]

Page 17: Improving the Performance of Database-Centric Applications Through Program Analysis

17

Study ORM code changes overtime

[MSR 2016]

Files with ORM code

We automatically identify and classify ORM code into different types, and study their changes overtime

DB mapping

code

Regularcode

Perf config code

Page 18: Improving the Performance of Database-Centric Applications Through Program Analysis

Maintenance activities of each type of ORM code

18

Database mapping code@Table(name = “user”)

Performance configurationquery.cache()

ORM query codeUser u = q.getSingleResult();

[MSR 2016]

35% of ORM code changes

55% of ORM code changes

< 10% of ORM code changes ORM

configurations are rarely tuned!

Page 19: Improving the Performance of Database-Centric Applications Through Program Analysis

19

Part I: Empirical Study• How is ORM code maintained?

Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data

anti-patterns

Page 20: Improving the Performance of Database-Centric Applications Through Program Analysis

Caching helps performance

20[FSE 2016]

User u = findUserByID(1);

ORMdatabase

ORMCache

Application server

User u = findUserByID(1);

Page 21: Improving the Performance of Database-Centric Applications Through Program Analysis

Finding optimal ORM cache configuration is difficult

21[FSE 2016]

• Optimal cache configurations may change when the workload changes significantly

• There can be hundreds of potential cache locations in the code

User

Application server

Page 22: Improving the Performance of Database-Centric Applications Through Program Analysis

Approach: Link REST calls to database accesses

22

Tomcat HTTP server

10.10.10.1 - - [11/Apr/2015:12:19:30] 200 “GET /app/user/1 ” …

User Table

User

[FSE 2016]

Log files

Page 23: Improving the Performance of Database-Centric Applications Through Program Analysis

Approach: Apply static analysis to extract database-access information

23

@Get@Path(“/user/{id}”)public User getUser(id){ … select from User u where u.id = id …}

Map annotations to corresponding REST calls

Apply inter-procedural data flow analysis to see if REST inputs are used as criteria for database queries

10.10.10.1 - - [11/Apr/2015:12:19:30] 200

“GET /app/user/1 ” …

[FSE 2016]

Page 24: Improving the Performance of Database-Centric Applications Through Program Analysis

Running performance tests for evaluation

24

Application server Database

• We use JMeter tests to generate the load• Database is populated with hundreds of

MB of data

[FSE 2016]

Page 25: Improving the Performance of Database-Centric Applications Through Program Analysis

Performance improved significantly after using optimized cache

25

NoCache DeveloperCache0%

20%

40%

60%

80%

100%

PetClinicCloudStoreOpenMRS

% o

f im

prov

emen

t in

thro

ughp

ut

[FSE 2016]

Page 26: Improving the Performance of Database-Centric Applications Through Program Analysis

26

Part I: Empirical Study• How is ORM code maintained?

Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations • Statically detecting ORM anti-patterns• Dynamically detecting redundant data

anti-patterns

Page 27: Improving the Performance of Database-Centric Applications Through Program Analysis

Developers are often not aware of database access

Wow! I don’t need to worry

about DB code!

ORM code with performance anti-patterns

27

Bad applicationperformance

The performance impact can be SIGNIFICANT!

[ICSE 2014 & 2016]

Page 28: Improving the Performance of Database-Centric Applications Through Program Analysis

Coverity PMD Google error-prone

Facebook InferFindBugs

Existing tools do not look for performance problems

28

These tools only support detecting language and functional related problems.

[ICSE 2014 & 2016]

Page 29: Improving the Performance of Database-Centric Applications Through Program Analysis

Statically detecting one-by-one processing

29

First find all the methods that read/write from/to DBMS

Class User{getUserById()…getUserAddress()…

}

Identify the positions of all loopsfor each userId{

foo(userId)}

Check if the the call graph calls any database-accessing method

foo (userId){getUserById(userId)

} [ICSE 2014 & 2016]

Page 30: Improving the Performance of Database-Centric Applications Through Program Analysis

Assessing anti-pattern impact by fixing the anti-patterns

ExecutionResponse timeFor u in users

u.getName

Code with anti-patterns

Get users in a batchFor u in users:

u.getName

Code without anti-patterns 30

Execute test suite 30 times

Response time after fixing the anti-patterns

Avg. % improvement

Execution

Execute test suite 30 times [ICSE 2014]

Page 31: Improving the Performance of Database-Centric Applications Through Program Analysis

Performance anti-patterns have medium to large impacts

One-by-one processing0%

20%

40%

60%

80%

100%

31

% o

f im

prov

emen

t in

resp

onse

tim

e

[ICSE 2014]

Page 32: Improving the Performance of Database-Centric Applications Through Program Analysis

Extending static anti-pattern detection framework

32

Source code

• DBChecker looks for both functional and performance anti-patterns

• DBChecker is able to detect 5 more anti-patterns that we see in practice

[ICSE 2016]

Page 33: Improving the Performance of Database-Centric Applications Through Program Analysis

DBChecker is adopted in practice

33

• DBChecker is adopted by our industrial partner

• DBChecker is executed on a daily-basis for application quality assurance

• We documented the experience we learned during the adoption processing

[ICSE 2016]

Page 34: Improving the Performance of Database-Centric Applications Through Program Analysis

Lessons learned: Handling a large number of detection results

34

• Developers have limited time to fix detected problems

• Most existing static anti-pattern detection tools do not prioritize the detected instances for the same anti-pattern

[ICSE 2016]

Page 35: Improving the Performance of Database-Centric Applications Through Program Analysis

35

Solution: Prioritizing based on DB tablesUser

Time zone

• Problems related to large or frequently-accessed tables are ranked higher (more likely to be performance bottlenecks)

• Problems related to highly dependable tables are ranked higher

[ICSE 2016]

Page 36: Improving the Performance of Database-Centric Applications Through Program Analysis

36

Part I: Empirical Study• How is ORM code maintained?

Part II: Approaches for Improving the Performance of Data Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data

anti-patterns

Page 37: Improving the Performance of Database-Centric Applications Through Program Analysis

ORM often requests redundant data

ORM

37

Our goal is to identify redundant data anti-pattern between the needed data

in the application and the ORM requested data

Needed data in the application Requested data

[TSE 2016]

Page 38: Improving the Performance of Database-Centric Applications Through Program Analysis

Finding redundant data anti-patterns

Source Code

Database-

accessing methods

Exercising the application

Requested data

from the DB

Static analysis

38

Compile

Executable

Needed data in

the code

Executed methods

[TSE 2016]

Page 39: Improving the Performance of Database-Centric Applications Through Program Analysis

Update all

Select all

Excessive dataData is not updated in the code, but is updated by ORM-generated SQL

Data in some tables are not used in the code, but data is retrieved

Some data in the same table is not needed, but is retrieved 39

Discovered three types of redundant data anti-patterns

[TSE 2016]

Page 40: Improving the Performance of Database-Centric Applications Through Program Analysis

Assessing anti-pattern impact by fixing the anti-patterns

ExecutionResponse timeSelect * from

User…

Code with anti-patterns

Select user name from User…

Code without anti-patterns 40

Execute test suite 30 times

Response time after fixing the anti-patterns

Avg. % improvement

Execution

Execute test suite 30 times [TSE 2016]

Page 41: Improving the Performance of Database-Centric Applications Through Program Analysis

Performance improvement after removing anti-patterns

0%

20%

40%

60%

80%

100%

4.00% 5.40% 4.00%

30%

0% 0%

31%

0%

92%

BL EA PC

41

Anti-patterns have different impacts on different workloads, but removing them can give significant performance improvements

in general[TSE 2016]

Select all Update all Exc. data

% o

f im

prov

emen

t in

resp

onse

tim

e

Page 42: Improving the Performance of Database-Centric Applications Through Program Analysis

42

Page 43: Improving the Performance of Database-Centric Applications Through Program Analysis

43

Page 44: Improving the Performance of Database-Centric Applications Through Program Analysis

44

Page 45: Improving the Performance of Database-Centric Applications Through Program Analysis

45

Page 46: Improving the Performance of Database-Centric Applications Through Program Analysis

46

Page 47: Improving the Performance of Database-Centric Applications Through Program Analysis

47

Page 48: Improving the Performance of Database-Centric Applications Through Program Analysis

48

Page 49: Improving the Performance of Database-Centric Applications Through Program Analysis

49

Page 50: Improving the Performance of Database-Centric Applications Through Program Analysis

50

Page 51: Improving the Performance of Database-Centric Applications Through Program Analysis

Summary of our the literature review

Most tools and prior studies do not focus

on database performance anti-

patterns

Most prior studies focus on improving SQLs, but developers nowadays work with database

abstraction frameworks