34
UNCOVERING HIDDEN RELATIONSHIPS FROM PAST CHANGES: LOGICAL DEPENDENCIES AND ITS APPLICATIONS MSR Asia Summit 2013 October 28 th , 2013 Marco Aurélio Gerosa / Gustavo Ansaldi Oliva Computer Science Dept. - University of São Paulo {gerosa,goliva}@ime.usp.br Kyoto Research Park Kyoto, Japan

Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Embed Size (px)

DESCRIPTION

Talk in MSR Asia

Citation preview

Page 1: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

UNCOVERING HIDDEN RELATIONSHIPS FROM PAST CHANGES:

LOGICAL DEPENDENCIES AND ITS APPLICATIONS

MSR Asia Summit 2013 October 28th, 2013

Marco Aurélio Gerosa / Gustavo Ansaldi Oliva

Computer Science Dept. - University of São Paulo

{gerosa,goliva}@ime.usp.brKyoto Research ParkKyoto, Japan

Page 2: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Dependencies

Page 3: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Structural dependencies3/31

C l as sA C l as sBdoT h isF orM e()

B, I need your help to do my work. Could you do this task for me?

Hummm, someone is asking me to do something. Let’s do it!

Page 4: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Software dependencies in general4/31

A uses B if correct execution of B may be necessary for A to complete the task described in its specification [Parnas, 1979].

A depends on B if the last is needed to compile or link A [Lakos, 1996].

A dependency relation means that the semantics of the depending elements is semantically or structurally dependent on the definition of the supplier element [UML Formal Specification].

A dependency means that a client element has knowledge of the supplier element and a change in the supplier may affect the client [Larman, 2004].

A B

Page 5: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Dependencies can be harmful

5/31

A

BH

G

F E

D

C

• More dependencies, more maintenance effort [Banker et al.1998].

• More dependencies, more defects [Cataldo et al. 2009].

• Dependencies can lead to negative ripple effects [Pressman, 2001]

C l as sA C l as sBdoT h isF orM e()

C l as sCdoS tu f f ( )

Page 6: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Dependencies may be hard to identify

6/31

Publisher/subscriber, polymorphism, clones, crosscutting concerns, semantic relations etc.

Unrecognized dependencies result in a higher number of defects [Herbsleb et al. 2006].

A dependency means that a client element has knowledge of the supplier element and a change in the supplier may affect the client [Larman, 2004].

D ependency Chang e

Page 7: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Logical dependencies7/31

Files frequently changed together share some sort of dependency [Gall et al. 1998]

Logical dependencies better predict failures and quality compared to syntactic dependencies [Cataldo et al. 2009]

Developers should focus on identifying less-explicit relationships rather than obvious and explicit syntactic dependencies [Cataldo, 2010]

A B

Page 8: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Collaborative filtering8/31

Page 9: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Social aspects matter9/31

Page 10: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

How to identify logical dependencies A logical dependency from A (client) to B (provider)

occurs when changes to B are done together with changes to A

10/31

Strong Logical dependency from B to A

(the opposite is a much weaker dependency)

A B8 0%

4 0%

A logical dependency denotes an implicit and evolutionary relationship between software artifacts

Page 11: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Clustering Classes11/31

Marco A. Gerosa ([email protected])

Clustering classes based on modification records [Ball et al., 1997]Logical dependencies characterized as the probability of 2 classes changing togetherThe cluster identified semantic-related classesThomas Ball, Yung-Min Kim, Adam Porter, Harvey Siy. If Your Version Control System Could Talk... Presented at the Workshop on Process Modelling and Empirical Studies of Software Engineering, ICSE 97, May 1997, MA, Boston

Page 12: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Logical coupling based on releases periods Gall et al. [1998] proposed an

approach for logical dependency identification

“Our technique reveals hidden dependencies not evident in the source code, identifies modules that should undergo restructuring, and is based on minimal amount of data”

Use of the term for the first time

12

Coupling among subsystems [Gall et al., 1998]

A.ab.144 <1,2,4,7>C.bc.201 <1,2,4,7>

[subsystem.module.program]Harald Gall, Karin Hajek, and Mehdi Jazayeri. 1998. Detection of Logical Coupling Based on Product Release History. In Proceedings of the International Conference on Software Maintenance (ICSM '98). IEEE Computer Society, Washington, DC, USA, 190-198.

Page 13: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Grouping consecutive changes in CVS repositories

Gall et al. [2003] defined an approach with a fixed time window to capture logical dependencies in CVS repositories

Design flaws could be discovered without any analysis of source code

13

Class 13.c.18.A was 21 times checked in together

with Class 13.c.18.B and Class 13.c.18.C

Page 14: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Association rules and sliding-time window

14

Zimmerman et al. [2005] formalized logical dependencies as association rules Frequency Support Confidence

They used a sliding-time window to recover change transactions from CVS

Correct prediction of more than 70% of following changes

Thomas Zimmermann, Peter Weissgerber, Stephan Diehl, and Andreas Zeller. 2005. Mining Version Histories to Guide Software Changes. IEEE Trans. Software Eng. 31, 6 (June 2005), 429-445. DOI=10.1109/TSE.2005.72 http://dx.doi.org/10.1109/TSE.2005.72

Page 15: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

15 After the programmer has made some changes to the source (above), ROSE suggests locations (below) where, in similar transactions in the past, further changes were made [Zimmerman et al., 2005]

Page 16: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Where are we? What are Logical Dependencies? Identification of Logical Dependencies Applications of Logical Dependencies Our Research on Logical Dependencies The Road Ahead

16/31

Page 17: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Evolution Radar Detect design issues

and opportunities for refactoring

Angle θ = alphabetical sorting of files inside M

17/35

Marco D'Ambros, Michele Lanza, and Mircea Lungu. 2009. Visualizing Co-Change Information with the Evolution Radar. IEEE Trans. Softw. Eng. 35, 5 (September 2009), 720-735. DOI=10.1109/TSE.2009.17

Page 18: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Coordination requirements18/35

Organizations often cope with complex tasks by dividing them into smaller interdependent work units and then assigning such units to teams

Coordination among teams arises as a response to such interdependent work units

Logical dependencies were applied to determine coordination requirements among developers

Marcelo Cataldo, Patrick A. Wagstrom, James D. Herbsleb, and Kathleen M. Carley. 2006. Identification of coordination requirements: implications for the Design of collaboration and awareness tools. In Proceedings of the 2006 Conference on Computer supported cooperative work (CSCW '06). ACM, New York, NY, USA.

Page 19: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Applications

Marco A. Gerosa ([email protected])

19/35

Pinzger et al. [2005] showed that it facilitates the detection of potential refactoring candidates.

D’Ambros et al. [2006] showed that logical dependencies improved bug prediction models

Breu and Zimmermann [2006] used it to identify and rank crosscutting concerns in software systems.

Page 20: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

And more…20

Logical dependencies have also been employed to The impact on failures [Cataldo et al., 2009] Change prediction and change impact analysis [Kagdi et al., 2007] Uncover cross-cutting concerns [Canfora et al., 2006] Uncover design flaws and opportunities for refactoring,

restructuring, reenginering [Beyer & Hassan, 2006] Understand and evaluate software architecture [Zimmermman et

al., 2003] Maintain documentation (internationalization, etc.) [Kagdi et al.,

2006] …

Page 21: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Where are we? What are Logical Dependencies? Foundation of Logical Dependencies Applications of Logical Dependencies Our Research on Logical Dependencies The Road Ahead

21/31

Page 22: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Structural vs logical dependencies22/31

How do structural and logical dependencies relate?

Analysis of commits of the ASF showed that: 93% of the logical dependencies did not

involve structural dependencies 95% of the structural dependencies did not

imply in a logical dependency Structural dependencies do not

frequently lead to logical dependencies (!)

D ependency Chang e

Gustavo Ansaldi Oliva and Marco Aurelio Gerosa. 2011. On the Interplay between Structural and Logical Dependencies in Open-Source Software. In Proceedings of the 2011 25th Brazilian Symposium on Software Engineering (SBES '11). IEEE Computer Society, Washington, DC, USA, 144-153.

Page 23: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

The origins of Logical Dependencies23/31

Gustavo A. Oliva, Francisco W.S. Santana, Marco A. Gerosa, and Cleidson R.B. de Souza. 2011. Towards a classification of logical dependencies origins: a case study. In Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution (IWPSE-EVOL '11). ACM, New York, NY, USA, 31-40.

Page 24: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Preprocessing change-sets to improve logical dependencies identification

24/31

Commit <> change The implementation of a single

change may span consecutive and closely related commits

Application of the CVS sliding time window approach [Zimmerman et al., 2004] to group timely-close and semantically-related change-sets, maintaining repository consistency

We were able to group ~10% of all commits in the ASF

Oliva, G. A., Santana, F., Gerosa, M. A., Souza, C. (2012), “Preprocessing Change-Sets to Improve Logical Dependencies Identification”, Sixth International Workshop on Software Quality and Maintainability (SQM 2012)

Page 25: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Next steps25/31

Develop a framework for the identification of logical dependencies

Expand previous studies Survey about logical dependencies

Page 26: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Where are we? What are Logical Dependencies? Foundation of Logical Dependencies Applying Logical Dependencies Our Research on Logical Dependencies The Road Ahead

26/31

Page 27: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Terminology Different terms have often been used interchangeably

Logical dependencies or coupling Change dependencies or coupling Evolutionary dependencies or coupling Historical dependencies or coupling Co-changes

27/31

Page 28: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

And what happens between commits?28

[Robbes et al., 2008]

Page 29: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Challenges: Understanding Influencing Factors

29/31

Improve the identification of logical dependencies Commit habits The influence of the chosen technology (VCS) The nature of changes Period of analysis Filtering options

Page 30: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

Summary30/31

Logical dependencies have been applied for a variety of purposes Useful to better understand software and

organizational aspects Complements existing approaches,

techniques, and tools Needs more investigation on its calculation

Page 31: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco Aurélio Gerosa

Gustavo Ansaldi Oliva

Thank you for your attention

Twitter: {@gerosa_marco,@golivax}[email protected]@gmail.com

Software Engineering & Collaborative Systems Research Group

http://lapessc.ime.usp.br/

31/31

Page 32: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

References32/31

D. L. Parnas. 1979. Designing Software for Ease of Extension and Contraction. IEEE Trans. Softw. Eng. 5, 2 (March 1979), 128-138. DOI=10.1109/TSE.1979.234169

John Lakos. 1996. Large-Scale C++ Software Design. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA

[UML Formal Specification] Retrieved from http://www.omg.org/technology/documents/formal/uml.htm

Craig Larman. 2004. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development (3rd Edition). Prentice Hall PTR, Upper Saddle River, NJ, USA

Marco D'Ambros and Michele Lanza. 2006. Reverse Engineering with Logical Coupling. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE '06). IEEE Computer Society, Washington, DC, USA, 189-198. DOI=10.1109/WCRE.2006.51

Banker, R., et al. Software Development Practices, Software Complexity, and Software Maintenance Performance: A Field Study. Management Science 40(4): 433–450.

Romain Robbes. Of Change and Software. Phd Thesis. University of Lugano.Some references may be missing, please enter in contact with us if you need them.

Page 33: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

References33/31

Thomas Ball, Yung-Min Kim, Adam Porter, Harvey Siy. If Your Version Control System Could Talk... Presented at the Workshop on Process Modelling and Empirical Studies of Software Engineering, ICSE 97, May 1997, MA, Boston

Cataldo, M., & Nambiar, S. (2010). The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Quality. doi:10.1002/smr

Harald Gall, Karin Hajek, and Mehdi Jazayeri. 1998. Detection of Logical Coupling Based on Product Release History. In Proceedings of the International Conference on Software Maintenance (ICSM '98). IEEE Computer Society, Washington, DC, USA, 190-198.

Harald Gall, Mehdi Jazayeri, and Jacek Krajewski. 2003. CVS Release History Data for Detecting Logical Couplings. In Proceedings of the 6th International Workshop on Principles of Software Evolution (IWPSE '03). IEEE Computer Society, Washington, DC, USA, 13-.

Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller. 2004. Mining Version Histories to Guide Software Changes. In Proceedings of the 26th International Conference on Software Engineering (ICSE '04). IEEE Computer Society, Washington, DC, USA, 563-572.

Thomas Zimmermann, Peter Weissgerber, Stephan Diehl, and Andreas Zeller. 2005. Mining Version Histories to Guide Software Changes. IEEE Trans. Softw. Eng. 31, 6 (June 2005), 429-445. DOI=10.1109/TSE.2005.72 http://dx.doi.org/10.1109/TSE.2005.72 Some references may be missing, please enter in contact with us if you need them.

Page 34: Uncovering hidden relationships from past changes: evolutionary dependencies and its applications

Marco A. Gerosa ([email protected])

References34/31

Marco D'Ambros and Michele Lanza. 2006. Reverse Engineering with Logical Coupling. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE '06). IEEE Computer Society, Washington, DC, USA, 189-198. DOI=10.1109/WCRE.2006.51

W. P. Stevens, G. J. Myers, and L. L. Constantine. 1974. Structured design. IBM Syst. J. 13, 2 (June 1974), 115-139. DOI=10.1147/sj.132.0115 http://dx.doi.org/10.1147/sj.132.0115

Marco D'Ambros, Michele Lanza, and Romain Robbes. 2009. On the Relationship Between Change Coupling and Software Defects. In Proceedings of the 2009 16th Working Conference on Reverse Engineering (WCRE '09). IEEE Computer Society, Washington, DC, USA, 135-144. DOI=10.1109/WCRE.2009.19 http://dx.doi.org/10.1109/WCRE.2009.19

Thomas Zimmermann, Stephan Diehl, and Andreas Zeller. 2003. How History Justifies System Architecture (or Not). In Proceedings of the 6th International Workshop on Principles of Software Evolution (IWPSE '03). IEEE Computer Society, Washington, DC, USA, 73-. Some references may be missing, please enter in contact with us if you need them.