Research Article The Impact of the PSP on Software …downloads.hindawi.com/archive/2014/861489.pdfResearch Article The Impact of the PSP on Software Quality: Eliminating the Learning

Research ArticleThe Impact of the PSP on Software Quality: Eliminatingthe Learning Effect Threat through a Controlled Experiment

Fernanda Grazioli, Diego Vallespir, Leticia Pérez, and Silvana Moreno

Facultad de Ingenierıa, Universidad de la Republica, Julio Herrera y Reissig 565, 11300 Montevideo, Uruguay

Correspondence should be addressed to Diego Vallespir; [email protected]

Received 30 May 2014; Revised 5 September 2014; Accepted 5 September 2014; Published 30 September 2014

Academic Editor: Robert J. Walker

Copyright © 2014 Fernanda Grazioli et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

Data from the Personal Software Process (PSP) courses indicate that the PSP improves the quality of the developed programs.However, since the programs (exercises of the course) are in the same application domain, the improvement could be due toprogramming repetition. In this research we try to eliminate this threat to validity in order to confirm that the quality improvementis due to the PSP. In a previous study we designed and performed a controlled experiment with software engineering undergraduatestudents at the Universidad de la Republica. The students performed the same exercises of the PSP course but without applying thePSP techniques. Here we present a replication of this experiment.The results indicate that the PSP and not programming repetitionis the most plausible cause of the important software quality improvements.

1. Introduction

The Personal Software Process (PSP) is a software develop-ment process for the individual [1]. The PSP helps the engi-neer to control, manage, and improve his or her work and it istaught through a course. The students (many times softwareengineers) perform several programming exercises in whichtechniques and phases of the PSP are added as the exercisesadvance. For each exercise, process data are collected.

Data from the courses indicate that the PSP improves thequality of the products developed [2–5]. One way used todetermine this is through statistical analysis of the evolutionof the results obtained by the students in each programof the course. For example, if the programs developed areof a better quality as the course progresses, then it can bestatistically inferred that the PSP is responsible for the qualityimprovement.

However, since the programs are in the same applicationdomain, the improvement could be due to programmingrepetition (i.e., the learning effect). To explore the reasons forthe improvements, we asked the following research question:Are the quality improvements observed in the PSP coursesdue to the introduction of the phases and techniques of

the PSP or due to programming repetition? To investigatethis we designed and performed a controlled experimentwithsoftware engineering undergraduate students at the Univer-sidad de la Republica. The students performed the exercisesfrom the course without applying the PSP techniques. Thismakes it possible to know if quality improves by the simplefact of programming repetition. In the context of this study,product quality is measured as defect density.

The designed experiment was executed in 2012 [6], andan exact replication was executed in 2013.The subjects of ourexperiment perform 8 program assignments without apply-ing the PSP techniques. Then, the collected data are statisti-cally analyzed and compared with the data collected duringthe regular PSP course. In this paper, we present the analysisof both executions of the experiment. The most importantresult indicates that the PSP and not programming repetitionis the most plausible cause of the important software qualityimprovements.

Section 2 briefly presents the PSP. The related works arepresented in Section 3. Section 4 presents the way in whichthe study was designed. Sections 5, 6, and 7 present the resultsof the research. Threats to validity are presented in Section 8and the conclusion is in Section 9.

Hindawi Publishing CorporationAdvances in Soware EngineeringVolume 2014, Article ID 861489, 10 pageshttp://dx.doi.org/10.1155/2014/861489

2 Advances in Software Engineering

Planning

Design

Design review

Code

Code review

Compile

Unit test

Postmortem

Code reviewsDesign reviews

PSP2 Design templates

PSP2.1

Current processTime recordingDefect recordingDefect type standard

PSP0 Coding standardSize measurement Process improvement proposal

PSP0.1

Size estimatingTest report

PSP1 Task planningSchedule planning

PSP1.1

Scripts

Project summary

Logs

Requirements

Finished product

Guide

Guide

Record timeand defects

Results

Plan

Figure 1: The PSP phases and the PSP process levels.

2. The Personal Software Process

The PSP is a software development process for the individual[1, 7].ThePSP establishes a highly instrumented developmentprocess that includes a rigorous measurement framework foreffort and quality.Theprocess includes phases and techniquesthat the engineer follows anduseswhile building the software.

For each phase, the PSP has scripts that help follow theprocess correctly. The phases are planning, detailed design,detailed design review, code, code review, compile, unit test,and postmortem. In each phase, the engineer collects dataon the time spent in the phase and the defects injected andremoved, as it is shown in Figure 1.

During the PSP course, the engineer builds programswhile he progressively learns the PSP phases and techniques.For the first exercise, the engineer starts with a simple anddefined process. As the course progresses, new process phasesand elements are added, from estimation and planning tocode reviews and to design and design review. As theseelements are added, the process changes.

There are six PSP processes, also called PSP levels: PSP0,PSP0.1, PSP1, PSP1.1, PSP2, and PSP2.1. Each process buildson the prior process by adding engineering or managementactivities, as shown in Figure 1. The PSP is in fact the PSP2.1level. The other PSP levels exist exclusively for the purposesof teaching the PSP.

The course has changed twice. The first version of thecourse, PSP for engineers I/II original, involved 10 programexercises. The second version, PSP for engineers I/II revised,involved 8 programs. The third version, PSP fundamentalsand advanced, involved 7 program assignments. All threecourse versions have been taught in different environments,

for different kinds of people all around the world: undergrad-uate students, graduate students, and professional softwareengineers [8, 9].The SEI partners generally use in their classesthe last course version, while in the academy the 10 programcourse version is commonly used. We know this by internalcommunications with the SEI and by reviewing the publishedarticles related to PSP analysis.

As an example, Table 1 shows which PSP level is appliedon each assignment in the 8-program course.

3. Related Works

The PSP has several published studies showing improvementin developer performance with process insertion. In partic-ular, some studies show that the PSP improves the quality ofthe developed products [2–5].

Normally, in these studies the data are grouped by PSPlevel in order to be able to evaluate the different techniquesand phases that are introduced and used on each PSP level.Rather than analyzing changes in group averages, thesekinds of studies focus on the average changes of individualengineers, that is, analyze the improvement in developerperformance.

Defect counts andmeasures of defect density (i.e., defectsper KLOC) have traditionally served as software qualitymeasures. The PSP uses this method of measuring productquality, as well as several process quality metrics.

Hayes and Over analyzed the data of 298 engineers whoattended the PSP for engineers I/II original course until 1997[2].Their investigation focuses on the changes across the firstthree major PSP levels (0, 1, and 2). They grouped individual

Advances in Software Engineering 3

Table 1: PSP levels applied on each program assignment in the PSP I/II revised course.

1 2 3 4 5 6 7 8PSP0 PSP0.1 PSP1 PSP1.1 PSP2 PSP2.1 PSP2.1 PSP2.1

data according to these levels and then examined the changein individual performance that occurred from level to level.

The analysis of overall defect density revealed a statisti-cally significant difference between PSP levels 0 and 1, as wellas between PSP levels 0 and 2. They found that the medianreduction in total defect density is factor of 1.5 between PSPlevels 0 and 2. On the other hand, they were not able to rejectthe null hypothesis between PSP levels 1 and 2.

Analysis of both defect density in the compile phaseand defect density in the test phase revealed a statisticallysignificant difference in defect density across the three PSPlevels comparisons.They found that the median reduction indefect density for the compile phase is a factor of 3.7, and, forthe test phase, the median reduction is a factor of 2.5, bothbetween PSP levels 0 and 2.

In 2007, Rombach et al. performed a replication andextension of the Hayes study [3]. They analyzed the dataof 3090 engineers who attended the PSP for engineers I/IIoriginal course between 1994 and 2005. The paper does notclarify whether they use all the 3090 engineers’ data for thedefect density analyses or not. We know that authors beganwith that size sample but we do not know if they discardedsome data for quality analysis.

They observed similar and significantly decreasing trendsconcerning defect density for the compile and test phases. Forthose variables, the null hypotheses were rejected for all thePSP level comparisons.

In the case of overall defect density, a general decreasingtrendwas observed in almost all PSP level comparisonswherethe null hypotheses were rejected. It was only between PSPlevels 1 and 2 that they were not able to reject the nullhypothesis as no significant difference was found.

In 2006, Paulk analyzed the data of 1345 engineers whoattended the PSP for engineers I/II original course between1994 and 2001 [4]. He only used defect density in test phaseas the dependent variable to measure software quality.

He analyzed the changes across the four major PSP levels(0, 1, 2, and 3). A decreasing trend could be observed betweenall PSP level comparisons. Null hypotheses were rejected forall cases, using both statistical tests named above. He found adecrease in defect density in testing of more than 75 percentbetween PSP level 0 and 3.

In a later work, Paulk considered a data set of 2435programs developed by engineers who performed the PSP forengineers I/II original course between 1994 and 2001 [5]. Thearticle does not clarify how many engineers were involved inthose developments. The data set is a subset of his previousstudy and in this instance he only considered programswritten in C language. He analyzed the changes across thefour major PSP levels like in his previous study, using thesame two statistical methods. He found a statistically signif-icant decreasing trend between all PSP level comparisons.

He found that quality improved by 79 percent and thatvariability decreased by 81 percent between PSP level 0 and 3.

A retrospective analysis of these related works left somethreats to external validity. One threat is the confoundingof the effect of introducing process phases and techniquesinsertions with the gaining of domain experience as relatedprograms are developed. Therefore, the question is if theimprovements are due to the phases and techniques or due toprogramming repetition during the course (learning effect).

4. Methodology and Study Setup

The main goal of our research is to know if the softwareengineers improve the quality of the developed products dueto the PSP itself or due to programming repetition in thesame application domain. In order to answer this researchquestion, our study is composed of three main steps:

(i) analyzing the data of the PSP for engineers I/II revisedcourse,

(ii) designing, executing, and analyzing the data of acontrolled experiment that allows us to know ifquality improves by the simple fact of programmingrepetition,

(iii) comparing both results in order to know if there aredifferences between them.

In order to analyze the PSP for engineers I/II revisedcourse, the methodology consists of analyzing the availabledata of students that performed the PSP complete courseand looking for changes in the individual performance. Themethod and the hypotheses are presented in detail in thefollowing paragraphs. The results, analysis, and discussion ofthe PSP regular course are presented in Section 5. Regardingthe experiment that allows us to look at the effects of pro-gramming repetition, the methodology and experimentaldesign are presented in Section 6.1, while the experimentresults, analysis, and discussion are presented in Section 6.2.

We use threemeasures to evaluate the quality of the prod-ucts: defect density in compile, defect density in unit test, andtotal defect density. As we show in the related works section,these are generally used for software quality assessment. Thedefect density is measured as the number of defects per everythousand lines of code (KLOC). The consequence of highdefect density in software engineering is typically seen in theform of defect-fixing or rework effort incurred in projects,which results in poor quality products [2].

Because the PSP was developed to improve individualperformance through the gradual introduction of practices,we decided to follow a similar approach to analyze the regularPSP course, examining the change in individual performanceas these practices are introduced by the different PSP process


levels. We group the available data of the program assign-ments according to the major PSP level that was followed onthat assignment. That is, we generate three groups: one withthe programs developed using PSP0 and PSP0.1, another onewith PSP1 and PSP1.1, and the last one with PSP2 and PSP2.1.Table 2 shows the program assignment numbers for eachgroup according to the PSP for engineers I/II revised course.

Our PSP course analysis raises the null hypotheses andtheir respective alternative hypotheses for each of the threementioned quality metrics. The hypotheses aim at knowingif, when comparing a PSP level to another one appliedpreviously, the software engineer improves the quality of thedeveloped products. So, we compare groups by pairs to findif the changes in each dependent variable are statisticallysignificant:

H0 def ut: median (defect density in UT 𝑖) = median(defect density in UT 𝑗),H1 def ut: median (defect density in UT 𝑖) <>median(defect density in UT 𝑗),

where 𝑖, 𝑗 are the numbers of the major PSP levels (0, 1, or 2)and 𝑖 < 𝑗.

The same type of null and alternative hypotheses is raisedfor the other two dependent variables.

In our experiment, the subjects perform the same eightexercises from the regular course without applying the PSPtechniques. In order to get comparable results, to analyzethe data of the experiment, we decide to group the datausing the program assignment grouping done in the PSPI/II revised course, regardless of the PSP level. That is, wegenerate three groups: one with programs 1 and 2, anotherone with assignments 3 and 4, and the last one with exercises5 to 8. Given these groups, for the experiment we raise thesame hypotheses stated above for the three product qualitymeasures under study.

In both cases, in the PSP regular course and in theexperiment, we have a context of repeated measures samples.So, the first statistical analysis method we considered tocompare defect density was the parametric ANOVA forrepeatedmeasures test, which allows investigating changes inmean scores over time points. Repeated measures ANOVAcarries a set of assumptions such as multivariate normality,homogeneity of covariance matrices, and independence. Theresults of the Shapiro-Wilk normality test for our experimentsample indicate that it is not normally distributed. So, it doesnot fulfill the normality assumption for ANOVA. The samehappened when testing the normality of the PSP course data.Based on the normality test results and the small size samples,we discarded ANOVA for repeated measures as the analysismethod.Then we analyzed the data of the PSP course and thedata of the experiment using the nonparametric Wilcoxonsigned-rank test [10]. This test is used to compare two setsof scores that come from the same subjects and it does notrequire any assumptions about the shape of the distribution.The Wilcoxon signed-rank test uses the median in the nulland in the alternative hypotheses instead of the mean.

After analyzing and discussing each study separately,the comparison between the course and the experiment

Table 2: Program assignments grouped by PSP major level.

Group Program assignmentsPSP0 1 and 2PSP1 3 and 4PSP2 5, 6, 7, and 8

is performed. This comparison is presented in Section 7,using a descriptive method and includes a complete resultsdiscussion.

5. PSP I/II Revised Course QualityResults and Discussion

We used data from the eight-program course version, PSPfor engineers I/II revised, taught between June 2006 and June2010. These courses were taught by the Software EngineeringInstitute (SEI) at Carnegie Mellon University or by SEI part-ners, including a number of different instructors in multiplecountries.

We made several cuts and ran data cleaning algorithmsto include only the subjects who had completed all program-ming exercises, in order to clean and remove errors andquestionable data. We determined other cuts on the data set,by performing an analysis and assessment of the data qualitybased on the data quality theory [11]. After this cleaningprocess, the data set was composed of 40 engineers.

Table 3 presents median, interquartile range, andmean ofthe three variables under study for the threemajor PSP levels.

Tables 4, 5, and 6 present the results of applying theWilcoxon test to each pair of PSP levels for the hypothesis ofdefect density in compile (DDComp), defect density in unittest (DDUT), and total defect density (TDD), respectively.The tables present the comparison between pairs of PSP lev-els. Each cell contains the (2-tailed) 𝑃 value of the Wilcoxontest and the effect size of the changes when applicable. Thecells in ∗ or ∗ ∗ ∗ indicate that the null hypothesis hasbeen rejected (𝑃 < 0.05). The ∗ ones also indicate thatthere has been an improvement as the subjects advance inthe PSP levels; the ∗ ∗ ∗ ones indicate deterioration. The ∗∗cells indicate that it has not been possible to reject the nullhypothesis.The effect sizes of the changes are presented basedon Cohen’s 𝑑measure.

The analysis of defects per KLOC in compile phasereveals a statistically significant improvement in defect den-sity between all the PSP level comparisons (𝑃 < 0.05), all withmedium and large effect sizes following Cohen classification[12].

The analysis of defects perKLOC in unit test phase revealsa statistically significant improvement in defect densitybetween all the PSP level comparisons (𝑃 < 0.05). The effectsize of the improvement is large between PSP0 and PSP2.

The analysis of overall defect density reveals that thedifferences between PSP levels 0 and 1 and between PSP levels0 and 2 are statistically significant (𝑃 < 0.05) but that thedifference between PSP levels 1 and 2 is not. The effect size


Table 3: Descriptive statistics for the three variables under study.

PSP0 PSP1 PSP2Defect density in compile

(number of defects found in compile/KLOC)Median 26.67 10.03 0.00IQR 23.37 17.90 4.76Mean 29.67 14.63 3.50

Defect density in unit testing(number of defects found in UT/KLOC)

Median 18.76 9.93 5.41IQR 16.29 15.22 14.03Mean 20.60 12.94 8.04

Total defect density per KLOC(number of defects found/KLOC)

Median 48.32 24.73 31.77IQR 28.67 27.69 37.42Mean 58.05 29.90 35.64

Table 4: DDComp analysis.

Level PSP1 PSP2PSP0 𝑃 = 0.000, 𝑑 = 0.7

∗

𝑃 = 0.000, 𝑑 = 1.4∗

PSP1 𝑃 = 0.000, 𝑑 = 1.0∗

Table 5: DDUT analysis.


∗

𝑃 = 0.000, 𝑑 = 1.0∗

PSP1 𝑃 = 0.021, 𝑑 = 0.4∗

Table 6: TDD analysis.


∗

𝑃 = 0.000, 𝑑 = 0.7∗

PSP1 𝑃 = 0.072∗∗

of the improvement is large between PSP0 and PSP1 andmedium between PSP0 and PSP2.

All these observations support the PSP course benefitsregarding product quality. It is important to note that, becauseof the followed approach, the improvements that wereobserved represent real change in individual performance,not a change in the average performance of the group. Theseanalyses throw very similar software quality results to thoseobtained by Hayes, Rombach, and Paulk.

However, as it was stated earlier that, in this kind of study,we cannot affirm that improvements are achieved exclusivelyby the PSP in itself. There is a possibility that the improve-ments are achieved due to the programming learning effectproduced by the programming repetition during the course.

6. The PSP0.1 Experiment

We designed and performed a controlled experiment at theUniversidad de la Republica. In this experiment, the subjects

performed the exercises from the PSP for engineers I/IIrevised course without applying the PSP techniques.

6.1. Design, Experimental Material, and Subjects. The experi-mental material is made up of the process scripts of PSP0 andPSP0.1, the requirements of the programs 1 to 8 used in thePSP course, and the tool for data collection. All this materialis the same as the one that is used in the PSP for engineers I/IIrevised courses for the PSP0 and PSP0.1 levels. The tool fordata collection is the one distributed by the SEI (PSP supporttool developed in Microsoft Access).

The design of this experiment is a repeated measuresdesign. Students develop 8 software programs following anestablished process. The 8 programs are the same for all thesubjects and are developed in the same order. The studentsuse the PSP0 for the first program and the PSP0.1 for theremaining seven programs. These two levels of the PSP onlyaim at collecting data of the process (time, defects, etc.) butthey do not introduce the practices of the PSP (reviews,design, PROBE, etc.). This design of the experiment makesit possible to know if the students improve the performancedue to programming repetition.

For our experiment we decided to use 8 program assign-ments because we consider that 10 assignments (as it hasthe first PSP course version) could be too much for astudent that is always applying the same baseline process,and as a consequence students could lose the interest orthe motivation at the last part of the experiment. We alsopreferred to use this instead of 7 assignments (as it has thelast PSP course version) because it better fits the context of auniversity subject of one semester. Furthermore, 8 programsare enough to analyze whether there are improvements dueto programming repetition.

Regarding the environment, the students must performthe assignments individually in their houses, but they are per-manentlymonitored by a tutor. It is a controlled environmentbecause there is a constant feedback with the tutor, unlimitedemail exchanges to evacuate doubts (sometimes also phonecalls or meetings), corrections, and redelivery requests whennecessary at the end of each assignment. Students are not ina time-limited classroom, so in this way the time records andthe amount of defects found are not biased by the availabletime of class. That is, the student performs the assignmentat their own peace and records their real time and defectswithout being pressured by the clock or by other studentswhofinish earlier. That allows us to have reliable measures.

A total of 22 subjects performed the PSP0.1 experiment:12 during the first execution that was performed in 2012and the other 10 during the second execution in 2013. Theseexecutions were conducted in the same way and only thesubjects changed.

The subjects are software engineering undergraduatestudents of the Universidad de la Republica of Uruguay; allof them advanced students since they are in their fourthor fifth year. They have completed the course ProgrammingWorkshop in which they learn Java language and theyhave at least completed three more programming coursesand a course on object oriented languages. We consider


therefore that the group that participates in the experimentis homogeneous due to their similar advance in their career.

Some qualitative analyses have shown that undergraduatestudents in PSP courses were concerned more with pro-gramming than with software process issues and that theywere not ready to appreciate the benefits of addressing thoseissues [13–15]. In these works, the students were applyingPSP in introductory programming courses during their firstyear of university studies. Our experiment is different fromthat in several aspects. We only use a small part of thePSP framework which does not require new knowledge bystudents. They must apply their own process just addingdiscipline aspects to record their data, and they do not have tolearn complex techniques applied in PSP such as the PROBEmethod for time and size estimation, performing neitherreviews nor detailed design techniques. Another differentaspect is that our students already know how to program andthey are advanced students who had previously completedseveral programming courses. Our students are focused onmaking the best software development they can and focusedon recording the data correctly. During the execution of ourexperiment we did not identify similar problems to thosedescribed by these authors.

The students participate in the experiment in order toobtain credits for their career and that is their motivation. Itis mandatory for them to attend the theory classes (lectures)where the software development process used (PSP0 andPSP0.1) is presented. It is also mandatory for them to followthe scripts provided and to collect the data using the toolfor that purpose. The students do not know they are takingpart in an experiment and they think they are taking a coursewith an important component of laboratory practices. Theydo know, however, that the data they collect will be used inresearch work and they indeed give their written consent forsuch purpose.

Finally, participation in the course by the students isvoluntary. This course is not mandatory for their SoftwareEngineering Degree; therefore enrolling in it is optional.

6.2. Results and Discussion. In this section, we present thequality analysis results of the 22 subjects that performed theexperiment, which will allow us to know if quality improvesby the simple fact of programming repetition.

For this analysis the labels “progs1-2,” “progs3-4,” and“progs5–8” represent the program assignment grouping doneas in the PSP I/II revised course. That is, progs1-2 is thegrouped data of program assignments 1 and 2, progs3-4 is thegrouped data of program assignments 3 and 4, and progs5–8is the grouped data of program assignments 5 to 8.

Table 7 presents median, interquartile range, andmean ofthe three variables under study for the three groups.

Tables 8, 9, and 10 present the result of applying theWilcoxon test to each pair of groups and the effect size for thethree variables under study. The colors are used in the sameway as in Table 4.

The analyses of defects per KLOC in compile phaseshow a statistically significant improvement when compar-ing the first two programs with the following programs

Table 7: Descriptive statistics for the three variables under study.

Progs1-2 Progs3-4 Progs5–8Defect density in compile

(number of defects found in Compile/KLOC)Median 56.58 39.72 37.83IQR 75.48 45.32 39.15Mean 78.21 59.08 44.12

Defect density in unit testing(number of defects found in UT/KLOC)

Median 39.10 16.45 19.82IQR 17.58 20.01 18.30Mean 54.57 20.13 27.38

Total defect density per KLOC(number of defects found/KLOC)

Median 114.40 64.33 66.50IQR 82.13 51.87 48.87Mean 134.91 79.96 72.07

Table 8: DDComp analysis.

Group Progs3-4 Progs5–8Progs1-2 𝑃 = 0.04, 𝑑 = 0.3

∗

𝑃 = 0.001, 𝑑 = 0.6∗

Progs3-4 𝑃 = 0.296∗∗

Table 9: DDUT analysis.


∗

𝑃 = 0.001, 𝑑 = 0.7∗

Progs3-4 𝑃 = 0.012, 𝑑 = 0.4∗∗∗

Table 10: TDD analysis.


∗

𝑃 = 0.000, 𝑑 = 0.7∗

Progs3-4 𝑃 = 0.961∗∗

(i.e. progs1-2 versus progs2-3 and progs5–8). However, thereis no improvement after the first programs as it was not pos-sible to reject the null hypothesis when comparing progs3-4and progs5–8.

This means that programming repetition and data col-lection of time and defects following the process PSP0.1reduce the defects found in compile phase with statisticalsignificance. We understand that the most plausible reasonfor the improvement in the first programs is that the subjectsrecord their own injected defect types found in the programs.By recording each injected defect and data related to them,the engineer becomes aware of themost common defects andreduces the injection or easily finds them before compilation.Nevertheless, further experiments are needed to confirm ifthis is the reason for the improvement.

Following Cohen classification, the effect size of theimprovement is small between progs1-2 and progs3-4 andmedium between progs1-2 and progs5–8.


The analyses of defects per KLOC in unit test phase reveala statistically significant improvement between progs1-2 andprogs3-4 and between progs1-2 and progs5–8.This could alsobe due to the defect recording of every injected defect.

Between progs3-4 and progs5–8 there is a significantdifference but it refers to a deterioration, which means thatprograms 5–8 show a higher number of detected defects inunit test than programs 3-4. This shows that programmingrepetition (using these programs) does not result in a con-tinuous improvement of defect density in unit testing. Onepossible explanation for this behavior is that defect types thatare not recognizable by the compiler (for example, function,data, or checking defects) are not as easy to learn to avoidtheir injection as the defect types that are detectable at thecompile phase (for example, syntax defects). This combinedwith the increasing complexity of the program assignmentstasks could be the reason for the deterioration.There could bemany other possible reasons for this deterioration, but furtherstudies are necessary.

According to Cohen classification, the effect sizes of theimprovements are large, and the effect size of the deteriora-tion is medium.

The analyses of total defects per KLOC show the samebehavior as the analyses of DDComp. A statistically signif-icant improvement was found when comparing the progs1-2 and progs2-3 and progs5–8, and it was not possible toreject the null hypothesis when comparing progs3-4 againstprogs5–8. This shows that the overall defect density of thedeveloped programs is reduced with statistical significanceby the programming repetition and data collection of timeand defects following the PSP0.1 process. The effect size ofthe improvement is large between progs1-2 and progs3-4 andmedium between progs1-2 and progs5–8.

All these observations reveal that there is an improvementregarding product quality with the use of PSP0.1, but thisimprovement is not continuous as it is in the PSP for engi-neers I/II revised course.

7. Comparative Discussion

In this section we compare the regular PSP course with ourexperiment. We want to know whether the quality improve-ment is because of the PSP practices or because of othercharacteristics.

Figure 2 summarizes the hypotheses tests results andeffect sizes that were presented earlier in the PSP I/II revisedcourse and the PSP0.1 experiment. Only the comparisonsbetween progs1-2 against progs3-4 and progs3-4 againstprogs5–8 are included in the tables. Remember that, in thecase of the PSP course, those comparisons refer to PSP0against PSP1 and PSP1 against PSP2 levels, respectively.

A bar-whisker chart of defect density per group and a bar-whisker chart of individual improvement are also included inthe figure comparing the PSP course and the experiment foreach analyzed variable.These charts are descriptive and allowus to get a clearer idea of the software quality behavior at theindividual level.

The individual improvement is calculated as the percent-age difference for each subject between two groups (i.e., %improvement = − 100 ∗ (defect density in group Y − defectDensity in group X)/defect density in group X). An improve-ment is represented by a positive value, and deteriorationis represented by a negative one. Samples with zero defectsas the divisor are not considered in the analysis, as divisionby zero is not defined. We use a proportional representationof individual improvement instead of an absolute differencebecause we consider it more appropriate in our context (i.e.,it does not seem appropriate to consider a reduction in defectdensity from 455 to 450 defects per KLOC as equal as areduction from 10 to 5 defects per KLOC.)

The left side charts indicate that, from the beginning, thePSP course is better than PSP0.1 experiment.The PSP coursesare generally performed by professional software engineerswith several years of experience in the industry. We believethat this factor is making the initial number of defects lowerwhen comparing with undergraduate students. This factor isknown to affect productivity performance [16]. This is nota problem for our study because we are interested in thechanges in the individual performance. That is, we analyzeif there are improvements when introducing techniques or ifthere are improvements with programming repetition.

For each analyzed variable, we can see an initial improve-ment (from progs1-2 to progs3-4) in the PSP course as wellas in the PSP0.1 experiment. In the right side charts, thisimprovement effect is clearly visible. In both cases, the reasonfor the improvement could be the defect recording activitydone since PSP0, as in PSP1 only size and time estimationtechniques are introduced and these should not have impacton product quality. For DDComp and TDD the effect size inthe PSP course is greater, while for DDUT the effect size isgreater in the experiment.

There is no statistical evidence of improvement whencomparing progs3-4 to progs5–8 in the PSP0.1 experimentin any of the analyzed variables (even statistical evidence ofdeterioration exists in DDUT). However, when comparingPSP1 to PSP2 in the PSP course we found a statisticalimprovement in DDComp and DDUT with an effect sizeof 1.0 and 0.4, respectively. We can see those improvementsin the PSP course graphically represented in the right sidecharts. Design reviews, code reviews, and detailed designtechniques are introduced in PSP2 using four specific designtemplates and individual tailored checklists. This change inthe process is the most plausible reason for the difference inthe quality improvement between the PSP0.1 experiment andthe PSP course. Therefore, the practices introduced by thePSP (and, so, the PSP itself) lead to such a big improvementin product quality, while programming repetition or defectrecording does not.

From the perspective of the practitioner at least twointeresting conclusions emerge. One is that the use of the PSPsupports the development of quality software. We presentedthe fact that quality improvements are due to the use ofthis process and, also, that by using the PSP2 a low defectdensity in unit testing is reached, even when the engineer isincorporating and learning the process (see Table 5).


Defect density in compile

Course

PSP0.1 expe.

PSP I/II rev.

Defect density in unit testing

Course

PSP0.1 expe.

PSP I/II rev.

Total defect density

Course

PSP0.1 expe.

PSP I/II rev.

Def

ect d

ensit

y in

com

pile

Group Groups

Def

ect d

ensit

y in

uni

t tes

tTo

tal d

efec

t den

sity

150.00

CoursePSP0.1 expe.PSP I/II revised

100.00

50.00

Progs1-2 Progs1-2 versusprogs3-4

Progs3-40.00

050

10080.00

60.00

40.00

20.00

0.00

200.00

250.00

150.00

100.00

50.00

0.00

Progs5–8

GroupProgs1-2 Progs3-4 Progs5–8

Progs3-4 versusprogs5–8

Progs1-2 versusprogs3-4







GroupProgs1-2 Progs3-4 Progs5–8



Groups




Groups




−300

−250

−200

−150

−100

−50

050

100

−300

−250

−200

−150

−100

−50

0255075

100

−100

−25

−50

−75

Impr

ovem

ent D

DU

T (%

)Im

prov

emen

t TD

D (%

)Im

prov

emen

t DD

Com

p (%

)

d = d = d =

d =d =

d =

d =

d = d =

0.3∗

0.9∗

0.5∗

0.4∗

0.4∗∗∗

0.6∗

0.9∗

0.7∗

1.0∗

∗∗—

∗∗—∗∗—

Figure 2: PSP0.1 experiment versus PSP I/II revised course.


The second conclusion from the practitioner’s point ofview is that the detailed design and the individual design andcode reviews are excellent techniques, probably unavoidable,to build quality software. These techniques are the onesadded in the PSP2 and they are the ones responsible forthe quality improvements detected in our work. Nowadays,with the fashion of agility, developers sometimes do not builddetailed designs, let alone reviewing their design or codeusing checklists.

Still, unfortunately, themost common approach to softwaredevelopment today is code-and-fix programing [17], some-times modernized by the automating of the unit tests. It istime to change.

8. Threats to Validity

Due to space restrictions, we only mention the threatsto validity that seem to be most important to us. Theseinclude the following: (a) having a small number of subjectsperforming the experiment, which results in the applicationof nonparametric tests that are less powerful than parametricones; (b) all the subjects of the experiment are students withlittle or no programming experience in the industry; (c) theprogram assignments were developed at home, which is athreat but is reduced because of the tutor’s monitoring duringand at the end of each assignment, as it was explained inSection 6.1; and (d) the design method and the design andcode review are embedded in the PSP process, so the secondpractitioner’s conclusion is dependent on the PSP.

9. Conclusions and Future Work

The presented results contribute to the elimination of animportant threat to the validity of different experimentsperformed with the PSP. This agrees with previous researchworks we performed which indicate that the practices intro-duced by the PSP and not programming repetition areresponsible for the improvement of individual performance[6, 18].

In addition, the comparison between our experiment andthe regular PSP course reveals that continuous and transcen-dent product quality improvements cannot be reached simplyby the programming learning effect. The use of adequatepractices is the cause of the important software qualityimprovements.

As future work we intend to isolate the PSP techniques(detailed design, design and code review) using a newcontrolled experiment that will enable us to study the effect ofeach technique in software quality and the synergy producedbetween them.

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper.

References

[1] W. Humphrey, A Discipline for Software Engineering, Addison-Wesley, 1995.

[2] W. Hayes and J. Over, “The personal software process: anempirical study of the impact of PSP on individual engineers,”Tech. Rep. CMU/SEI-97-TR-001, Software Engineering Insti-tute, Carnegie Mellon University, Pittsburgh, Pa, USA, 1997.

[3] D. Rombach, J. Munch, A. Ocampo, W. S. Humphrey, andD. Burton, “Teaching disciplined software development,” TheJournal of Systems and Software, vol. 81, no. 5, pp. 747–763, 2008.

[4] M. C. Paulk, “Factors affecting personal software quality,”CrossTalk, vol. 19, no. 3, pp. 9–13, 2006.

[5] M. C. Paulk, “The impact of process discipline on personalsoftware quality and productivity,” ASQ Software Quality Pro-fessional, vol. 12, no. 2, pp. 15–19, 2010.

[6] D. Vallespir, F. Grazioli, L. Perez, and S. Moreno, “Demon-strating the impact of the PSP on software quality and effort:eliminating the programming learning,” in TSP Symposium,Software Engineering Institute, Carnegie Mellon University,Dallas, Tex, USA, 2013.

[7] W. Humphrey, PSP: A Self-Improvement Process for SoftwareEngineers, Addison-Wesley, 2005.

[8] “Transition guide for the PSP for engineers course,” InternalDocument, Software Engineering Institute, Carnegie MellonUniversity, Pittsburgh, Pa, USA, 2005.

[9] Transition Guide PSP for Engineers to PSP Fundamentals andPSP Advanced, Internal Document, Software Engineering Insti-tute, Carnegie Mellon University, 2008.

[10] F. Wilcoxon, “Individual comparisons by ranking methods,”Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945.

[11] C. Valverde, F. Grazioli, and D. Vallespir, “Un estudio de lacalidad de los datos recolectados durante el uso del personalsoftware process,” in Proceedings of the 9th Jornadas Iberoamer-icanas de Ingenierıa de Software e Ingenierıa del Conocimiento(JIISIC ’12), pp. 37–44, November 2012.

[12] J. Cohen, Statistical Power Analysis for the Behavioral Sciences,Lawrence EarlbaumAssociates, Hillsdale, NJ, USA, 2nd edition,1988.

[13] P. Runeson, “Experiences from teaching PSP for freshmen,”in Proceedings of the 14th Conference on Software EngineeringEducation and Training, pp. 98–107, February 2001.

[14] S. K. Lisack, “The personal software process in the classroom:student reactions (an experience report),” in Proceedings ofthe 13th Conference on Software Engineering Education andTraining, pp. 169–175, 2000.

[15] R. Grove, “Using the personal software process to motivategood programming practices,” in Proceedings of the 6th AnnualConference on the Teaching of Computing and the 3rd AnnualConference on Integrating Technology into Computer ScienceEducation (ITICSE ’98), pp. 98–101, Dublin, Ireland, September1998.

[16] R. Mushtaq, F. Joao, and N. William, “Factors affecting produc-tivity performance in PSP training,” in Proceedings of the TeamSoftware Process Symposium (TSP ’13), Software EngineeringInstitute, Carnegie Mellon, Pittsburgh, Pa, USA, 2013.

[17] S. McConell and L. L. Tripp, “Software engineering as aprofession,” in Software Engineering Essentials, Volume II: The


Supporting Process, R. H.Thayer andM. Dorfman, Eds., chapter11, pp. 159–164, 2013.

[18] F. Grazioli and W. Nichols, “A cross course analysis of productquality improvement with PSP,” Tech. Rep. CMU/SEI-2012-SR-015: 76–89, Software Engineering Institute, Carnegie MellonUniversity, Pittsburgh, Pa, USA, 2012, TSP Symposium.

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


Documents

Research Article The Impact of the PSP on Software …downloads.hindawi.com/archive/2014/861489.pdfResearch Article The Impact of the PSP on Software Quality: Eliminating the Learning