Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
title: StatisticalSoftwareEngineeringauthor:
publisher: NationalAcademiesPressisbn10|asin: 0309053447printisbn13: 9780309053440ebookisbn13: 9780585002101
language: Englishsubject Softwareengineering--Statisticalmethods.
publicationdate: 1996lcc: QA76.758.N381996ebddc: 005.1
subject: Softwareengineering--Statisticalmethods.
TheNationalResearchCouncilestablishedtheBoardonMathematicalSciencesin1984.TheobjectivesoftheBoardaretomaintainawarenessandactiveconcernforthehealthofthemathematicalsciencesandtoserveasthefocalpointintheNationalResearchCouncilforissuesconnectedwiththemathematicalsciences.TheBoardholdssymposiaandworkshopsandpreparesreportsonemergingissuesandareasofresearchandeducation,conductsstudiesforfederalagencies,andmaintainsliaisonwiththemathematicalsciencescommunities,academia,professionalsocieties,andindustry.
TheBoardgratefullyacknowledgesongoingcoresupportfromtheAirForceOfficeofScientificResearch,ArmyResearchOffice,DepartmentofEnergy,NationalScienceFoundation,NationalSecurityAgency,andOfficeofNavalResearch.
Pagei
StatisticalSoftwareEngineering
PanelonStatisticalMethodsinSoftwareEngineeringCommitteeonAppliedandTheoreticalStatistics
BoardonMathematicalSciencesCommissiononPhysicalSciences,Mathematics,andApplications
NationalResearchCouncil
NationalAcademyPressWashington,D.C.1996
Pageii
NOTICE:TheprojectthatisthesubjectofthisreportwasapprovedbytheGoverningBoardoftheNationalResearchCouncil,whosemembersaredrawnfromthecouncilsoftheNationalAcademyofSciences,theNationalAcademyofEngineering,andtheInstituteofMedicine.
TheNationalAcademyofSciencesisaprivate,nonprofit,self-perpetuatingsocietyofdistinguishedscholarsengagedinscientificandengineeringresearch,dedicatedtothefurtheranceofscienceandtechnologyandtotheiruseforthegeneralwelfare.UpontheauthorityofthechartergrantedtoitbytheCongressin1863,theAcademyhasamandatethatrequiresittoadvisethefederalgovernmentonscientificandtechnicalmatters.Dr.BruceAlbertsispresidentoftheNationalAcademyofSciences.
TheNationalAcademyofEngineeringwasestablishedin1964,underthecharteroftheNationalAcademyofSciences,asaparallelorganizationofoutstandingengineers.Itisautonomousinitsadministrationandintheselectionofitsmembers,sharingwiththeNationalAcademyofSciencestheresponsibilityforadvisingthefederalgovernment.TheNationalAcademyofEngineeringalsosponsorsengineeringprogramsaimedatmeetingnationalneeds,encourageseducationandresearch,andrecognizesthesuperiorachievementofengineers.Dr.HaroldLiebowitzispresidentoftheNationalAcademyofEngineering.
TheInstituteofMedicinewasestablishedin1970bytheNationalAcademyofSciencestosecuretheservicesofeminentmembersofappropriateprofessionsintheexaminationofpolicymatterspertainingtothehealthofthepublic.TheInstituteactsundertheresponsibilitygiventotheNationalAcademyofSciencesbyitscongressionalchartertobeanadvisertothefederalgovernmentand,
uponitsowninitiative,toidentifyissuesofmedicalcare,research,andeducation.Dr.KennethI.ShineispresidentoftheInstituteofMedicine.
TheNationalResearchCouncilwasorganizedbytheNationalAcademyofSciencesin1916toassociatethebroadcommunityofscienceandtechnologywiththeAcademy'spurposesoffurtheringknowledgeandadvisingthefederalgovernment.FunctioninginaccordancewithgeneralpoliciesdeterminedbytheAcademy,theCouncilhasbecometheprincipaloperatingagencyofboththeNationalAcademyofSciencesandtheNationalAcademyofEngineeringinprovidingservicestothegovernment,thepublic,andthescientificandengineeringcommunities.TheCouncilisadministeredjointlybybothAcademiesandtheInstituteofMedicine.Dr.BruceAlbertsandDr.HaroldLiebowitzarechairmanandvice-chairman,respectively,oftheNationalResearchCouncil.
ThisprojectwassupportedbytheAdvancedResearchProjectsAgency,ArmyResearchOffice,NationalScienceFoundation,andDepartmentoftheNavy'sOfficeoftheChiefofNavalResearch.Anyopinions,findings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthorsanddonotnecessarilyreflecttheviewsofthesponsors.Furthermore,thecontentofthereportdoesnotnecessarilyreflectthepositionorthepolicyoftheU.S.government,andnoofficialendorsementshouldbeinferred.
Copyright1996bytheNationalAcademyofSciences.Allrightsreserved.
LibraryofCongressCatalogCardNumber95-71101InternationalStandardBookNumber0-309-05344-7
Additionalcopiesofthisreportareavailablefrom:NationalAcademyPress,Box2852101ConstitutionAvenue,N.W.
Washington,D.C.20055800-624-6242202-334-3313(intheWashingtonmetropolitanarea)B-676
PrintedintheUnitedStatesofAmerica
Pageiii
PANELONSTATISTICALMETHODSINSOFTWAREENGINEERING
DARYLPREGIBON,AT&TBellLaboratories,Chair
HERMANCHERNOFF,HarvardUniversity
BILLCURTIS,CarnegieMellonUniversity
SIDDHARTHAR.DALAL,Bellcore
GLORIAJ.DAVIS,NASA-AmesResearchCenter
RICHARDA.DEMILLO,Bellcore
STEPHENG.EICK,AT&TBellLaboratories
BEVLITTLEWOOD,CityUniversity,London,England
CHITOORV.RAMAMOORTHY,UniversityofCalifornia,Berkeley
Staff
JOHNR.TUCKER,Director
Pageiv
COMMITTEEONAPPLIEDANDTHEORETICALSTATISTICS
JONR.KETTENRING,Bellcore,Chair
RICHARDA.BERK,UniversityofCalifornia,LosAngeles
LAWRENCED.BROWN,UniversityofPennsylvania
NICHOLASP.JEWELL,UniversityofCalifornia,Berkeley
JAMESD.KUELBS,UniversityofWisconsin
JOHNLEHOCZKY,CarnegieMellonUniversity
DARYLPREGIBON,AT&TBellLaboratories
FRITZSCHEUREN,GeorgeWashingtonUniversity
J.LAURIESNELL,DartmouthCollege
ELIZABETHTHOMPSON,UniversityofWashington
Staff
JACKALEXANDER,ProgramOfficer
Pagev
BOARDONMATHEMATICALSCIENCES
AVNERFRIEDMAN,UniversityofMinnesota,Chair
LOUISAUSLANDER,CityUniversityofNewYork
HYMANBASS,ColumbiaUniversity
MARYELLENBOCK,PurdueUniversity
PETERE.CASTRO,EastmanKodakCompany
FANR.K.CHUNG,UniversityofPennsylvania
R.DUNCANLUCE,UniversityofCalifornia,Irvine
SUSANMONTGOMERY,UniversityofSouthernCalifornia
GEORGENEMHAUSER,GeorgiaInstituteofTechnology
ANILNERODE,CornellUniversity
IMGRAMOLKIN,StanfordUniversity
RONALDF.PEIERLS,BrookhavenNationalLaboratory
DONALDST.P.RICHARDS,UniversityofVirginia
MARYF.WHEELER,RiceUniversity
WILLIAMP.ZIEMER,IndianaUniversity
ExOfficioMember
JONR.KETTENRING,BellcoreChair,CommitteeonAppliedandTheoreticalStatistics
Staff
JOHNR.TUCKER,Director
JACKALEXANDER,ProgramOfficer
RUTHE.O'BRIEN,StaffAssociate
BARBARAW.WRIGHT,AdministrativeAssistant
Pagevi
COMMISSIONONPHYSICALSCIENCES,MATHEMATICS,ANDAPPLICATIONS
ROBERTJ.HERMANN,UnitedTechnologiesCorporation,Chair
STEPHENL.ADLER,InstituteforAdvancedStudy
PETERM.BANKS,EnvironmentalResearchInstituteofMichigan
SYLVIAT.CEYER,MassachusettsInstituteofTechnology
L.LOUISHEGEDUS,W.R.GraceandCompany
JOHNE.HOPCROFT,CornellUniversity
RHONDAJ.HUGHES,BrynMawrCollege
SHIRLEYA.JACKSON,U.S.NuclearRegulatoryCommission
KENNETHI.KELLERMANN,NationalRadioAstronomyObservatory
KENKENNEDY,RiceUniversity
THOMASA.PRINCE,CaliforniaInstituteofTechnology
JEROMESACKS,NationalInstituteofStatisticalSciences
L.E.SCRIVEN,UniversityofMinnesota
LEONT.SILVER,CaliforniaInstituteofTechnology
CHARLESP.SLICHTER,UniversityofIllinoisatUrbana-Champaign
ALVINW.TRIVELPIECE,OakRidgeNationalLaboratory
SHMUELWINOGRAD,IBMT.J.WatsonResearchCenter
CHARLESA.ZRAKET,MitreCorporation(retired)
NORMANMETZGER,ExecutiveDirector
Pagevii
PrefaceThedevelopmentandtheproductionofhigh-quality,reliable,complexcomputersoftwarehavebecomecriticalissuesintheenormousworldwidecomputertechnologymarket.Thecapabilitytoefficientlyengineercomputersoftwaredevelopmentandproductionprocessesiscentraltothefutureeconomicstrength,competitiveness,andnationalsecurityoftheUnitedStates.However,problemsrelatedtosoftwarequality,reliability,andsafetypersist,aprominentexamplebeingthefailureonseveraloccasionsofmajorlocalandnationaltelecommunicationsnetworks.Itisnowacknowledgedthatthecostsofproducingandmaintainingsoftwaregreatlyexceedthecostsofdeveloping,producing,andmaintaininghardware.Thusthedevelopmentandapplicationofcost-savingtools,alongwithtechniquesforensuringqualityandreliabilityinsoftwareengineering,areprimarygoalsintoday'ssoftwareindustry.Theenormityofthissoftwareproductionandmaintenanceactivityissuchthatanytoolscontributingtoseriouscostsavingswillyieldatremendouspayoffinabsoluteterms.
AtameetingoftheCommitteeonAppliedandTheoreticalStatistics(CATS)oftheNationalResearchCouncil(NRC),participantsidentifiedsoftwareengineeringasanareapresentingnumerousopportunitiesforfruitfulcontributionsfromstatisticsandofferingexcellentpotentialforbeneficialinteractionsbetweenstatisticiansandsoftwareengineersthatmightpromoteimprovedsoftwareengineeringpracticeandcostsavings.Todelineatetheseopportunitiesandfocusattentiononcontextspromisingusefulinteractions,CATSconvenedastudypaneltogatherinformationandproduceareportthatwould(1)exhibitimprovedmethodsforassessingsoftwareproductivity,quality,
reliability,associatedrisk,andsafetyandformanagingsoftwaredevelopmentprocesses,(2)outlineaprogramofresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineeringwiththeaimofmotivatingandattractingnewresearchersfromthemathematicalsciences,statistics,andsoftwareengineeringfieldstotackletheseimportantandpressingproblemareas,and(3)emphasizetherelevanceofusingrigorousstatisticalandprobabilistictechniquesinsoftwareengineeringcontextsandsuggestopportunitiesforfurtherresearchinthisdirection.
Tohelpidentifyimportantissuesandobtainabroadrangeofperspectivesonthem,thepanelorganizedaninformation-gatheringforumonOctober11-12,1993,atwhich12invitedspeakersaddressedhowstatisticalmethodsimpingeonthesoftwaredevelopmentprocess,softwaremetrics,softwaredependabilityandtesting,andsoftwarevisualization.Theforumalsoincludedconsiderationofnonstandardmethodsandselectcasestudies(seetheforumprogramintheappendix).Thepanelhopesthatitsreport,whichisbasedonthepanel'sexpertiseaswellasinformationpresentedattheforum,willcontributetopositiveadvancesinsoftwareengineeringand,asasubsidiarybenefit,beastimulusforothercloselyrelateddisciplines,e.g.,appliedmathematics,operationsresearch,computerscience,andsystemsandindustrialengineering.Thepanelis,infact,veryenthusiasticabouttheopportunitiesfacingthestatisticalcommunityandhopestoconveythisenthusiasminthisreport.
Thepanelgratefullyacknowledgestheassistanceandinformationprovidedbyanumberofindividuals,includingthe12forumspeakersT.W.Keller,D.Card,V.R.Basili,J.C.Munson,J.C.Knight,R.Lipton,T.Yamaura,S.Zweben,M.S.Phadke,E.E.Sumner,Jr.,W.Hill,andJ.Staskofouranonymousreviewers,theNRCstaffoftheBoardonMathematicalScienceswhosupportedthevariousfacetsofthisproject,andSusanMauriziforherworkineditingthemanuscript.
Pageix
ContentsEXECUTIVESUMMARY 1
1INTRODUCTION 5
2CASESTUDY:NASASPACESHUTTLEFLIGHTCONTROLSOFTWARE
9
OverviewofRequirements 9
TheOperationalLifeCycle 10
AStatisticalApproachtoManagingtheSoftwareProductionProcess
10
FaultDetection 11
SafetyCertification 12
3ASOFTWAREPRODUCTIONMODEL 13
ProblemFormulationandSpecificationofRequirements 14
Design 14
Implementation 16
Testing 18
4CRITIQUEOFSOMECURRENTAPPLICATIONSOFSTATISTICSINSOFTWAREENGINEERING
27
CostEstimation 27
StatisticalInadequaciesinEstimating 29
ProcessVolatility 30
MaturityandDataGranularity 30
ReliabilityofModelInputs 31
ManagingtoEstimates 32
AssessmentandReliability 32
ReliabilityGrowthModeling 32
InfluenceoftheDevelopmentProcessonSoftwareDependability
36
InfluenceoftheOperationalEnvironmentonSoftwareDependability
37
Safety-CriticalSoftwareandtheProblemofAssuringUltrahighDependability
38
DesignDiversity,FaultTolerance,andGeneralIssuesofDependence
38
JudgmentandDecision-makingFramework 39
StructuralModelingIssues 40
Experimentation,DataCollection,andGeneralStatisticalTechniques
40
SoftwareMeasurementandMetrics 41
5STATISTICALCHALLENGES 43
SoftwareEngineeringExperimentalIssues 43
CombiningInformation 46
VisualizationinSoftwareEngineering 48
Pagex
ConfigurationManagementData 49
FunctionCallGraphs 50
TestCodeCoverage 50
CodeMetrics 50
ChallengesforVisualization 52
OpportunitiesforVisualization 52
OrthogonalDefectClassification 59
6 SUMMARYANDCONCLUSIONS 61
InstitutionalModelforResearch 62
ModelforDataCollectionandAnalysis 62
IssuesinEducation 64
REFERENCES 67
APPENDIX:FORUMPROGRAM 72
Page1
ExecutiveSummarySoftware,acriticalcoreindustrythatisessentialtoU.S.interestsinscience,technology,anddefense,isubiquitousintoday'ssociety.Softwarecoexistswithhardwareinourtransportation,communication,financial,andmedicalsystems.Asthesesystemsgrowinsizeandcomplexityandourdependenceonthemincreases,theneedtoensuresoftwarereliabilityandsafety,faulttolerance,anddependabilitybecomesparamount.Buildingsoftwareisnowviewedasanengineeringdiscipline,softwareengineering,whichaimstodevelopmethodologiesandprocedurestocontrolthewholesoftwaredevelopmentprocess.Besidestheissueofcontrollingandimprovingsoftwarequality,theissueofimprovingtheproductivityofthesoftwaredevelopmentprocessisalsobecomingimportantfromtheindustrialperspective.
PURPOSEANDSCOPEOFTHISSTUDY
Althoughstatisticalmethodshavealonghistoryofcontributingtoimprovedpracticesinmanufacturingandintraditionalareasofscience,technology,andmedicine,theyhaveuptonowhadlittleimpactonsoftwaredevelopmentprocesses.Thisreportattemptstobridgetheislandsofknowledgeandexperiencebetweenstatisticsandsoftwareengineeringbyenunciatinganewinterdisciplinaryfield:statisticalsoftwareengineering.Itishopedthatthereportwillhelpseedthefieldofstatisticalsoftwareengineeringbyindicatingopportunitiesforstatisticalthinkingtocontributetoincreasedunderstandingofsoftwareandsoftwareproduction,andtherebyenhancethequalityandproductivityofboth.
Thisreportistheresultofastudybyapanelconvenedbythe
CommitteeonAppliedandTheoreticalStatistics(CATS),astandingcommitteeoftheBoardonMathematicalSciencesoftheNationalResearchCouncil,toidentifychallengesandopportunitiesinthedevelopmentandimplementationofsoftwareinvolvingsignificantstatisticalcontent.Inadditiontopointingouttherelevanceofrigorousstatisticalandprobabilistictechniquestopressingsoftwareengineeringconcerns,thepaneloutlinesopportunitiesforfurtherresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineering.Theaimistomotivatenewresearchersfromstatisticsandthemathematicalsciencestotackleproblemswithrelevanceforsoftwaredevelopment,aswellastosuggestastatisticalapproachtosoftwareengineeringconcernsthatthepanelhopessoftwareengineerswillfindrefreshingandstimulating.Thisreportalsotouchesonimportantissuesintrainingandeducationforsoftwareengineersinthestatisticalsciencesandforstatisticianswithaninterestinsoftwareengineering.
Centraltothisreport'stheme,andessentialtostatisticalsoftwareengineering,istheroleofdata:whereverdataareusedorcanbegeneratedinthesoftwarelifecycle,statisticalmethodscanbebroughttobearfordescription,estimation,andprediction.Nevertheless,themajorobstacletoapplyingstatisticalmethodstosoftwareengineeringisthelackofconsistent,high-qualitydataintheresource-allocation,design,review,implementation,andteststagesofsoftwaredevelopment.Statisticiansinterestedinconductingresearchinsoftwareengineering
Page2
mustplayaleadershiproleinjustifyingthatresourcesareneededtoacquireandmaintainhigh-qualityandrelevantdata.
Thepanelconjecturesthattheuseofadequatemetricsanddataofgoodqualityistheprimarydifferentiatorbetweensuccessful,productivesoftwaredevelopmentorganizationsandthosethatarestruggling.Althoughthesinglelargestareaofoverlapbetweenstatisticsandsoftwareengineeringcurrentlyconcernssoftwaredevelopmentandproduction,itisthepanel'sviewthatthelargestcontributionsofstatisticstosoftwareengineeringwillbethoseaffectingthequalityandproductivityoffront-endprocesses,thatis,processesthatprecedecodegeneration.Oneofthebiggestimpactsthatthestatisticalcommunitycanmakeinsoftwareengineeringistocombineinformationacrosssoftwareengineeringprojectsasameansofevaluatingeffectsoftechnology,language,organization,andprocess.
CONTENTSOFTHISREPORT
Followinganintroductoryopeningchapterintendedtofamiliarizereaderswithbasicstatisticalsoftwareengineeringconceptsandconcerns,acasestudyoftheNationalAeronauticsandSpaceAdministration(NASA)spaceshuttleflightcontrolsoftwareispresentedinChapter2toillustratesomeofthestatisticalissuesinsoftwareengineering.Chapter3describesawell-knowngeneralsoftwareproductionmodelandassociatedstatisticalissuesandapproaches.AcritiqueofsomecurrentapplicationsofstatisticsandsoftwareengineeringispresentedinChapter4.Chapter5discussesanumberofstatisticalchallengesarisinginsoftwareengineering,andthepanel'sclosingsummaryandconclusionsappearinChapter6.
STATISTICALCHALLENGES
Incomparisonwithotherengineeringdisciplines,softwareengineeringisstillinthedefinitionstage.Characteristicsofestablisheddisciplinesincludehavingdefined,tested,crediblemethodologiesforpractice,assessment,andpredictability.Softwareengineeringcombinesapplicationdomainknowledge,computerscience,statistics,behavioralscience,andhumanfactorsissues.Statisticalchallengesinsoftwareengineeringdiscussedinthisreportincludethefollowing:
Generalizingparticularstatisticalsoftwareengineeringexperimentalresultstoothersettingsandprojects,
Scalingupresultsobtainedinacademicstudiestoindustrialsettings,
Combininginformationacrosssoftwareengineeringprojectsandstudies,
Adoptingexploratorydataanalysisandvisualizationtechniques,
Educatingthesoftwareengineeringcommunityregardingstatisticalapproachesanddataissues,
Developingmethodsofanalysistocopewithqualitativevariables,
Page3
Providingmodelswiththeappropriateerrordistributionsforsoftwareengineeringapplications,and
Enhancingacceleratedlifetesting.
SUMMARYANDCONCLUSIONS
Inthe1990s,complexhardware-basedfunctionalityisbeingreplacedbymoreflexible,software-basedfunctionality,andmassivesoftwaresystemscontainingmillionsoflinesofcodearebeingcreatedbymanyprogrammerswithdifferentbackgrounds,training,andskills.Thechallengeistobuildhuge,high-qualitysystemsinacost-effectivemanner.Thepanelexpectsthischallengetopreoccupythefieldofsoftwareengineeringfortherestofthedecade.Anysetofmethodologiesthatcanhelpinthistaskwillbeinvaluable.Moreimportantly,theuseofsuchmethodologieswilllikelydeterminethecompetitivepositionsoforganizationsandnationsinvolvedinsoftwareproduction.Whatisneededisadetailedunderstandingbystatisticiansofthesoftwareengineeringprocess,aswellasanappreciationbysoftwareengineersofwhatstatisticianscanandcannotdo.
Catalystsessentialforthisproductiveinteractionbetweenstatisticiansandsoftwareengineers,andsomeoftheinterdisciplinaryresearchopportunitiesforsoftwareengineersandstatisticians,includethefollowing:
Amodelforstatisticalresearchinsoftwareengineeringthatiscollaborativeinnature.Theidealcollaborationpartnersstatisticians,softwareengineers,andarealsoftwareprocessorproduct.Barrierstoacademicrewardandrecognitionbarriers,aswellasobstaclestothefundingofcross-disciplinaryresearch,canbeexpectedtodecreaseovertime;intheinterim,industrycanplayaleadershiprolein
nurturingcollaborationsbetweensoftwareengineersandstatisticiansandcanreduceitsownsetofbarriers(forinstance,thoserelatedtoproprietaryandintellectualpropertyinterests).
Amodelfordatacollectionandanalysisthatensurestheavailabilityofhigh-qualitydataforstatisticalapproachestoissuesinsoftwareengineering.Carefulattentiontodataissuesrangingfromdefinitionofmetricstofeed-back/-forwardloops,includingexploratorydataanalysis,statisticalmodeling,defectanalysis,andsoon,isessentialifstatisticalmethodsaretohaveanyappreciableimpactonagivensoftwareprojectunderstudy.Forthisreasonitiscrucialthatthesoftwareindustrytakealeadpositioninresearchonstatisticalsoftwareengineering.
Attentiontorelevantissuesineducation.Enormousopportunitiesandmanypotentialbenefitsarepossibleifthesoftwareengineeringcommunitylearnsaboutrelevantstatisticalmethodsandifstatisticianscontributetoandcooperateintheeducationoffuturesoftwareengineers.Somerelevantareasinclude:
Page4
Designedexperiments.Softwareengineeringisinherentlyexperimental,yetrelativelyfewdesignedexperimentshavebeenconducted.Softwareengineeringeducationprogramsmuststressthedesirability,wherefeasible,ofvalidatingnewtechniquesusingstatisticallyvaliddesignedexperiments.
Exploratorydataanalysis.Exploratorydataanalysismethodsareessentially''modelfree,"wherebytheinvestigatorhopestobesurprisedbyunexpectedbehaviorratherthanhavingthinkingconstrainedtowhatisexpected.
Modeling.Recentadvancesinthestatisticalcommunityinthepastdecadehaveeffectivelyrelaxedthelinearityassumptionsofnearlyallclassicaltechniques.Thereshouldbeanemphasisoneducationalinformationexchangeleadingtomoreandwideruseoftheserecentlydevelopedtechniques.
Riskanalysis.Aparadigmformanagingriskforthespaceshuttleprogram,discussedinChapter2ofthisreport,andthecorrespondingstatisticalmethodscanplayacrucialroleinidentifyingrisk-pronepartsofsoftwaresystemsandofcombinedhardwareandsoftwaresystems.
Attitudetowardassumptions.Softwareengineersshouldbeawarethatviolatingassumptionsisnotasimportantasthoroughlyunderstandingtheviolation'seffectsonconclusions.Statisticstextbooks,courses,andconsultingactivitiesshouldconveythestatistician'slevelofunderstandingaboutandperspectiveontheimportanceandimplicationsofassumptionsforstatisticalinferencemethods.
Visualization.Graphicsisimportantinexploratorystagesinhelpingtoascertainhowcomplexamodelthedataoughttosupport;intheanalysisstage,bywhichresidualsaredisplayedtoexaminewhatthe
currentlyentertainedmodelhasfailedtoaccountfor;andinthepresentationstage,inwhichgraphicscanprovidesuccinctandconvincingsummariesofthestatisticalanalysisandtheassociateduncertainty.Visualizationcanhelpsoftwareengineerscopewith,andunderstand,thehugequantitiesofdatacollectedaspartofthesoftwaredevelopmentprocess.
Tools.Itisimportanttoidentifygoodstatisticalcomputingtoolsforsoftwareengineers.Anoverviewofstatisticalcomputing,languages,systems,andpackagesshouldbedonethatisfocusedspecificallyforthebenefitofsoftwareengineers.
Page5
1Introductionstatistics.Themathematicsofthecollection,organization,andinterpretationofnumericaldata,especiallytheanalysisofpopulationcharacteristicsbyinferencefromsampling.
1
softwareengineering.(1)Theapplicationofasystematic,disciplined,quantifiableapproachtothedevelopment,operation,andmaintenanceofsoftware;thatis,theapplicationofengineeringtosoftware.(2)Thestudyofapproachesasin(1).2
statisticalsoftwareengineering.Theinterdisciplinaryfieldofstatisticsandsoftwareengineeringspecializingintheuseofstatisticalmethodsforcontrollingandimprovingthequalityandproductivityofthepracticesusedincreatingsoftware.
Theabovedefinitionsdescribetheislandsofknowledgeandexperiencethatthisreportattemptstobridge.SoftwareisacriticalcoreindustrythatisessentialtoU.S.nationalinterestsinscience,technology,anddefense.Itisubiquitousintoday'ssociety,coexistingwithhardware(micro-electroniccircuitry)inourtransportation,communication,financial,andmedicalsystems.Thesoftwareinamoderncardiacpacemaker,forexample,consistsofapproximatelyone-halfmegabyteofcodethathelpscontrolthepulserateofpatientswithheartdisorders.Inthisandotherapplications,issuessuchasreliabilityandsafety,faulttolerance,anddependabilityareobviouslyimportant.Fromtheindustrialperspective,soalsoareissues
concernedwithimprovingthequalityandproductivityofthesoftwaredevelopmentprocess.Yetstatisticalmethods,despitethelonghistoryoftheirimpactinmanufacturingaswellasintraditionalareasofscience,technology,andmedicine,haveasyethadlittleimpactoneitherhardwareorsoftwaredevelopment.
ThisreportistheproductofapanelconvenedbytheBoardonMathematicalSciences'CommitteeonAppliedandTheoreticalStatistics(CATS)toidentifychallengesandopportunitiesinsoftwaredevelopmentandimplementationthathaveasignificantstatisticalcomponent.Inattemptingtoidentifyinterrelatedaspectsofstatisticsandsoftwareengineering,itenunciatesanewinterdisciplinaryfield:statisticalsoftwareengineering.Whileemphasizingtherelevanceofapplyingrigorousstatisticalandprobabilistictechniquestoproblemsinsoftwareengineering,thepanelalsopointsoutopportunitiesforfurtherresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineering.Itshopeisthatnewresearchersfromstatisticsandthemathematicalscienceswillthusbemotivatedtoaddressrelevantandpressingproblemsof
1SeeTheAmericanHeritageDictionaryoftheEnglishLanguage(1981)2SeeInstituteofElectricalandElectronicsEngineers(1990)
Page6
softwaredevelopmentandalsothatsoftwareengineerswillfindthestatisticalemphasisrefreshingandstimulating.Thisreportalsoaddressestheimportantissuesoftrainingandeducationofsoftwareengineersinthestatisticalsciencesandofstatisticianswithaninterestinsoftwareengineering.
Atthepanel'sinformation-gatheringforuminOctober1993,12invitedspeakersdescribedtheirviewsontopicsthatareconsideredindetailinChapters2through6ofthisreport.Oneofthespeakers,JohnKnight,pointedoutthatthedateoftheforumcoincidednearlytothedaywiththe25thanniversaryoftheGarmischConference(RandellandNaur,1968),aNATO-sponsoredworkshopatwhichtheterm"softwareengineering"isgenerallyacceptedtohaveoriginated.Theparticularironyofthiscoincidenceisthatitisalsogenerallyacceptedthatalthoughmuchmoreambitioussoftwaresystemsarenowbeingbuilt,littlehaschangedintherelativeabilitytoproducesoftwarewithpredictablequality,costs,anddependability.OneoftheoriginalGarmischparticipants,A.G.Fraser,nowassociatevicepresidentintheInformationSciencesResearchDivisionatAT&TBellLaboratories,defendstheapparentlackofprogressbythereminderthatpriortoGarmisch,therewasno"collectiverealization"thattheproblemsindividualorganizationswerefacingweresharedacrosstheindustrythusGarmischwasacriticalfirststeptowardaddressingissuesinsoftwareproduction.Itishopedthatthisreportwillplayasimilarroleinseedingthefieldofstatisticalsoftwareengineeringbyindicatingopportunitiesforstatisticalthinkingtohelpincreaseunderstanding,aswellastheproductivityandquality,ofsoftwareandsoftwareproduction.
Inpreparingthisreport,thepanelstruggledwiththeproblemofprovidingthe"bigpicture"ofthesoftwareproductionprocess,whilesimultaneouslyattemptingtohighlightopportunitiesforrelated
researchonstatisticalmethods.Theproblemsfacingthesoftwareengineeringfieldareindeedbroad,andnonstatisticalapproaches(e.g.,formalmethodsforverifyingprogramspecifications)areatleastasrelevantasstatisticalones.Thusthisreporttendstoemphasizethelargercontextinwhichstatisticalmethodsmustbedeveloped,basedontheunderstandingthatrecognitionofthescopeandtheboundariesofproblemsisessentialtocharacterizingtheproblemsandcontributingtotheirsolution.Itmustbenotedattheoutset,forexample,thatsoftwareengineeringisconcernedwithmorethantheendproduct,namely,code.Theproductionprocessthatresultsincodeisacentralconcernandthusisdescribedindetailinthereport.Toalargeextent,thepresentationofmaterialmirrorsthestepsinthesoftwaredevelopmentprocess.Althoughcurrentlythesinglelargestareaofoverlapbetweenstatisticsandsoftwareengineeringconcernssoftwaretesting(whichimpliesthatthecodeexists),itisthepanel'sviewthatthelargestcontributionstothesoftwareengineeringfieldwillbethoseaffectingthequalityandproductivityoftheprocessesthatprecedecodegeneration.
Thepanelalsoemphasizesthattheprocessandmethodsdescribedinthisreportpertaintothecaseofnewsoftwareprojects,aswellastothemoreordinarycircumstanceofevolvingsoftwareprojectsor"legacysystems."Forinstance,thesoftwarethatcontrolsthespaceshuttleflightsystemsorthatrunsmoderntelecommunicationnetworkshasbeenevolvingforseveraldecades.Thesetwocasesarereferredtofrequentlytoillustratesoftwaredevelopmentconceptsandcurrentpractice,andalthoughthesoftwaresystemsmaybeuncharacteristicallylarge,theyarearguablyforerunnersofwhatliesaheadinmanyapplications.Forexample,laserprintersoftwareiswitnessinganorder-of-magnitude(base-10)increaseinsizewitheachnewrelease.
Page7
Similarincreasesinsizeandcomplexityareexpectedinallconsumerelectronicproductsasincreasedfunctionalityisintroduced.
Centraltothisreport'stheme,andessentialtostatisticalsoftwareengineering,istheroleofdata,therealmwhereopportunitieslieanddifficultiesbegin.Theopportunitiesareclear:wheneverdataareusedorcanbegeneratedinthesoftwarelifecycle,statisticalmethodscanbebroughttobearfordescription,estimation,andprediction.Thisreporthighlightssuchareasandgivesexamplesofhowstatisticalmethodshavebeenandcanbeused.
Nevertheless,themajorobstacletoapplyingstatisticalmethodstosoftwareengineeringisthelackofconsistent,high-qualitydataintheresource-allocation,design,review,implementation,andteststagesofsoftwaredevelopment.Statisticiansinterestedinconductingresearchinsoftwareengineeringmustacknowledgethisfactandplayaleadershiproleinprovidingadequategroundsfortheresourcesneededtoacquireandmaintainhigh-quality,relevantdata.Astatementbyoneoftheforumparticipants,DavidCard,capturestheseriousproblemthatstatisticiansfaceindemonstratingthevalueofgooddataandgooddataanalysis:"Itmaynotbethateffectivetobeabletorigorouslydemonstratea10%or15%or20%improvement(inqualityorproductivity)whenwithnodataandnoanalysis,youcanclaim50%oreven100%."
Thecostofcollectingandmaintaininghigh-qualityinformationtosupportsoftwaredevelopmentisunfortunatelyhigh,butarguablyessentialastheNASAcasestudypresentedinChapter2makesclear.Thepanelconjecturesthatuseofadequatemetricsanddataofgoodqualityis,ingeneral,theprimarydifferentiatorbetweensuccessful,productivesoftwaredevelopmentorganizationsandthosethatarestruggling.Traditionalmanufacturershavelearnedthevalueof
investinginaninformationsystemtosupportproductdevelopment;softwaredevelopmentorganizationsmusttakeheed.Alltoooften,asareleasedateapproaches,allavailableresourcesarededicatedtomovingasoftwareproductoutthedoor,withtheresultthatfewornoresourcesareexpendedoncollectingdataduringthesecrucialperiods.Subsequentattemptsatretrospectiveanalysistohelpforecastcostsforanewproductoridentifyrootcausesoffaultsfoundduringproducttestingareinconclusivewhenspeculationratherthanharddataisallthatisavailabletoworkwith.Butevensoftwaredevelopmentorganizationsthatrealizetheimportanceofhistoricaldatacangetcaughtinadownwardspiral:effortisexpendedoncollectionofdatathatinitiallyareinsufficienttosupportinferences.Whendataarenotbeingused,effortstomaintaintheirqualitydecrease.Butthenwhenthedataareneeded,theirqualityisinsufficienttoallowdrawingconclusions.Thespiralhasbegun.
Asonemeansofcapturingvaluablehistoricaldata,effortsareunderwaytocreaterepositoriesofdataonsoftwaredevelopmentexperimentsandprojects.Thereismuchapprehensioninthesoftwareengineeringcommunitythatsuchdatawillnotbehelpfulbecausetherelevantmetadata(dataaboutthedata)arenotlikelytobeincluded.Thepanelsharesthisconcernbecausetheexclusionofmetadatanotonlyencouragessometimesthoughtlessanalyses,butalsomakesittooeasyforstatisticianstoconductisolatedresearchinsoftwareengineering.Thepanelbelievesthattrulycollaborativeresearchmustbeundertakenandthatitmustbedonewithakeeneyetosolvingtheparticularproblemsfacedbythesoftwareindustry.Nevertheless,thepanelrecognizesbenefitstocollectingdataorexperimentationinsoftwaredevelopment.AsispointedoutinmoredetailinChapter5,oneofthelargestimpactsthestatisticalcommunity
Page8
canhaveinsoftwareengineeringconcernseffortstocombineinformation(NRC,1992)acrosssoftwareengineeringprojectsasameansofevaluatingtheeffectsoftechnology,language,organization,andthedevelopmentprocessitself.Althoughdifficultissuesareposedbytheneedtoadjustappropriatelyfordifferencesinprojects,theinconsistencyofmetrics,andvaryingdegreesofdataquality,theavailabilityofadatarepositoryatleastallowsforsuchresearchtobegin.
Althoughthisreportservesasareviewofthesoftwareproductionprocessandrelatedresearchtodate,itisnecessarilyincomplete.Limitationsonthescopeofthepanel'seffortsprecludedafullertreatmentofsomematerialandtopicsaswellasinclusionofcasestudiesfromawidervarietyofbusinessandcommercialsectors.Thepanelresistedthetemptationtodrawonanalogiesbetweensoftwaredevelopmentandtheconvergingareaofcomputerhardwaredevelopment(whichforthemostpartisinitiallyrepresentedinsoftware).Theoneapproachitisconfidentofnotreflectingisover-simplificationoftheproblemdomainitself.
Page9
2CaseStudy:NASASpaceShuttleFlightControlSoftwareTheNationalAeronauticsandSpaceAdministrationleadstheworldinresearchinaeronauticsandspace-relatedactivities.Thespaceshuttleprogram,beguninthelate1970s,wasdesignedtosupportexplorationofEarth'satmosphereandtoleadthenationbackintohumanexplorationofspace.
IBM'sFederalSystemsDivision(nowLoral),whichwascontractedtosupportNASA'sshuttleprogrambydevelopingandmaintainingthesafety-criticalsoftwarethatcontrolsflightactivities,hasgainedmuchexperienceandinsightinthedevelopmentandsafeoperationofcriticalsoftware.Throughouttheprogram,theprevailingmanagementphilosophyhasbeenthatqualitymustbebuiltintosoftwarebyusingsoftwarereliabilityengineeringmethodologies.Thesemethodologiesarenecessarilydependentontheabilitytomanage,control,measure,andanalyzethesoftwareusingdescriptivedatacollectedspecificallyfortrackingandstatisticalanalysis.BasedonapresentationbyKeller(1993)atthepanel'sinformation-gatheringforum,thefollowingcasestudydescribesspaceshuttleflightsoftwarefunctionalityaswellasthesoftwaredevelopmentprocessthathasevolvedforthespaceshuttleprogramoverthepast15years.
OVERVIEWOFREQUIREMENTS
Theprimaryavionicssoftwaresystem(PASS)isthemission-criticalon-boarddataprocessingsystemforNASA'sspaceshuttlefleet.Inflight,allshuttlecontrolactivitiesincludingmainenginethrottling,directingcontroljetstoturnthevehicleinadifferentorientation,
firingtheengines,orprovidingguidancecommandsforlandingareperformedmanuallyorautomaticallywiththissoftware.IntheeventofaPASSfailure,thereisabackupsystem.Asindicatedinthespaceshuttleflightloghistory,thebackupsystemhasneverbeeninvoked.
Toensurehighreliabilityandsafety,IBMhasdesignedthespaceshuttlecomputersystemtohavefourredundant,synchronizedcomputers,eachofwhichisloadedwithanidenticalversionofthePASS.Every3to4milliseconds,thefourcomputerscheckwithoneanothertoassurethattheyareinlockstepandaredoingthesamething,seeingthesameinput,sendingthesameoutput,andsoforth.Theoperatingsystemisdesignedtoinstantaneouslydeselectafailedcomputer.
ThePASSissafety-criticalsoftwarethatmustbedesignedforqualityandsafetyattheoutset.Itconsistsofapproximately420,000linesofsourcecodedevelopedinHAL,anengineeringlanguageforreal-timesystems,andishostedonflightcomputerswithverylimitedmemory.Softwareisintegratedwithintheflightcontrolsystemintheformofoverlays-onlythesmallamountofcodenecessaryforaparticularphaseoftheflight(e.g.,ascent,on-orbit,orentryactivities)isloadedincomputermemoryatanyonetime.Atquiescentpointsinthe
Page10
mission,thememorycontentsare"swappedout"forprogramapplicationsthatareneededforthenextphaseofthemission.
Insupportofthedevelopmentofthissafety-criticalflightcode,thereareanother1.4millionlinesofcode.Thisadditionalsoftwareisusedtobuild,develop,andtestthesystemaswellastoprovidesimulationcapabilityandperformconfigurationcontrol.Thissupportsoftwaremusthavethesamehighqualityastheon-boardsoftware,giventhatflawedgroundsoftwarecanmaskerrors,introduceerrorsintotheflightsoftware,orprovideanincorrectconfigurationofsoftwaretobeloadedaboardtheshuttle.
Inshort,IBM/Loralmaintainsapproximately2millionlinesofcodeforNASA'sspaceshuttleflightcontrolsystem.ThecontinuallyevolvingrequirementsofNASA'sspaceflightprogramresultinanevolvingsoftwaresystem:thesoftwareforeachshuttlemissionflownisacompositeofcodethathasbeenimplementedincrementallyover15years.Atanygiventime,thereisasubsetoftheoriginalcodethathasneverbeenchanged,codethatwassequentiallyaddedineachupdate,andnewcodepertainingtothecurrentrelease.Approximately275peoplesupportthespaceshuttlesoftwaredevelopmenteffort.
THEOPERATIONALLIFECYCLE
OriginallythePASSwasdevelopedtoprovideabasicflightcapabilityofthespaceshuttle.Thefirstflownversionwasdevelopedandsupportedforflightsin1981through1982.However,therequirementsoftheflightmissionsevolvedtoincludeincreasedoperationalcapabilityandmaintenanceflexibility.Amongtheshuttleprogramenhancementsthatchangedtheflightcontrolsystemrequirementswerechangesinpayloadmanifestcapabilitiesandmainenginecontroldesign,crewenhancements,additionofanexperimentalautopilotfororbiting,systemimprovements,abort
enhancements,provisionsforextendedlandingsites,andhardwareplatformchanges.FollowingtheChallengeraccident,whichwasnotrelatedtosoftware,manynewsafetyfeatureswereaddedandthesoftwarewaschangedaccordingly.
Foreachreleaseofflightsoftware(calledanoperationalincrement),anominal6-to9-monthperiodelapsesbetweendeliverytoNASAandactualflight.Duringthistime,NASAperformssystemverification(toassurethatthedeliveredsystemcorrectlyperformsasrequired)andvalidation(toassurethattheoperationiscorrectfortheintendeddomain).Thisphaseofthesoftwarelifecycleiscriticaltoassuringsafetybeforeasafety-criticaloperationoccurs.Itisatimeforacompleteintegratedsystemtest(flightsoftwarewithflighthardwareinoperationaldomainscenarios).Crewtrainingformissionpracticesisalsoperformedatthistime.
ASTATISTICALAPPROACHTOMANAGINGTHESOFTWAREPRODUCTIONPROCESS
Tomanagethesoftwareproductionprocessforspaceshuttleflightcontrol,descriptivedataaresystematicallycollected,maintained,andanalyzed.Atthebeginningofthespaceshuttleprogram,globalmeasurementsweretakentotrackschedulesandcosts.Butassoftware
Page11
developmentcommenced,itbecamenecessarytoretainmuchmoreproduct-specificinformation,owingtothecriticalnatureofspaceshuttleflightaswellastheneedforcompleteaccountabilityfortheshuttle'soperation.Thedetailandgranularityofdatadictatenotonlythetypebutalsothelevelofanalysisthatcanbedone.Datarelatedtofailureshavebeenspecificallyaccumulatedinadatabasealongwithalltheothercorollaryinformationavailable,andaprocedurehasbeenestablishedforreliabilitymodeling,statisticalanalysis,andprocessimprovementbasedonthisinformation.
Acompositedescriptionofallspaceshuttlesoftwareofvariousagesismaintainedthroughaconfigurationmanagement(CM)system.TheCMdataincludenotonlyachangeitself,butalsothelinesofcodeaffected,reasonsforthechange,andthedateandtimeofchange.Inaddition,theCMsystemincludesdatadetailingscenariosforpossiblefailuresandtheprobabilityoftheiroccurrence,userresponseprocedures,theseverityofthefailures,theexplicitsoftwareversionandspecificlinesofcodeinvolved,thereasonsfornopreviousdetection,howlongthefaulthadexisted,andtherepairorresolution.Althoughthesedataseemabundant,itisimportanttoacknowledgetheirtimedependence,becausethesoftwaresystemtheydescribeissubjecttoconstant"churn."
Overtheyears,theCMsystemforthespaceshuttleprogramhasevolvedintoacommon,minimumsetofdatathatmustberetainedregardingeveryfaultthatisrecognizedanywhereinthelifecycle,includingfaultsfoundbyinspectionsbeforesoftwareisactuallybuilt.Thisevolutionarydevelopmentisamenabletoevaluationbystatisticalmethods.Trendanalysisandpredictionsregardingtesting,allocationofresources,andestimationofprobabilitiesoffailureareexamplesofthemanyactivitiesthatdrawonthedatabase.Thisdatabasealsocontinuestobethebasisfordefininganddevelopingsophisticated,
insightfulestimationtechniquessuchasthosedescribedbyMunson(1993).
FaultDetection
Managementphilosophyprescribesthatprocessimprovementispartoftheprocess.Suchproactiveprocessimprovementincludesinspectionateverystepoftheprocess,detaileddocumentationoftheprocess,andanalysisoftheprocessitself.
Thecriticalimplicationsofanill-timedfailureinspaceshuttleflightcontrolsoftwarerequirethatremediesbedecisiveandaggressive.Whenafaultisidentified,afeedbackprocessinvolvingdetailedinformationonthefaultenforcesasearchforsimilarfaultsintheexistingsystemandchangestheprocesstoguardactivelyagainstsuchfaultsinflightcontrolsoftwaredevelopment.Thecharacteristicsofasinglefaultareactivelydocumentedinthefollowingfour-stepreactiveprocess-improvementprotocol:
1.Removethefault,
2.Identifytherootcauseofthefault,
3.Eliminatetheprocessdeficiencythatletthefaultescapeearlierdetection,and
4.Analyzetheproductforother,similarfaults.
Page12
Furtherscrutinyofwhatoccurredintheprocessbetweenintroductionanddetectionofafaultisaimedatdeterminingwhydownstreamprocesselementsfailedtodetectandremovethefault.Suchintrospectiveanalysisisdesignedtoimprovetheprocessandspecificprocesselementssothatifasimilarfaultisintroducedagain,theseprocesselementswilldetectitbeforeitgetstoofaralongintheproductlifecycle.Thisfour-stepprocessimprovementisachievablebecauseofthematurityoftheoverallIBM/Loralsoftwaremanagementprocess.ThecompleterecordingofprojecteventsintheCMsystem(phaseoftheprocess,changehistoryofinvolvedline(s)ofcode,thelineofcodethatincludedanerror,theindividualsinvolved,andsoon)allowshindsightsothatthedevelopmentteamcanapproachtheoccurrenceofanerrornotasafailurebutratherasanopportunitytoimprovetheprocessandtofindother,similarerrors.
SafetyCertification
Thedependabilityofsafety-criticalsoftwarecannotbebasedmerelyontestingthesoftware,countingandrepairingthefaults,andconducting"livetests"onshuttlemissions.Testingofsoftwareformany,manyyears,muchlongerthanitslifecycle,wouldberequiredinordertodemonstratesoftwarefailureprobabilitylevelsof10-7or10-9peroperationalhour.Aprocessmustbeestablished,anditmustbedemonstratedstatisticallythatifthatprocessisfollowedandmaintainedunderstatisticalcontrol,thensoftwareofknownqualitywillresult.Oneresultistheabilitytopredictaparticularleveloffaultdensity,inthesensethatfaultdensityisproportionaltofailureintensity,andsoprovideaconfidencelevelregardingsoftwarequality.Thisapproachisdesignedtoensurethatqualityisbuiltintothesoftwareatameasurablelevel.IBM'shistoricaldatademonstrateaconstantlyimprovingprocessforcomfortofspaceshuttleflight.Theuseofsoftwareengineeringmethodologiesthatincorporatestatistical
analysismethodsgenerallyallowstheestablishmentofabenchmarkforobtainingavalidmeasureofhowwellaproductmeetsaspecifiedlevelofquality.
Page13
3ASoftwareProductionModelThesoftwaredevelopmentprocessspansthelifecycleofagivenproject,fromthefirstidea,toimplementation,throughcompletion.Manyprocessmodelsfoundintheliteraturedescribewhatisbasicallyaproblem-solvingeffort.Theonediscussedindetailbelow,asaconvenientwaytoorganizethepresentation,isoftendescribedasthewaterfallmodel.Itisthebasisfornearlyallthemajorsoftwareproductsinusetoday.Butaswithallgreatworkhorses,itisbeginningtoshowitsage.Newmodelsincurrentuseincludethosewithdesignandimplementationoccurringinparallel(e.g.,rapidprototypingenvironments)andthoseadoptingamoreintegrated,lesslinear,viewofaprocess(e.g.,thespiralmodelreferredtoinChapter6).Althoughthediscussioninthischapterisspecifictoaparticularmodel,thatinsubsequentchapterscutsacrossallmodelsandemphasizestheneedtoincorporatestatisticalinsightintothemeasurement,datacollection,andanalysisaspectsofsoftwareproduction.
Thefirststepofthesoftwarelifecycle(Boehm,1981)isthegenerationofsystemrequirementswherebyfunctionality,interactions,andperformanceofthesoftwareproductarespecifiedin(usually)numerousdocuments.Inthedesignstep,systemrequirementsarerefinedintoacompleteproductdesign,anoverallhardwareandsoftwarearchitecture,anddetaileddescriptionsofthesystemcontrol,data,andinterfaces.Theresultofthedesignstepis(usually)asetofdocumentslayingoutthesystem'sstructureinsufficientdetailtoensurethatthesoftwarewillmeetsystemrequirements.Mostoften,bothrequirementsanddesigndocumentsareformallyreviewedpriortocodinginordertoavoiderrorscausedbyincorrectlystated
requirementsorpoordesign.Thecodingstagecommencesoncethesereviewsaresuccessfullycompleted.Sometimesschedulingconsiderationsleadtoparallelreviewandcodingactivities.Normallyindividualsorsmallteamsareassignedspecificmodulestocode.Codeinspectionshelpensurethatmodulequality,functionality,andschedulearemaintained.
Oncemodulesarecoded,thetestingstepbegins.(ThistopicisdiscussedinsomedetailinChapter3.)Testingisdoneincrementallyonindividualmodules(unittesting),onsetsofmodules(integrationtesting),andfinallyonallmodules(systemtesting).Inevitably,faultsareuncoveredintestingandareformallydocumentedasmodificationrequests(MRs).OnceallMRsareresolved,ormoreusuallyasschedulesdictate,thesoftwareisreleased.Fieldexperienceisrelayedbacktothedeveloperasthesoftwareis"burnedin"inaproductionenvironment.Patchesorrereleasesfollowbasedoncustomerresponse.Backwardcompatibilitytests(regressiontesting)areconductedtoensurethatcorrectfunctionalityismaintainedwhennewversionsofthesoftwareareproduced.
Theaboveoverviewisnoticeablynonquantitative.Indeed,thisnonquantitativecharacteristicisthemoststrikingdifferencebetweensoftwareengineeringandmoretraditional(hardware)engineeringdisciplines.Measurementofsoftwareiscriticalforcharacterizingboththeprocessandtheproduct,andyetsuchmeasurementhasproventobeelusiveandcontroversial.AsarguedinChapter1,theapplicationofstatisticalmethodsispredicatedontheexistenceofrelevantdata,andtheissueofsoftwaremeasurementsandmetricsisdiscussed
Page14
prominentlythroughoutthereport.Thisisnottoimplythatmeasurementshaveneverbeenmadeorthatdataaretotallylacking.Unfortunatelymetricstendtodescribepropertiesandconditionsforwhichitiseasytogatherdataratherthanthosethatareusefulforcharacterizingsoftwarecontent,complexity,andform.
PROBLEMFORMULATIONANDSPECIFICATIONOFREQUIREMENTS
Withinthecontextofsystemdevelopment,specificationsforrequiredsoftwarefunctionsarederivedfromthelargersystemrequirements,whicharetheprimarysourcefordeterminingwhatthedeliveredsoftwareproductwilldoandhowitwilldoit.Theserequirementsaretranslatedbythedesignerordesignteamintoafinishedproductthatdeliversallthatisexplicitlystatedanddoesnotincludeanythingexplicitlyforbidden.SomecommonreferencesregardingrequirementsspecificationarementionedinIEEEStandardforSoftwareProductivityMetrics(IEEE,1993).
Requirementsthefirstformaltangibleproductobtainedinthedevelopmentofasystem-aresubjectivestatementsspecifyingthesystem'svariousdesiredoperationalcharacteristics.Errorsinrequirementsariseforanumberofreasons,includingambiguousstatements,inconsistentinformation,unclearuserrequirements,andincompleterequests.Projectsthathaveill-definedorunstatedrequirementsaresubjecttoconstantiteration,andalackofpreciserequirementsisakeysourceofsubsequentsoftwarefaults.Ingeneral,thelongerafaultresidesinasystembeforeitisdetected,thegreateristhecostofremovingitorrecoveringfromrelatedfailures.Thisconditionisaprimarydriverofthereviewprocessthroughoutsoftwaredevelopment.
Theformulationrequirementsstartwithcustomersrequestinganew
functionality.Systemsengineerscollectinformationdescribingthenewfunctionalityanddevelopacustomerspecificationdescription(CSD)describingthecustomer'sviewofthefeature.TheCSDisusedinternallybysoftwaredevelopmentorganizationstoformulatecostestimatesforbidding.Afterthefeatureiscommitted(sold),systemsengineerswriteafeaturespecificationdescription(FSD)describingtheinternalviewofthefeature.TheFSDiscommonlyreferredtoas''requirements."BoththeCSDandFSDarecarefullyreviewedandmustmeetformalcriteriaforapproval.
DESIGN
Theheartofthesoftwaredevelopmentcycleisthetranslationandrefinementoftherequirementsintocode.Softwarearchitectstransformtherequirementsforeachspecifiedfeatureintoahigh-leveldesign.Aspartofthisprocess,theydeterminewhichsubsystems(e.g.,databases)andmodulesarerequiredandhowtheyinteractorcommunicate.Thebroad,high-leveldesignisthenrefinedintoadetailedlow-leveldesign.Thistransformationinvolvesmuchinformationgatheringanddetectivework.Thesoftwarearchitectsareoftenthemostexperiencedandknowledgeableofthesoftwareengineers.
Thesequenceofcontinualrefinementsultimatelyresultsinamappingofhigh-levelfunctionsintomodulesandcode.Partofthisdesignprocessisselectinganappropriate
Page15
representation,whichinmostcasesisaspecificprogramminglanguage.Selectionofarepresentationinvolvesfactorssuchasoperationaldomain,systemperformance,andfunction,amongothers.Whencompleted,thehigh-leveldesignisreviewedbyall,includingthoseconcernedwiththeaffectedsubsystemsandtheorganizationresponsiblefordevelopment.
Thehumanelementisacriticalissueintheearlystagesofasoftwareproject.Quantitativedataarepotentiallyavailablefollowingdocumentreviews.Specifically,earlyinthedevelopmentcycleofsoftwaresystems,(paper)documentsareprepareddescribingfeaturerequirementsorfeaturedesign.Priortoaformaldocumentreview,thereviewersindividuallyreadthedocument,notingissuesthattheybelieveshouldberesolvedbeforethedocumentisapprovedandfeaturedevelopmentisbegun.Atthereviewmeeting,asinglelistofissuesispreparedthatincludestheissuesnotedbythereviewersaswellastheonesdiscoveredduringthemeetingitself.Thisprocessthusgeneratesdataconsistingofatabulationofissuesfoundbyeachreviewer.Thedegreeofoverlapprovidesinformationregardingthenumberofremainingissues,thatis,thoseyettobeidentified.Ifthisnumberisacceptablysmall,theprocesscanproceedtothenextstep;ifnot,furtherdocumentrefinementisnecessaryinordertoavoidcostlyfixeslaterintheprocess.Theproblemasstatedbearsacertainresemblancetocapture-recapturemodelsinwildlifestudies,andsoappropriatestatisticalmethodscanbedevisedforanalyzingthereviewdata,asillustratedinthefollowingexample.
Example.Table1containsdataonissuesidentifiedforaparticularfeaturefortheAT&T5ESSswitch(Eicketal.,1992a).Sixreviewersfoundatotalof47distinctissues.Acommoncapture-recapturemodelassumesthateachissuehasthesameprobabilityofbeingcaptured(detected)andthatreviewersworkindependentlywiththeirown
chanceofcapturinganissue,ordetectionprobability.Undersuchamodel,likelihoodmethodsyieldanestimateofN=65,implyingthatapproximately20issuesremaintobeidentifiedinthedocument.Anupper95%confidenceboundforNunderthismodelis94issues.
Suchamodelisnaturalbutsimplistic.Thesoftwaredevelopmentenvironmentisnotconducivetoindependenceamongreviewers(sothatsomedegreeofcollusionisunavoidable),andreviewersalsoareselectedtocovercertainareasofspecialization.Ineithercase,thecornerstoneofcapture-recapturemodels,thebinomialdistribution,isnolongerappropriateforthetotalnumberofissues.Itispossibletodevelopalikelihood-basedtestforpairwisecollusionofreviewersandreviewer-specifictestsofspecialization.Intheexampleabove,thereisnoevidenceofcollusionamongreviewers,butreviewerCexhibitsasignificantlygreaterdegreeofspecializationthandotheotherreviewers.Whenthisrevieweristreatedasaspecialist,themaximumlikelihoodestimate(MLE)ofthenumberofissuesisreducedto53,implyingthatonlyahalfdozenissuesremaintobediscoveredinthedocument.
Othermismatchesbetweenthedataarisinginsoftwarereviewandthoseincapture-recapturewildlifepopulationstudiesinducebiasintheMLE.Anotherpossibleestimatorforthisproblemisthejackknifeestimator(BurnhamandOverton,1978).ButthisestimatorseemsinfacttobemorebiasedthantheMLE(VanderWielandVotta,1993).Botharerescuedtoalargeextentbytheircategorizationoffaultsintoclasses(e.g.,"easytofind"versus"hardtofind").Inanygiven
Page16
Table1.Issuediscovery.Therowsofthetablerepresent47issuesnotedbysixreviewerspriortoreviewmeetings.Anentryincelli,jofthetableindicatesthatissuei(i=1,...,47)wasnotedbyreviewerj(j=A,...,F).Rowswithnoentries(i.e.,columnsumsofzero)correspondtoissuesdiscoveredatthemeeting.
Issue A B C D E F Sum Issue A B C D E F Sum
1 1 1 25 1 1 2
2 1 1 2 26 1 1 2
3 1 1 27 1 1
4 1 1 28 1 1
5 0 29 1 1 2
6 1 1 30 1 1
7 1 1 31 1 1
8 1 1 32 1 1
9 0 33 1 1
10 1 1 34 1 1 1 3
11 1 1 35 1 1 2
12 1 1 36 1 1
13 0 37 1 1
14 1 1 2 38 1 1
15 1 1 39 1 1
16 0 40 1 1
17 1 1 2 41 1 1
18 1 1 42 1 1 2
19 1 1 2 43 1 1
20 1 1 2 44 1 1
21 1 1 1 1 1 5 45 1 1
22 1 1 2 46 1 1
23 1 1 47 1 1
24 1 1 SUM 25 3 4 13 9 6 60
application,itisnecessarytoverifythatthe"easytofind"and"hardtofind"classificationismeaningful,ortodeterminethatitismerelypartitioningthedistributionofdifficultyinanarbitrarymanner.Arelevantpointinthisandotherapplicationsofstatisticalmethodsinsoftwareengineeringisthataddressingaspectsoftheproblemthatinducestudybiasisimportantandvalued-theoreticalworkaddressingaspectsofstatisticalbiasisnotlikelytobeashighlyvalued.
IMPLEMENTATION
Thephaseinthesoftwaredevelopmentprocessthatisoftenreferredtointerchangeablyascoding,development,orimplementationistheactualtransformationoftherequirementsintoexecutableform."Implementationinthesmall"referstocoding,and"implementationinthelarge"referstodesigninganentiresysteminatop-downfashionwhilemaintainingaperspectiveonthefinalintegratedsystem.
Page17
Low-leveldesigns,orcodingunits,arecreatedfromthehigh-leveldesignforeachsubsystemandmodulethatneedstobechanged.Eachcodingunitspecifiesthechangestobemadetotheexistingfiles,newormodifiedentrypoints,andanyfilethatmustbeadded,aswellasotherchanges.Afterdocumentreviewsandapprovals,thecodingmaybegin.Usingprivatecopiesofthecode,developersmakethechangesandaddthefilesspecifiedinthecodingunit.Codingisdelicatework,andgreatcareistakensothatunwantedsideeffectsdonotbreakanyoftheexistingcode.Aftercompletion,thecodeistestedbythedeveloperandcarefullyreviewedbyotherexperts.Thechangesaresubmittedtoapublicload(codefromallprogrammersthatismergedandloadedsimultaneously)usinganMRnumber.TheMRistiedbacktothefeaturetoestablishacorrespondencebetweenthecodeandthefunctionalitythatitprovides.
MRsareassociatedwiththesystemversionmanagementsystem,whichmaintainsacompletehistoryofeverychangetothesoftwareandcanrecreatethecodeasitexistedatanypointintime.Forproductionsoftwaresystems,versionmanagementsystemsarerequiredtoensurecodeintegrity,tosupportmultiplesimultaneousreleases,andtofacilitatemaintenance.Ifthereisaproblem,itmaybenecessarytobackoutchanges.Besidesarecordoftheaffectedlines,otherinformationiskept,suchasthenameoftheprogrammermakingthechanges,theassociatedfeaturenumber,whetherachangefixesafaultoraddsnewfunctionality,thedateofachange,andsoon.
Theconfigurationmanagementdatabasecontainstherecordofcodechanges,orchangehistoryofthecode.Eicketal.(1992b)describeavisualizationtechniquefordisplayingthechangehistoryofsourcecode.Thegraphicaltechniquerepresentseachfileasaverticalcolumnandeachlineofcodeasacolor-codedrowwithinthecolumn.Therowindentationandlengthtrackthecorrespondingtext,andthe
rowcoloristiedtoastatistic.Iftherowtrackingisliteralaswithcomputersourcecode,thedisplaylooksasifthetexthadbeenprintedincolorandthenphoto-reducedforviewingasasinglefigure.Thespatialpatternofcolorshowsthedistributionofthestatisticwithinthetext.
Example.Developinglargesoftwaresystemsisaproblemofscale.Inmultimillion-linesystemstheremaybehundredsofthousandsoffilesandtensofthousandsofmodules,workedonbythousandsofprogrammersformultiyearperiods.Justdiscoveringwhattheexistingcodedoesisamajortechnicalproblemconsumingsignificantamountsoftime.Acontinuingandsignificantproblemisthatofcodediscovery,wherebyprogrammerstrytounderstandhowunfamiliarcodeworks.Itmaytakeseveralweeksofdetailedstudytochangeafewlinesofcodewithoutcausingunwantedsideeffects.Indeed,muchoftheeffortinmaintenanceinvolveschangingcodewrittenbyanotherprogrammer.Becauseofvariationinprogrammerstaffsizesandinevitableturnover,trainingnewprogrammersisimportant.Visualizationtechniques,describedfurtherinChapter5,canimproveproductivitydramatically.
Figure1displaysamodulecomposedof20sourcecodefilescontaining9,365linesofcode.Theheightofeachcolumnindicatesthesizeofthefile.Fileslongerthanonecolumnarecontinuedovertothenext.Therowcolorindicatestheageofeachlineofcodeusingarainbowcolorscalewiththenewestlinesinredandtheoldestinblue.Ontheleftisaninteractivecolorscaleshowingacolorforeachofthe324changesbythe126programmersmodifyingthiscode
Page18
overthelast10years.Thevisualimpressionisthatofaminiaturepictureofallofthesourcecode,withtheindentationshowingtheusualClanguagecontrolstructure.
Theperceptionofcolorsisblurred,butthereareclearpatterns.Filesinapproximatelythesamehuewerewrittenataboutthesametimeandarerelated.Rainbowfileswithmanydifferenthuesareunstableandarelikelytobetroublespotsbecauseofallthechanges.Thebiggestfilehasabout1,300linesofcodeandtakesacolumnandahalf.
Changesfrommanycodingunitsareperiodicallycombinedtogetherintoaso-calledcommonloadofthesoftwaresystem.Theloadiscompiled,madeavailabletodevelopersfortesting,andinstalledinthelaboratorymachines.Bringingthechangestogetherisnecessarysothatdevelopersworkingondifferentcodingunitsofacommonfeaturecanensurethattheircodeworkstogetherproperlyanddoesnotbreakanyotherfunctionality.Developersalsousethepublicloadtotesttheircodeonlaboratorymachines.
Afterallcodingunitsassociatedwithafeaturearecompleteandithasbeentestedbythedevelopersinthelaboratory,thefeatureisturnedovertotheintegrationgroupforindependenttesting.TheintegrationgrouprunstestsofthefeatureaccordingtoafeaturetestplanthatwaspreparedinparallelwiththeFSD.Eventuallythenewcodeisreleasedaspartofanupgradeorsentoutdirectlyifitfixesacriticalfault.Atthisstage,maintenanceonthecodebegins.Ifcustomershaveproblems,developerswillneedtosubmitfaultmodificationrequests.
TESTING
Manysoftwaresystemsinusetodayareverylarge.Forexample,the
softwarethatsupportsmoderntelecommunicationsnetworks,orprocessesbankingtransactions,orchecksindividualtaxreturnsfortheInternalRevenueServicehasmillionsoflinesofcode.Thedevelopmentofsuchlarge-scalesoftwaresystemsisacomplexandexpensiveprocess.Becauseasinglesimplefaultinasystemmaycripplethewholesystemandresultinasignificantloss(e.g.,lossoftelephoneserviceinanentirecity),greatcareisneededtoassurethatthesystemisflawlesslyconstructed.Becauseafaultcanoccurinonlyasmallpartofasystem,itisnecessarytoassurethatevensmallprogramsareworkingasintended.Suchcheckingforconformanceisaccomplishedbytestingthesoftware.
Specifically,thepurposeofsoftwaretestingistodetecterrorsinaprogramand,intheabsenceoferrors,gainconfidenceinthecorrectnessoftheprogramorthesystemundertest.Althoughtestingisnosubstituteforimprovingaprocess,itdoesplayacrucialroleintheoverallsoftwaredevelopmentprocess.Testingisimportantbecauseitiseffective,ifcostly.Itisvariouslyestimatedthatthetotalcostoftestingisapproximately20to33%ofthetotalsoftwarebudgetforsoftwaredevelopment(Humphrey,1989).ThisfractionamountstobillionsofdollarsintheU.S.softwareindustryalone.Further,softwaretestingisverytimeconsuming,becausethetimefortestingistypicallygreaterthanthatforcoding.Thus,effortstoreducethecostsandimprovetheeffectivenessoftestingcanyieldsubstantialgainsinsoftwarequalityandproductivity.
Page19
Figure1.ASeeSoftTMdisplayshowingamodulewith20filesand9,365linesofcode.Eachfileisrepresentedasacolumnandeachlineofcodeasacoloredrow.Thenewestrowsareinredandtheoldestinblue,withacolorspectruminbetween.Thisoverviewhighlightsthelargestfilesandprogramcontrolstructures,whilethecolorshowsrelationshipsbetweenfiles,aswellasunstable,frequentlychangedcode.Eicketal.(1992b).
Page21
Muchofthedifficultyofsoftwaretestingisinthemanagementofthetestingprocess(producingreports,enteringMRs,documentingMRscleared,andsoon),themanagementoftheobjectsofthetestingprocess(testcases,testdrivers,scripts,andsoon),andthemanagementofthecostsandtimeoftesting.
Typically,softwaretestingreferstothephaseoftestingcarriedoutafterpartsofcodearewrittensothatindividualprogramsormodulescanbecompiled.Thisphaseincludesunit,integration,system,product,customer,andregressiontesting.Unittestingoccurswhenprogrammerstesttheirownprograms,andintegrationtestingisthetestingofpreviouslyseparatepartsofthesoftwarewhentheyareputtogether.Systemtestingisthetestingofafunctionalpartofthesoftwaretodeterminewhetheritperformsitsexpectedfunction.Producttestingismeanttotestthefunctionalityofthefinalsystem.Customertestingisoftenproducttestingperformedbytheintendeduserofthesystem.Regressiontestingismeanttoassurethatanewversionofasystemfaithfullyreproducesthedesirablebehavioroftheprevioussystem.
Besidesthestagesoftesting,therearemanydifferenttestingmethods.Inwhiteboxtesting,testsaredesignedonthebasisofdetailedarchitecturalknowledgeofthesoftwareundertest.Inblackboxtesting,onlyknowledgeofthefunctionalityofthesoftwareisusedfortesting;knowledgeofthedetailedarchitecturalstructureoroftheproceduresusedincodingisnotused.Whiteboxtestingistypicallyusedduringunittesting,inwhichthetester(whoisusuallythedeveloperwhocreatedthecode)knowstheinternalstructureandtriestoexerciseitbasedondetailedknowledgeofthecode.Blackboxtestingisusedduringintegrationandsystemtesting,whichemphasizestheuserperspectivemorethantheinternalworkingsofthesoftware.Thus,blackboxtestingtriestotestthefunctionalityof
softwarebysubjectingthesystemundertesttovarioususer-controlledinputsandbyassessingitsresultingperformanceandbehavior.
Sincethenumberofpossibleinputsortestcasesisalmostlimitless,testersneedtoselectasample,asuiteoftestcases,basedontheireffectivenessandadequacy.Hereinliesignificantopportunitiesforstatisticalapproaches,especiallyasappliedtoblackboxtesting.Adhocblackboxtestingcanbedonewhentesters,perhapsbasedontheirknowledgeofthesystemundertestanditsusers,decidespecificinputs.Anotherapproach,basedonstatisticalsamplingideas,istogeneratetestcasesrandomly.Theresultsofthistestingcanbeanalyzedbyusingvarioustypesofreliabilitygrowthcurvemodels(see"AssessmentandReliability"inChapter4).Randomgenerationrequiresastatisticaldistribution.Sincethepurposeofblackboxtestingistosimulateactualusage,ahighlyrecommendedtechniqueistogeneratetestcasesrandomlyfromthestatisticaldistributionneededbyusers,oftenreferredtoastheoperationalprofileofasystem.
Thereareseveraladvantagesanddisadvantagestostatisticaloperationalprofiletesting.Akeyadvantageisthatifonetakesalargeenoughsample,thenthesystemundertestwillbetestedinallthewaysthatausermayneeditandthusshouldexperiencefewerfieldfaults.Anotheradvantageofthismethodisthepossibilityofbringingthefullforceofstatisticaltechniquestobearoninferentialproblems;thatis,theresultsobtainedduringtestingcanbegeneralizedtomakeinferencesaboutthefieldbehaviorofthesystemundertest,includinginferencesaboutthenumberoffaultsremaining,thefailurerateinthefield,andsoon.
Inspiteofalltheseadvantages,statisticaloperationalprofiletestinginitspurestformisrarelyused.Therearemanydifficulties;someareoperationalandothersaremorebasic.Forexample,onecanneverbecertainabouttheoperationalprofileintermsofinputs,andespecially
Page22
intermsoftheirprobabilitiesofoccurrence.Also,forlargesystems,theinputspaceishigh-dimensional.Thus,anotherproblemishowtosamplefromthishigh-dimensionalspace.Further,thedistributionisnotstatic;itwill,inalllikelihood,changeovertimeasnewusersexercisethesysteminunanticipatedways.Evenifthispossibilitycanbediscounted,questionsremainabouttheefficiencyofstatisticaloperationalprofiletesting,whichcanbeveryinefficient,becausemostoftenthesystemundertestwillbeusedinroutineways,andthusarandomlydrawnsamplewillbehighlyweightedbyroutineoperations.Thishighweightingmaybefineifthenumberoftestcasesisverylarge.Butthentestingwouldbeveryexpensive,perhapsevenprohibitivelyso.Therefore,testersoftenadoptsomevariantofdrawingarandomsample;forexample,testersgivemoreweighttoboundaryvaluesthosevaluesaroundwhichthesystemisexpectedtochangeitsbehaviorandthereforewherefaultsarelikelytobefound.Thisandothercleverstrategiesadoptedbytesterstypicallyresultinatestingdistributionthatisquitedifferentfromtheoperationalprofile.Ofcourse,insuchacasetheresultsofthetestinglaboratorywillnotbegeneralizableunlesstherelationshipsbetweenthetwodistributionsaretakenintoaccount.
Thus,totakeadvantageoftheattractivenessofoperationalprofiletesting,somekeyproblemshavetobesolved:
1.Howtoobtaintheoperationalprofile,
2.Howtosampleaccordingtoastatisticaldistributioninhigh-dimensionalspace,and
3.Howtogeneralizeresultsobtainedinthetestinglaboratorytothefieldwhenthetestingdistributionisavariantoftheoperationalprofiledistribution.
Allofthesequestionscanbedealtwithconceptuallyusingstatisticalapproaches.
For(1),aBayesianelicitationprocedurecanbeenvisionedtoderivetheoperationalprofile.ThiselicitationisdoneroutinelyinBayesianapplications,butbecausethespaceisveryhighdimensional,techniquesareneededforBayesianelicitationinveryhighdimensionalspaces.
Concerning(2),ifthejointdistributioncorrespondingtotheoperationalprofileisknown,schemescanbeusedthataremoreefficientthansimplerandomsamplingschemes.Simplerandomsamplingisinefficientbecauseittypicallygiveshigherprobabilitytothemiddleofadistributionthantoitstails,especiallyinhighdimensions.Amoreefficientschemewouldsamplethetailsquickly.Thiscanbeaccomplishedbystratifyingthesupportofthedistribution.
McKayetal.(1979)formalizedthisideausingLatinhypercubesampling.SupposewehaveaK-dimensionalrandomvectorX=(X1,...,XK)andwewanttogetasampleofsizeNfromthejointdistributionofX.IfthecomponentsofXareindependent,thentheschemeissimple,namely:
DividetherangeofeachcomponentrandomvariableinNintervalsofequalprobability,
RandomlysampleoneobservationforeachcomponentrandomvariableineachofthecorrespondingNintervals,andfinally
RandomlycombinethecomponentstocreateX.
Stein(1987)showedthatthissamplingschemecanbesubstantiallybetterthansimplerandomsampling.ImanandConover(1982)andStein(1987)bothdiscussedextensionsfor
Page23
nonindependentcomponentvariables.Ofcourse,ifspecifyinghomogenousstrataispossible,itshouldbedonepriortoapplyingtheLatinhypercubesamplingmethodtoincreasetheoveralleffectivenessofthesamplingscheme.
Example:Considerasoftwaresystemcontrollingthestateofanair-to-groundmissile.Thekeyinputsforthesoftwarearealtitude,attackandbankangles,speed,pitch,roll,andyaw.Typically,thesevariablesareindependentlycontrolled.Totestthissoftwaresystem,combinationsofalltheseinputsmustbeprovidedandtheoutputfromthesoftwaresystemcheckedagainstthecorrespondingphysics.Onewouldliketogeneratetestcasesthatincludeinputsoverabroadrangeofpermissiblevalues.Totestallthevalidpossibilities,itwouldbereasonabletotryuniformdistributionsforeachinput.Supposewedecideuponasampleofsize6.ThecorrespondingLatinhypercubedesigniseasilyconstructedbydividingeachvariableintosixequalprobabilityintervalsandsamplingrandomlyfromeachinterval.Becausewehaveindependentrandomvariableshere,thefinalstepconsistsofrandomlycouplingthesesamples.Thedesignisdifficulttovisualizeinmorethantwodimensions,butonesuchsampleforattackandbankanglesisdepictedinFigure2.Notethatthereisexactlyoneobservationineachcolumnandineachrow,thusthename"Latinhypercube."
Figure2.Latinhypercube.N=6andK=2.
Page24
Finally,concerning(3),tomakeinferencesaboutfieldperformance,theissueofthediscrepancybetweenthestatisticaloperationalprofileandthetestingdistributionmustbeaddressed.Atthispoint,adistinctioncanbemadebetweentwotypesofextrapolationtofieldperformanceofthesystemundertest.Itisclearthatevenifthetrueoperationalprofiledistributionisnotavailable,totheextentthatthetestingdistributionhasthesamesupportastheoperationalprofiledistribution,statisticalinferencescanbemadeaboutthenumberofremainingfaults.Ontheotherhand,toextrapolatethefailureintensityfromthetestinglaboratorytothefield,itisnotenoughtohavethesamesupport;rather,identicaldistributionsareneeded.Ofcourse,itisunlikelythatafterspendingmuchtimeandmoneyontesting,onewouldagaintestwiththestatisticaloperationalprofile.Whatisneededisawayofreusingtheinformationgeneratedinthetestinglaboratory,perhapsbyatransformationinwhichsomestatisticaltechniquesbasedonreweightingcanhelp.Therearetwobasicideas,bothrelyingheavilyontheassumptionthatthetestingandthefield-usedistributionshavethesamesupport.Oneideaistouseallthedatafromthetestinglaboratory,butwithaddedweightstochangethesampletoresemblearandomsamplefromtheoperationalprofile.Theapproachissimilartoreweightinginimportancesampling.Anotherideaistoacceptorrejecttheinputsusedintestingwithaprobabilitydistributionbasedontheoperationalprofile.Foradescriptionofbothofthesetechniques,seeBeckmanandMcKay(1987).
Inhispresentationatthepanel'sforum,Phadke(1993)suggestedanothersetofstatisticaltechniques,basedonorthogonalarrays,forparsimonioustestingofsoftware.Theexampledescribedaboveprovesusefulinanelaboration.
Example.Forthesoftwaresystemthatdeterminesthestateofanattackplane,letusassumethatinterestcentersontestingonlytwo
conditionsforeachinputvariable.Thissituationarises,forexample,whentheprimaryinterestliesinboundaryvaluetesting.Letthelowervaluebeinputstate0andtheuppervaluebeinputstate1foreachofthevariables.Theninthelanguageofstatisticalexperimentaldesign,wehavesevenfactors,A,...,G(altitude,attackangle,bankangle,speed,pitch,roll,andyaw),eachattwolevels(0,1).Totestallofthepossiblecombinations,onewouldneedacompletefactorialexperiment,whichwouldhave27=128testcasesconsistingofallpossiblesequencesof0'sand1's.Forastatisticalexperimentintendedtoaddressonlymaineffects,ahighlyfractionatedfactorialdesignwouldbesufficient.However,inthecaseofsoftwaretesting,thereisnostatisticalvariabilityandlittleornointerestinestimatingvariouseffects.Rather,theinterestisincoveringthetestspaceasmuchaspossibleandcheckingwhetherthetestcasespassorfail.Eveninthiscase,itisstillpossibletousestatisticaldesignideas.Forexample,considerthesequenceoftestcasesgiveninTable2.Thisdesignrequires8testcasesinsteadof128.Inthiscase,sincethereisnostatisticalvariation,maineffectsdonothaveanypracticalmeaning.However,lookingatthepatterninthetable,itisclearthatallpossiblecombinationsofanytwopairsarecoveredinabalancedway.Thus,testingaccordingtothisdesignwillprotectagainstanyincorrectimplementationofthecodeinvolvingapairwiseinteraction.
Page25
A B C D E F G A B C D E F G
1 0 0 0 0 0 0 0 10 0 0 0 0 0 0
2 0 0 0 1 1 1 1 21 1 1 1 1 1 0
3 0 1 1 0 0 1 1 30 0 1 1 1 0 1
4 1 1 1 1 1 0 0 41 0 0 0 1 1 1
5 1 0 1 0 1 0 1 51 1 1 0 0 0 1
6 1 0 1 1 0 1 0 60 1 0 1 0 1 1
7 1 1 0 0 1 1 0
8 1 1 0 1 0 0 1
Table2a.Orthogonalarray.Testcasesinrows.Testfactorsincolumns.
Table2b.Combinatorialdesign.Testcasesinrows.Testfactorsincolumns.
Ingeneral,followingTaguchi,Phadke(1993)suggestsorthogonalarraydesignsofstrengthtwo.Thesedesigns(aspecificinstanceofwhichisgivenintheaboveexample)guaranteethatallpossiblepairwisecombinationswillbetriedoutinabalancedway.AnotherapproachbasedoncombinatorialdesignswasproposedbyCohenetal.(1994).Theirdesignsdonotconsiderbalancetobeanoverridingdesigncriterion,andaccordinglytheyproducedesignswithsmallernumbersofrequiredtestcases.Forexample,Table2bcontainsacombinatorialdesignwithcompletepairwisecoverageinsixrunsinsteadoftheeightrequiredbyorthogonalarrays(Table2a).Thisnotionhasbeenextendedtonotionsofhigher-ordercoverageaswell.Theefficacyoftheseandothertypesofdesignshastobeevaluatedinthetestingcontext.
Besidesthetypesoftestingdiscussedabove,thereareotherstatisticalstrategiesthatcanbeused.Forexample,DeMilloetal.(1988)havesuggestedtheuseoffaultinsertiontechniques.Thebasicideaisakintocapture-recapturesamplinginwhichsampledunitsofapopulation(usuallywildlife)arereleasedandinversesamplingisdonetoestimatetheunknownpopulationsize.TheMOTHRAsystembuiltbyDeMilloandhiscolleaguesimplementssuchascheme.Whiletherearemanypossiblesamplingschemes(Nayak,1988),thedifficultywithfaultinsertionisthatthefaultsinsertedoughttobesubtleenoughsothatthesystemcanbecompiledandtested;notwoinsertedfaultsshouldinteractwitheachother;andwhileitmaybepossibleattheunittestinglevel,itisprohibitivelyexpensiveforintegrationtesting.Itshouldbepointedoutthattheuseofcapture-recapturesampling,outlinedinthischapter'ssubsectiontitled''Design,"forquantifyingdocumentreviewsdoesnotrequirefaultseedingand,accordingly,isnotsubjecttotheabovedifficulties.
Anotherkeyproblemintestingisdeterminingwhentherehasbeenenoughtesting.Forunittestingwheremuchofthetestingiswhiteboxandthemodulesaresmall,onecanattempttocheckwhetherallthepathshavebeencoveredbythetestcases,anideaextendedsubstantiallybyHorganandLondon(1992).However,forintegrationandsystemtesting,thisparticularapproach,coveragetesting,isnotpossiblebecauseofthesizeandthenumberofpossiblepathsthroughthesystem.Hereisanotheropportunityforusingstatisticalapproachestodevelopatheoryofstatisticalcoverage.Coveragetestingrelatestoderivingmethodsandalgorithmsfor
Page26
generatingtestcasessothatonecanstate,withaveryhighprobability,thatonehascheckedmostoftheimportantpathsofthesoftware.Thiskindofmethodologyhasbeenusedwithprobabilisticalgorithmsinprotocoltesting,wherethestructureoftheprogramcanbedescribedingreatdetail.(Aprotocolisaveryprecisedescriptionoftheinterfacebetweentwodiversesystems.)LeeandYanakakis(1992)haveproposedalgorithmswherebyoneisguaranteed,withahighdegreeofprobability,thatallthestatesoftheprotocolsarechecked.Thedifficultywiththisapproachisthatthenumberofstatesbecomeslargeveryquickly,andexceptforasmallpartofthesystemundertest,itisnotclearthatsuchatechniquewouldbepractical(undercurrentcomputingtechnology).Theseideashavebeenmathematicallyformalizedinthevibrantareaoftheoremcheckingandproving(Blumetal.,1990).Thekeyideaistotaketransformsofprogramssuchthattheresultsareinvariantunderthesetransformsifthesoftwareiscorrect.Thus,anyvariationintheresultssuggestspossiblefaultsinthesoftware.Blumetal.(1989)andLipton(1989),amongothers,havedevelopedanumberofalgorithmstogiveprobabilisticboundsonthecorrectnessofsoftwarebasedonthenumberofdifferenttransformations.
Inalloftheseveralapproachestotestingdiscussedabove,thenumberoftestcasescanbeextraordinarilylarge.Becauseofthecostoftestingandtheneedtosupplysoftwareinareasonableperiodoftime,itisnecessarytoformulaterulesaboutwhentostoptesting.Hereinliesanothersetofinterestingproblemsinsequentialanalysisandstatisticaldecisiontheory.AspointedoutbyDalalandMallows(1988,1990,1992),Singpurwalla(1991),andothers,thekeyissueistoexplicitlyincorporatetheeconomictrade-offbetweenthedecisiontostoptesting(andabsorbthecostoffixingsubsequentfieldfaults)andthedecisiontocontinuetesting(andincurongoingcoststofind
andfixfaultsbeforereleaseofasoftwareproduct).Sincethetestingprocessisnotdeterministic,thefault-findingprocessismodeledbyastochasticreliabilitymodel(seeChapter4forfurtherdiscussion).Theopportunemomentforreleaseisdecidedusingsequentialdecisiontheory.Therulesaresimpletoimplementandhavebeenusedinanumberofprojects.Thisframeworkhasbeenextendedtotheproblemofbuyingsoftwarewithsomesortofprobabilisticguaranteeonthenumberoffaultsremaining(DalalandMallows,1992).Anotherextensionwithpracticalimportance(DalalandMcIntosh,1994)dealswiththeissueofasystemundertestnothavingbeencompletelydeliveredatthestartoftesting.Thissituationisacommonoccurrenceforlargesystems,whereinordertomeetschedulingmilestones,testingbeginsimmediatelyonmodulesandsetsofmodulesastheyarecompleted.
Page27
4CritiqueofSomeCurrentApplicationsofStatisticsinSoftwareEngineering
COSTESTIMATION
Oneofsoftwareengineering'slong-standingproblemsistheconsiderableinaccuracyofthecost,resource,andscheduleestimatesdevelopedforprojects.Theseestimatesoftendifferfromthefinalcostsbyafactoroftwoormore.Suchinaccuracieshaveasevereimpactonprocessintegrityandultimatelyonfinalsoftwarequality.Fivefactorscontributetothiscontinuingproblem:
1.Mostcostestimateshavelittlestatisticalbasisandhavenotbeenvalidated;
2.Thevalueofhistoricaldataindevelopingpredictivemodelsislimited,sincenoconsistentsoftwaredevelopmentprocesshasbeenadoptedbyanorganization;
3.Thematurityofanorganization'sprocesschangesthegranularityofthedatathatcanbeusedeffectivelyinprojectcostestimation;
4.Thereliabilityofinputstocostestimationmodelsvarieswidely;and
5.Managersattempttomanagetotheestimates,reducingthevalidityofhistoricaldataasabasisforvalidation.
Certainoftheaboveissuescenterontheso-calledmaturityofanorganization(Humphrey,1988).Fromapurelystatisticalresearchperspective,(5)maybethemostinterestingarea,butthemajorchallengefacingthesoftwarecommunityisfindingtherightmetrics
tomeasureinthefirstplace.
Example.ThedataplottedinFigure3pertaintotheproductivityofaconventionalCOBOLdevelopmentenvironment(Kitchenham,1992).Foreachof46differentproducts,size(numberofentitiesandtransactions)andeffort(inperson-hours)weremeasured.FromFigure3,itisapparentthatdespitesubstantialvariability,astrong(log-log)linearrelationshipexistsbetweenprogramsizeandprogrameffort.
Asimplemodelrelatingefforttosizeis
log10(effort)= +ßlog10(size)+noise.
Page28
Figure3.DataontherelationshipbetweendevelopmenteffortandproductsizeinaCOBOLdevelopmentorganization.
AleastsquaresfittothesedatayieldsCoeff.SE t
Intercept 1.120 0.30243.702log10(size)1.049 0.12508.397RMS 0.194
Thesefittedcoefficientssuggestthatdevelopmenteffortisproportionaltoproductsize;aformaltestofthehypothesis,H:ß=1,givesatvalueatthe.65significancelevel.
Theestimatedinterceptafterfixingß=1is1.24;theresultingfitanda95%predictionintervalareoverlaidonthedatainFigure3.Thismodelpredictsthatitrequiresapproximately17hours(=101.24)toimplementeachunitofsize.
Suchmodelsareusedforpredictionandtoolvalidation.Consideranadditionalobservationmadeofaproductdevelopedusingafourth-
generationlanguageandrelationaldatabases.Undertheexperimentaldevelopmentprocess,ittook710hourstoimplementtheproductofsize183(thispointisdenotedbyXinFigure3).Thefittedmodelpredictsthatthisproductwouldhave
Page29
takenapproximately3,000hourstocompleteusingtheconventionaldevelopmentenvironment.The95%predictionintervalatX=183rangesfromapproximately1,000to9,000hours;thus,assumingthatotherfactorsarenotcontributingtotheapparentshortdevelopmentcycleofthisproduct,theuseofthosenewfourth-generationtoolshasdemonstrablydecreasedthedevelopmenteffort(andhencethecost).
StatisticalInadequaciesinEstimating
Mostcostestimationmethodsdevelopaninitialrelationshipbetweentheestimatedsizeofasystem(inlinesofcode,forinstance)andtheresourcesrequiredtodevelopit.Suchequationsareoftenoftheformillustratedintheaboveexample:effortisproportionaltosizeraisedtotheßpower.Thisinitialestimateisthenadjustedbyanumberoffactorsthatarethoughttoaffecttheproductivityofthespecificproject,suchastheexperienceoftheassignedstaff,theavailabletools,therequirementsforreliability,andthecomplexityoftheinteractionwiththecustomer.Thustheestimatingequationassumestheloglinearform:
effort» sizeßXaiajakalam...az,
wherethea'sarethecoefficientsfortheadjustmentfactors.Unfortunately,theseadjustmentfactorsarenottreatedasvariablesinaregressionequation;rather,eachhasasetoffixedcoefficients(termed"weightingfactors")associatedwitheachlevelofthevariable.Theseareindependentlyappliedasifthevariableswereuncorrelated(anassumptionknowntobeincorrect).Theseweightingschemeshavebeendevelopedbasedonintuitionabouteachvariable'spotentialimpactratherthanonastatisticalmodelfittingusinghistoricaldata.Thus,althoughtherelationshipbetweeneffortandsize
isoftenrecalibratedfordifferentorganizations,theweightingfactorsarenot.
Exacerbatingtheproblemswithexistingcostestimationmodelsisthelackofrigorousvalidationoftheequations.Forinstance,Boehm(1981)hasacknowledgedthathiswell-knownCOCOMOestimatingmodelwasnotdevelopedusingstatisticalmethods.Manyindividualsmarketingcostestimationmodelingtoolsdenigratethevalueofstatisticalapproachescomparedtocleverintuition.Totheextentthatanalyticalmethodsareusedinthedevelopmentorvalidationofthesemodels,theyareoftenperformedondatasetsthatcontainasmanypredictorvariables(productivityfactors)asprojects.Thusdeterminationoftheseparateorindividualcontributionsofthevariablesalmostcertainlydependstoomuchonchanceandcanbedistortedbycollinearrelationships.Thesemodelsarerarelysubjectedtoindependentvalidationstudies.Further,littleresearchhasbeendonethatattemptstorestrictthesemodelstoincludingonlythoseproductivityfactorsthatreallymatter(i.e.,subsetselection).
Becauseofthelackofstatisticalrigorinmostcostestimationmodels,softwaredevelopmentorganizationsusuallyhandcraftweightingschemestofittheirhistoricalresults.Thus,thespecificinstantiationofmostcostestimationmodelsdiffersacrossorganizations.Undertheseconditions,cross-validationoftheweightingschemesisverydifficult,ifnotimpossible.Anew
Page30
approachtodevelopingcostestimationmodelswouldbebeneficial,onethatinvokessoundstatisticalprinciplesinfittingsuchequationstohistoricaldataandtovalidatingtheirapplicabilityacrossorganizations.Iftheinstantiationofsuchmodelsisfoundtobedomain-specific,statisticallyvalidmethodsshouldbesoughtforregeneratingaccuratemodelsindifferentdomains.
ProcessVolatility
Inimmaturesoftwaredevelopmentorganizations,theprocessesuseddifferacrossprojectsbecausetheyarebasedontheexperiencesandpreferencesoftheindividualsassignedtoeachproject,ratherthanoncommonorganizationalpractice.Thus,insuchorganizationscostestimationmodelsmustattempttopredicttheresultsofaprocessthatvarieswidelyacrossprojects.Inpoorlyrunprojectsthesignal-to-noiseratioislow,inthatthereislittleconsistentpracticethatcanbeusedasthebasisfordependableprediction.Insuchprojects,neitherthesizenortheproductivityfactorsprovideanyconsistentinsightintotheresourcesrequired,sincetheyarenotsystematicallyrelatedtotheprocessesthatwillbeused.
Thehistoricaldatacollectedfromprojectsinimmaturesoftwaredevelopmentorganizationsaredifficulttointerpretbecausetheyreflectwidelydivergentpractices.Suchdatasetsdonotprovideanadequatebasisforvalidation,sinceprocessvariationcanmaskunderlyingrelationships.Infact,becausetherelationshipsamongindependentvariablesmaychangewithvariationsintheprocess,differentprojectsmayrequiredifferentvaluesoftheparametersinthecostestimationmodels.Asorganizationsmatureandstabilizetheirprocesses,theaccuracyoftheestimatingmodelstheyuseusuallyincreases.
MaturityandDataGranularity
Inmatureorganizationsthesoftwaredevelopmentprocessiswelldefinedandisappliedconsistentlyacrossprojects.Themorecarefullydefinedtheprocess,thefinerthegranularityoftheprocessesthatcanbemeasured.Thus,assoftwareorganizationsmature,theentirebasisfortheircostestimationmodelscanchange.Immatureorganizationshavedataonlyatthelevelofoverallprojectsize,numberofperson-yearsrequired,andoverallcost.Withincreasingorganizationalmaturity,itbecomespossibletoobtaindataonprocessdetailssuchashowmanyreviewsmustbeconductedateachlifecyclestagebasedonthesizeofthesystem,howmanytestcasesmustberun,andhowmanydefectsmustbefixedbasedonthedefectremovalefficiencyofeachstageoftheverificationprocess.Thus,estimationinfullydevelopedorganizationscanbebasedonabottom-upanalysisinwhichthehistoricaldatacanbemoreaccuratebecausetheobjectsofestimation,andtheefforttheyrequire,aremoreeasilycharacterized.
Asorganizationsmature,thestructureofrelevantcostestimationmodelscanchange.Whenprocessmodelsarenotdefinedindetail,modelsmusttaketheformofregressionequationsbasedonvariablesthatdescribethetotalimpactofapredictorvariableonaproject's
Page31
developmentcycle.Thereislittlenotioninthesemodelsofthedetailedpracticesthatmakeupthetotality.Inmatureorganizationssuchpracticesaredefinedandcanbeanalyzedindividuallyandbuiltupintoatotalestimate.Normallytheerrorsinestimatingthesesmallercomponentsaresmallerthanthecorrespondingerroratthetotalprojectlevel,anditisassumedthatthesummaryeffectofaggregatingthesesmallererrorsisstillsmallerthantheerrorintheestimateatthetotalprojectlevel.
ReliabilityofModelInputs
Evenifacostestimationmodelisstatisticallysound,thedataonwhichitisbasedcanhavelowvalidity.Often,managersdonothavesufficientknowledgeofcrucialvariablesthatmustbeenteredintoamodel,suchastheestimatedsizeofvariousindividualcomponentsofasystem.Insuchinstances,processesexistforincreasingtheaccuracyofthesedata.Forinstance,Delphitechniquescanbeusedbysoftwareengineerswhohavepreviousexperienceindevelopingvarioussystemcomponents.Thelessexperienceanorganizationhaswithaparticularcomponentofasystem,thelessreliableisthesizeestimateforthatcomponent.Typically,componentsizesareunderestimated,withruinouseffectsontheresourcesandscheduleestimatedforaproject.Sometimeshistorical"fudgefactors"areappliedtoaccountforunderestimation,althoughamorerigorousdata-basedapproachisrecommended.Toaidinidentifyingthepotentialrisksinasoftwaredevelopmentproject,itwouldalsobebeneficialtohavereliableconfidenceboundsfordifferentcomponentsoftheestimatedsizeoreffort.
Statisticalmethodscanbeappliedtodeveloppriorprobabilities(e.g.,forBayesianestimationmodels)fromknowledgeablesoftwareengineersandtoadjusttheseusinghistoricaldata.Thesemethods
shouldbeusednotonlytosuggesttheconfidencethatcanbeplacedinanestimate,butalsotoindicatethecomponentswithinasystemthatcontributemosttoinaccuraciesinanestimate.
Asprojectsprogressduringtheirlifecyclefromspecificationsofrequirementstodesigntogenerationofcode,theinformationonwhichestimatescanbebasedgrowsmorereliable:thereisthusgreatercertaintyinestimatingfromthearchitecturaldesignofasystemorthedetaileddesignofeachmodulethaninestimatingfromtextualstatements.Inshort,thesourcesfromwhichestimatescanbedevelopedchangeastheprojectcontinuesthroughitsdevelopmentcycle.Eachsucceedinglevelofinputisamorereliableindicatoroftheultimatesystemsizethanaretheinputsavailableinearlierstagesofdevelopment.Thustheoverallestimateofsize,resources,andschedulepotentiallybecomesmoreaccurateinsucceedingphasesofaproject.Yetitisimportanttodeterminethemostaccurateindicatorsofcrucialparameterssuchassize,effort,andscheduleveryearlyinaproject,whentheleastreliabledataareavailable.Assuch,thereisaneedforstatisticallyvalidwaysofdevelopingmodelinputsfromlessreliableformsofdata(theseinputsmustreliablyestimatelatermeasuresthatwillbemorevalidinputs)andofestimatinghowmucherrorisintroducedintoanestimatebasedonthereliabilityoftheinputs.
Page32
ManagingtoEstimates
Complicatingtheabilitytovalidatecostestimationmodelsfromhistoricaldataisthefactthatprojectmanagerstrytomanagetheirprojectstomeetreceivedestimatesforcost,effort,schedule,andothersuchvariables.Thus,anestimateaffectsthesubsequentprocess,andhistoricaldataaremadeartificiallymoreaccuratebymanagementdecisionsandotherfactorsthatareoftenmaskedinprojectdata.Forinstance,projectswhoserequiredlevelofefforthasbeenunderestimatedoftensurviveonlargeamountsofunreportedovertimeputinbythedevelopmentstaff.Moreover,manymanagersarequiteskilledatcuttingfunctionalityfromasysteminordertomeetadeliverydate.Intheworstcases,engineersshort-cuttheirordinaryengineeringprocessestomeetanunrealisticschedule,usuallywithdisastrousresults.Techniquesformodelingsystemsdynamicsprovideonewaytocharacterizesomeoftheinteractionsthatoccurbetweenanestimateandthesubsequentprocessthatisgeneratedbytheestimate(Abdel-Hamid,1991).
Thevalidationofcostestimationmodelsmustbeconductedwithanunderstandingofsuchinteractionsbetweenestimatesandaprojectmanager'sdecisions.Someofthesedynamicsmaybeusefullydescribedbystatisticalmodelsorbytechniquesdevelopedinpsychologicaldecisiontheory(Kahnemanetal.,1982).Thus,itmaybepossibletodevelopastatisticaldynamicmodel(e.g.,amultistagelinearmodel)thatcharacterizesthereliabilityofinputstoanestimate,theestimateitself,decisionsmadebasedontheestimate,theresultingperformanceoftheproject,measuresthatemergelaterintheproject,subsequentdecisionmakingbasedontheselatermeasures,andtheultimateperformanceoftheproject.Suchmodelswouldbevaluableinhelpingprojectmanagerstounderstandtheramificationsofdecisionsbasedonaninitialestimateandalsoonsubsequentperiodic
updates.
ASSESSMENTANDRELIABILITY
ReliabilityGrowthModeling
Manyreliabilitymodelsofvaryingdegreesofplausibilityareavailabletosoftwareengineers.Thesemodelsareappliedateitherthetestingstageorthefield-monitoringstage.Mostofthemodelstakeasinputeitherfailuretimeorfailurecountdataandfitastochasticprocessmodeltoreflectreliabilitygrowth.Thedifferencesamongthemodelslieprincipallyinassumptionsmadebasedontheunderlyingstochasticprocessgeneratingthedata.Abriefsurveyofsomeofthewell-knownmodelsandtheirassumptionsandefficacyisgiveninAbdel-Ghalyetal.(1986).
Althoughmanysoftwarereliabilitygrowthmodelsaredescribedintheliterature,theevidencesuggeststhattheycannotbetrustedtogiveaccuratepredictionsinallcasesandalsothatitisnotpossibletoidentifyaprioriwhichmodel(ifany)willbetrustworthyinaparticular
Page33
context.Nodoubtworkwillcontinueinrefiningthesemodelsandintroducing"improved"ones.Althoughsuchworkisofsomeinterest,thepaneldoesnotbelievethatitmeritsextensiveresearchbythestatisticalcommunity,butthinksratherthatstatisticalresearchcouldbedirectedmorefruitfullytoprovidinginsighttotheusersofthemodelsthatcurrentlyexist.
Theproblemisvalidationofsuchmodelswithrespecttoaparticulardatasource,toallowuserstodecidewhich,ifany,predictionschemeisproducingaccurateresultsfortheactualsoftwarefailureprocessunderexamination.Someworkhasbeendoneonthisproblem(Abdel-Ghalyetal.,1986;BrocklehurstandLittlewood,1992),usingacombinationofprobabilityforecastingandsequentialprediction,theso-calledprequentialapproachdevelopedbyDawid(1984),butthisworkhassofarbeenratherinformal.Itwouldbehelpfultohavemoreproceduresforassessingtheaccuracyofcompetingpredictionsystemsthatcouldthenbeusedroutinelybyindustrialsoftwareengineerswithoutadvancedstatisticaltraining.
Statisticalinferenceintheareaofreliabilitytendsalmostinvariablytobeofaclassicalfrequentistkind,eventhoughmanyofthemodelsoriginatefromasubjectiveBayesianprobabilityviewpoint.ThisunsatisfactorystateofaffairsarisesfromthesheerdifficultyofperformingthecomputationsnecessaryforaproperBayesiananalysis.Itseemslikelythattherewouldbeprofitintryingtoovercometheseproblems,perhapsviatheGibbssamplingapproach(see,e.g.,SmithandRoberts,1993).
Anotherfruitfulavenueforresearchconcernstheintroductionofexplanatoryvariables,so-calledcovariates,intosoftwarereliabilitygrowthmodels.Mostexistingmodelsassumethatnoexplanatoryvariablesareavailable.Thisassumptionisassuredlysimplistic
concerningtestingforallbutsmallsystemsinvolvingshortdevelopmentandlifecycles.Forlargesystems(i.e.,thosewithmorethan100,000linesofcode)therearevariables,otherthantime,thatareveryrelevant.Forexample,itistypicallyassumedthatthenumberoffaults(foundandunfound)inasystemundertestremainsstablei.e.,thatthecoderemainsfrozenduringtesting.However,thisisrarelythecaseforlargesystems,sinceaggressivedeliverycyclesforcethefinalphasesofdevelopmenttooverlapwiththeinitialstagesofsystemtesting.Thus,thesizeofcodeand,consequently,thenumberoffaultsinalargesystemcanvarywidelyduringtesting.Ifthesechangesincodesizearenotconsidered,theresult,atbest,islikelytobeanincreaseinvariabilityandalossinpredictiveperformance,andatworst,apoorlyfittingmodelwithunstableparameterestimates.Takingthislogiconestepfurthersuggeststheneedtodistinguishbetweennewlinesofcode(newfaults)andcodecomingfrompreviousreleases(oldfaults),andpossiblytheageofdifferentpartsofcode.Ofcourse,onecancarrythislogictoanextremeandhaveunwieldymodelswithmanycovariates.Inpractice,whatisrequiredisacompromisebetweenthetwoextremesofhavingnocovariatesandhavinghundredsofthem.Thisiswhereopportunitiesaboundforapplyingstate-of-the-artstatisticalmodelingtechniques.DescribedbrieflybelowisacasestudyreportedbyDalalandMcIntosh(1994)dealingwithreliabilitymodelingwhencodeischanging.
Page34
Example.Consideranewreleaseofalargetelecommunicationssystemwithapproximately7millionnoncommentarysourcelines(NCSLs)and400,000linesofnoncommentaryneworchangedsourcelines(NCNCSLs).Forafasterdeliverycycle,thesourcecodeusedforsystemtestwasupdatedeverynightthroughoutthetestperiod.Attheendofeachof198calendardaysinthetestcycle,thenumberoffaultsfound,NCNCSLs,andthestafftimespentontestingwerecollected.Figure4(top)portraysgrowthofthesystemasafunctionofstafftime.ThedataareprovidedinTable3.
Figure4.Plotsofmodulesize(NCNCSLs)versusstafftime(days)foralargetelecommunicationssoftware
system(top).Observedandfittedcumulativefaultsversusstafftime(bottom).Thedottedline(barelyvisible)
representsthefittedmodel,thesolidlinerepresentstheobserveddata,andthedashedline(alsodifficulttosee)
istheextrapolationofthefittedmodel.
Page35
Table3.Dataoncumulativesize(NCNCSLs),cumulativestafftime(days),andcumulativefaultsforalargetelecommunicationssystemon198consecutivecalendardays(withduplicatelinesrepresentingweekendsorholidays).
Cum.StaffDays
Cum.Faults
Cum.NCNCSLs
Cum.StaffDays
Cum.Faults
Cum.NCNCSLs
Cum.StaffDays
Cum.Faults
Cum.NCNCSLs
0 0 0 334.8231 261669 776.5 612 318476
4.8 0 16012 342.7243 262889 793.5 621 320125
6 0 16012 350.5252 263629 807.2 636 321774
6 0 16012 356.3259 264367 811.8 639 321774
14.3 7 32027 360.6271 265107 812.5 639 321774
22.8 7 48042 365.7277 265845 829 648 323423
32.1 7 58854 365.7277 265845 844.4 658 325072
41.4 7 69669 365.7277 265845 860.5 666 326179
51.2 11 80483 374.9282 266585 876.7 674 327286
51.2 11 80483 386.5290 267325 892 679 328393
51.2 11 80483 396.5300 268607 895.5 686 328393
60.6 12 91295 408 310 269891 895.5 686 328393
70 13 102110 417.3312 271175 910.8 690 329500
79.9 15 112925 417.3312 271175 925.1 701 330608
91.3 20 120367 417.3312 271175 938.3 710 330435
97 21 127812 424.9321 272457 952 720 330263
97 21 127812 434.2326 273741 965 729 330091
97 21 127812 442.7339 275025 967.7 729 330091
97 21 127812 451.4346 276556 968.6 731 330091
107.722 135257 456.1347 278087 981.3 740 329919
119.128 142702 456.1347 278087 997 749 329747
127.640 150147 456.1347 278087 1013.9759 330036
135.144 152806 460.8351 279618 1030.1776 330326
135.144 152806 466 356 281149 1044 781 330616
135.144 152806 472.3359 283592 1047 782 330616
142.846 155464 476.4362 286036 1047 782 330616
148.948 158123 480.9367 288480 1059.7783 330906
156.652 160781 480.9367 288480 1072.6787 331196
163.952 167704 480.9367 288480 1085.7793 331486
169.759 174626 486.8374 290923 1098.4796 331577
170.159 174626 495.8376 293367 1112.4797 331669
170.659 174626 505.7380 295811 1113.5798 331669
174.763 181548 516 392 298254 1114.1798 331669
179.668 188473 526.2399 300698 1128 802 331760
185.571 194626 527.3401 300698 1139.1805 331852
194 88 200782 527.3401 300698 1151.4811 331944
200.393 206937 535.8405 303142 1163.2823 332167
200.393 206937 546.3415 304063 1174.3827 332391
200.393 206937 556.1425 305009 1174.3827 332391
207.297 213093 568.1440 305956 1174.3827 332391
211.998 219248 577.2457 306902 1184.6832 332615
217 105 221355 578.3457 306902 1198.3834 332839
223.5113 223462 578.3457 306902 1210.3836 333053
227 113 225568 587.2467 307849 1221.1839 333267
227 113 225568 595.5473 308795 1230.5842 333481
227 113 225568 605.6480 309742 1231.6842 333481
234.1122 227675 613.9491 310688 1231.6842 333481
241.6129 229784 621.6496 311635 1240.9844 333695
250.7141 233557 621.6496 311635 1249.5845 333909
259.8155 237330 621.6496 311635 1262.2849 335920
268.3166 241103 623.4496 311635 1271.3851 337932
268.3166 241103 636.3502 311750 1279.8854 339943
268.3166 241103 649.7517 311866 1281 854 339943
277.2178 244879 663.9527 312467 1281 854 339943
285.5186 247946 675.1540 313069 1287.4855 341955
294.2190 251016 677.4543 313069 1295.1859 341967
295.7190 251016 677.9544 313069 1304.8860 341979
298 190 254086 688.4553 313671 1305.8865 342073
298 190 254086 698.1561 314273 1313.3867 342168
298 190 254086 710.5573 314783 1314.4867 342168
305.2195 257155 720.9581 315294 1314.4867 342168
312.3201 260225 731.6584 315805 1320 867 342262
318.2209 260705 732.7585 315805 1325.3867 342357
328.9224 261188 733.6585 315805 1330.6870 342357
334.8231 261669 746.7586 316316 1334.2870 342358
334.8231 261669 761 598 316827 1336.7870 342358
SOURCE:DalalandMcIntosh(1994).
Page36
Assumethatthetestingprocessisobservedattimeti,i=0,...,h,,andatanygiventime,theamountoftimeittakestofindaspecific''bug"isexponentialwithratem.Attime,thetotalnumberoffaultsremaininginthesystemisPoissonwithmeanli+1,andNCNCSLisincreasedbyanamount.ThischangeaddsaPoissonnumberoffaultswithmeanproportionaltoC,sayqCi.Theseassumptionsleadtothemassbalanceequation,namely,thattheexpectednumberoffaultsinthesystematti(afterpossiblemodification)istheexpectednumberoffaultsinthesystematti-1adjustedbytheexpectednumberfoundintheinterval(ti-1,ti)plusthefaultsintroducedbythechangesmadeatti:
li+1=lie-m(ti-ti-1)+qCi,
fori=1,...h.NotethatrepresentsthenumberofnewfaultsenteringthesystemperadditionalNCNCSL,andrepresentsthenumberoffaultsinthecodeatthestartofsystemtest.Bothoftheseparametersmakeitpossibletodifferentiatebetweenthenewcodeaddedinthecurrentreleaseandtheoldercode.Forthedataathand,theestimatedparametersareq=0.025,m=0.002,andl1=41.ThefittedandtheobserveddataareplottedagainststafftimeinFigure4(bottom).Thefitisevidentlyverygood.Ofcourseassessingthemodelonindependentornewdataisrequiredforpropervalidation.
Theefficacyofcreatingastatisticalmodelisnowexamined.Theestimateofqishighlysignificant,bothstatisticallyandpractically,showingtheneedforincorporatingchangesinNCNCSLsasacovariate.Itsnumericalvalueimpliesthatforeveryadditional10,000NCNCSLsaddedtothesystem,25faultsarebeingaddedaswell.Forthesedata,thepredictednumberoffaultsattheendofthetestperiodisPoissondistributedwithmean145.DividingthisquantitybythetotalNCNCSLsgives4.2per10,000NCNCSLsasanestimatedfield
faultdensity.Theseestimatesoftheincomingandoutgoingqualityareveryvaluableinjudgingtheefficacyofsystemtestingandfordecidingwhereresourcesshouldbeallocatedtoimprovethequality.Here,forexample,systemtestingwaseffectiveinthatitremoved21ofevery25faults.However,itraisesanotherissue:25faultsper10,000NCNCSLsenteringsystemtestmaybetoohighandaplanoughttobeconsideredtoimprovetheincomingquality.
Noneoftheaboveconclusionscouldhavebeenmadewithoutusingastatisticalmodel.Theseconclusionsarevaluableforcontrollingandimprovingthereliabilitytestingprocess.Further,forthisanalysisitwasessentialtohaveacovariateotherthantime.
InfluenceoftheDevelopmentProcessonSoftwareDependability
Asnotedabove,surprisinglylittleusehasbeenmadeofexplanatoryvariablemodels,suchasproportionalhazardsregression,inthemodelingofsoftwaredependability.Amajorreason,thepanelbelieves,isthedifficultythatsoftwareengineershaveinidentifyingvariablesthatcan
Page37
playagenuinelyexplanatoryrole.Anotherdifficultyisthecomparativepaucityofdataowingtothedifficultiesofreplication.Thus,forexample,forpurposesofidentifyingthoseattributesofthesoftwaredevelopmentprocessthataredriversofthefinalproduct'sdependability,itisverydifficulttoobtainsomethingakintoa"randomsample"of"similar"subjectprograms.Thoseissuesarenotunliketheonesfacedinothercontextswherethesetechniquesareused,forexample,inmedicaltrials,buttheyseemparticularlyacuteforevaluationofsoftwaredependability.
Afurtherproblemisthattheobservableinthissoftwaredevelopmentapplicationisarealizationofastochasticprocess,andnotmerelyofalifetimerandomvariable.Thusthereseemstobeanopportunityforresearchintomodelsthat,ontheonehand,capturecurrentunderstandingofthenatureofthegrowthinreliabilitythattakesplaceasaresultofdebuggingand,ontheotherhand,allowinputaboutthenatureofthedevelopmentprocessorthearchitectureoftheproduct.
InfluenceoftheOperationalEnvironmentonSoftwareDependability
Itcanbemisleadingtotalkofthereliabilityofaprogram:asisthecaseforthereliabilityofhardware,thereliabilityofaprogramdependsonthenatureofitsuse.Forsoftware,however,onedoesnothavethesimplenotionsofstressthataresometimesplausibleinthehardwarecontext.Itisthusnotpossibletoinferthereliabilityofaprograminoneenvironmentfromevidenceoftheprogram'sfailurebehaviorinanother.Thisisaseriousdifficultyforseveralreasons.
First,onewouldliketobeabletopredicttheoperationalreliabilityofaprogramfromtestdata.Thesimplestapproachatpresentistoensurethatthetestenvironment,thatis,thetypeofusage,isexactlysimilarto,ordiffersinknownproportionsforspecifiedstratafrom,theoperationalenvironment.Realsoftwaretestingregimesareoften
deliberatelymadetobedifferentfromoperationalones,sinceitisclaimedthatinthiswayreliabilitycanbeachievedmoreefficiently:thisargumentissimilartothatforhardwarestresstestingbutismuchlessconvincinginthesoftwarecontext.
Afurtherreasontobeinterestedinthisproblemofinferringprogramreliabilityisthatmostsoftwaregetsbroadlydistributedtodiverselocationsandisusedverydifferentlybydifferentusers:thereisgreatdisparityinthepopulationofuserenvironments.Vendorswouldliketobeabletopredictdifferentusers'perceptionsofaproduct'sreliability,butitisclearlyimpracticaltoreplicateinatesteverydifferentpossibleoperationalenvironment.Vendorswouldalsoliketobeabletopredictthecharacteristicsofapopulationofusers.Thusitmightbeexpectedthatalessdisparatepopulationofuserswouldbepreferabletoamoredisparateone:intheformercase,forexample,problemsreportedatdifferentsitesmightbesimilarandthusbelessexpensivetofix.
Explanatoryvariablemodelingmayplayausefulroleifsuitablyinformative,measurableattributesofoperationalusagecanbeidentified.Theremaybeotherwaysofformingstochasticcharacterizationsofoperationalenvironments.Markovmodelsofthesuccessiveactivationofmodules,oroffunctions,havebeenproposed(Littlewood,1979;Siegrist,1988a,b)buthavenot
Page38
beenwidelyused.Furtherworkonsuchapproaches,andontheproblemsofstatisticalinferenceassociatedwiththem,couldbepromising.
Safety-CriticalSoftwareandtheProblemofAssuringUltrahighDependability
Itseemsclearthatcomputerswillplayincreasinglycriticalrolesinsystemsuponwhichhumanlivesdepend.Already,systemsarebeingbuiltthatrequireextremelyhighdependabilityafigureof10-9probabilityoffailureperhourofflighthasbeenstatedastherequirementforrecentfly-by-wiresystemsincivilaircraft.Thereareclearlimitationstothelevelsofdependabilitythatcanbeachievedwhenwearebuildingsystemsofacomplexitythatprecludesclaimsthattheyarefreeofdesignfaults.Moreimportantly,evenifwewereabletobuildasystemtomeetarequirementforultrahighdependability,wecouldhaveonlylowconfidencethatwehadachievedthatgoal,becausetheproblemofassessingtheselevelsissuchthatitwouldbeimpracticaltoacquiresufficientsupportingevidence(LittlewoodandStrigini,1993).
Althoughacompletesolutiontotheproblemofassessingultrahighdependabilityisnotanticipated,thereiscertainlyroomforimprovingonwhatcanbedonecurrently.Probabilisticandstatisticalproblemsaboundinthisarea,anditisnecessarytosqueezeasmuchaspossiblefromrelativelysmallamountsofoftendisparateevidence.Thefollowingaresomeoftheareasthatcouldbenefitfrominvestigation.
DesignDiversity,FaultTolerance,andGeneralIssuesofDependence
Onepromisingapproachtotheproblemofachievinghighdependability(herereliabilityand/orsafety)isdesigndiversity:buildingtwoormoreversionsoftherequiredprogramandallowing
anadjudicationmechanism(e.g.,avoter)tooperateatrun-time.Althoughsuchsystemshavebeenbuiltandareinoperationinsafety-criticalcontexts,thereislittletheoreticalunderstandingoftheirbehaviorinoperation.Inparticular,thereliabilityandsafetymodelsarequitepoor.
Forexample,thereisampleevidence(KnightandLeveson,1986)that,inthepresenceofdesignfaults,onecannotsimplyassumethatdifferentversionswillfailindependentlyofoneanother.Thusthesimplehardwarereliabilitymodelsthatinvolvemereredundancy,andassumeindependenceofcomponentfailures,cannotbeused.Itisonlyquiterecentlythatprobabilitymodelinghasstartedtoaddressthisproblemseriously(EckhardtandLee,1985;LittlewoodandMiller,1989).Thesemodelsprovideaformalconceptualframeworkwithinwhichitispossibletoreasonaboutthesubtleissuesofconditionalindependenceinvolvedinthefailureprocessesofdesign-diversesystems.However,theyprovidelittlequantitativepracticalassistancetoasoftwaredesignerorevaluator.
Furtherprobabilisticmodelingisneededtoelucidatesomeofthecomplexissues.Forexample,littleattentionhasbeenpaidtomodelingthefullfaulttolerantsystem,involvingdiversityandadjudication.Inparticular,thepropertiesofthestochasticprocessoffailuresof
Page39
suchsystemsarenotunderstood.If,asseemslikely,individualversionsofaprograminareal-timecontrolsystemexhibitclustersoffailuresintime,howdoestheclusterprocessofthesystemrelatetotheclusterprocessesoftheindividualversions?Althoughsuchissuesseemnarrowlytechnical,theyarevitallyimportantinthedesignofrealsystems,whosephysicalintegritymaybesufficienttosurviveoneortwofailedinputcycles,butnotmany.
Anotherareathathashadlittleworkisprobabilisticmodelingofdifferentpossibleadjudicationmechanismsandtheirfailureprocesses.
JudgmentandDecision-makingFramework
Althoughprobabilityseemstobethemostappropriatemechanismforrepresentinguncertaintyaboutsystemdependability,othercandidatessuchasShafer-Dempsterandpossibilitytheoriesmightbeplausiblealternativesinsafety-criticalcontextswherequantitativemeasuresarerequiredintheabsenceofdataforexample,whenoneisforcedtorelyontheengineeringjudgmentofanexpert.Furtherworkisneededtoelucidatetherelativeadvantagesanddisadvantagesofthedifferentapproachesapplicableinthesoftwareengineeringdomain.
Thereisevidencethathumanjudgment,evenin"hard"sciencessuchasphysics,canbeseriouslyinerror(HenrionandFischhoff,1986):peopleseemtomakeconsistenterrorsandtendtobeoptimisticintheirownjudgmentregardingtheirlikelyerror.Itislikelythatsoftwareengineeringjudgmentsaresimilarlyfallible,andsothisareacallsforsomestatisticalexperimentation.Inaddition,itwouldbebeneficialtohaveformalmechanismsforassessingwhetherjudgmentsarewellcalibratedandforrecalibratingjudgmentandpredictionschemes(ofhumansormodels)thathavebeenshowntobeinaccurate.Thisproblemhassomesimilaritytotheproblemsof
validatingsoftwarereliabilitymodels,alreadymentioned,inwhichprequentiallikelihoodplaysavitalrole.ItalsobearsonmoregeneralapplicationsofBayesianmodelingwhereelicitationofaprioriprobabilityvaluesisrequired.
Itseemsinevitablethatreasoningandjudgmentaboutthefitnessofsafety-criticalsystemswilldependonevidencethatisdisparateinnature.Suchevidencecouldincludedataonfailures,asinreliabilitygrowthmodels;humanexpertjudgment;resultsregardingtheefficacyofdevelopmentprocesses;informationaboutthearchitectureofasystem;orevidencefromformalverification.Iftherequiredjudgmentdependsonanumericalassessmentofasystem'sdependability,thereareclearlyimportantissuesconcerningthecompositionofverydifferentkindsofevidencefromdifferentsources.Theseissuesmay,indeed,beoverridingwhenitcomestochoosingamongthedifferentwaysofrepresentinguncertainty.TheBayestheorem,forexample,mayprovideaneasierwaythandoespossibilitytheorytocombineinformationfromdifferentsourcesofuncertainty.
Aparticularlyimportantproblemconcernsthewayinwhichdeterministicreasoningcanbeincorporatedintothefinalassessmentofasystem.Formalmethodsofachievingdependabilityarebecomingincreasinglyimportant.Suchmethodsrangefromformalnotations,whichassistintheelicitationandexpressionofrequirements,tofullmathematicalverificationofthecorrespondencebetweenaformalspecificationandanimplementation.Oneviewisthattheseapproachesincorporatingdeterministicreasoningtosystemdevelopmentremoveaparticular
Page40
typeofuncertainty,leavingothersuntouched(uncertaintyaboutthecompletenessofaformalspecification,thepossibilityofincorrectproof,andsoon).Oneshouldfactorintothefinalassessmentofasystem'sdependabilitythecontributionfromsuchdeterministic,logicalevidence,neverthelesskeepinginmindthatthereisanirreducibleuncertaintyinone'spossibleknowledgeofthefailurebehaviorofasystem.
StructuralModelingIssues
Concernsaboutthesafetyandreliabilityofsoftware-basedsystemsnecessarilyarisefromtheirinherentcomplexityandnovelty.Systemsnowbeingbuiltaresocomplexthattheycannotbeguaranteedtobefreefromdesignfaults.Theextenttowhichconfidencecanbecarriedoverfromthebuildingofprevioussystemsismuchmorelimitedinsoftwareengineeringthanin"real"engineering,becausesoftware-basedsystemstendtobecharacterizedbyagreatdealofnovelty.
Designersneedhelpinmakingdecisionsthroughoutthedesignprocess,especiallyattheveryhighestlevel.Realsystemsareoftendifficulttoassessbecauseofearlydecisionsregardinghowmuchsystemcontrolwilldependoncomputers,hardware,andhumans.FortheAirbusA320,forexample,theearlydecisiontoplaceahighleveloftrustinthecomputerizedfly-by-wiresystemmeantthatthissystem(andthusitssoftware)neededtohaveabetterthanprobabilityoffailureinatypicalflight.Stochasticmodelingmightaidinsuchhigh-leveldesigndecisionssothatdesignerscanmake"whatif"calculationsatanearlystage.
Experimentation,DataCollection,andGeneralStatisticalTechniques
Adearthofdatahasbeenaprobleminmuchofsafety-criticalsoftwareengineeringsinceitsinception.Onlyahandfulofpublished
datasetsexistsevenforthesoftwarereliabilitygrowthproblem,whichisbyfarthemostextensivelydevelopedaspectofsoftwaredependabilityassessment.Whenthelackofdataarisesfromtheneedforconfidentialityindustrialcompaniesareoftenreluctanttoallowaccesstodataonsoftwarefailuresbecauseofthepossibilitythatpeoplemaythinklesshighlyoftheirproductslittlecanbedonebeyondmakingeffortstoresolveconfidentialityproblems.However,insomecasestheavailabledataaresparsebecausethereisnostatisticalexpertiseonhandtoadviseonwaysinwhichdatacanbecollectedcost-effectively.Itmaybeworthwhiletoattempttoproducegeneralguidelinesfordatacollectionthataddressthespecificdifficultiesofthesoftwareengineeringproblemdomain.
Withnotableexceptions(Eckhardtetal.,1991;KnightandLeveson,1986),experimentationhassofarplayedalow-keyroleinsoftwareengineeringresearch.Somewhatsurprisingly,inviewofitsdifficultyandcost,themostextensiveexperimentationhasinvestigatedtheefficacyofdesigndiversity.Otherareaswhereexperimentalapproachesseemfeasibleandshouldbeencouragedincludetheobviousandgeneralquestionofwhichsoftwaredevelopmentmethodsaremostcost-effectiveinproducingsoftwareproductswithdesirableattributessuchasdependability.Statisticaladviceonthedesignofsuchexperimentswouldbeessential;itmight
Page41
alsobethecasethatinnovationinthedesignofexperimentscouldmakefeasiblesomeinvestigationsthatcurrentlyseemtooexpensivetocontemplate:themainproblemarisesfromtheneedforreplicationovermanysoftwareproducts.
Ontheotherhand,areaswhereexperimentscanbeconductedwithoutthereplicationproblembeingoverwhelminginvolvetheinvestigationofquiterestrictedhypothesesabouttheeffectivenessofspecifictechniques.Forexample,experimentationcouldaddresswhetherthetechniquesthatareclaimedtobeeffectiveforachievingreliability(i.e.,effectivenessofdebugging)aresignificantlybetterthanthose,suchasoperationaltesting,thatwillallowreliabilitytobemeasured.
SOFTWAREMEASUREMENTANDMETRICS
Measurementisatthefoundationofscienceandengineering.Animportantgoalsharedbysoftwareengineersandstatisticiansistoderivereliable,reproducible,andaccuratemeasuresofsoftwareproductsandprocesses.Measurementsareimportantforassessingtheeffectsofproposed"improvements"insoftwareproduction,whethertheybetechnologicalorprocessoriented.Measurementsserveanequallyimportantroleinscheduling,planning,resourceallocation,andcostestimation(seethefirstsectioninthischapter).
EarlypioneeringworkbyMcCabe(1976)andHalstead(1977)seededthefieldofsoftwaremetrics;anoverviewisprovidedbyZuse(1991).Muchoftheattentioninthisareahasfocusedonstaticmeasurementsofcode.Lessattentionhasbeenpaidtodynamicmeasurementsofsoftware(e.g.,measuringtheconnectivityofsoftwaremodulesunderoperatingconditions)andaspectsofthesoftwareproductionprocesssuchassoftwarereuse,especiallyinsystemsemployingobject-orientedlanguages.
Themostwidelyusedcodemetric,theNCSL(noncommentarysourceline),isoftenusedasasurrogateforfunctionality.Surprisingly,sincesoftwareisnownearly50yearsold,standardsforcountingNCSLsremainelusiveinpractice.Forexample,shouldasingle,two-linestatementinClanguagecountasoneNCSLortwo?
Countsoftokens(operatorsoroperands),delimiters,andbranchingstatementsareusedasotherstaticmetrics.Althoughsomeoftheseareclearlymeasuresofsoftwaresize,otherspurporttomeasuremoresubtlenotionsofsoftwarecomplexityandstructure.Ithasbeenobservedthatallsuchmetricsarehighlycorrelatedwithsize.Atthepanel'sinformation-gatheringforum,Munson(1993)concludedthatcurrentsoftwaremetricscaptureapproximatelythree"independent"featuresofasoftwaremodule:programcontrol,programsize,anddatastructure.Astatistical(principal-components)analysisof13metricsonHALprogramsinthespaceshuttleprogramwasthekeytothisfinding.Whileonemightarguethatperformingacommonstatisticaldecompositionofmultivariatedataishardlynovel,itmostcertainlyisinsoftwareengineering.Theimportantimplicationofthatfindingisthattherearefeaturesofsoftwarethatarenotbeingcapturedbytheexistingbatteryofsoftwaremetrics(e.g.,cohesionandcoupling)andifthesearekeydifferentiatorsofpotentiallyhigh-andlow-faultprograms,thereisnowaythatananalysisoftheavailablemetricswillhighlightthiscondition.Ontheothersideoftheledger,thestatisticalcostsofincluding"noisy"versionsofthesame(latent)variableinmodelsandanalysis
Page42
methodsthatarebasedonthesemetrics,suchascostestimation,seemnottohavebeenappreciated.Subsetselectionmethods(e.g.,Mallows,1973)provideonewaytoassessvariableredundancyandtheeffectonfittedmodels,butotherapproachesthatusejudgmentcomposites,orcompositesbasedonotherbodiesofdata(Tukey,1991),willoftenbemoreeffectivethandiscardingmetrics.
Metricstypicallyinvolveprocessesorproducts,aresubjectiveorobjective,andinvolvedifferenttypesofmeasurementscales,forexample,nominal,ordinal,interval,orratio.Anobjectivemetricisameasurementtakenonaproductorprocess,usuallyonanintervalorratioscale.Someexamplesincludethenumberoflinesofcode,developmenttime,numberofsoftwarefaults,ornumberofchanges.Asubjectivemetricmayinvolveaclassificationorqualificationbasedonexperience.Examplesincludethequalityofuseofamethodortheexperienceoftheprogrammersintheapplicationorprocess.
OnestandardforsoftwaremeasurementistheBasiliandWeiss(1984)Goal/Question/Metricparadigm,whichhasfiveparameters:
1.Anobjectofthestudyaprocess,product,oranyotherexperiencemodel;
2.Afocuswhatinformationisofinterest;
3.Apointofviewtheperspectiveofthepersonneedingtheinformation;
4.Apurposehowtheinformationwillbeused;and
5.Adeterminationofwhatmeasurementswillprovidetheinformationthatisneeded.
Theresultsarestudiedrelativetoaparticularenvironment.
Page43
5StatisticalChallengesIncomparisonwithotherengineeringdisciplines,softwareengineeringisstillinthedefinitionstage.Characteristicsofestablisheddisciplinesincludehavingdefined,time-tested,crediblemethodologiesfordisciplinarypractice,assessment,andpredictability.Softwareengineeringcombinesapplicationdomainknowledge,computerscience,statistics,behavioralscience,andhumanfactorsissues.Statisticalresearchandeducationchallengesinsoftwareengineeringinvolvethefollowing:
Generalizingparticularexperimentalresultstoothersettingsandprojects,
Scalingupresultsobtainedinacademicstudiestoindustrialsettings,
Combininginformationacrosssoftwareengineeringprojectsandstudies,
Adoptingexploratorydataanalysisandvisualizationtechniques,
Educatingthesoftwareengineeringcommunityastostatisticalapproachesanddataissues,
Developinganalysismethodstocopewithqualitativevariables,
Providingmodelswiththeappropriateerrordistributionsforsoftwareengineeringapplications,and
Improvingacceleratedlifetesting.
Thefollowingsectionselaborateoncertainofthesechallenges.
SOFTWAREENGINEERINGEXPERIMENTALISSUES
Softwareengineeringisanevolutionaryandexperimentaldiscipline.AsarguedforcefullybyBasili(1993),itisalaboratoryorexperimentalscience.Theterm"experimentalscience"hasdifferentmeaningsforengineersandstatisticians.Forengineers,softwareisexperimentalbecausesystemsarebuilt,studied,andevaluatedbasedontheory.Eachsysteminvestigatesnewideasandadvancesthestateoftheart.Forstatisticians,thepurposeofexperimentsistogatherstatisticallyvalidevidenceabouttheeffectsofsomefactor,perhapsinvolvingtheprocess,methodology,orcodeinasystem.
Therearethreeclassesofexperimentsinsoftwareengineering:
Casestudies,
Academicexperiments,and
Industrialexperiments.
Casestudiesareperhapsthemostcommonandinvolvean"experiment"onasinglelarge-scaleproject.Academicexperimentsusuallyinvolveasmall-scaleexperiment,oftenonaprogramor
Page44
methodology,typicallyusingstudentsastheexperimentalsubjects.Industrialexperimentsfallsomewherebetweencasestudiesandacademicexperiments.Becauseoftheexpenseanddifficultyofperformingextensivecontrolledexperimentsonsoftware,casestudiesareoftenresortedto.Theidealsituationistobeabletotakeadvantageofreal-worldindustrialoperationswhilehavingasmuchcontrolasisfeasible.Muchofthepresentworkinthisareaisatbestanecdotalandwouldbenefitgreatlyfrommorerigorousstatisticaladviceandcontrol.Thepanelforeseesanopportunityforinnovativeworkoncombininginformation(seebelow)fromrelativelydisparateexperiences.
Conductingstatisticallyvalidsoftwareexperimentsischallengingforseveralreasons:
Thesoftwareproductionprocessisoftenchaoticanduncontrolled(i.e.,immature);
Humanvariabilityisacomplicatingfactor;and
Industrialexperimentsareverycostlyandthereforemustproducesomethinguseful.
Manyvariablesinthesoftwareproductionprocessarenotwellunderstoodandaredifficulttocontrolfor.Forsoftwareengineeringexperiments,thefactorsofinterestincludethefollowing:
"People"factors:number,level,organization,processexperience;
Problemfactors:applicationdomain,constraints,susceptibilitytochange;
Processfactors:lifecyclemodel,methods,tools,programminglanguage;
Productfactors:deliverables,systemsize,systemreliability,portability;and
Resourcefactors:targetanddevelopmentmachines,calendartime,budget,existingsoftware,andsoon.
Eachofthesecharacteristicsmustbemodeledorcontrolsdonefortheexperimenttobevalid.
Humanvariabilityisparticularlychallenging,giventhatthedifferenceinqualityandproductivitybetweenthebestandworstprogrammersmaybe20to1.Forexample,inanexperimentcomparingbatchversusinteractivecomputing,Sackman(1970)observeddifferencesinabilityofupto28to1inprogrammersperformingthesametask.Thisvariationcanoverwhelmtheeffectsofachangeinmethodologythatmayaccountfora10%to15%differenceinqualityorproductivity.
Thehumanfactorissostronglyintegratedwitheveryaspectofthesubjectivedisciplineofsoftwareengineeringthatitaloneistheprimedriverofissuestobeaddressed.Thehumanfactorcreatesissuesintheprocess,theproduct,andtheuserenvironment.Measurementsoftheobjects(theproductandtheprocess)areobscuredwhenqualifiedbytheattributes(ambiguousrequirementsandproductivityissuesarekeyexamples).Recognizingandcharacterizingthehumanattributeswithinthecontextofthesoftwareprocessarekeytounderstandinghowtoincludetheminsystemandstatisticalmodels.
Thecapabilitiesofindividualsstronglyinfluencethemetricscollectedthroughoutthesoftwareproductionprocess.Capabilitiesincludeexperience,intelligence,familiaritywiththeapplicationdomain,abilitytocommunicatewithothers,abilitytoenvisiontheproblemspatially,andabilitytoverballydescribethatspatialunderstanding.Althoughnotscientificallyfounded,anecdotalinformationsupportstheincidenceofthesecapabilities(Curtis,1988).
Page45
Forsoftwareengineeringexperiments,thekeyproblemsinvolvesmallsamplesizes,highvariability,manyuncontrolledfactors,andextremedifficultyincollectingexperimentaldata.Traditionalstatisticalexperimentaldesigns,originallydevelopedforagriculturalexperiments,arenotwellsuitedforsoftwareengineering.Atthepanel'sforum,Zweben(1993)discussedaninterestingexampleofanexperimentfromobject-orientedprogramming,involvingafairlycomplexdesignandanalysis.Object-orientedprogrammingisanapproachthatissweepingthesoftwareindustry,butforwhichmuchofthesupportingevidenceisanecdotal.
Example.Thepurposeofthesoftwaredesignandanalysisexperimentwastogatherstatisticallyvalidevidenceabouttheeffectoneffortandqualityofusingtheprinciplesofabstraction,encapsulation,andlayeringtoenhancecomponentsofsoftwaresystems.Theexperimentwasdividedintotwotypesoftasks:
1.Enhancinganexistingcomponenttoprovideadditionalfunctionality,and
2.Modifyingacomponenttoprovidedifferentfunctionality.
Theexperimentalsubjectswerestudentsingraduateclassesonsoftwarecomponentdesignanddevelopment.Thetwoapproachesforthismaintenanceproblemare"whitebox,"whichinvolvesmodifyingtheoldcodetogetthenewfunctionality,and"blackbox,"whichinvolveslayeringonthenewfunctionality.Theexperimentsweredesignedtodetect,foreachtask,differencesbetweenthetwoapproachesinthetimerequiredtomakethemodificationandinthenumberofassociatedfaultsuncovered.Threeexperimentswereconducted.ExperimentAinvolvedanunboundedqueuecomponent.ThesubjectsweregivenabasicAdapackageimplementingenque,deque,andisempty,andthetaskwastoimplementtheoperatorsadd,
copy,clear,append,andreverse.Thesubjectwasinstructedtokeeptrackofthetimespentindesigning,coding,testing,anddebuggingeachoperator,andalsotheassociatednumberofbugsuncoveredineachtask.Thetaskswerecompletedintwoways:bydirectlyimplementingnewoperationsusingtherepresentationofthequeue,andbylayeringonthenewoperatorsascapabilities.ExperimentBinvolvedapartialmapcomponent,andexperimentCinvolvedanalmostconstantmapcomponent.Giventhatinexperimentsinvolvingstudents,theresultsmaybeinvalidatedbyproblemswithdataintegrity,forthisexperimentthestudentparticipantsweretoldthattheresultsoftheexperimentwouldhavenoeffectoncoursegrades.Thecodewasvalidatedbyaninstructortoensurethattherewerenolingeringdefects.Theexperimentalplanwasconductedusingacrossoverdesign.Eachsubjectimplementedtheenhancementstwice,usingboththewhiteboxandtheblackboxmethods.Thisparticularexperimentaldesigncouldtestforthetreatment(layeringornot)effectandtreatmentbysequenceinteraction.Thesubjectdifferenceswerenestedwithinthesequences,andthesequenceswerecounterbalancedbasedonexperiencelevel.Thecarryovereffectofthefirsttreatmentinfluencesthechoiceregardingthecorrectwayoftestingfortreatmenteffects.
Thestatisticalmodelusedtorepresentthebehaviorinthenumberofbugswassophisticatedaswell,anoverdispersedloglinearmodel.Theuseofthismodelallowedforananalysisofnonnormalresponsedatawhilealsopreventinginvalidinferencesthatwouldhaveoccurredhad
Page46
overdispersionnotbeentakenintoaccount.Indeed,onlyexperimentBdisplayedasignificanttreatmenteffectafteradjustmentforoverdispersion.
COMBININGINFORMATION
Theresultsofmanydiversesoftwareprojectsandstudiestendtoleadtomoreconfusionthaninsight.Thesoftwareengineeringcommunitywouldbenefitifmorevalueweregainedfromtheworkthatisbeingdone.Totheextentthatprojectsandstudiesfocusonthesameendpoint,statisticscanhelptofusetheindependentresultsintoaconsistentandanalyticallyjustifiablestory.
Thestatisticalmethodologythataddressesthetopicofhowtofusesuchindependentresultsisrelativelynewandistermed''combininginformation";arelatedsetoftoolsisprovidedbymeta-analysis.AnexcellentoverviewofthismethodologywasproducedbyaCATSpanelanddocumentedinanNRCreport(NRC,1992)thatisnowavailableasanAmericanStatisticalAssociationpublication(ASA,1993).Thereportdocumentsvariousapproachestotheproblemofhowtocombineinformationanddescribesnumerousspecificapplications.Oneoftherecommendationsmadeinit(p.182)iscrucialtoachievingadvancesinsoftwareengineering:
Thepanelurgesthatauthorsandjournaleditorsattempttoraisethelevelofquantitativeexplicitnessinthereportingofresearchfindings,bypublishingsummariesofappropriatequantitativemeasuresonwhichtheresearchconclusionsarebased(e.g.,ataminimum:samplesizes,means,andstandarddeviationsforallvariables,andrelevantcorrelationmatrices).
Itisnotsensibletomerelycombinep-valuesfromindependentstudies.Itisclearlybettertotakeweightedaveragesofeffectswhentheweightsaccountfordifferencesinsizeandsensitivityacrossthe
studiestobecombined.
Example.Kitchenham(1991)discussesanissueincostestimationthatinvolveslookingacross10differentsourcesconsistingof17differentsoftwareprojects.Theissueiswhethertheexponentßinthebasiccostestimationmodel,effortµsizeß,issignificantlydifferentfrom1.Theusualinterpretationofßisthe"overheadintroducedbyproductsize,"sothatavaluegreaterthan1impliesthatrelativelymoreeffortisrequiredtoproducelargesoftwaresystemsthantoproducesmallerones.Manycitesuch"diseconomiesofscale"insoftwareproductionasevidenceinsupportoftheirmodelsandtools.
The17softwareprojectsarelistedinTable4.Fortunately,thecitedsourcescontainbothpointestimates(b)oftheexponentanditsestimatedstandarderror.Thesesummarystatisticscanbeusedtoestimateacommonexponentandultimatelytestthehypothesisthatitisdifferentfrom1.
Page47
Table4.Reportedandderiveddataon17projectsconcernedwithcostestimation.
Study b SE(b) Var(b) w
Bai-Bas 0.951 0.068 0.004624 21.240Bel-Leh 1.062 0.101 0.010200 18.990Your 0.716 0.230 0.052900 10.490Wing 1.059 0.294 0.086440 7.758Kemr 0.856 0.177 0.031330 13.550Boehm.Org 0.833 0.184 0.033860 13.100Boehm.semi 0.976 0.133 0.017690 16.630Boehm.Emb 1.070 0.104 0.010820 18.770Kit-Tay.ICL 0.472 0.323 0.104300 6.813Kit-Tay.BTSX 1.202 0.300 0.090000 7.550Kit-Tay.BTSW 0.495 0.185 0.034220 13.040DS1.1 1.049 0.125 0.015630 17.220DS1.2 1.078 0.105 0.011020 18.700DS1.3 1.086 0.289 0.083520 7.938DS2.New 0.178 0.134 0.017960 16.550DS2.Ext 1.025 0.158 0.024960 14.830DS3 1.141 0.077 0.005929 20.670
SOURCE:Reprinted,withpermission,fromKitchenham(1992).(c)1992byNationalComputingCentre,Ltd.
FollowingtheNRCrecommendationsoncombininginformationacrossstudies(NRC,1992),theappropriatemodel(theso-calledrandomeffectsmodelinmeta-analysis)allowsforasystematicdifferencebetweenprojects(e.g.,biasindatareporting,managementstyle,andsoon)thataveragestozero.Underthismodel,theoverallexponentisestimatedasaweightedaverageoftheindividual
exponentswheretheweightshavetheformwi=var(bi)+t2andthecommonbetween-projectcomponentofvarianceisestimatedby
whereQ=Swi(bi- )2.ThestatisticQisitselfatestofthehomogeneityofprojectsandunderanormalityassumptionisdistributedasX2k-1.ForthesedataoneobtainsQ=55.19,whichstronglyindicatesheterogeneityacrossprojects.Althoughtherandomeffectsmodelanticipatessuchheterogeneity,otherapproachesthatmodelthedifferencesbetweenprojects(e.g.,
Page48
regressionmodels)maybemoreinformative.Sincenoexplanatoryvariablesareavailable,thisdiscussionproceedsusingthesimplermodel.
Theestimatedbetween-projectcomponentofvarianceist2=0.0425,whichissurprisinglylargeandisperhapshighlyinfluencedbytwoprojectswithb'slessthan0.5.Combiningthisestimatewiththeindividualwithin-projectvariancesleadstotheweightsgiveninthefinalcolumnofTable4.Thustheoverallestimatedexponentis =0.911withestimatedstandarderrors=0.0640(=Ö[1/Swi]).Combiningthesetwoestimatesleadsreadilytoa95%confidenceintervalforßof(0.78,1.04).Thusthedatainthesestudiesdonotsupportthediseconomies-of-scaleargument.
Evenbetterthanpublishedsummarieswouldbeacentralrepositoryofthedataarisingfromastudy.Thisinformationwouldallowassessmentofvariousdeterminationsofsimilaritiesbetweenstudies,aswellaspotentialbiases.Thepanelisawareofseveralinitiativestobuildsuchdatarepositories.TheproposedNationalSoftwareCouncilhasasoneofitsprimaryresponsibilitiestheconstructionandmaintenanceofanationalsoftwaremeasurementsdatabase.Atthepanel'sforum,aspecializeddatabaseonsoftwareprojectsintheaeronauticsindustrywasalsodiscussed(Keller,1993).
Anissuerelatedtocombininginformationfromdiversesourcesconcernsthetranslationtoindustryofsmallexperimentalstudiesand/orpublishedcasestudiesdoneinanacademicenvironment.Seriousdoubtsexistinindustryastotheupwardscalabilityofmostofthesestudiesbecausepopulations,projectsizes,andenvironmentsarealldifferent.Expectationsdifferregardingquality,anditisunclearwhethervariablesmeasuredinasmallstudyarethevariablesinwhichindustryhasaninterest.Thestatisticalcommunityshoulddevelop
stochasticmodelstopropagateuncertainty(includingvariabilityassessment)ondifferentcontrolfactorssothatadjustmentsandpredictionsapplicabletoindustry-levelenvironmentscanbemade.
VISUALIZATIONINSOFTWAREENGINEERING
Scientificvisualizationisanemergingtechnologythatisdrivenbyever-decreasinghardwarepricesandtheassociatedincreasingsophisticationofvisualizationsoftware.Visualizationinvolvestheinteractivepictorialdisplayofdatausinggraphics,animation,andsound.Muchoftherecentprogressinvisualizationhascomefromtheapplicationofcomputergraphicstothree-dimensionalimageanalysisandrendering.Datavisualization,asubsetofscientificvisualization,focusesonthedisplayandanalysisofabstractdata.Someoftheearliestandbest-knownexamplesofdatavisualizationinvolvestatisticaldatadisplays.
Themotivationforapplyingvisualizationtosoftwareengineeringistounderstandthecomplexity,multidimensionality,andstructureembodiedinsoftwaresystems.Muchoftheoriginalresearchinsoftwarevisualizationtheuseoftypography,graphicdesign,animation,andcinematographytofacilitatetheunderstandingandenhancementofsoftwaresystems-wasperformedbycomputerscientistsinterestedinunderstandingalgorithms,particularlyinthe
Page49
contextofeducation.Applyingthequantitativefocusofstatisticalgraphicsmethodstocurrentlypopularscientificvisualizationtechniquesisafertileareaforresearch.
Visualizingsoftwareengineeringdataischallengingbecauseofthediversityofdatasetsassociatedwithsoftwareprojects.Fordatasetsinvolvingsoftwarefaults,timestofailure,costandeffortpredictions,andsoon,thereisaclearstatisticalrelationshipofinterest.Softwarefaultdensitymayberelatedtocodecomplexityandtoothersoftwaremetrics.Traditionaltechniquesforvisualizingstatisticaldataaredesignedtoextractquantitativerelationshipsbetweenvariables.Othersoftwareengineeringdatasetssuchastheexecutiontraceofaprogram(thesequenceofstatementsexecutedduringatestrun)orthechangehistoryofafilearenoteasilyvisualizedusingconventionaldatavisualizationtechniques.Theneedforrelevanttechniqueshasledtothedevelopmentofspecializeddomain-specificvisualizationcapabilitiespeculiartosoftwaresystems.Applicationsincludethefollowing:
Configurationmanagementdata(Eicketal.,1992b),
Functioncallgraphs(Ganseretal.,1993),
Codecoverage,
Codemetrics,
Algorithmanimation(BrownandHershberger,1992;Stasko,1993),
Sophisticatedtypesettingofcomputerprograms(BaeckerandMarcus,1988),
Softwaredevelopmentprocess,
Softwaremetrics(Ebert,1992),and
Softwarereliabilitymodelsanddata.
Someoftheseapplicationsarediscussedbelow.
ConfigurationManagementData
Arichsoftwaredatabasesuitableforvisualizationinvolvesthecodeitself.Inproductionsystems,thesourcecodeisstoredinconfigurationmanagementdatabases.Thesedatabasescontainacompletehistoryofthecodewitheverysourcecodechangerecordedasamodificationrequest.Alongwiththeaffectedlines,thesourcecodedatabaseusuallycontainsotherinformationsuchastheidentityoftheprogrammermakingthechanges,datethechangesweresubmitted,reasonforthechange,andwhetherthechangewasmeanttoaddfunctionalityorfixabug.Thevariablesassociatedwithsourcecodemaybecontinuous,categorical,orbinary.Foralineinacomputerprogram,whenitwaswrittenis(essentially)continuous,whowroteitiscategorical,andwhetherornotthelinewasexecutedduringaregressiontestisbinary.
Example.Figure1(see"Implementation"inChapter3)showsproductioncodewritteninClanguagefromamoduleinAT&T's5ESSswitch(Eick,1994).Inthedisplay,rowcoloristiedtothecode'sage:themostrecentlyaddedlinesareinredandtheoldestinblue,withacolorspectruminbetween.Dynamicgraphicstechniquesareemployedforincreasingtheeffectivenessofthedisplay.TherearefiveinteractiveviewsofdatainFigure1:
Page50
1.Therowscorrespondingtothetextlines,
2.Thevaluesonthecolorscale,
3.Thefilenamesabovethecolumns,
4.Thebrowserwindows,and
5.Thebarchartbeneaththecolorscale.
Eachoftheviewsislinked,unitedthroughtheuseofcolor,andactivatedbyusingamousepointer.Thismodeofmanipulatingthedisplay,calledbrushingbyBeckerandCleveland(1987)andbyBeckeretal.(1987),isparticularlyeffectiveforexploringsoftwaredevelopmentdata.
FunctionCallGraphs
PerhapsthemostcommonvisualizationofsoftwareisafunctioncallgraphasshowninFigure5.Functioncallgraphsareawidelyused,visual,tree-likedisplayofthefunctioncallsinapieceofcode.Theyshowcallingrelationshipsbetweenmodulesinasystemandareonerepresentationofsoftwarestructure.Aproblemwithfunctioncallgraphsisthattheybecomeoverloadedwithtoomuchinformationforallbutthesmallestsystems.Oneapproachtoimprovingtheusefulnessoffunctioncallgraphsmightinvolvetheuseofdynamicgraphicstechniquestofocusthedisplayonthevisuallyinformativeregions.
TestCodeCoverage
Anotherinterestingexampleofsourcecodevisualizationinvolvesshowingtestsuitecodecoverage.Figure6showsthestatementcoverageandexecution"hotspots"foraprogramthathasbeenrunthroughitsregressiontest.Therowindentationandlinelengthhave
beenturnedoffsothateachlinereceivesthesameamountofvisualspace.Themostfrequentlyexecutedlinesareshowninredandtheleastfrequentlyinblue,withacolorspectruminbetween.Therearetwospecialcolors:theblacklinescorrespondtononexecutablelinesofCcodesuchascomments,variabledeclarations,andfunctions,andthegraylinescorrespondtotheexecutablelinesofcodethatwerenotexecuted.Thesearethelinesthattheregressiontestmissed.
CodeMetrics
AsdiscussedinChapter4(inthesection"SoftwareMeasurementandMetrics"),staticcodemetricsattempttoquantifyandmeasurethecomplexityofcode.Thesemetricsareusedtoidentifyportionsofprogramsthatareparticularlydifficultandarelikelytobesubjecttodefects.Onevisualizationmethodfordisplayingcodecomplexitymetricsusesaspace-fillingrepresentation(BakerandEick,1995).Takingadvantageofthehierarchicalstructureofcode,eachsubsystem,module,andfileistiledonthedisplay,whichshowsthemasnested,space-fillingrectangleswitharea,color,andfillencodingsoftwaremetrics.Thistechniquecandisplay
Page51
therelativesizesofasystem'scomponents,therelativestabilityofthecomponents,thelocationofnewfunctionality,thelocationoferror-pronecodewithmanyfixestoidentifiedfaults,and,usinganimation,thehistoricalevolutionofthecode.
Example.Figure7displaystheAT&T5ESSswitchingcodeusingtheSeeSys(system,adynamicgraphicsmetricsvisualizationsystem.Interactivecontrolsenabletheusertomanipulatethedisplay,resetthecolors,andzoominonparticularmodulesandfiles,providinganinteractivesoftwaredataanalysisenvironment.Thespace-fillingrepresentation:
Showsmodules,files,andsubsystemsincontext;
Providesanoverviewofacompletesoftwaresystem;and
Appliesstatisticaldynamicgraphicstechniquestotheproblemofvisualizingmetrics.
Amajordifferenceintheuseofgraphicsinscientificvisualizationandstatisticsisthatfortheformer,graphsaretheend,whereasforthelatter,theyaremoreoftenthemeanstoanend.Thusvisualizationsofsoftwarearecrucialtostatisticalsoftwareengineeringtotheextentthattheyfacilitatedescriptionandmodelingofsoftwareengineeringdata.Discussedbelowaresomepossibilitiesrelatedtotheexamplesdescribedinthischapter.
TherainbowfilesinFigure1suggestthatcertaincodeischangedfrequently.Frequentlychangedcodeisoftenerror-prone,difficulttomaintain,andproblematic.Softwareengineersoftenclaimthatcode,orpeople'sunderstandingofit,decayswithage.Eventuallythecodebecomesunmaintainableandmustberewritten(re-engineered).
Statisticalmodelsareneededtocharacterizethenormalrateofchangeandthereforedeterminewhetherthecurrentfilesareunusual.Suchmodelsneedtotakeaccountofthenumberofchanges,locationsoffaults,typeoffunctionality,pastdevelopmentpatterns,andfuturetrends.Forexample,acommonsoftwaredesigninvolveshavingasimplemainroutinethatcallsonseveralotherprocedurestoinvokeneededfunctionality.Themainroutinemaybechangedfrequentlyasprogrammersmodifysmallsnippetsofcodetoaccesslargechunksofnewcodethatisputintootherfiles.Forthiscode,manysimple,smallchangesarenormalanddonotindicatemaintenanceproblems.Ifmodelsexisted,thenitwouldbepossibletomakequantitativecomparisonsbetweenfilesratherthanthequalitativecomparisonsthatarecurrentlymade.
Figure5suggestssomenaturalcovariatesandmodelsforimprovingtheefficiencyofsoftwaretesting.Currentcompilertechnologycaneasilyanalyzecodetoobtainthefunctions,lines,andeventhepathsexecutedbycodeintestsuites.Forcertainclassesofprogrammingerrorssuchastypographicalerrors,theincrementalcodecoverageisanidealcovariateforestimatingtheprobabilityofdetectinganerror.Theexecutionfrequencyofblocksofcodeorfunctionsisclearlyrelatedtotheprobabilityoferrordetection.Figure5showsclearlythatsmallportionsoftheprogramareheavilyexercisedbutthatmostofthecodeisnottouched.Inanindirectwayoperationalprofiletestingattemptstocapturethisideabytestingthefeatures,andthereforethecode,inrelationtohowoftentheywillbeused.Thisnotionsuggeststhatstatisticaltechniquesinvolvingcovariatescanimprovetheefficiencyofsoftwaretesting.
Figure7suggestsnovelwaysofdisplayingsoftwaremetrics.Thecurrentpracticeistoidentifyoverlycomplexfilesforspecialcareandmanagementattention.Theproceduresfor
Page52
identifyingcomplexcodeareoftenbasedonverycleverandsophisticatedarguments,butnotondata.Astatisticalapproachmightattempttocorrelatethecomplexityofcodewiththelocationsofpastfaultsandinvestigatetheirpredictivepower.Statisticalmodelsthatcanrelatecomplexitymetricstoactualfaultswillincreasethemodels'practicalefficiencyforreal-lifesystems.Thesemodelsshouldnotbedevelopedintheabsenceofdataaboutthecode.Simplewaysofpresentingsuchdata,suchasanorderedlistoffaultdensity,filebyfile,canbeveryeffectiveinguidingtheselectionofanappropriatemodel.Inothercases,microanalysis,oftendrivenbygraphicalbrowsers,mightsuggestaricherclassofmodelsthatthedatacouldsupport.Forexample,softwarefaultratesareoftenquotedintermsofthenumberoffaultsper1,000linesofNCSL.ThelinesinFigure1canbecolor-codedtoshowthehistoricallocationsofpastfaults.Inotherrepresentations(notshown),clearspatialpatternswithfaultsareconcentratedinparticularfilesandinparticularregionsofthefiles,suggestingthatspatialmodelsoffaultdensitymightworkverywellinhelpingtoidentifyfault-pronecode.
ChallengesforVisualization
Theresearchopportunitiesandchallengesinvisualizingsoftwaredataaresimilartothoseforvisualizingotherlargeabstractdatabases:
1.Softwaredataareabstract;thereisnonaturaltwo-dimensionalorthree-dimensionalrepresentationofthedata.Aresearchchallengeistodiscovermeaningfulrepresentationsofthedatathatenableananalysttounderstandthedataincontext.
2.Muchsoftwaredataarenontraditionalstatisticaldatasuchasthechangehistoryofsourcecode,duplicationinmanuals,orthestructureofarelationaldatabase.Newmetaphorsmustbediscoveredforharmonioustransferinformation.
3.Thedatabaseassociatedwithlargesoftwaresystemsmaybehuge,potentiallycontainingmillionsofobservations.Effectivestatisticalgraphicstechniquesmustbeabletocopewiththevolumeofdatafoundinmodernsoftwaresystems.
4.Thelackofeasy-to-usesoftwaretoolsmakesthedevelopmentofhigh-qualitycustomvisualizationsparticularlydifficult.Currently,visualizationsmustbehand-codedinlow-levellanguagessuchasCorC++.Thisisatime-consumingtaskthatcanbecarriedoutonlybythemostsophisticatedprogrammers.
OpportunitiesforVisualization
Visualizationsassociatedwithsoftwareinvolvethecodeitself,dataassociatedwiththesystem,theexecutionoftheprogram,andtheprocessforcreatingthesystem.Opportunitiesincludethefollowing:
1.Objects/Patterns.Object-orientedprogrammingisrapidlybecomingstandardfordevelopmentofnewsystemsandisbeingretrofittedintoexistingsystems.Effective
Page53
Figure5.Functioncallgraphsshowingthecallingpatternbetweenprocedures.Thetoppanelshowsaninterpretable,easy-to-comprehenddisplay,whereasthe
bottompanelisoverlybusyandvisuallyconfusing.
Page55
Figure6.aSeeSoftTMdisplayshowingcodecoverageforaprogramexecutingitsregressiontest.Thecolorofeachlineisdeterminedbythenumberoftimesthatitexecuted.Thecolorsrangefromred(the"hotspots")todeepblue(forcodeexecutedonlyonce)usingared-green-bluecolorspectrum.Therearetwospecialcolors:theblacklinesarenon-executablelinesofcodesuchasvariable
declarationsandcomments,andthegraylinesarethenon-executed(notcovered)lines.Thefigureshowsthatgeneratingregressiontestswithhigh
coverageisquitedifficult.Source:Eick(1994).
Page57
Figure7.Adisplayofsoftwaremetricsforamillion-linesystem.Therectangleformingtheoutermostboundaryrepresentstheentiresystem.Therectanglescontainedwithintheboundaryrepresentthesize(inNCSLs)ofindividual
subsystems(eachlabeledwithasinglecharacterA-Z,a-t),andmoduleswithinthesubsystems.Colorisusedheretoredundantlyencodesizeaccordingtothe
colorschemeintheslideratthebottomofthescreen.
Page59
displaysneedtobedevelopedforunderstandingtheinheritance(ordependency)structure,semanticrelationshipsamongobjects,andtherun-timelifecycleofobjects.
2.Performance.Softwaresystemsinevitablyruntooslowly,makingrun-timeperformanceanimportantconsideration.Hostsystemsoftencollectlargevolumesoffine-grain(thatis,low-level)performancedataincludingfunctioncallingpatterns,lineexecutioncounts,operatingsystempagefaults,heapusage,andstackspace,aswellasdiskusage.Noveltechniquestounderstandanddigestdynamicprogramexecutiondatawouldbeimmediatelyuseful.
3.Parallelism.Recently,massivelyparallelcomputerswithtenstothousandsofcooperatingprocessorshavestartedtobecomewidelyavailable.Programmingthesecomputersinvolvesdevelopingnewdistributedalgorithmsthatdivideimportantcomputationsamongtheprocessors.Mostoftenanessentialaspectofthecomputationinvolvescommunicatinginterimresultsbetweenprocessorsandsynchronizingthecomputations.Visualizationtechniquesareacrucialtoolforenablingprogrammerstomodelanddebugsubtlecomputations.
4.Three-dimensional.Workstationscapableofrenderingrealisticthree-dimensionaldisplaysarerapidlybecomingwidelyavailableatreasonableprices.Newvisualizationtechniquesleveragingthree-dimensionalcapabilitiesshouldbedevelopedtoenablesoftwareengineerstocopewiththeever-increasingcomplexityofmodernsoftwaresystems.
ORTHOGONALDEFECTCLASSIFICATION
Theprimaryfocusofsoftwareengineeringistomonitorasoftwaredevelopmentprocesswithaviewtowardimprovingqualityand
productivity.Forimprovingquality,therehavebeentwodistinctapproaches.Thefirstconsiderseachdefectasuniqueandtriestoidentifyacause.Thesecondconsidersadefectasasamplefromanensembletowhichaformalstatisticalreliabilitymodelisfitted.Chillaregeetal.(1992)proposedanewmethodologythatstrikesabalancebetweenthesetwoendsofspectrum.Thismethod,calledorthogonaldefectclassification,isbasedonexploratorydataanalysistechniquesandhasbeenfoundtobequiteusefulatIBM.Itrecognizesthatthekeytoimprovingaprocessistoquantifyvariouscause-and-effectrelationshipsinvolvingdefects.
Thebasicapproachisasfollows.First,classifydefectsintovarioustypes.Then,obtainadistributionofthetypesacrossdifferentdevelopmentphases.Finally,havingcreatedthesereferencedistributionsandtherelationshipsamongthem,comparethemwiththedistributionsobservedinanewproductorrelease.Iftherearediscrepancies,takecorrectiveaction.
Operationally,thedefectsareclassifiedaccordingtoeight''orthogonal"(mutuallyexclusive)defecttypes:functional,assignment,interface,checking,timing,build/package/merge,datastructuresandalgorithms,anddocumentation.Further,developmentphasesaredividedintofourbasicstages(wheredefectscanbeobserved):design,unittest,functiontest,andsystemtest.Foreachstageandeachdefecttype,arangeofacceptablebaselinedefectratesisdefinedbyexperience.Thisinformationisusedtoimprovethequalityofanewproductorrelease.Toward
Page60
thisend,foragivendefecttype,defectdistributionsacrossdevelopmentstagesarecomparedwiththebaselinerates.Foreachchainofresultssay,toohighearlyon,lowerlater,andhighattheendanimplicationisderived.Forexample,theimplicationmaybethatfunctiontestingshouldberevamped.
Thismethodologyhasbeenextendedtoastudyofthedistributionoftriggers,thatis,theconditionsthatallowadefecttosurface.First,itisimplicitinthisapproachthatthereisnosubstituteforagooddataanalysis.Second,assumptionsclearlyarebeingmadeaboutthestationarityofreferencedistributions,anapproachthatmaybeappropriateforastableenvironmentwithsimilarprojects.Thus,itmaybenecessarytocreateclassesofreferencedistributionsandclassesofsimilarprojects.Perhapssomeclusteringtechniquesmaybevaluableinthiscontext.Third,althoughthedefecttypesaremutuallyexclusive,itispossiblethatafaultmayresultinmanydefects,andviceversa.Thismultiple-spawningmaycauseseriousimplementationdifficulties.Propermeasurementprotocolsmaydiminishsuchmultipropagation.Finally,givengood-qualitydata,itmaybepossibletoextendorthogonaldefectclassificationtoeffortstoidentifyrisksintheproductionofsoftware,perhapsusingdatatoprovideearlyindicatorsofproductqualityandpotentialproblemsconcerningscheduling.Thepotentialofthislineofinquiryshouldbecarefullyinvestigated,sinceitcouldopenupanexcitingnewareainsoftwareengineering.
Page61
6SummaryandConclusionsInthe1950s,astheproductionlinewasbecomingthestandardforhardwaremanufacturing,Demingshowedthatstatisticalprocesscontroltechniques,inventedoriginallybyShewhart,wereessentialtocontrollingandimprovingtheproductionprocess.Deming'scrusadehashadalastingimpactinJapanandhaschangeditsworldwidecompetitiveposition.Ithasalsohadaglobalimpactontheuseofstatisticalmethods,thetrainingofstatisticians,andsoforth.
Inthe1990stheemphasisisonsoftware,ascomplexhardware-basedfunctionalityisbeingreplacedbymoreflexible,software-basedfunctionality.Smallprogramscreatedbyafewprogrammersarebeingsupersededbymassivesoftwaresystemscontainingmillionsoflinesofcodecreatedbymanyprogrammerswithdifferentbackgrounds,training,andskills.Thisistheworldofso-calledsoftwarefactories.Thesefactoriesatpresentdonotfitthetraditionalmodelof(hardware)factoriesandmorecloselyresemblethedevelopmenteffortthatgoesintodesigningnewproducts.However,withthespreadofsoftwarereuse,theincreasingavailabilityoftoolsforautomaticallycapturingrequirements,generatingcodeandtestcases,andprovidinguserdocumentation,andthegrowingrelianceonstandardizedtuningandinstallationprocessesandstandardizedproceduresforanalysis,themodelismovingclosertothatofatraditionalfactory.Theeconomyofscalethatisachievablebyconsideringsoftwaredevelopmentasamanufacturingprocess,afactory,ratherthanahandcraftingprocess,isessentialforpreservingU.S.competitiveleadership.Thechallengeistobuildthesehugesystemsinacost-effectivemanner.Thepanelexpectsthischallengeto
concernthefieldofsoftwareengineeringfortherestofthedecade.Hence,anysetofmethodologiesthatcanhelpinmeetingthischallengewillbeinvaluable.Moreimportantly,theuseofsuchmethodologieswilllikelydeterminethecompetitivepositionsoforganizationsandnationsinvolvedinsoftwareproduction.
Withtheamountofvariabilityinvolvedinthesoftwareproductionprocessanditsmanysubprocesses,aswellasthediversityofdevelopers,users,anduses,itisunlikelythatadeterministiccontrolsystemwillhelpimprovethesoftwareproductionprocess.Asinstatisticalphysics,onlyatechnologybasedonstatisticalmodeling,somethingakintostatisticalcontrol,willwork.ThepanelbelievesthatthejunctureathandisnotverydifferentfromtheonereachedbyDeminginthe1950swhenhebegantopopularizetheconceptofstatisticalprocesscontrol.Whatisneedednowisadetailedunderstandingbystatisticiansofthesoftwareengineeringprocess,aswellasanappreciationbysoftwareengineersofwhatstatisticianscanandcannotdo.Ifcollaborativeinteractionsandthebuildingofthismutualunderstandingcanbecultivated,thentherelikelywilloccuramajorimpactofthesameorderofmagnitudeasDeming'sintroductionofstatisticalprocesscontroltechniquesinhardwaremanufacturing.
Ofcourse,thisisnottosaythatallsoftwareproblemsaregoingtobesolvedbystatisticalmeans,justasnotallautomobilemanufacturingproblemscanbesolvedbystatisticalmeans.Onthecontrary,thesoftwareindustryhasbeentechnologydriven,andthebulkoffuturegainsinproductivitywillcomefromnew,creativeideas.Forexample,muchofthegaininproductivity
Page62
between1950and1970occurredbecauseofthereplacementofassemblercodingbyhigh-levellanguages.
Nevertheless,asthepanelattemptstopointoutinthisreport,increasedcollaborationbetweensoftwareengineersandstatisticiansholdsmuchpromiseforresolvingproblemsinsoftwaredevelopment.Someofthecatalyststhatareessentialforthisinteractiontobeproductive,aswellassomeoftherelatedresearchopportunitiesforsoftwareengineersandstatisticians,arediscussedbelow.
INSTITUTIONALMODELFORRESEARCH
Thepanelstronglybelievesthattherightmodelforstatisticalresearchinsoftwaredevelopmentiscollaborativeinnature.Itisessentialtoavoidsolvingthe"wrong"problems.Itisequallyimportantthattheproblemsidentifiedinthisreportnotbe"solved"bystatisticiansinisolation.Statisticiansneedtoattainadegreeofcredibilityinsoftwareengineering,andsuchcredibilitywillnotbeachievedbydevelopingNnewreliabilitymodelswithhigh-powerasymptotics.Theidealcollaborationpartnersstatisticiansandsoftwareengineersinworkaimedatimprovingarealsoftwareprocessorproduct.
Thisconclusionassumesnotonlythatstatisticiansandsoftwareengineershaveamutualdesiretoworktogethertosolvesoftwareengineeringproblems,butalsothatfundingandrewardmechanismsareinplacetostimulatethetechnicalcollaboration.Uptonow,suchincentiveshavenotbeenthenorminacademicinstitutions,giventhat,forexample,coauthoredpapershavebeengenerallydiscountedbypromotionevaluationcommittees.Moreover,atfundingagencies,proposalsforcollaborativeworkhavetendedto"fallthroughthecracks"becauseofalackofinterdisciplinaryexpertisetoevaluatetheirmerits.Thepanelexpectssuchbarrierstobereducedinthecomingyears,butintheinterim,industrycanplayaleadershiprolein
nurturingcollaborationsbetweensoftwareengineersandstatisticiansandcanreduceitsownsetofbarriers(forinstance,thoserelatedtoproprietaryandintellectualpropertyinterests).
MODELFORDATACOLLECTIONANDANALYSIS
Asdiscussedaboveinthisreport,forstatisticalapproachestobeuseful,itisessentialthathigh-qualitydatabeavailable.Qualityincludesmeasuringtherightthingsattherighttimespecifically,adoptedsoftwaremetricsmustberelevantforeachoftheimportantstagesofthedevelopmentlifecycle,andtheprotocolofmetricsforcollectingdatamustbewelldefinedandwellexecuted.Withoutcarefulpreparationthattakesaccountofallofthesedataissues,itisunlikelythatstatisticalmethodswillhaveanyimpactonagivensoftwareprojectunderstudy.Forthisreason,itiscrucialtohavethesoftwareindustrytakealeadpositioninresearchonstatisticalsoftwareengineering.
Figure8,amodelfortheinteractionbetweenresearchersandthesoftwaredevelopmentprocess,displaysahigh-levelspiralviewofthesoftwaredevelopmentprocessofferedbyDalal
Page63
Figure8.Spiralsoftwaredevelopmentprocessmodel.SSEM,statisticalsoftwareengineeringmodule.
Figure9.Statisticalsoftwareengineeringmoduleatstagen.
etal.(1994).Figure9givesamoredetailedviewofthestatisticalsoftwareengineeringmodule(SSEM)atthecenterofFigure8.
TheSSEMhasseveralcomponents.Oneofitsmajorfunctionsistoactasthecentralrepositoryforallrelevantprojectdata(statisticalornonstatistical).Thusthismoduleservesasaresourcefortheentireproject,interfacingwitheverystage,typicallyatitsrevieworconclusion.Forexample,theSSEMwouldbeusedattherequirementreviewstage,whendataoninspection,faults,times,effort,and
coverageareavailable.Fortesting,informationwouldbegatheredattheendofeachstageoftesting(unit,integration,system,alpha,beta,...)aboutthenumberofopenfaults,closedfaults,typesofproblems,severity,changes,andeffort.Suchdatawouldcomefromtestcasemanagementsystems,changemanagementsystems,andconfigurationmanagementsystems.
Page64
AdditionalelementsoftheSSEMincludecollectionprotocols,metrics,exploratorydataanalysis(EDA),modeling,confirmatoryanalysis,andconclusions.AcriticalpartoftheSSEMwouldberelatedtoroot-causeanalysis.AnalysiscouldbeassimpleasIshikawa'sfishbonediagram(Ishikawa,1976),ormorecomplex,suchasorthogonaldefectclassification(describedinChapter5).Thiscapabilityaccordswiththebeliefthatacarefulanalysisofrootcauseisessentialtoimprovingthesoftwaredevelopmentprocess.CentralplacementoftheSSEMensuresthattheresultsofvariousanalyseswillbecommunicatedatallrelevantstages.Forexample,atthecodereviewstage,theSSEMcansuggestwaysofimprovingtherequirementprocessaswellaspointoutpotentiallyerror-pronepartsofthesoftwarefortesting.
ISSUESINEDUCATION
Enormousopportunitiesandmanypotentialbenefitsarepossibleifthesoftwareengineeringcommunitylearnsaboutrelevantstatisticalmethodsandifstatisticianscontributetoandcooperateintheeducationoffuturesoftwareengineers.Theareasoutlinedbelowarethosethatarerelevanttoday.Asthecommunitymaturesinitsstatisticalsophistication,theareasthemselvesshouldevolvetoreflectthematurationprocess.
Designedexperiments.Softwareengineeringisinherentlyexperimental,yetrelativelyfewdesignedexperimentshavebeenconducted.Softwareengineeringeducationprogramsmuststressthedesirability,whereverfeasible,ofvalidatingnewtechniquesthroughtheuseofstatisticallyvalid,designedexperiments.Partofthereasonforthelackofexperimentationinsoftwareengineeringmayinvolvethelargevariabilityinhumanprogrammingcapabilities.AspointedoutinChapter5,themosttalented
programmermaybe20timesmoreproductivethantheleasttalented.Thisdisparitymakesitdifficulttoconductexperimentsbecausethebetween-subjectvariabilitytendstooverwhelmthetreatmenteffects.Experimentaldesignsthataddressbroadvariabilityinsubjectsshouldbeemphasizedinthesoftwareengineeringcurriculum.Asimilaremphasisshouldbegiventorandom-andfixed-effectsmodelswithhierarchicalstructureandtodistinguishingwithin-andbetween-experimentvariability.
ThereisalsoaroleforthestatisticsprofessioninthedevelopmentofguidelinesforexperimentsinsoftwareengineeringakintothosemandatedbytheFoodandDrugAdministrationforclinicaltrials.Theseguidelineswillrequirereformulationinthesoftwareengineeringcontextwiththepossibleinvolvementofvariousindustryandacademicforums,includingtheInstituteofElectricalandElectronicsEngineers,theAmericanStatisticalAssociation,andtheSoftwareEngineeringInstitute.
Exploratorydataanalysis.Itisimportanttoappreciatethestrengthsandthelimitationsofavailabledatabychallengingthedatawithabatteryofnumerical,tabular,andgraphicalmethods.Exploratorydataanalysismethods(e.g.,Tukey,1977;MostellerandTukey,1977)areessentially"modelfree,"sothatinvestigatorscanbesurprisedby
Page65
unexpectedbehaviorratherthanhavetheirthinkingconstrainedbywhatisexpected.Oneoftheattitudestowardstatisticalanalysisthatisimportanttoconveyisthatof
data=fit+residual.
Theiterativenatureofimprovingthemodelfitbyremovingstructurefromtheresidualsmustbestressedindiscussionsofstatisticalmodeling.
Modeling.Themodelsusedbystatisticiansdifferdramaticallyfromthoseusedbynonstatisticians.Thedifferencesstemfromadvancesinthestatisticalcommunityinthepastdecadethateffectivelyrelaxassumptionsoflinearityfornearlyallclassicaltechniques.Thisrelaxationisobtainedbyassumingonlylocallinearityandusingsmoothingtechniques(e.g.,splines)toregularizethesolutions(HastieandTibshirani,1990).Theresultisquiteflexiblebutinterpretablemodelsthatarerelativelyunknownoutsidethestatisticscommunity.Arguablythesemorerecentmethodslackthewell-studiedinferentialpropertiesofclassicaltechniques,butthatdrawbackisexpectedtoberemediedincomingyears.Educationalinformationexchangesshouldbeconductedtostimulatemorefrequentandwideruseofsuchcomparativelyrecenttechniques.
Riskanalysis.Softwaresystemsareoftenusedinconjunctionwithothersoftwareandhardwaresystems.Forexample,intelecommunications,anoriginatingcallisconnectedbyswitchingsoftware;however,theactualconnectionismadebyphysicalcables,transmissioncells,andothercomponents.Themegasystemsthuscreatedrunournation'stelephonesystems,stockmarkets,andnuclearpowerplants.Failurescanbeveryexpensive,ifnotcatastrophic.Thus,itisessentialtohavesoftwareandhardwaresystemsbuiltinsuchawaythattheycantoleratefaults
andprovideminimalfunctionality,whileprecludingacatastrophicfailure.Thistypeofsystemrobustnessisrelatedtoso-calledfault-tolerantdesignofsoftware(Leveson,1986).
Riskanalysishasplayedakeyroleinidentifyingfault-pronecomponentsofhardwaresystemsandhashelpedinmanagingtherisksassociatedwithverycomplexhardware-softwaresystems.AparadigmsuggestedbyDalaletal.(1989)forriskmanagementforthespaceshuttleprogramandcorrespondingstatisticalmethodsareimportantinthiscontext.Forsoftwaresystems,riskanalysistypicallybeginswithidentifyingprogrammingstyles,characteristicsofthemodulesresponsibleformostsoftwarefaults,andsoon.Statisticalanalysisofroot-causedataleadstoariskprofileforasystemandcanbeusefulinriskreduction.Riskmanagementalsoinvolvesconsiderationoftheprobabilityofoccurrenceofvariousfailurescenarios.SuchprobabilitiesareobtainedeitherbyusingtheDelphimethod(e.g.,Dalkey,1972;Pill,1971)orbyanalyzinghistoricaldata.Oneofthekeyrequirementsinfailure-scenarioanalysisistodynamicallyupdateinformationaboutthescenariosasnewdataonsystembehaviorbecomeavailable,suchasachanginguserprofile.
Page66
Attitudetowardassumptions.Assoftwareengineersareaware,amajordifferencebetweenstatisticsandmathematicsisthatforthelatter,itmattersonlythatassumptionsbecorrectlystated,whereasfortheformer,itisessentialthattheprevailingassumptionsbesupportedbythedata.Thisdistinctionisimportant,butunfortunatelyitisoftentakentooliterallybymanywhousestatisticaltechniques.Tukeyhaslongarguedthatwhatisimportantisnotsomuchthatassumptionsareviolatedbutratherthattheireffectonconclusionsiswellunderstood.Thusforalinearmodel,wherethestandardassumptionsincludenormality,homoscedasticity,andindependence,theirimportancetostatementsofinferenceisexactlyintheoppositeorder.Statisticstextbooks,courses,andconsultingactivitiesshouldconveythestatistician'slevelofunderstandingofandperspectiveontheimportanceofassumptionsforstatisticalinferencemethods.
Visualization.Theimportanceofplottingdatainallaspectsofstatisticalworkcannotbeoveremphasized.Graphicsisimportantinexploratorystagestoascertainhowcomplexamodelthedatacansupport;intheanalysisstagefordisplayofresidualstoexaminewhatacurrentlyentertainedmodelhasfailedtoaccountfor;andinthepresentationstagewheregraphicscanprovidesuccinctandconvincingsummariesofthestatisticalanalysisandassociateduncertainty.Visualizationcanalsohelpsoftwareengineerscopewith,andunderstand,thehugequantitiesofdatacollectedinthesoftwaredevelopmentprocess.
Tools.Softwareengineerstendtothinkofstatisticiansaspeoplewhoknowhowtorunaregressionsoftwarepackage.Althoughstatisticiansprefertothinkofthemselvesmoreasproblemsolvers,itisstillimportantthattheypointoutgoodstatisticalcomputingtools-forinstance,S,SAS,GLIM,RS1,andsoon-tosoftware
engineers.ACATSreport(NRC,1991)attemptstoprovideanoverviewofstatisticalcomputinglanguages,systems,andpackages,butforsuchmaterialtobeusefultosoftwareengineers,amorefocusedoverviewwillberequired.
Page67
ReferencesAbdel-Ghaly,A.A.,P.Y.Chan,andB.Littlewood.1986.Evaluationofcompetingsoftwarereliabilitypredictions.IEEETrans.SoftwareEng.SE-12(9):950-967.
Abdel-Hamid,T.1991.SoftwareProjectDynamics:AnIntegratedApproach.EnglewoodCliffs,N.J.:Prentice-Hall.
AmericanHeritageDictionaryoftheEnglishLanguage,The.1981.Boston:HoughtonMifflin.
AmericanStatisticalAssociation(ASA).1993.CombiningInformation:StatisticalIssuesandOpportunitiesforResearch,ContemporaryStatisticsSeries,No.1.Alexandria,Va.:AmericanStatisticalAssociation.
Baecker,R.M.andA.Marcus.1988.HumanFactorsandTypographyforMoreReadablePrograms.Reading,Mass.:AddisonWesley.
Baker,M.J.andS.G.Eick.1995.Space-fillingsoftwaredisplays.J.VisualLanguagesComput.6(2).Inpress.
Basili,V.1993.Measurement,analysisandmodeling,andexperimentationinsoftwareengineering.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
Basili,V.andD.Weiss.1984.Amethodologyforcollectingvalidsoftwareengineeringdata.IEEETrans.SoftwareEng.SE-10:6.
Becker,R.A.andW.S.Cleveland.1987.Brushingscatterplots.Technometrics29:127-142.
Becker,R.A.,W.S.Cleveland,andA.R.Wilks.1987.Dynamicgraphicsfordataanalysis.StatisticalScience2:355-383.
Beckman,R.J.andM.D.McKay.1987.MonteCarloestimationunderdifferentdistributionsusingthesamesimulation.Technometrics29:153-160.
Blum,M.,M.Luby,andR.Rubinfeld.1989.Programresultcheckingagainstadaptiveprogramsandincryptographicsettings.Pp.107-118inDistributedComputingandCryptography,J.FeigenbaumandM.Merritt,eds.DIMACS:SeriesinDiscreteMathematicsandTheoreticalComputerScience,Vol.2.Providence,R.I.:AmericanMathematicalSociety.
Blum,M.,M.Luby,andR.Rubinfeld.1990.Self-testing/correctingwithapplicationstonumericalproblems.STOC22:73-83.
Boehm,B.W.1981.SoftwareEngineeringEconomics.EngelwoodCliffs,N.J.:PrenticeHall.
Brocklehurst,S.andB.Littlewood.1992.Newwaystogetaccuratereliabilitymeasures.IEEESoftware9(4):34-42.
Brown,M.H.andJ.Hershberger.1992.Colorandsoundinalgorithmanimation.IEEEComputer25(12):52-63.
Burnham,K.P.andW.S.Overton.1978.Estimationofthesizeofaclosedpopulationwhencaptureprobabilitiesvaryamonganimals.Biometrika45:343-359.
Chillarege,R.,I.Bhandari,J.Chaar,M.Halliday,D.Moebus,B.Ray,andM.Wong.1992.Orthogonaldefectclassification-Aconceptforin-processmeasurements.IEEETrans.Software.Eng.SE-18:943-955.
Cohen,D.M.,S.R.Dalal,A.Kaija,andG.Patton.1994.Theautomaticefficienttestgenerator(AETG)system.Pp.303-309inProceedingsofthe5thInternationalSymposiumonSoftware
ReliabilityEngineering.LosAlamitos,Calif.:IEEEComputerSocietyPress.
Page68
Curtis,W.1988.Theimpactofindividualdifferencesinprogrammers.Pp.279-294inWorkingwithComputers:TheoryversusOutcome,G.C.vanderVeeretal.,eds.SanDiego,Calif.:AcademicPress.
Dalal,S.R.andC.L.Mallows.1988.Whenshouldonestopsoftwaretesting?J.Am.Statist.Assoc.83:872-879.
Dalal,S.R.andC.L.Mallows.1990.Somegraphicalaidsfordecidingwhentostoptestingsoftware.IEEEJ.SelectedAreasinCommunications8:169-175.(Specialissueonsoftwarequalityandproductivity.)
Dalal,S.R.andC.L.Mallows.1992.Buyingwithexactconfidence.Ann.Appl.Probab.2:752-765.
Dalal,S.R.andA.M.McIntosh.1994.Whentostoptestingforlargesoftwaresystemswithchangingcode.IEEETrans.SoftwareEng.SE-20:318-323.
Dalal,S.R.,E.B.Fowlkes,andA.B.Hoadley.1989.Riskanalysisofthespaceshuttle:Pre-Challengerpredictionoffailure.J.Am.Stat.Assoc.84:945-957.
Dalal,S.R.,J.R.Horgan,andJ.R.Kettenring.1994.ReliablesoftwareandcommunicationII:Controllingthesoftwaredevelopmentprocess.IEEEJ.SelectedAreasinCommunications12:33-39.
Dalkey,N.C.1972.StudiesintheQualityofLife-DelphiandDecision-Making.Lexington,Mass.:D.C.Heath&Co.
Dawid,A.P.1984.Statisticaltheory:Theprequentialapproach.J.R.Stat.Soc.LondonA147:278-292.
DeMillo,R.A.,D.S.Guindi,K.S.King,W.M.McCracken,andA.J.
Offutt.1988.AnextendedoverviewoftheMOTHRAmutationsystem.Pp.142-151inProceedingsoftheSecondWorkshoponSoftwareTesting,VerificationandAnalysis.Alberta,Canada:Banff.
Ebert,C.1992.Visualizationtechniquesforanalyzingandevaluatingsoftwaremeasures.IEEETrans.SoftwareEng.11(18):1029-1034.
Eckhardt,D.E.andL.D.Lee.1985.Atheoreticalbasisofmultiversionsoftwaresubjecttocoincidenterrors.IEEETrans.SoftwareEng.SE-11:1511-1517.
Eckhardt,D.E.,A.K.Caglayan,J.C.Knight,L.D.Lee,D.F.McAllister,M.A.Vouk,andJ.P.Kelly.1991.Anexperimentalevaluationofsoftwareredundancyasastrategyforimprovingreliability.IEEETrans.SoftwareEng.SE-17(7):692-702.
Eick,S.G.1994.Graphicallydisplayingtext.J.Comput.GraphicalStat.3(2):127-142.
Eick,S.G.,C.R.Loader,M.D.Long,S.A.VanderWiel,andL.G.Votta.1992a.Estimatingsoftwarefaultcontentbeforecoding.Pp.59-65inProceedingsofthe14thInternationalConferenceonSoftwareEngineering(Melbourne,Australia).LosAlamitos,Calif.:IEEEComputerSocietyPress.
Eick,S.G.,J.L.Steffen,andE.E.Sumner.1992b.(Atoolforvisualizinglineorientedsoftware.IEEETrans.SoftwareEng.11(18):957-968.
Ganser,E.R.,E.E.Koutsofios,S.C.North,andK.-P.Vo.1993.Atechniquefordrawingdirectedgraphs.IEEETrans.SoftwareEng.SE-19(3):214-230.
Halstead,M.H.1977.ElementsofSoftwareScience.NewYork:Elsevier.
Hastie,T.J.andR.J.Tibshirani.1990.GeneralizedAdditiveModels.London:Chapman&Hall.
Page69
Henrion,M.andB.Fischhoff.1986.Assessinguncertaintyinphysicalconstants.Am.J.Phys.54(9):791-798.
Horgan,J.R.andS.London.1992.ATAC:AdataflowtestingtoolforC.Pp.2-10inProceedingsoftheSecondSymposiumonAssessmentofQualitySoftwareDevelopmentTools(May27-29,1992,NewOrleans,La.),E.Nahouraii,ed.LosAlamitos,Calif.:IEEEComputerSocietyPress.
Humphrey,W.S.1988.Characterizingthesoftwareprocess:Amaturityframework.IEEESoftware5:73-79.
Humphrey,W.S.1989.ManagingtheSoftwareProcess.Reading,Mass.:AddisonWesley.
Iman,R.L.andW.J.Conover.1982.Adistributionfreeapproachtoinducingrankcorrelationsamonginputvariables.Commun.Stat.,PartB11:311-334.
InstituteofElectricalandElectronicsEngineers(IEEE).1990.IEEEStandardGlossaryofSoftwareEngineeringTerminology.IEEEStd.610.12-1990.NewYork:IEEE,Inc.
InstituteofElectricalandElectronicsEngineers(IEEE).1993.IEEEStandardforSoftwareProductivityMetrics.IEEEComputerSociety,IEEEStd.1045-1992,January11,1993.NewYork:IEEE,Inc.
Ishikawa,K.1976.GuidetoQualityControl.Tokyo,Japan:AsianProductivityOrganization.
Kahneman,D.,P.Slovic,andA.Tversky,eds.1982.JudgmentUnderUncertainty:HeuristicsandBiases.NewYork:CambridgeUniversityPress.
Keller,T.W.1993.Maintenanceprocessmetricsforspaceshuttle
flightsoftware.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
Kitchenham,B.1991.Nevermindthemetrics;whataboutthenumbers!Pp.28-37inFormalAspectsofMeasurement,T.Denvir,R.Herman,andR.W.Whitty,eds.ProceedingsoftheBCS-FACSWorkshop,May5,1991,SouthBankUniversity,London.NewYork:Springer-Verlag.
Kitchenham,B.1992.AnalyzingSoftwareData.MetricsClubReport.Manchester,England:NationalComputingCentre,Ltd.
Knight,J.C.andN.G.Leveson.1986.Experimentalevaluationoftheassumptionofindependenceinmultiversionsoftware.IEEETrans.SoftwareEng.SE-12(1):96-109.
Lee,D.andM.Yanakakis.1992.On-lineminimizationoftransitionsystems.Pp.264-274inProceedingsofthe24thAnnualACMSymposiumonTheoryofComputing.NewYork:AssociationforComputingMachinery.
Leveson,N.G.1986.Softwaresafety:why,what,andhow.ACMComput.Surveys8:125-163.
Lipton,R.1989.Newdirectionsintesting.Pp.191-202inDistributedComputingandCryptography,J.FeigenbaumandM.Merritt,eds.DIMACS:SeriesinDiscreteMathematicsandTheoreticalComputerScience,Vol.2.Providence,R.I.:AmericanMathematicalSociety.
Littlewood,B.1979.Softwarereliabilitymodelformodularprogramstructure.IEEETrans.ReliabilityR-28(3):241-246.
Littlewood,B.andD.R.Miller.1989.Conceptualmodelingofcoincidentfailuresinmultiversionsoftware.IEEETrans.SoftwareEng.SE-15(12):1596-1614.
Page70
Littlewood,B.andL.Strigini.1993.Validationofultra-highdependabilityforsoftware-basedsystems.CommunicationsoftheAssociationforComputingMachinery36(11):69-80.
Mallows,C.L.1973.SomecommentsonCp.Technometrics15:661-667.
McCabe,T.J.1976.Acomplexitymeasure.IEEETrans.SoftwareEng.SE-1(3):312-327.
McKay,M.D.,W.J.Conover,andR.J.Beckman.1979.Acomparisonofthreemethodsforselectingvaluesofinputvariablesintheanalysisofoutputfromacomputercode.Technometrics21:239-245.
Mosteller,F.andJ.W.Tukey.1977.DataAnalysisandRegression:ASecondCourseinStatistics.Reading,Mass.:AddisonWesley.
Munson,J.C.1993.Therelationshipbetweensoftwaremetricsandqualitymetrics.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
NationalResearchCouncil(NRC).1991.TheFutureofStatisticalSoftware.CommitteeonAppliedandTheoreticalStatistics,BoardonMathematicalSciences.Washington,D.C.:NationalAcademyPress.
NationalResearchCouncil(NRC).1992.CombiningInformation:StatisticalIssuesandOpportunitiesforResearch.CommitteeonAppliedandTheoreticalStatistics,BoardonMathematicalSciences.Washington,D.C.:NationalAcademyPress.(Reprintedin1993bytheAmericanStatisticalAssociationasVolume1intheASAContemporaryStatisticsseries.)
Nayak,T.K.1988.Estimatingpopulationsizebyrecapturesampling.Biometrika75:113-120.
Phadke,M.S.1993.Robustdesignmethodforsoftwareengineering.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
Pill,J.1971.TheDelphimethod:Substance,context,acritiqueandanannotatedbibliography.Socio-EconomicPlanningScience5:57-71.
Randell,B.andP.Naur,eds.1968.SoftwareEngineeringConceptsandTechniques.NATOScienceCommittee,ProceedingsoftheNATOConferences,October7-11,1968,Garmisch,Germany.NewYork:Petrocelli/Charter.
Sackman,H.1970.Man-ComputerProblem-Solving:ExperimentalEvaluationofTime-SharingandBatchProcessing.NewYork:Auerbach.
Siegrist,K.1988a.ReliabilityofsystemswithMarkovtransfersofcontrol.IEEETrans.SoftwareEng.SE-14(7):1049-1053.
Siegrist,K.1988b.ReliabilityofsystemswithMarkovtransfersofcontrol,II.IEEETrans.SoftwareEng.SE-14(10):1478-1480.
Singpurwalla,N.D.1991.Determininganoptimaltimeintervalfortestinganddebuggingsoftware.IEEETrans.SoftwareEng.17(4):313-319.
Smith,A.F.M.andG.O.Roberts.1993.BayesiancomputationviatheGibbssamplerandrelatedMarkovchainMonteCarlomethods.J.R.Stat.Soc.LondonB55(1):3-23.
Stasko,J.1993.Softwarevisualization.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
Page71
Stein,M.1987.LargesamplepropertiesofsimulationsusingLatinhypercubesampling.Technometrics29:143-151.
Tukey,J.W.1977.ExploratoryDataAnalysis.Reading,Mass.:AddisonWesley.
Tukey,J.W.1991.Useofmanycovariatesinclinicaltrials.Int.Stat.Rev.59(2):123-128.
VanderWiel,S.A.andL.G.Votta.1993.Assessingsoftwaredesignsusingcapture-recapturemethods.IEEETrans.SoftwareEng.SE-19(11):1045-1054.
Zuse,H.1991.SoftwareComplexity:MeasuresandMethods.Berlin:deGruyter.
Zweben,S.1993.Statisticalmethodsinastudyofsoftwarere-useprinciples.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.
Page72
Appendix:ForumProgramMONDAY,OCTOBER11,1993
8:00AM WelcomeandIntroductions
8:05AM SessiononSoftwareProcess
SessionChair:GloriaJ.Davis(NASA-AmesResearchCenter)
InvitedSpeakers:TedW.Keller(IBMCorporation),DavidCard(ComputerSciencesCorporation)
9:45AM Break
10:15AM SessiononSoftwareMetrics
SessionChair:BillCurtis(CarnegieMellonUniversity)
InvitedSpeakers:VictorR.Basili(UniversityofMaryland),JohnC.Munson(UniversityofFlorida)
NOONBreak
1:00PM SessiononSoftwareDependabilityandTesting
SessionChair:RichardA.DeMillo(PurdueUniversity)
InvitedSpeakers:JohnC.Knight(UniversityofVirginia),RichardLipton(PrincetonUniversity)
2:25PM Break
3:15PM SessiononCaseStudies
SessionChair:DarylPregibon(AT&TBellLaboratories,
MurrayHill)
InvitedSpeakers:TsuneoYamaura(HitachiComputerProducts-America,Inc.),StuartZweben(OhioStateUniversity)
5:00PM Adjourn