Statistical software engineering

title: StatisticalSoftwareEngineeringauthor:

publisher: NationalAcademiesPressisbn10|asin: 0309053447printisbn13: 9780309053440ebookisbn13: 9780585002101

language: Englishsubject Softwareengineering--Statisticalmethods.

publicationdate: 1996lcc: QA76.758.N381996ebddc: 005.1

subject: Softwareengineering--Statisticalmethods.

TheNationalResearchCouncilestablishedtheBoardonMathematicalSciencesin1984.TheobjectivesoftheBoardaretomaintainawarenessandactiveconcernforthehealthofthemathematicalsciencesandtoserveasthefocalpointintheNationalResearchCouncilforissuesconnectedwiththemathematicalsciences.TheBoardholdssymposiaandworkshopsandpreparesreportsonemergingissuesandareasofresearchandeducation,conductsstudiesforfederalagencies,andmaintainsliaisonwiththemathematicalsciencescommunities,academia,professionalsocieties,andindustry.

TheBoardgratefullyacknowledgesongoingcoresupportfromtheAirForceOfficeofScientificResearch,ArmyResearchOffice,DepartmentofEnergy,NationalScienceFoundation,NationalSecurityAgency,andOfficeofNavalResearch.

Pagei

StatisticalSoftwareEngineering

PanelonStatisticalMethodsinSoftwareEngineeringCommitteeonAppliedandTheoreticalStatistics

BoardonMathematicalSciencesCommissiononPhysicalSciences,Mathematics,andApplications

NationalResearchCouncil

NationalAcademyPressWashington,D.C.1996

Pageii

NOTICE:TheprojectthatisthesubjectofthisreportwasapprovedbytheGoverningBoardoftheNationalResearchCouncil,whosemembersaredrawnfromthecouncilsoftheNationalAcademyofSciences,theNationalAcademyofEngineering,andtheInstituteofMedicine.

TheNationalAcademyofSciencesisaprivate,nonprofit,self-perpetuatingsocietyofdistinguishedscholarsengagedinscientificandengineeringresearch,dedicatedtothefurtheranceofscienceandtechnologyandtotheiruseforthegeneralwelfare.UpontheauthorityofthechartergrantedtoitbytheCongressin1863,theAcademyhasamandatethatrequiresittoadvisethefederalgovernmentonscientificandtechnicalmatters.Dr.BruceAlbertsispresidentoftheNationalAcademyofSciences.

TheNationalAcademyofEngineeringwasestablishedin1964,underthecharteroftheNationalAcademyofSciences,asaparallelorganizationofoutstandingengineers.Itisautonomousinitsadministrationandintheselectionofitsmembers,sharingwiththeNationalAcademyofSciencestheresponsibilityforadvisingthefederalgovernment.TheNationalAcademyofEngineeringalsosponsorsengineeringprogramsaimedatmeetingnationalneeds,encourageseducationandresearch,andrecognizesthesuperiorachievementofengineers.Dr.HaroldLiebowitzispresidentoftheNationalAcademyofEngineering.

TheInstituteofMedicinewasestablishedin1970bytheNationalAcademyofSciencestosecuretheservicesofeminentmembersofappropriateprofessionsintheexaminationofpolicymatterspertainingtothehealthofthepublic.TheInstituteactsundertheresponsibilitygiventotheNationalAcademyofSciencesbyitscongressionalchartertobeanadvisertothefederalgovernmentand,

uponitsowninitiative,toidentifyissuesofmedicalcare,research,andeducation.Dr.KennethI.ShineispresidentoftheInstituteofMedicine.

TheNationalResearchCouncilwasorganizedbytheNationalAcademyofSciencesin1916toassociatethebroadcommunityofscienceandtechnologywiththeAcademy'spurposesoffurtheringknowledgeandadvisingthefederalgovernment.FunctioninginaccordancewithgeneralpoliciesdeterminedbytheAcademy,theCouncilhasbecometheprincipaloperatingagencyofboththeNationalAcademyofSciencesandtheNationalAcademyofEngineeringinprovidingservicestothegovernment,thepublic,andthescientificandengineeringcommunities.TheCouncilisadministeredjointlybybothAcademiesandtheInstituteofMedicine.Dr.BruceAlbertsandDr.HaroldLiebowitzarechairmanandvice-chairman,respectively,oftheNationalResearchCouncil.

ThisprojectwassupportedbytheAdvancedResearchProjectsAgency,ArmyResearchOffice,NationalScienceFoundation,andDepartmentoftheNavy'sOfficeoftheChiefofNavalResearch.Anyopinions,findings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthorsanddonotnecessarilyreflecttheviewsofthesponsors.Furthermore,thecontentofthereportdoesnotnecessarilyreflectthepositionorthepolicyoftheU.S.government,andnoofficialendorsementshouldbeinferred.

Copyright1996bytheNationalAcademyofSciences.Allrightsreserved.

LibraryofCongressCatalogCardNumber95-71101InternationalStandardBookNumber0-309-05344-7

Additionalcopiesofthisreportareavailablefrom:NationalAcademyPress,Box2852101ConstitutionAvenue,N.W.

Washington,D.C.20055800-624-6242202-334-3313(intheWashingtonmetropolitanarea)B-676

PrintedintheUnitedStatesofAmerica

Pageiii

PANELONSTATISTICALMETHODSINSOFTWAREENGINEERING

DARYLPREGIBON,AT&TBellLaboratories,Chair

HERMANCHERNOFF,HarvardUniversity

BILLCURTIS,CarnegieMellonUniversity

SIDDHARTHAR.DALAL,Bellcore

GLORIAJ.DAVIS,NASA-AmesResearchCenter

RICHARDA.DEMILLO,Bellcore

STEPHENG.EICK,AT&TBellLaboratories

BEVLITTLEWOOD,CityUniversity,London,England

CHITOORV.RAMAMOORTHY,UniversityofCalifornia,Berkeley

Staff

JOHNR.TUCKER,Director

Pageiv

COMMITTEEONAPPLIEDANDTHEORETICALSTATISTICS

JONR.KETTENRING,Bellcore,Chair

RICHARDA.BERK,UniversityofCalifornia,LosAngeles

LAWRENCED.BROWN,UniversityofPennsylvania

NICHOLASP.JEWELL,UniversityofCalifornia,Berkeley

JAMESD.KUELBS,UniversityofWisconsin

JOHNLEHOCZKY,CarnegieMellonUniversity

DARYLPREGIBON,AT&TBellLaboratories

FRITZSCHEUREN,GeorgeWashingtonUniversity

J.LAURIESNELL,DartmouthCollege

ELIZABETHTHOMPSON,UniversityofWashington

Staff

JACKALEXANDER,ProgramOfficer

Pagev

BOARDONMATHEMATICALSCIENCES

AVNERFRIEDMAN,UniversityofMinnesota,Chair

LOUISAUSLANDER,CityUniversityofNewYork

HYMANBASS,ColumbiaUniversity

MARYELLENBOCK,PurdueUniversity

PETERE.CASTRO,EastmanKodakCompany

FANR.K.CHUNG,UniversityofPennsylvania

R.DUNCANLUCE,UniversityofCalifornia,Irvine

SUSANMONTGOMERY,UniversityofSouthernCalifornia

GEORGENEMHAUSER,GeorgiaInstituteofTechnology

ANILNERODE,CornellUniversity

IMGRAMOLKIN,StanfordUniversity

RONALDF.PEIERLS,BrookhavenNationalLaboratory

DONALDST.P.RICHARDS,UniversityofVirginia

MARYF.WHEELER,RiceUniversity

WILLIAMP.ZIEMER,IndianaUniversity

ExOfficioMember

JONR.KETTENRING,BellcoreChair,CommitteeonAppliedandTheoreticalStatistics

Staff

JOHNR.TUCKER,Director

JACKALEXANDER,ProgramOfficer

RUTHE.O'BRIEN,StaffAssociate

BARBARAW.WRIGHT,AdministrativeAssistant

Pagevi

COMMISSIONONPHYSICALSCIENCES,MATHEMATICS,ANDAPPLICATIONS

ROBERTJ.HERMANN,UnitedTechnologiesCorporation,Chair

STEPHENL.ADLER,InstituteforAdvancedStudy

PETERM.BANKS,EnvironmentalResearchInstituteofMichigan

SYLVIAT.CEYER,MassachusettsInstituteofTechnology

L.LOUISHEGEDUS,W.R.GraceandCompany

JOHNE.HOPCROFT,CornellUniversity

RHONDAJ.HUGHES,BrynMawrCollege

SHIRLEYA.JACKSON,U.S.NuclearRegulatoryCommission

KENNETHI.KELLERMANN,NationalRadioAstronomyObservatory

KENKENNEDY,RiceUniversity

THOMASA.PRINCE,CaliforniaInstituteofTechnology

JEROMESACKS,NationalInstituteofStatisticalSciences

L.E.SCRIVEN,UniversityofMinnesota

LEONT.SILVER,CaliforniaInstituteofTechnology

CHARLESP.SLICHTER,UniversityofIllinoisatUrbana-Champaign

ALVINW.TRIVELPIECE,OakRidgeNationalLaboratory

SHMUELWINOGRAD,IBMT.J.WatsonResearchCenter

CHARLESA.ZRAKET,MitreCorporation(retired)

NORMANMETZGER,ExecutiveDirector

Pagevii

PrefaceThedevelopmentandtheproductionofhigh-quality,reliable,complexcomputersoftwarehavebecomecriticalissuesintheenormousworldwidecomputertechnologymarket.Thecapabilitytoefficientlyengineercomputersoftwaredevelopmentandproductionprocessesiscentraltothefutureeconomicstrength,competitiveness,andnationalsecurityoftheUnitedStates.However,problemsrelatedtosoftwarequality,reliability,andsafetypersist,aprominentexamplebeingthefailureonseveraloccasionsofmajorlocalandnationaltelecommunicationsnetworks.Itisnowacknowledgedthatthecostsofproducingandmaintainingsoftwaregreatlyexceedthecostsofdeveloping,producing,andmaintaininghardware.Thusthedevelopmentandapplicationofcost-savingtools,alongwithtechniquesforensuringqualityandreliabilityinsoftwareengineering,areprimarygoalsintoday'ssoftwareindustry.Theenormityofthissoftwareproductionandmaintenanceactivityissuchthatanytoolscontributingtoseriouscostsavingswillyieldatremendouspayoffinabsoluteterms.

AtameetingoftheCommitteeonAppliedandTheoreticalStatistics(CATS)oftheNationalResearchCouncil(NRC),participantsidentifiedsoftwareengineeringasanareapresentingnumerousopportunitiesforfruitfulcontributionsfromstatisticsandofferingexcellentpotentialforbeneficialinteractionsbetweenstatisticiansandsoftwareengineersthatmightpromoteimprovedsoftwareengineeringpracticeandcostsavings.Todelineatetheseopportunitiesandfocusattentiononcontextspromisingusefulinteractions,CATSconvenedastudypaneltogatherinformationandproduceareportthatwould(1)exhibitimprovedmethodsforassessingsoftwareproductivity,quality,

reliability,associatedrisk,andsafetyandformanagingsoftwaredevelopmentprocesses,(2)outlineaprogramofresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineeringwiththeaimofmotivatingandattractingnewresearchersfromthemathematicalsciences,statistics,andsoftwareengineeringfieldstotackletheseimportantandpressingproblemareas,and(3)emphasizetherelevanceofusingrigorousstatisticalandprobabilistictechniquesinsoftwareengineeringcontextsandsuggestopportunitiesforfurtherresearchinthisdirection.

Tohelpidentifyimportantissuesandobtainabroadrangeofperspectivesonthem,thepanelorganizedaninformation-gatheringforumonOctober11-12,1993,atwhich12invitedspeakersaddressedhowstatisticalmethodsimpingeonthesoftwaredevelopmentprocess,softwaremetrics,softwaredependabilityandtesting,andsoftwarevisualization.Theforumalsoincludedconsiderationofnonstandardmethodsandselectcasestudies(seetheforumprogramintheappendix).Thepanelhopesthatitsreport,whichisbasedonthepanel'sexpertiseaswellasinformationpresentedattheforum,willcontributetopositiveadvancesinsoftwareengineeringand,asasubsidiarybenefit,beastimulusforothercloselyrelateddisciplines,e.g.,appliedmathematics,operationsresearch,computerscience,andsystemsandindustrialengineering.Thepanelis,infact,veryenthusiasticabouttheopportunitiesfacingthestatisticalcommunityandhopestoconveythisenthusiasminthisreport.

Thepanelgratefullyacknowledgestheassistanceandinformationprovidedbyanumberofindividuals,includingthe12forumspeakersT.W.Keller,D.Card,V.R.Basili,J.C.Munson,J.C.Knight,R.Lipton,T.Yamaura,S.Zweben,M.S.Phadke,E.E.Sumner,Jr.,W.Hill,andJ.Staskofouranonymousreviewers,theNRCstaffoftheBoardonMathematicalScienceswhosupportedthevariousfacetsofthisproject,andSusanMauriziforherworkineditingthemanuscript.

Pageix

ContentsEXECUTIVESUMMARY 1

1INTRODUCTION 5

2CASESTUDY:NASASPACESHUTTLEFLIGHTCONTROLSOFTWARE

9

OverviewofRequirements 9

TheOperationalLifeCycle 10

AStatisticalApproachtoManagingtheSoftwareProductionProcess

10

FaultDetection 11

SafetyCertification 12

3ASOFTWAREPRODUCTIONMODEL 13

ProblemFormulationandSpecificationofRequirements 14

Design 14

Implementation 16

Testing 18

4CRITIQUEOFSOMECURRENTAPPLICATIONSOFSTATISTICSINSOFTWAREENGINEERING

27

CostEstimation 27

StatisticalInadequaciesinEstimating 29

ProcessVolatility 30

MaturityandDataGranularity 30

ReliabilityofModelInputs 31

ManagingtoEstimates 32

AssessmentandReliability 32

ReliabilityGrowthModeling 32

InfluenceoftheDevelopmentProcessonSoftwareDependability

36

InfluenceoftheOperationalEnvironmentonSoftwareDependability

37

Safety-CriticalSoftwareandtheProblemofAssuringUltrahighDependability

38

DesignDiversity,FaultTolerance,andGeneralIssuesofDependence

38

JudgmentandDecision-makingFramework 39

StructuralModelingIssues 40

Experimentation,DataCollection,andGeneralStatisticalTechniques

40

SoftwareMeasurementandMetrics 41

5STATISTICALCHALLENGES 43

SoftwareEngineeringExperimentalIssues 43

CombiningInformation 46

VisualizationinSoftwareEngineering 48

Pagex

ConfigurationManagementData 49

FunctionCallGraphs 50

TestCodeCoverage 50

CodeMetrics 50

ChallengesforVisualization 52

OpportunitiesforVisualization 52

OrthogonalDefectClassification 59

6 SUMMARYANDCONCLUSIONS 61

InstitutionalModelforResearch 62

ModelforDataCollectionandAnalysis 62

IssuesinEducation 64

REFERENCES 67

APPENDIX:FORUMPROGRAM 72

Page1

ExecutiveSummarySoftware,acriticalcoreindustrythatisessentialtoU.S.interestsinscience,technology,anddefense,isubiquitousintoday'ssociety.Softwarecoexistswithhardwareinourtransportation,communication,financial,andmedicalsystems.Asthesesystemsgrowinsizeandcomplexityandourdependenceonthemincreases,theneedtoensuresoftwarereliabilityandsafety,faulttolerance,anddependabilitybecomesparamount.Buildingsoftwareisnowviewedasanengineeringdiscipline,softwareengineering,whichaimstodevelopmethodologiesandprocedurestocontrolthewholesoftwaredevelopmentprocess.Besidestheissueofcontrollingandimprovingsoftwarequality,theissueofimprovingtheproductivityofthesoftwaredevelopmentprocessisalsobecomingimportantfromtheindustrialperspective.

PURPOSEANDSCOPEOFTHISSTUDY

Althoughstatisticalmethodshavealonghistoryofcontributingtoimprovedpracticesinmanufacturingandintraditionalareasofscience,technology,andmedicine,theyhaveuptonowhadlittleimpactonsoftwaredevelopmentprocesses.Thisreportattemptstobridgetheislandsofknowledgeandexperiencebetweenstatisticsandsoftwareengineeringbyenunciatinganewinterdisciplinaryfield:statisticalsoftwareengineering.Itishopedthatthereportwillhelpseedthefieldofstatisticalsoftwareengineeringbyindicatingopportunitiesforstatisticalthinkingtocontributetoincreasedunderstandingofsoftwareandsoftwareproduction,andtherebyenhancethequalityandproductivityofboth.

Thisreportistheresultofastudybyapanelconvenedbythe

CommitteeonAppliedandTheoreticalStatistics(CATS),astandingcommitteeoftheBoardonMathematicalSciencesoftheNationalResearchCouncil,toidentifychallengesandopportunitiesinthedevelopmentandimplementationofsoftwareinvolvingsignificantstatisticalcontent.Inadditiontopointingouttherelevanceofrigorousstatisticalandprobabilistictechniquestopressingsoftwareengineeringconcerns,thepaneloutlinesopportunitiesforfurtherresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineering.Theaimistomotivatenewresearchersfromstatisticsandthemathematicalsciencestotackleproblemswithrelevanceforsoftwaredevelopment,aswellastosuggestastatisticalapproachtosoftwareengineeringconcernsthatthepanelhopessoftwareengineerswillfindrefreshingandstimulating.Thisreportalsotouchesonimportantissuesintrainingandeducationforsoftwareengineersinthestatisticalsciencesandforstatisticianswithaninterestinsoftwareengineering.

Centraltothisreport'stheme,andessentialtostatisticalsoftwareengineering,istheroleofdata:whereverdataareusedorcanbegeneratedinthesoftwarelifecycle,statisticalmethodscanbebroughttobearfordescription,estimation,andprediction.Nevertheless,themajorobstacletoapplyingstatisticalmethodstosoftwareengineeringisthelackofconsistent,high-qualitydataintheresource-allocation,design,review,implementation,andteststagesofsoftwaredevelopment.Statisticiansinterestedinconductingresearchinsoftwareengineering

Page2

mustplayaleadershiproleinjustifyingthatresourcesareneededtoacquireandmaintainhigh-qualityandrelevantdata.

Thepanelconjecturesthattheuseofadequatemetricsanddataofgoodqualityistheprimarydifferentiatorbetweensuccessful,productivesoftwaredevelopmentorganizationsandthosethatarestruggling.Althoughthesinglelargestareaofoverlapbetweenstatisticsandsoftwareengineeringcurrentlyconcernssoftwaredevelopmentandproduction,itisthepanel'sviewthatthelargestcontributionsofstatisticstosoftwareengineeringwillbethoseaffectingthequalityandproductivityoffront-endprocesses,thatis,processesthatprecedecodegeneration.Oneofthebiggestimpactsthatthestatisticalcommunitycanmakeinsoftwareengineeringistocombineinformationacrosssoftwareengineeringprojectsasameansofevaluatingeffectsoftechnology,language,organization,andprocess.

CONTENTSOFTHISREPORT

Followinganintroductoryopeningchapterintendedtofamiliarizereaderswithbasicstatisticalsoftwareengineeringconceptsandconcerns,acasestudyoftheNationalAeronauticsandSpaceAdministration(NASA)spaceshuttleflightcontrolsoftwareispresentedinChapter2toillustratesomeofthestatisticalissuesinsoftwareengineering.Chapter3describesawell-knowngeneralsoftwareproductionmodelandassociatedstatisticalissuesandapproaches.AcritiqueofsomecurrentapplicationsofstatisticsandsoftwareengineeringispresentedinChapter4.Chapter5discussesanumberofstatisticalchallengesarisinginsoftwareengineering,andthepanel'sclosingsummaryandconclusionsappearinChapter6.

STATISTICALCHALLENGES

Incomparisonwithotherengineeringdisciplines,softwareengineeringisstillinthedefinitionstage.Characteristicsofestablisheddisciplinesincludehavingdefined,tested,crediblemethodologiesforpractice,assessment,andpredictability.Softwareengineeringcombinesapplicationdomainknowledge,computerscience,statistics,behavioralscience,andhumanfactorsissues.Statisticalchallengesinsoftwareengineeringdiscussedinthisreportincludethefollowing:

Generalizingparticularstatisticalsoftwareengineeringexperimentalresultstoothersettingsandprojects,

Scalingupresultsobtainedinacademicstudiestoindustrialsettings,

Combininginformationacrosssoftwareengineeringprojectsandstudies,

Adoptingexploratorydataanalysisandvisualizationtechniques,

Educatingthesoftwareengineeringcommunityregardingstatisticalapproachesanddataissues,

Developingmethodsofanalysistocopewithqualitativevariables,

Page3

Providingmodelswiththeappropriateerrordistributionsforsoftwareengineeringapplications,and

Enhancingacceleratedlifetesting.

SUMMARYANDCONCLUSIONS

Inthe1990s,complexhardware-basedfunctionalityisbeingreplacedbymoreflexible,software-basedfunctionality,andmassivesoftwaresystemscontainingmillionsoflinesofcodearebeingcreatedbymanyprogrammerswithdifferentbackgrounds,training,andskills.Thechallengeistobuildhuge,high-qualitysystemsinacost-effectivemanner.Thepanelexpectsthischallengetopreoccupythefieldofsoftwareengineeringfortherestofthedecade.Anysetofmethodologiesthatcanhelpinthistaskwillbeinvaluable.Moreimportantly,theuseofsuchmethodologieswilllikelydeterminethecompetitivepositionsoforganizationsandnationsinvolvedinsoftwareproduction.Whatisneededisadetailedunderstandingbystatisticiansofthesoftwareengineeringprocess,aswellasanappreciationbysoftwareengineersofwhatstatisticianscanandcannotdo.

Catalystsessentialforthisproductiveinteractionbetweenstatisticiansandsoftwareengineers,andsomeoftheinterdisciplinaryresearchopportunitiesforsoftwareengineersandstatisticians,includethefollowing:

Amodelforstatisticalresearchinsoftwareengineeringthatiscollaborativeinnature.Theidealcollaborationpartnersstatisticians,softwareengineers,andarealsoftwareprocessorproduct.Barrierstoacademicrewardandrecognitionbarriers,aswellasobstaclestothefundingofcross-disciplinaryresearch,canbeexpectedtodecreaseovertime;intheinterim,industrycanplayaleadershiprolein

nurturingcollaborationsbetweensoftwareengineersandstatisticiansandcanreduceitsownsetofbarriers(forinstance,thoserelatedtoproprietaryandintellectualpropertyinterests).

Amodelfordatacollectionandanalysisthatensurestheavailabilityofhigh-qualitydataforstatisticalapproachestoissuesinsoftwareengineering.Carefulattentiontodataissuesrangingfromdefinitionofmetricstofeed-back/-forwardloops,includingexploratorydataanalysis,statisticalmodeling,defectanalysis,andsoon,isessentialifstatisticalmethodsaretohaveanyappreciableimpactonagivensoftwareprojectunderstudy.Forthisreasonitiscrucialthatthesoftwareindustrytakealeadpositioninresearchonstatisticalsoftwareengineering.

Attentiontorelevantissuesineducation.Enormousopportunitiesandmanypotentialbenefitsarepossibleifthesoftwareengineeringcommunitylearnsaboutrelevantstatisticalmethodsandifstatisticianscontributetoandcooperateintheeducationoffuturesoftwareengineers.Somerelevantareasinclude:

Page4

Designedexperiments.Softwareengineeringisinherentlyexperimental,yetrelativelyfewdesignedexperimentshavebeenconducted.Softwareengineeringeducationprogramsmuststressthedesirability,wherefeasible,ofvalidatingnewtechniquesusingstatisticallyvaliddesignedexperiments.

Exploratorydataanalysis.Exploratorydataanalysismethodsareessentially''modelfree,"wherebytheinvestigatorhopestobesurprisedbyunexpectedbehaviorratherthanhavingthinkingconstrainedtowhatisexpected.

Modeling.Recentadvancesinthestatisticalcommunityinthepastdecadehaveeffectivelyrelaxedthelinearityassumptionsofnearlyallclassicaltechniques.Thereshouldbeanemphasisoneducationalinformationexchangeleadingtomoreandwideruseoftheserecentlydevelopedtechniques.

Riskanalysis.Aparadigmformanagingriskforthespaceshuttleprogram,discussedinChapter2ofthisreport,andthecorrespondingstatisticalmethodscanplayacrucialroleinidentifyingrisk-pronepartsofsoftwaresystemsandofcombinedhardwareandsoftwaresystems.

Attitudetowardassumptions.Softwareengineersshouldbeawarethatviolatingassumptionsisnotasimportantasthoroughlyunderstandingtheviolation'seffectsonconclusions.Statisticstextbooks,courses,andconsultingactivitiesshouldconveythestatistician'slevelofunderstandingaboutandperspectiveontheimportanceandimplicationsofassumptionsforstatisticalinferencemethods.

Visualization.Graphicsisimportantinexploratorystagesinhelpingtoascertainhowcomplexamodelthedataoughttosupport;intheanalysisstage,bywhichresidualsaredisplayedtoexaminewhatthe

currentlyentertainedmodelhasfailedtoaccountfor;andinthepresentationstage,inwhichgraphicscanprovidesuccinctandconvincingsummariesofthestatisticalanalysisandtheassociateduncertainty.Visualizationcanhelpsoftwareengineerscopewith,andunderstand,thehugequantitiesofdatacollectedaspartofthesoftwaredevelopmentprocess.

Tools.Itisimportanttoidentifygoodstatisticalcomputingtoolsforsoftwareengineers.Anoverviewofstatisticalcomputing,languages,systems,andpackagesshouldbedonethatisfocusedspecificallyforthebenefitofsoftwareengineers.

Page5

1Introductionstatistics.Themathematicsofthecollection,organization,andinterpretationofnumericaldata,especiallytheanalysisofpopulationcharacteristicsbyinferencefromsampling.

1

softwareengineering.(1)Theapplicationofasystematic,disciplined,quantifiableapproachtothedevelopment,operation,andmaintenanceofsoftware;thatis,theapplicationofengineeringtosoftware.(2)Thestudyofapproachesasin(1).2

statisticalsoftwareengineering.Theinterdisciplinaryfieldofstatisticsandsoftwareengineeringspecializingintheuseofstatisticalmethodsforcontrollingandimprovingthequalityandproductivityofthepracticesusedincreatingsoftware.

Theabovedefinitionsdescribetheislandsofknowledgeandexperiencethatthisreportattemptstobridge.SoftwareisacriticalcoreindustrythatisessentialtoU.S.nationalinterestsinscience,technology,anddefense.Itisubiquitousintoday'ssociety,coexistingwithhardware(micro-electroniccircuitry)inourtransportation,communication,financial,andmedicalsystems.Thesoftwareinamoderncardiacpacemaker,forexample,consistsofapproximatelyone-halfmegabyteofcodethathelpscontrolthepulserateofpatientswithheartdisorders.Inthisandotherapplications,issuessuchasreliabilityandsafety,faulttolerance,anddependabilityareobviouslyimportant.Fromtheindustrialperspective,soalsoareissues

concernedwithimprovingthequalityandproductivityofthesoftwaredevelopmentprocess.Yetstatisticalmethods,despitethelonghistoryoftheirimpactinmanufacturingaswellasintraditionalareasofscience,technology,andmedicine,haveasyethadlittleimpactoneitherhardwareorsoftwaredevelopment.

ThisreportistheproductofapanelconvenedbytheBoardonMathematicalSciences'CommitteeonAppliedandTheoreticalStatistics(CATS)toidentifychallengesandopportunitiesinsoftwaredevelopmentandimplementationthathaveasignificantstatisticalcomponent.Inattemptingtoidentifyinterrelatedaspectsofstatisticsandsoftwareengineering,itenunciatesanewinterdisciplinaryfield:statisticalsoftwareengineering.Whileemphasizingtherelevanceofapplyingrigorousstatisticalandprobabilistictechniquestoproblemsinsoftwareengineering,thepanelalsopointsoutopportunitiesforfurtherresearchinthestatisticalsciencesandtheirapplicationstosoftwareengineering.Itshopeisthatnewresearchersfromstatisticsandthemathematicalscienceswillthusbemotivatedtoaddressrelevantandpressingproblemsof

1SeeTheAmericanHeritageDictionaryoftheEnglishLanguage(1981)2SeeInstituteofElectricalandElectronicsEngineers(1990)

Page6

softwaredevelopmentandalsothatsoftwareengineerswillfindthestatisticalemphasisrefreshingandstimulating.Thisreportalsoaddressestheimportantissuesoftrainingandeducationofsoftwareengineersinthestatisticalsciencesandofstatisticianswithaninterestinsoftwareengineering.

Atthepanel'sinformation-gatheringforuminOctober1993,12invitedspeakersdescribedtheirviewsontopicsthatareconsideredindetailinChapters2through6ofthisreport.Oneofthespeakers,JohnKnight,pointedoutthatthedateoftheforumcoincidednearlytothedaywiththe25thanniversaryoftheGarmischConference(RandellandNaur,1968),aNATO-sponsoredworkshopatwhichtheterm"softwareengineering"isgenerallyacceptedtohaveoriginated.Theparticularironyofthiscoincidenceisthatitisalsogenerallyacceptedthatalthoughmuchmoreambitioussoftwaresystemsarenowbeingbuilt,littlehaschangedintherelativeabilitytoproducesoftwarewithpredictablequality,costs,anddependability.OneoftheoriginalGarmischparticipants,A.G.Fraser,nowassociatevicepresidentintheInformationSciencesResearchDivisionatAT&TBellLaboratories,defendstheapparentlackofprogressbythereminderthatpriortoGarmisch,therewasno"collectiverealization"thattheproblemsindividualorganizationswerefacingweresharedacrosstheindustrythusGarmischwasacriticalfirststeptowardaddressingissuesinsoftwareproduction.Itishopedthatthisreportwillplayasimilarroleinseedingthefieldofstatisticalsoftwareengineeringbyindicatingopportunitiesforstatisticalthinkingtohelpincreaseunderstanding,aswellastheproductivityandquality,ofsoftwareandsoftwareproduction.

Inpreparingthisreport,thepanelstruggledwiththeproblemofprovidingthe"bigpicture"ofthesoftwareproductionprocess,whilesimultaneouslyattemptingtohighlightopportunitiesforrelated

researchonstatisticalmethods.Theproblemsfacingthesoftwareengineeringfieldareindeedbroad,andnonstatisticalapproaches(e.g.,formalmethodsforverifyingprogramspecifications)areatleastasrelevantasstatisticalones.Thusthisreporttendstoemphasizethelargercontextinwhichstatisticalmethodsmustbedeveloped,basedontheunderstandingthatrecognitionofthescopeandtheboundariesofproblemsisessentialtocharacterizingtheproblemsandcontributingtotheirsolution.Itmustbenotedattheoutset,forexample,thatsoftwareengineeringisconcernedwithmorethantheendproduct,namely,code.Theproductionprocessthatresultsincodeisacentralconcernandthusisdescribedindetailinthereport.Toalargeextent,thepresentationofmaterialmirrorsthestepsinthesoftwaredevelopmentprocess.Althoughcurrentlythesinglelargestareaofoverlapbetweenstatisticsandsoftwareengineeringconcernssoftwaretesting(whichimpliesthatthecodeexists),itisthepanel'sviewthatthelargestcontributionstothesoftwareengineeringfieldwillbethoseaffectingthequalityandproductivityoftheprocessesthatprecedecodegeneration.

Thepanelalsoemphasizesthattheprocessandmethodsdescribedinthisreportpertaintothecaseofnewsoftwareprojects,aswellastothemoreordinarycircumstanceofevolvingsoftwareprojectsor"legacysystems."Forinstance,thesoftwarethatcontrolsthespaceshuttleflightsystemsorthatrunsmoderntelecommunicationnetworkshasbeenevolvingforseveraldecades.Thesetwocasesarereferredtofrequentlytoillustratesoftwaredevelopmentconceptsandcurrentpractice,andalthoughthesoftwaresystemsmaybeuncharacteristicallylarge,theyarearguablyforerunnersofwhatliesaheadinmanyapplications.Forexample,laserprintersoftwareiswitnessinganorder-of-magnitude(base-10)increaseinsizewitheachnewrelease.

Page7

Similarincreasesinsizeandcomplexityareexpectedinallconsumerelectronicproductsasincreasedfunctionalityisintroduced.

Centraltothisreport'stheme,andessentialtostatisticalsoftwareengineering,istheroleofdata,therealmwhereopportunitieslieanddifficultiesbegin.Theopportunitiesareclear:wheneverdataareusedorcanbegeneratedinthesoftwarelifecycle,statisticalmethodscanbebroughttobearfordescription,estimation,andprediction.Thisreporthighlightssuchareasandgivesexamplesofhowstatisticalmethodshavebeenandcanbeused.

Nevertheless,themajorobstacletoapplyingstatisticalmethodstosoftwareengineeringisthelackofconsistent,high-qualitydataintheresource-allocation,design,review,implementation,andteststagesofsoftwaredevelopment.Statisticiansinterestedinconductingresearchinsoftwareengineeringmustacknowledgethisfactandplayaleadershiproleinprovidingadequategroundsfortheresourcesneededtoacquireandmaintainhigh-quality,relevantdata.Astatementbyoneoftheforumparticipants,DavidCard,capturestheseriousproblemthatstatisticiansfaceindemonstratingthevalueofgooddataandgooddataanalysis:"Itmaynotbethateffectivetobeabletorigorouslydemonstratea10%or15%or20%improvement(inqualityorproductivity)whenwithnodataandnoanalysis,youcanclaim50%oreven100%."

Thecostofcollectingandmaintaininghigh-qualityinformationtosupportsoftwaredevelopmentisunfortunatelyhigh,butarguablyessentialastheNASAcasestudypresentedinChapter2makesclear.Thepanelconjecturesthatuseofadequatemetricsanddataofgoodqualityis,ingeneral,theprimarydifferentiatorbetweensuccessful,productivesoftwaredevelopmentorganizationsandthosethatarestruggling.Traditionalmanufacturershavelearnedthevalueof

investinginaninformationsystemtosupportproductdevelopment;softwaredevelopmentorganizationsmusttakeheed.Alltoooften,asareleasedateapproaches,allavailableresourcesarededicatedtomovingasoftwareproductoutthedoor,withtheresultthatfewornoresourcesareexpendedoncollectingdataduringthesecrucialperiods.Subsequentattemptsatretrospectiveanalysistohelpforecastcostsforanewproductoridentifyrootcausesoffaultsfoundduringproducttestingareinconclusivewhenspeculationratherthanharddataisallthatisavailabletoworkwith.Butevensoftwaredevelopmentorganizationsthatrealizetheimportanceofhistoricaldatacangetcaughtinadownwardspiral:effortisexpendedoncollectionofdatathatinitiallyareinsufficienttosupportinferences.Whendataarenotbeingused,effortstomaintaintheirqualitydecrease.Butthenwhenthedataareneeded,theirqualityisinsufficienttoallowdrawingconclusions.Thespiralhasbegun.

Asonemeansofcapturingvaluablehistoricaldata,effortsareunderwaytocreaterepositoriesofdataonsoftwaredevelopmentexperimentsandprojects.Thereismuchapprehensioninthesoftwareengineeringcommunitythatsuchdatawillnotbehelpfulbecausetherelevantmetadata(dataaboutthedata)arenotlikelytobeincluded.Thepanelsharesthisconcernbecausetheexclusionofmetadatanotonlyencouragessometimesthoughtlessanalyses,butalsomakesittooeasyforstatisticianstoconductisolatedresearchinsoftwareengineering.Thepanelbelievesthattrulycollaborativeresearchmustbeundertakenandthatitmustbedonewithakeeneyetosolvingtheparticularproblemsfacedbythesoftwareindustry.Nevertheless,thepanelrecognizesbenefitstocollectingdataorexperimentationinsoftwaredevelopment.AsispointedoutinmoredetailinChapter5,oneofthelargestimpactsthestatisticalcommunity

Page8

canhaveinsoftwareengineeringconcernseffortstocombineinformation(NRC,1992)acrosssoftwareengineeringprojectsasameansofevaluatingtheeffectsoftechnology,language,organization,andthedevelopmentprocessitself.Althoughdifficultissuesareposedbytheneedtoadjustappropriatelyfordifferencesinprojects,theinconsistencyofmetrics,andvaryingdegreesofdataquality,theavailabilityofadatarepositoryatleastallowsforsuchresearchtobegin.

Althoughthisreportservesasareviewofthesoftwareproductionprocessandrelatedresearchtodate,itisnecessarilyincomplete.Limitationsonthescopeofthepanel'seffortsprecludedafullertreatmentofsomematerialandtopicsaswellasinclusionofcasestudiesfromawidervarietyofbusinessandcommercialsectors.Thepanelresistedthetemptationtodrawonanalogiesbetweensoftwaredevelopmentandtheconvergingareaofcomputerhardwaredevelopment(whichforthemostpartisinitiallyrepresentedinsoftware).Theoneapproachitisconfidentofnotreflectingisover-simplificationoftheproblemdomainitself.

Page9

2CaseStudy:NASASpaceShuttleFlightControlSoftwareTheNationalAeronauticsandSpaceAdministrationleadstheworldinresearchinaeronauticsandspace-relatedactivities.Thespaceshuttleprogram,beguninthelate1970s,wasdesignedtosupportexplorationofEarth'satmosphereandtoleadthenationbackintohumanexplorationofspace.

IBM'sFederalSystemsDivision(nowLoral),whichwascontractedtosupportNASA'sshuttleprogrambydevelopingandmaintainingthesafety-criticalsoftwarethatcontrolsflightactivities,hasgainedmuchexperienceandinsightinthedevelopmentandsafeoperationofcriticalsoftware.Throughouttheprogram,theprevailingmanagementphilosophyhasbeenthatqualitymustbebuiltintosoftwarebyusingsoftwarereliabilityengineeringmethodologies.Thesemethodologiesarenecessarilydependentontheabilitytomanage,control,measure,andanalyzethesoftwareusingdescriptivedatacollectedspecificallyfortrackingandstatisticalanalysis.BasedonapresentationbyKeller(1993)atthepanel'sinformation-gatheringforum,thefollowingcasestudydescribesspaceshuttleflightsoftwarefunctionalityaswellasthesoftwaredevelopmentprocessthathasevolvedforthespaceshuttleprogramoverthepast15years.

OVERVIEWOFREQUIREMENTS

Theprimaryavionicssoftwaresystem(PASS)isthemission-criticalon-boarddataprocessingsystemforNASA'sspaceshuttlefleet.Inflight,allshuttlecontrolactivitiesincludingmainenginethrottling,directingcontroljetstoturnthevehicleinadifferentorientation,

firingtheengines,orprovidingguidancecommandsforlandingareperformedmanuallyorautomaticallywiththissoftware.IntheeventofaPASSfailure,thereisabackupsystem.Asindicatedinthespaceshuttleflightloghistory,thebackupsystemhasneverbeeninvoked.

Toensurehighreliabilityandsafety,IBMhasdesignedthespaceshuttlecomputersystemtohavefourredundant,synchronizedcomputers,eachofwhichisloadedwithanidenticalversionofthePASS.Every3to4milliseconds,thefourcomputerscheckwithoneanothertoassurethattheyareinlockstepandaredoingthesamething,seeingthesameinput,sendingthesameoutput,andsoforth.Theoperatingsystemisdesignedtoinstantaneouslydeselectafailedcomputer.

ThePASSissafety-criticalsoftwarethatmustbedesignedforqualityandsafetyattheoutset.Itconsistsofapproximately420,000linesofsourcecodedevelopedinHAL,anengineeringlanguageforreal-timesystems,andishostedonflightcomputerswithverylimitedmemory.Softwareisintegratedwithintheflightcontrolsystemintheformofoverlays-onlythesmallamountofcodenecessaryforaparticularphaseoftheflight(e.g.,ascent,on-orbit,orentryactivities)isloadedincomputermemoryatanyonetime.Atquiescentpointsinthe

Page10

mission,thememorycontentsare"swappedout"forprogramapplicationsthatareneededforthenextphaseofthemission.

Insupportofthedevelopmentofthissafety-criticalflightcode,thereareanother1.4millionlinesofcode.Thisadditionalsoftwareisusedtobuild,develop,andtestthesystemaswellastoprovidesimulationcapabilityandperformconfigurationcontrol.Thissupportsoftwaremusthavethesamehighqualityastheon-boardsoftware,giventhatflawedgroundsoftwarecanmaskerrors,introduceerrorsintotheflightsoftware,orprovideanincorrectconfigurationofsoftwaretobeloadedaboardtheshuttle.

Inshort,IBM/Loralmaintainsapproximately2millionlinesofcodeforNASA'sspaceshuttleflightcontrolsystem.ThecontinuallyevolvingrequirementsofNASA'sspaceflightprogramresultinanevolvingsoftwaresystem:thesoftwareforeachshuttlemissionflownisacompositeofcodethathasbeenimplementedincrementallyover15years.Atanygiventime,thereisasubsetoftheoriginalcodethathasneverbeenchanged,codethatwassequentiallyaddedineachupdate,andnewcodepertainingtothecurrentrelease.Approximately275peoplesupportthespaceshuttlesoftwaredevelopmenteffort.

THEOPERATIONALLIFECYCLE

OriginallythePASSwasdevelopedtoprovideabasicflightcapabilityofthespaceshuttle.Thefirstflownversionwasdevelopedandsupportedforflightsin1981through1982.However,therequirementsoftheflightmissionsevolvedtoincludeincreasedoperationalcapabilityandmaintenanceflexibility.Amongtheshuttleprogramenhancementsthatchangedtheflightcontrolsystemrequirementswerechangesinpayloadmanifestcapabilitiesandmainenginecontroldesign,crewenhancements,additionofanexperimentalautopilotfororbiting,systemimprovements,abort

enhancements,provisionsforextendedlandingsites,andhardwareplatformchanges.FollowingtheChallengeraccident,whichwasnotrelatedtosoftware,manynewsafetyfeatureswereaddedandthesoftwarewaschangedaccordingly.

Foreachreleaseofflightsoftware(calledanoperationalincrement),anominal6-to9-monthperiodelapsesbetweendeliverytoNASAandactualflight.Duringthistime,NASAperformssystemverification(toassurethatthedeliveredsystemcorrectlyperformsasrequired)andvalidation(toassurethattheoperationiscorrectfortheintendeddomain).Thisphaseofthesoftwarelifecycleiscriticaltoassuringsafetybeforeasafety-criticaloperationoccurs.Itisatimeforacompleteintegratedsystemtest(flightsoftwarewithflighthardwareinoperationaldomainscenarios).Crewtrainingformissionpracticesisalsoperformedatthistime.

ASTATISTICALAPPROACHTOMANAGINGTHESOFTWAREPRODUCTIONPROCESS

Tomanagethesoftwareproductionprocessforspaceshuttleflightcontrol,descriptivedataaresystematicallycollected,maintained,andanalyzed.Atthebeginningofthespaceshuttleprogram,globalmeasurementsweretakentotrackschedulesandcosts.Butassoftware

Page11

developmentcommenced,itbecamenecessarytoretainmuchmoreproduct-specificinformation,owingtothecriticalnatureofspaceshuttleflightaswellastheneedforcompleteaccountabilityfortheshuttle'soperation.Thedetailandgranularityofdatadictatenotonlythetypebutalsothelevelofanalysisthatcanbedone.Datarelatedtofailureshavebeenspecificallyaccumulatedinadatabasealongwithalltheothercorollaryinformationavailable,andaprocedurehasbeenestablishedforreliabilitymodeling,statisticalanalysis,andprocessimprovementbasedonthisinformation.

Acompositedescriptionofallspaceshuttlesoftwareofvariousagesismaintainedthroughaconfigurationmanagement(CM)system.TheCMdataincludenotonlyachangeitself,butalsothelinesofcodeaffected,reasonsforthechange,andthedateandtimeofchange.Inaddition,theCMsystemincludesdatadetailingscenariosforpossiblefailuresandtheprobabilityoftheiroccurrence,userresponseprocedures,theseverityofthefailures,theexplicitsoftwareversionandspecificlinesofcodeinvolved,thereasonsfornopreviousdetection,howlongthefaulthadexisted,andtherepairorresolution.Althoughthesedataseemabundant,itisimportanttoacknowledgetheirtimedependence,becausethesoftwaresystemtheydescribeissubjecttoconstant"churn."

Overtheyears,theCMsystemforthespaceshuttleprogramhasevolvedintoacommon,minimumsetofdatathatmustberetainedregardingeveryfaultthatisrecognizedanywhereinthelifecycle,includingfaultsfoundbyinspectionsbeforesoftwareisactuallybuilt.Thisevolutionarydevelopmentisamenabletoevaluationbystatisticalmethods.Trendanalysisandpredictionsregardingtesting,allocationofresources,andestimationofprobabilitiesoffailureareexamplesofthemanyactivitiesthatdrawonthedatabase.Thisdatabasealsocontinuestobethebasisfordefininganddevelopingsophisticated,

insightfulestimationtechniquessuchasthosedescribedbyMunson(1993).

FaultDetection

Managementphilosophyprescribesthatprocessimprovementispartoftheprocess.Suchproactiveprocessimprovementincludesinspectionateverystepoftheprocess,detaileddocumentationoftheprocess,andanalysisoftheprocessitself.

Thecriticalimplicationsofanill-timedfailureinspaceshuttleflightcontrolsoftwarerequirethatremediesbedecisiveandaggressive.Whenafaultisidentified,afeedbackprocessinvolvingdetailedinformationonthefaultenforcesasearchforsimilarfaultsintheexistingsystemandchangestheprocesstoguardactivelyagainstsuchfaultsinflightcontrolsoftwaredevelopment.Thecharacteristicsofasinglefaultareactivelydocumentedinthefollowingfour-stepreactiveprocess-improvementprotocol:

1.Removethefault,

2.Identifytherootcauseofthefault,

3.Eliminatetheprocessdeficiencythatletthefaultescapeearlierdetection,and

4.Analyzetheproductforother,similarfaults.

Page12

Furtherscrutinyofwhatoccurredintheprocessbetweenintroductionanddetectionofafaultisaimedatdeterminingwhydownstreamprocesselementsfailedtodetectandremovethefault.Suchintrospectiveanalysisisdesignedtoimprovetheprocessandspecificprocesselementssothatifasimilarfaultisintroducedagain,theseprocesselementswilldetectitbeforeitgetstoofaralongintheproductlifecycle.Thisfour-stepprocessimprovementisachievablebecauseofthematurityoftheoverallIBM/Loralsoftwaremanagementprocess.ThecompleterecordingofprojecteventsintheCMsystem(phaseoftheprocess,changehistoryofinvolvedline(s)ofcode,thelineofcodethatincludedanerror,theindividualsinvolved,andsoon)allowshindsightsothatthedevelopmentteamcanapproachtheoccurrenceofanerrornotasafailurebutratherasanopportunitytoimprovetheprocessandtofindother,similarerrors.

SafetyCertification

Thedependabilityofsafety-criticalsoftwarecannotbebasedmerelyontestingthesoftware,countingandrepairingthefaults,andconducting"livetests"onshuttlemissions.Testingofsoftwareformany,manyyears,muchlongerthanitslifecycle,wouldberequiredinordertodemonstratesoftwarefailureprobabilitylevelsof10-7or10-9peroperationalhour.Aprocessmustbeestablished,anditmustbedemonstratedstatisticallythatifthatprocessisfollowedandmaintainedunderstatisticalcontrol,thensoftwareofknownqualitywillresult.Oneresultistheabilitytopredictaparticularleveloffaultdensity,inthesensethatfaultdensityisproportionaltofailureintensity,andsoprovideaconfidencelevelregardingsoftwarequality.Thisapproachisdesignedtoensurethatqualityisbuiltintothesoftwareatameasurablelevel.IBM'shistoricaldatademonstrateaconstantlyimprovingprocessforcomfortofspaceshuttleflight.Theuseofsoftwareengineeringmethodologiesthatincorporatestatistical

analysismethodsgenerallyallowstheestablishmentofabenchmarkforobtainingavalidmeasureofhowwellaproductmeetsaspecifiedlevelofquality.

Page13

3ASoftwareProductionModelThesoftwaredevelopmentprocessspansthelifecycleofagivenproject,fromthefirstidea,toimplementation,throughcompletion.Manyprocessmodelsfoundintheliteraturedescribewhatisbasicallyaproblem-solvingeffort.Theonediscussedindetailbelow,asaconvenientwaytoorganizethepresentation,isoftendescribedasthewaterfallmodel.Itisthebasisfornearlyallthemajorsoftwareproductsinusetoday.Butaswithallgreatworkhorses,itisbeginningtoshowitsage.Newmodelsincurrentuseincludethosewithdesignandimplementationoccurringinparallel(e.g.,rapidprototypingenvironments)andthoseadoptingamoreintegrated,lesslinear,viewofaprocess(e.g.,thespiralmodelreferredtoinChapter6).Althoughthediscussioninthischapterisspecifictoaparticularmodel,thatinsubsequentchapterscutsacrossallmodelsandemphasizestheneedtoincorporatestatisticalinsightintothemeasurement,datacollection,andanalysisaspectsofsoftwareproduction.

Thefirststepofthesoftwarelifecycle(Boehm,1981)isthegenerationofsystemrequirementswherebyfunctionality,interactions,andperformanceofthesoftwareproductarespecifiedin(usually)numerousdocuments.Inthedesignstep,systemrequirementsarerefinedintoacompleteproductdesign,anoverallhardwareandsoftwarearchitecture,anddetaileddescriptionsofthesystemcontrol,data,andinterfaces.Theresultofthedesignstepis(usually)asetofdocumentslayingoutthesystem'sstructureinsufficientdetailtoensurethatthesoftwarewillmeetsystemrequirements.Mostoften,bothrequirementsanddesigndocumentsareformallyreviewedpriortocodinginordertoavoiderrorscausedbyincorrectlystated

requirementsorpoordesign.Thecodingstagecommencesoncethesereviewsaresuccessfullycompleted.Sometimesschedulingconsiderationsleadtoparallelreviewandcodingactivities.Normallyindividualsorsmallteamsareassignedspecificmodulestocode.Codeinspectionshelpensurethatmodulequality,functionality,andschedulearemaintained.

Oncemodulesarecoded,thetestingstepbegins.(ThistopicisdiscussedinsomedetailinChapter3.)Testingisdoneincrementallyonindividualmodules(unittesting),onsetsofmodules(integrationtesting),andfinallyonallmodules(systemtesting).Inevitably,faultsareuncoveredintestingandareformallydocumentedasmodificationrequests(MRs).OnceallMRsareresolved,ormoreusuallyasschedulesdictate,thesoftwareisreleased.Fieldexperienceisrelayedbacktothedeveloperasthesoftwareis"burnedin"inaproductionenvironment.Patchesorrereleasesfollowbasedoncustomerresponse.Backwardcompatibilitytests(regressiontesting)areconductedtoensurethatcorrectfunctionalityismaintainedwhennewversionsofthesoftwareareproduced.

Theaboveoverviewisnoticeablynonquantitative.Indeed,thisnonquantitativecharacteristicisthemoststrikingdifferencebetweensoftwareengineeringandmoretraditional(hardware)engineeringdisciplines.Measurementofsoftwareiscriticalforcharacterizingboththeprocessandtheproduct,andyetsuchmeasurementhasproventobeelusiveandcontroversial.AsarguedinChapter1,theapplicationofstatisticalmethodsispredicatedontheexistenceofrelevantdata,andtheissueofsoftwaremeasurementsandmetricsisdiscussed

Page14

prominentlythroughoutthereport.Thisisnottoimplythatmeasurementshaveneverbeenmadeorthatdataaretotallylacking.Unfortunatelymetricstendtodescribepropertiesandconditionsforwhichitiseasytogatherdataratherthanthosethatareusefulforcharacterizingsoftwarecontent,complexity,andform.

PROBLEMFORMULATIONANDSPECIFICATIONOFREQUIREMENTS

Withinthecontextofsystemdevelopment,specificationsforrequiredsoftwarefunctionsarederivedfromthelargersystemrequirements,whicharetheprimarysourcefordeterminingwhatthedeliveredsoftwareproductwilldoandhowitwilldoit.Theserequirementsaretranslatedbythedesignerordesignteamintoafinishedproductthatdeliversallthatisexplicitlystatedanddoesnotincludeanythingexplicitlyforbidden.SomecommonreferencesregardingrequirementsspecificationarementionedinIEEEStandardforSoftwareProductivityMetrics(IEEE,1993).

Requirementsthefirstformaltangibleproductobtainedinthedevelopmentofasystem-aresubjectivestatementsspecifyingthesystem'svariousdesiredoperationalcharacteristics.Errorsinrequirementsariseforanumberofreasons,includingambiguousstatements,inconsistentinformation,unclearuserrequirements,andincompleterequests.Projectsthathaveill-definedorunstatedrequirementsaresubjecttoconstantiteration,andalackofpreciserequirementsisakeysourceofsubsequentsoftwarefaults.Ingeneral,thelongerafaultresidesinasystembeforeitisdetected,thegreateristhecostofremovingitorrecoveringfromrelatedfailures.Thisconditionisaprimarydriverofthereviewprocessthroughoutsoftwaredevelopment.

Theformulationrequirementsstartwithcustomersrequestinganew

functionality.Systemsengineerscollectinformationdescribingthenewfunctionalityanddevelopacustomerspecificationdescription(CSD)describingthecustomer'sviewofthefeature.TheCSDisusedinternallybysoftwaredevelopmentorganizationstoformulatecostestimatesforbidding.Afterthefeatureiscommitted(sold),systemsengineerswriteafeaturespecificationdescription(FSD)describingtheinternalviewofthefeature.TheFSDiscommonlyreferredtoas''requirements."BoththeCSDandFSDarecarefullyreviewedandmustmeetformalcriteriaforapproval.

DESIGN

Theheartofthesoftwaredevelopmentcycleisthetranslationandrefinementoftherequirementsintocode.Softwarearchitectstransformtherequirementsforeachspecifiedfeatureintoahigh-leveldesign.Aspartofthisprocess,theydeterminewhichsubsystems(e.g.,databases)andmodulesarerequiredandhowtheyinteractorcommunicate.Thebroad,high-leveldesignisthenrefinedintoadetailedlow-leveldesign.Thistransformationinvolvesmuchinformationgatheringanddetectivework.Thesoftwarearchitectsareoftenthemostexperiencedandknowledgeableofthesoftwareengineers.

Thesequenceofcontinualrefinementsultimatelyresultsinamappingofhigh-levelfunctionsintomodulesandcode.Partofthisdesignprocessisselectinganappropriate

Page15

representation,whichinmostcasesisaspecificprogramminglanguage.Selectionofarepresentationinvolvesfactorssuchasoperationaldomain,systemperformance,andfunction,amongothers.Whencompleted,thehigh-leveldesignisreviewedbyall,includingthoseconcernedwiththeaffectedsubsystemsandtheorganizationresponsiblefordevelopment.

Thehumanelementisacriticalissueintheearlystagesofasoftwareproject.Quantitativedataarepotentiallyavailablefollowingdocumentreviews.Specifically,earlyinthedevelopmentcycleofsoftwaresystems,(paper)documentsareprepareddescribingfeaturerequirementsorfeaturedesign.Priortoaformaldocumentreview,thereviewersindividuallyreadthedocument,notingissuesthattheybelieveshouldberesolvedbeforethedocumentisapprovedandfeaturedevelopmentisbegun.Atthereviewmeeting,asinglelistofissuesispreparedthatincludestheissuesnotedbythereviewersaswellastheonesdiscoveredduringthemeetingitself.Thisprocessthusgeneratesdataconsistingofatabulationofissuesfoundbyeachreviewer.Thedegreeofoverlapprovidesinformationregardingthenumberofremainingissues,thatis,thoseyettobeidentified.Ifthisnumberisacceptablysmall,theprocesscanproceedtothenextstep;ifnot,furtherdocumentrefinementisnecessaryinordertoavoidcostlyfixeslaterintheprocess.Theproblemasstatedbearsacertainresemblancetocapture-recapturemodelsinwildlifestudies,andsoappropriatestatisticalmethodscanbedevisedforanalyzingthereviewdata,asillustratedinthefollowingexample.

Example.Table1containsdataonissuesidentifiedforaparticularfeaturefortheAT&T5ESSswitch(Eicketal.,1992a).Sixreviewersfoundatotalof47distinctissues.Acommoncapture-recapturemodelassumesthateachissuehasthesameprobabilityofbeingcaptured(detected)andthatreviewersworkindependentlywiththeirown

chanceofcapturinganissue,ordetectionprobability.Undersuchamodel,likelihoodmethodsyieldanestimateofN=65,implyingthatapproximately20issuesremaintobeidentifiedinthedocument.Anupper95%confidenceboundforNunderthismodelis94issues.

Suchamodelisnaturalbutsimplistic.Thesoftwaredevelopmentenvironmentisnotconducivetoindependenceamongreviewers(sothatsomedegreeofcollusionisunavoidable),andreviewersalsoareselectedtocovercertainareasofspecialization.Ineithercase,thecornerstoneofcapture-recapturemodels,thebinomialdistribution,isnolongerappropriateforthetotalnumberofissues.Itispossibletodevelopalikelihood-basedtestforpairwisecollusionofreviewersandreviewer-specifictestsofspecialization.Intheexampleabove,thereisnoevidenceofcollusionamongreviewers,butreviewerCexhibitsasignificantlygreaterdegreeofspecializationthandotheotherreviewers.Whenthisrevieweristreatedasaspecialist,themaximumlikelihoodestimate(MLE)ofthenumberofissuesisreducedto53,implyingthatonlyahalfdozenissuesremaintobediscoveredinthedocument.

Othermismatchesbetweenthedataarisinginsoftwarereviewandthoseincapture-recapturewildlifepopulationstudiesinducebiasintheMLE.Anotherpossibleestimatorforthisproblemisthejackknifeestimator(BurnhamandOverton,1978).ButthisestimatorseemsinfacttobemorebiasedthantheMLE(VanderWielandVotta,1993).Botharerescuedtoalargeextentbytheircategorizationoffaultsintoclasses(e.g.,"easytofind"versus"hardtofind").Inanygiven

Page16

Table1.Issuediscovery.Therowsofthetablerepresent47issuesnotedbysixreviewerspriortoreviewmeetings.Anentryincelli,jofthetableindicatesthatissuei(i=1,...,47)wasnotedbyreviewerj(j=A,...,F).Rowswithnoentries(i.e.,columnsumsofzero)correspondtoissuesdiscoveredatthemeeting.

Issue A B C D E F Sum Issue A B C D E F Sum

1 1 1 25 1 1 2

2 1 1 2 26 1 1 2

3 1 1 27 1 1

4 1 1 28 1 1

5 0 29 1 1 2

6 1 1 30 1 1

7 1 1 31 1 1

8 1 1 32 1 1

9 0 33 1 1

10 1 1 34 1 1 1 3

11 1 1 35 1 1 2

12 1 1 36 1 1

13 0 37 1 1

14 1 1 2 38 1 1

15 1 1 39 1 1

16 0 40 1 1

17 1 1 2 41 1 1

18 1 1 42 1 1 2

19 1 1 2 43 1 1

20 1 1 2 44 1 1

21 1 1 1 1 1 5 45 1 1

22 1 1 2 46 1 1

23 1 1 47 1 1

24 1 1 SUM 25 3 4 13 9 6 60

application,itisnecessarytoverifythatthe"easytofind"and"hardtofind"classificationismeaningful,ortodeterminethatitismerelypartitioningthedistributionofdifficultyinanarbitrarymanner.Arelevantpointinthisandotherapplicationsofstatisticalmethodsinsoftwareengineeringisthataddressingaspectsoftheproblemthatinducestudybiasisimportantandvalued-theoreticalworkaddressingaspectsofstatisticalbiasisnotlikelytobeashighlyvalued.

IMPLEMENTATION

Thephaseinthesoftwaredevelopmentprocessthatisoftenreferredtointerchangeablyascoding,development,orimplementationistheactualtransformationoftherequirementsintoexecutableform."Implementationinthesmall"referstocoding,and"implementationinthelarge"referstodesigninganentiresysteminatop-downfashionwhilemaintainingaperspectiveonthefinalintegratedsystem.

Page17

Low-leveldesigns,orcodingunits,arecreatedfromthehigh-leveldesignforeachsubsystemandmodulethatneedstobechanged.Eachcodingunitspecifiesthechangestobemadetotheexistingfiles,newormodifiedentrypoints,andanyfilethatmustbeadded,aswellasotherchanges.Afterdocumentreviewsandapprovals,thecodingmaybegin.Usingprivatecopiesofthecode,developersmakethechangesandaddthefilesspecifiedinthecodingunit.Codingisdelicatework,andgreatcareistakensothatunwantedsideeffectsdonotbreakanyoftheexistingcode.Aftercompletion,thecodeistestedbythedeveloperandcarefullyreviewedbyotherexperts.Thechangesaresubmittedtoapublicload(codefromallprogrammersthatismergedandloadedsimultaneously)usinganMRnumber.TheMRistiedbacktothefeaturetoestablishacorrespondencebetweenthecodeandthefunctionalitythatitprovides.

MRsareassociatedwiththesystemversionmanagementsystem,whichmaintainsacompletehistoryofeverychangetothesoftwareandcanrecreatethecodeasitexistedatanypointintime.Forproductionsoftwaresystems,versionmanagementsystemsarerequiredtoensurecodeintegrity,tosupportmultiplesimultaneousreleases,andtofacilitatemaintenance.Ifthereisaproblem,itmaybenecessarytobackoutchanges.Besidesarecordoftheaffectedlines,otherinformationiskept,suchasthenameoftheprogrammermakingthechanges,theassociatedfeaturenumber,whetherachangefixesafaultoraddsnewfunctionality,thedateofachange,andsoon.

Theconfigurationmanagementdatabasecontainstherecordofcodechanges,orchangehistoryofthecode.Eicketal.(1992b)describeavisualizationtechniquefordisplayingthechangehistoryofsourcecode.Thegraphicaltechniquerepresentseachfileasaverticalcolumnandeachlineofcodeasacolor-codedrowwithinthecolumn.Therowindentationandlengthtrackthecorrespondingtext,andthe

rowcoloristiedtoastatistic.Iftherowtrackingisliteralaswithcomputersourcecode,thedisplaylooksasifthetexthadbeenprintedincolorandthenphoto-reducedforviewingasasinglefigure.Thespatialpatternofcolorshowsthedistributionofthestatisticwithinthetext.

Example.Developinglargesoftwaresystemsisaproblemofscale.Inmultimillion-linesystemstheremaybehundredsofthousandsoffilesandtensofthousandsofmodules,workedonbythousandsofprogrammersformultiyearperiods.Justdiscoveringwhattheexistingcodedoesisamajortechnicalproblemconsumingsignificantamountsoftime.Acontinuingandsignificantproblemisthatofcodediscovery,wherebyprogrammerstrytounderstandhowunfamiliarcodeworks.Itmaytakeseveralweeksofdetailedstudytochangeafewlinesofcodewithoutcausingunwantedsideeffects.Indeed,muchoftheeffortinmaintenanceinvolveschangingcodewrittenbyanotherprogrammer.Becauseofvariationinprogrammerstaffsizesandinevitableturnover,trainingnewprogrammersisimportant.Visualizationtechniques,describedfurtherinChapter5,canimproveproductivitydramatically.

Figure1displaysamodulecomposedof20sourcecodefilescontaining9,365linesofcode.Theheightofeachcolumnindicatesthesizeofthefile.Fileslongerthanonecolumnarecontinuedovertothenext.Therowcolorindicatestheageofeachlineofcodeusingarainbowcolorscalewiththenewestlinesinredandtheoldestinblue.Ontheleftisaninteractivecolorscaleshowingacolorforeachofthe324changesbythe126programmersmodifyingthiscode

Page18

overthelast10years.Thevisualimpressionisthatofaminiaturepictureofallofthesourcecode,withtheindentationshowingtheusualClanguagecontrolstructure.

Theperceptionofcolorsisblurred,butthereareclearpatterns.Filesinapproximatelythesamehuewerewrittenataboutthesametimeandarerelated.Rainbowfileswithmanydifferenthuesareunstableandarelikelytobetroublespotsbecauseofallthechanges.Thebiggestfilehasabout1,300linesofcodeandtakesacolumnandahalf.

Changesfrommanycodingunitsareperiodicallycombinedtogetherintoaso-calledcommonloadofthesoftwaresystem.Theloadiscompiled,madeavailabletodevelopersfortesting,andinstalledinthelaboratorymachines.Bringingthechangestogetherisnecessarysothatdevelopersworkingondifferentcodingunitsofacommonfeaturecanensurethattheircodeworkstogetherproperlyanddoesnotbreakanyotherfunctionality.Developersalsousethepublicloadtotesttheircodeonlaboratorymachines.

Afterallcodingunitsassociatedwithafeaturearecompleteandithasbeentestedbythedevelopersinthelaboratory,thefeatureisturnedovertotheintegrationgroupforindependenttesting.TheintegrationgrouprunstestsofthefeatureaccordingtoafeaturetestplanthatwaspreparedinparallelwiththeFSD.Eventuallythenewcodeisreleasedaspartofanupgradeorsentoutdirectlyifitfixesacriticalfault.Atthisstage,maintenanceonthecodebegins.Ifcustomershaveproblems,developerswillneedtosubmitfaultmodificationrequests.

TESTING

Manysoftwaresystemsinusetodayareverylarge.Forexample,the

softwarethatsupportsmoderntelecommunicationsnetworks,orprocessesbankingtransactions,orchecksindividualtaxreturnsfortheInternalRevenueServicehasmillionsoflinesofcode.Thedevelopmentofsuchlarge-scalesoftwaresystemsisacomplexandexpensiveprocess.Becauseasinglesimplefaultinasystemmaycripplethewholesystemandresultinasignificantloss(e.g.,lossoftelephoneserviceinanentirecity),greatcareisneededtoassurethatthesystemisflawlesslyconstructed.Becauseafaultcanoccurinonlyasmallpartofasystem,itisnecessarytoassurethatevensmallprogramsareworkingasintended.Suchcheckingforconformanceisaccomplishedbytestingthesoftware.

Specifically,thepurposeofsoftwaretestingistodetecterrorsinaprogramand,intheabsenceoferrors,gainconfidenceinthecorrectnessoftheprogramorthesystemundertest.Althoughtestingisnosubstituteforimprovingaprocess,itdoesplayacrucialroleintheoverallsoftwaredevelopmentprocess.Testingisimportantbecauseitiseffective,ifcostly.Itisvariouslyestimatedthatthetotalcostoftestingisapproximately20to33%ofthetotalsoftwarebudgetforsoftwaredevelopment(Humphrey,1989).ThisfractionamountstobillionsofdollarsintheU.S.softwareindustryalone.Further,softwaretestingisverytimeconsuming,becausethetimefortestingistypicallygreaterthanthatforcoding.Thus,effortstoreducethecostsandimprovetheeffectivenessoftestingcanyieldsubstantialgainsinsoftwarequalityandproductivity.

Page19

Figure1.ASeeSoftTMdisplayshowingamodulewith20filesand9,365linesofcode.Eachfileisrepresentedasacolumnandeachlineofcodeasacoloredrow.Thenewestrowsareinredandtheoldestinblue,withacolorspectruminbetween.Thisoverviewhighlightsthelargestfilesandprogramcontrolstructures,whilethecolorshowsrelationshipsbetweenfiles,aswellasunstable,frequentlychangedcode.Eicketal.(1992b).

Page21

Muchofthedifficultyofsoftwaretestingisinthemanagementofthetestingprocess(producingreports,enteringMRs,documentingMRscleared,andsoon),themanagementoftheobjectsofthetestingprocess(testcases,testdrivers,scripts,andsoon),andthemanagementofthecostsandtimeoftesting.

Typically,softwaretestingreferstothephaseoftestingcarriedoutafterpartsofcodearewrittensothatindividualprogramsormodulescanbecompiled.Thisphaseincludesunit,integration,system,product,customer,andregressiontesting.Unittestingoccurswhenprogrammerstesttheirownprograms,andintegrationtestingisthetestingofpreviouslyseparatepartsofthesoftwarewhentheyareputtogether.Systemtestingisthetestingofafunctionalpartofthesoftwaretodeterminewhetheritperformsitsexpectedfunction.Producttestingismeanttotestthefunctionalityofthefinalsystem.Customertestingisoftenproducttestingperformedbytheintendeduserofthesystem.Regressiontestingismeanttoassurethatanewversionofasystemfaithfullyreproducesthedesirablebehavioroftheprevioussystem.

Besidesthestagesoftesting,therearemanydifferenttestingmethods.Inwhiteboxtesting,testsaredesignedonthebasisofdetailedarchitecturalknowledgeofthesoftwareundertest.Inblackboxtesting,onlyknowledgeofthefunctionalityofthesoftwareisusedfortesting;knowledgeofthedetailedarchitecturalstructureoroftheproceduresusedincodingisnotused.Whiteboxtestingistypicallyusedduringunittesting,inwhichthetester(whoisusuallythedeveloperwhocreatedthecode)knowstheinternalstructureandtriestoexerciseitbasedondetailedknowledgeofthecode.Blackboxtestingisusedduringintegrationandsystemtesting,whichemphasizestheuserperspectivemorethantheinternalworkingsofthesoftware.Thus,blackboxtestingtriestotestthefunctionalityof

softwarebysubjectingthesystemundertesttovarioususer-controlledinputsandbyassessingitsresultingperformanceandbehavior.

Sincethenumberofpossibleinputsortestcasesisalmostlimitless,testersneedtoselectasample,asuiteoftestcases,basedontheireffectivenessandadequacy.Hereinliesignificantopportunitiesforstatisticalapproaches,especiallyasappliedtoblackboxtesting.Adhocblackboxtestingcanbedonewhentesters,perhapsbasedontheirknowledgeofthesystemundertestanditsusers,decidespecificinputs.Anotherapproach,basedonstatisticalsamplingideas,istogeneratetestcasesrandomly.Theresultsofthistestingcanbeanalyzedbyusingvarioustypesofreliabilitygrowthcurvemodels(see"AssessmentandReliability"inChapter4).Randomgenerationrequiresastatisticaldistribution.Sincethepurposeofblackboxtestingistosimulateactualusage,ahighlyrecommendedtechniqueistogeneratetestcasesrandomlyfromthestatisticaldistributionneededbyusers,oftenreferredtoastheoperationalprofileofasystem.

Thereareseveraladvantagesanddisadvantagestostatisticaloperationalprofiletesting.Akeyadvantageisthatifonetakesalargeenoughsample,thenthesystemundertestwillbetestedinallthewaysthatausermayneeditandthusshouldexperiencefewerfieldfaults.Anotheradvantageofthismethodisthepossibilityofbringingthefullforceofstatisticaltechniquestobearoninferentialproblems;thatis,theresultsobtainedduringtestingcanbegeneralizedtomakeinferencesaboutthefieldbehaviorofthesystemundertest,includinginferencesaboutthenumberoffaultsremaining,thefailurerateinthefield,andsoon.

Inspiteofalltheseadvantages,statisticaloperationalprofiletestinginitspurestformisrarelyused.Therearemanydifficulties;someareoperationalandothersaremorebasic.Forexample,onecanneverbecertainabouttheoperationalprofileintermsofinputs,andespecially

Page22

intermsoftheirprobabilitiesofoccurrence.Also,forlargesystems,theinputspaceishigh-dimensional.Thus,anotherproblemishowtosamplefromthishigh-dimensionalspace.Further,thedistributionisnotstatic;itwill,inalllikelihood,changeovertimeasnewusersexercisethesysteminunanticipatedways.Evenifthispossibilitycanbediscounted,questionsremainabouttheefficiencyofstatisticaloperationalprofiletesting,whichcanbeveryinefficient,becausemostoftenthesystemundertestwillbeusedinroutineways,andthusarandomlydrawnsamplewillbehighlyweightedbyroutineoperations.Thishighweightingmaybefineifthenumberoftestcasesisverylarge.Butthentestingwouldbeveryexpensive,perhapsevenprohibitivelyso.Therefore,testersoftenadoptsomevariantofdrawingarandomsample;forexample,testersgivemoreweighttoboundaryvaluesthosevaluesaroundwhichthesystemisexpectedtochangeitsbehaviorandthereforewherefaultsarelikelytobefound.Thisandothercleverstrategiesadoptedbytesterstypicallyresultinatestingdistributionthatisquitedifferentfromtheoperationalprofile.Ofcourse,insuchacasetheresultsofthetestinglaboratorywillnotbegeneralizableunlesstherelationshipsbetweenthetwodistributionsaretakenintoaccount.

Thus,totakeadvantageoftheattractivenessofoperationalprofiletesting,somekeyproblemshavetobesolved:

1.Howtoobtaintheoperationalprofile,

2.Howtosampleaccordingtoastatisticaldistributioninhigh-dimensionalspace,and

3.Howtogeneralizeresultsobtainedinthetestinglaboratorytothefieldwhenthetestingdistributionisavariantoftheoperationalprofiledistribution.

Allofthesequestionscanbedealtwithconceptuallyusingstatisticalapproaches.

For(1),aBayesianelicitationprocedurecanbeenvisionedtoderivetheoperationalprofile.ThiselicitationisdoneroutinelyinBayesianapplications,butbecausethespaceisveryhighdimensional,techniquesareneededforBayesianelicitationinveryhighdimensionalspaces.

Concerning(2),ifthejointdistributioncorrespondingtotheoperationalprofileisknown,schemescanbeusedthataremoreefficientthansimplerandomsamplingschemes.Simplerandomsamplingisinefficientbecauseittypicallygiveshigherprobabilitytothemiddleofadistributionthantoitstails,especiallyinhighdimensions.Amoreefficientschemewouldsamplethetailsquickly.Thiscanbeaccomplishedbystratifyingthesupportofthedistribution.

McKayetal.(1979)formalizedthisideausingLatinhypercubesampling.SupposewehaveaK-dimensionalrandomvectorX=(X1,...,XK)andwewanttogetasampleofsizeNfromthejointdistributionofX.IfthecomponentsofXareindependent,thentheschemeissimple,namely:

DividetherangeofeachcomponentrandomvariableinNintervalsofequalprobability,

RandomlysampleoneobservationforeachcomponentrandomvariableineachofthecorrespondingNintervals,andfinally

RandomlycombinethecomponentstocreateX.

Stein(1987)showedthatthissamplingschemecanbesubstantiallybetterthansimplerandomsampling.ImanandConover(1982)andStein(1987)bothdiscussedextensionsfor

Page23

nonindependentcomponentvariables.Ofcourse,ifspecifyinghomogenousstrataispossible,itshouldbedonepriortoapplyingtheLatinhypercubesamplingmethodtoincreasetheoveralleffectivenessofthesamplingscheme.

Example:Considerasoftwaresystemcontrollingthestateofanair-to-groundmissile.Thekeyinputsforthesoftwarearealtitude,attackandbankangles,speed,pitch,roll,andyaw.Typically,thesevariablesareindependentlycontrolled.Totestthissoftwaresystem,combinationsofalltheseinputsmustbeprovidedandtheoutputfromthesoftwaresystemcheckedagainstthecorrespondingphysics.Onewouldliketogeneratetestcasesthatincludeinputsoverabroadrangeofpermissiblevalues.Totestallthevalidpossibilities,itwouldbereasonabletotryuniformdistributionsforeachinput.Supposewedecideuponasampleofsize6.ThecorrespondingLatinhypercubedesigniseasilyconstructedbydividingeachvariableintosixequalprobabilityintervalsandsamplingrandomlyfromeachinterval.Becausewehaveindependentrandomvariableshere,thefinalstepconsistsofrandomlycouplingthesesamples.Thedesignisdifficulttovisualizeinmorethantwodimensions,butonesuchsampleforattackandbankanglesisdepictedinFigure2.Notethatthereisexactlyoneobservationineachcolumnandineachrow,thusthename"Latinhypercube."

Figure2.Latinhypercube.N=6andK=2.

Page24

Finally,concerning(3),tomakeinferencesaboutfieldperformance,theissueofthediscrepancybetweenthestatisticaloperationalprofileandthetestingdistributionmustbeaddressed.Atthispoint,adistinctioncanbemadebetweentwotypesofextrapolationtofieldperformanceofthesystemundertest.Itisclearthatevenifthetrueoperationalprofiledistributionisnotavailable,totheextentthatthetestingdistributionhasthesamesupportastheoperationalprofiledistribution,statisticalinferencescanbemadeaboutthenumberofremainingfaults.Ontheotherhand,toextrapolatethefailureintensityfromthetestinglaboratorytothefield,itisnotenoughtohavethesamesupport;rather,identicaldistributionsareneeded.Ofcourse,itisunlikelythatafterspendingmuchtimeandmoneyontesting,onewouldagaintestwiththestatisticaloperationalprofile.Whatisneededisawayofreusingtheinformationgeneratedinthetestinglaboratory,perhapsbyatransformationinwhichsomestatisticaltechniquesbasedonreweightingcanhelp.Therearetwobasicideas,bothrelyingheavilyontheassumptionthatthetestingandthefield-usedistributionshavethesamesupport.Oneideaistouseallthedatafromthetestinglaboratory,butwithaddedweightstochangethesampletoresemblearandomsamplefromtheoperationalprofile.Theapproachissimilartoreweightinginimportancesampling.Anotherideaistoacceptorrejecttheinputsusedintestingwithaprobabilitydistributionbasedontheoperationalprofile.Foradescriptionofbothofthesetechniques,seeBeckmanandMcKay(1987).

Inhispresentationatthepanel'sforum,Phadke(1993)suggestedanothersetofstatisticaltechniques,basedonorthogonalarrays,forparsimonioustestingofsoftware.Theexampledescribedaboveprovesusefulinanelaboration.

Example.Forthesoftwaresystemthatdeterminesthestateofanattackplane,letusassumethatinterestcentersontestingonlytwo

conditionsforeachinputvariable.Thissituationarises,forexample,whentheprimaryinterestliesinboundaryvaluetesting.Letthelowervaluebeinputstate0andtheuppervaluebeinputstate1foreachofthevariables.Theninthelanguageofstatisticalexperimentaldesign,wehavesevenfactors,A,...,G(altitude,attackangle,bankangle,speed,pitch,roll,andyaw),eachattwolevels(0,1).Totestallofthepossiblecombinations,onewouldneedacompletefactorialexperiment,whichwouldhave27=128testcasesconsistingofallpossiblesequencesof0'sand1's.Forastatisticalexperimentintendedtoaddressonlymaineffects,ahighlyfractionatedfactorialdesignwouldbesufficient.However,inthecaseofsoftwaretesting,thereisnostatisticalvariabilityandlittleornointerestinestimatingvariouseffects.Rather,theinterestisincoveringthetestspaceasmuchaspossibleandcheckingwhetherthetestcasespassorfail.Eveninthiscase,itisstillpossibletousestatisticaldesignideas.Forexample,considerthesequenceoftestcasesgiveninTable2.Thisdesignrequires8testcasesinsteadof128.Inthiscase,sincethereisnostatisticalvariation,maineffectsdonothaveanypracticalmeaning.However,lookingatthepatterninthetable,itisclearthatallpossiblecombinationsofanytwopairsarecoveredinabalancedway.Thus,testingaccordingtothisdesignwillprotectagainstanyincorrectimplementationofthecodeinvolvingapairwiseinteraction.

Page25

A B C D E F G A B C D E F G

1 0 0 0 0 0 0 0 10 0 0 0 0 0 0

2 0 0 0 1 1 1 1 21 1 1 1 1 1 0

3 0 1 1 0 0 1 1 30 0 1 1 1 0 1

4 1 1 1 1 1 0 0 41 0 0 0 1 1 1

5 1 0 1 0 1 0 1 51 1 1 0 0 0 1

6 1 0 1 1 0 1 0 60 1 0 1 0 1 1

7 1 1 0 0 1 1 0

8 1 1 0 1 0 0 1

Table2a.Orthogonalarray.Testcasesinrows.Testfactorsincolumns.

Table2b.Combinatorialdesign.Testcasesinrows.Testfactorsincolumns.

Ingeneral,followingTaguchi,Phadke(1993)suggestsorthogonalarraydesignsofstrengthtwo.Thesedesigns(aspecificinstanceofwhichisgivenintheaboveexample)guaranteethatallpossiblepairwisecombinationswillbetriedoutinabalancedway.AnotherapproachbasedoncombinatorialdesignswasproposedbyCohenetal.(1994).Theirdesignsdonotconsiderbalancetobeanoverridingdesigncriterion,andaccordinglytheyproducedesignswithsmallernumbersofrequiredtestcases.Forexample,Table2bcontainsacombinatorialdesignwithcompletepairwisecoverageinsixrunsinsteadoftheeightrequiredbyorthogonalarrays(Table2a).Thisnotionhasbeenextendedtonotionsofhigher-ordercoverageaswell.Theefficacyoftheseandothertypesofdesignshastobeevaluatedinthetestingcontext.

Besidesthetypesoftestingdiscussedabove,thereareotherstatisticalstrategiesthatcanbeused.Forexample,DeMilloetal.(1988)havesuggestedtheuseoffaultinsertiontechniques.Thebasicideaisakintocapture-recapturesamplinginwhichsampledunitsofapopulation(usuallywildlife)arereleasedandinversesamplingisdonetoestimatetheunknownpopulationsize.TheMOTHRAsystembuiltbyDeMilloandhiscolleaguesimplementssuchascheme.Whiletherearemanypossiblesamplingschemes(Nayak,1988),thedifficultywithfaultinsertionisthatthefaultsinsertedoughttobesubtleenoughsothatthesystemcanbecompiledandtested;notwoinsertedfaultsshouldinteractwitheachother;andwhileitmaybepossibleattheunittestinglevel,itisprohibitivelyexpensiveforintegrationtesting.Itshouldbepointedoutthattheuseofcapture-recapturesampling,outlinedinthischapter'ssubsectiontitled''Design,"forquantifyingdocumentreviewsdoesnotrequirefaultseedingand,accordingly,isnotsubjecttotheabovedifficulties.

Anotherkeyproblemintestingisdeterminingwhentherehasbeenenoughtesting.Forunittestingwheremuchofthetestingiswhiteboxandthemodulesaresmall,onecanattempttocheckwhetherallthepathshavebeencoveredbythetestcases,anideaextendedsubstantiallybyHorganandLondon(1992).However,forintegrationandsystemtesting,thisparticularapproach,coveragetesting,isnotpossiblebecauseofthesizeandthenumberofpossiblepathsthroughthesystem.Hereisanotheropportunityforusingstatisticalapproachestodevelopatheoryofstatisticalcoverage.Coveragetestingrelatestoderivingmethodsandalgorithmsfor

Page26

generatingtestcasessothatonecanstate,withaveryhighprobability,thatonehascheckedmostoftheimportantpathsofthesoftware.Thiskindofmethodologyhasbeenusedwithprobabilisticalgorithmsinprotocoltesting,wherethestructureoftheprogramcanbedescribedingreatdetail.(Aprotocolisaveryprecisedescriptionoftheinterfacebetweentwodiversesystems.)LeeandYanakakis(1992)haveproposedalgorithmswherebyoneisguaranteed,withahighdegreeofprobability,thatallthestatesoftheprotocolsarechecked.Thedifficultywiththisapproachisthatthenumberofstatesbecomeslargeveryquickly,andexceptforasmallpartofthesystemundertest,itisnotclearthatsuchatechniquewouldbepractical(undercurrentcomputingtechnology).Theseideashavebeenmathematicallyformalizedinthevibrantareaoftheoremcheckingandproving(Blumetal.,1990).Thekeyideaistotaketransformsofprogramssuchthattheresultsareinvariantunderthesetransformsifthesoftwareiscorrect.Thus,anyvariationintheresultssuggestspossiblefaultsinthesoftware.Blumetal.(1989)andLipton(1989),amongothers,havedevelopedanumberofalgorithmstogiveprobabilisticboundsonthecorrectnessofsoftwarebasedonthenumberofdifferenttransformations.

Inalloftheseveralapproachestotestingdiscussedabove,thenumberoftestcasescanbeextraordinarilylarge.Becauseofthecostoftestingandtheneedtosupplysoftwareinareasonableperiodoftime,itisnecessarytoformulaterulesaboutwhentostoptesting.Hereinliesanothersetofinterestingproblemsinsequentialanalysisandstatisticaldecisiontheory.AspointedoutbyDalalandMallows(1988,1990,1992),Singpurwalla(1991),andothers,thekeyissueistoexplicitlyincorporatetheeconomictrade-offbetweenthedecisiontostoptesting(andabsorbthecostoffixingsubsequentfieldfaults)andthedecisiontocontinuetesting(andincurongoingcoststofind

andfixfaultsbeforereleaseofasoftwareproduct).Sincethetestingprocessisnotdeterministic,thefault-findingprocessismodeledbyastochasticreliabilitymodel(seeChapter4forfurtherdiscussion).Theopportunemomentforreleaseisdecidedusingsequentialdecisiontheory.Therulesaresimpletoimplementandhavebeenusedinanumberofprojects.Thisframeworkhasbeenextendedtotheproblemofbuyingsoftwarewithsomesortofprobabilisticguaranteeonthenumberoffaultsremaining(DalalandMallows,1992).Anotherextensionwithpracticalimportance(DalalandMcIntosh,1994)dealswiththeissueofasystemundertestnothavingbeencompletelydeliveredatthestartoftesting.Thissituationisacommonoccurrenceforlargesystems,whereinordertomeetschedulingmilestones,testingbeginsimmediatelyonmodulesandsetsofmodulesastheyarecompleted.

Page27

4CritiqueofSomeCurrentApplicationsofStatisticsinSoftwareEngineering

COSTESTIMATION

Oneofsoftwareengineering'slong-standingproblemsistheconsiderableinaccuracyofthecost,resource,andscheduleestimatesdevelopedforprojects.Theseestimatesoftendifferfromthefinalcostsbyafactoroftwoormore.Suchinaccuracieshaveasevereimpactonprocessintegrityandultimatelyonfinalsoftwarequality.Fivefactorscontributetothiscontinuingproblem:

1.Mostcostestimateshavelittlestatisticalbasisandhavenotbeenvalidated;

2.Thevalueofhistoricaldataindevelopingpredictivemodelsislimited,sincenoconsistentsoftwaredevelopmentprocesshasbeenadoptedbyanorganization;

3.Thematurityofanorganization'sprocesschangesthegranularityofthedatathatcanbeusedeffectivelyinprojectcostestimation;

4.Thereliabilityofinputstocostestimationmodelsvarieswidely;and

5.Managersattempttomanagetotheestimates,reducingthevalidityofhistoricaldataasabasisforvalidation.

Certainoftheaboveissuescenterontheso-calledmaturityofanorganization(Humphrey,1988).Fromapurelystatisticalresearchperspective,(5)maybethemostinterestingarea,butthemajorchallengefacingthesoftwarecommunityisfindingtherightmetrics

tomeasureinthefirstplace.

Example.ThedataplottedinFigure3pertaintotheproductivityofaconventionalCOBOLdevelopmentenvironment(Kitchenham,1992).Foreachof46differentproducts,size(numberofentitiesandtransactions)andeffort(inperson-hours)weremeasured.FromFigure3,itisapparentthatdespitesubstantialvariability,astrong(log-log)linearrelationshipexistsbetweenprogramsizeandprogrameffort.

Asimplemodelrelatingefforttosizeis

log10(effort)= +ßlog10(size)+noise.

Page28

Figure3.DataontherelationshipbetweendevelopmenteffortandproductsizeinaCOBOLdevelopmentorganization.

AleastsquaresfittothesedatayieldsCoeff.SE t

Intercept 1.120 0.30243.702log10(size)1.049 0.12508.397RMS 0.194

Thesefittedcoefficientssuggestthatdevelopmenteffortisproportionaltoproductsize;aformaltestofthehypothesis,H:ß=1,givesatvalueatthe.65significancelevel.

Theestimatedinterceptafterfixingß=1is1.24;theresultingfitanda95%predictionintervalareoverlaidonthedatainFigure3.Thismodelpredictsthatitrequiresapproximately17hours(=101.24)toimplementeachunitofsize.

Suchmodelsareusedforpredictionandtoolvalidation.Consideranadditionalobservationmadeofaproductdevelopedusingafourth-

generationlanguageandrelationaldatabases.Undertheexperimentaldevelopmentprocess,ittook710hourstoimplementtheproductofsize183(thispointisdenotedbyXinFigure3).Thefittedmodelpredictsthatthisproductwouldhave

Page29

takenapproximately3,000hourstocompleteusingtheconventionaldevelopmentenvironment.The95%predictionintervalatX=183rangesfromapproximately1,000to9,000hours;thus,assumingthatotherfactorsarenotcontributingtotheapparentshortdevelopmentcycleofthisproduct,theuseofthosenewfourth-generationtoolshasdemonstrablydecreasedthedevelopmenteffort(andhencethecost).

StatisticalInadequaciesinEstimating

Mostcostestimationmethodsdevelopaninitialrelationshipbetweentheestimatedsizeofasystem(inlinesofcode,forinstance)andtheresourcesrequiredtodevelopit.Suchequationsareoftenoftheformillustratedintheaboveexample:effortisproportionaltosizeraisedtotheßpower.Thisinitialestimateisthenadjustedbyanumberoffactorsthatarethoughttoaffecttheproductivityofthespecificproject,suchastheexperienceoftheassignedstaff,theavailabletools,therequirementsforreliability,andthecomplexityoftheinteractionwiththecustomer.Thustheestimatingequationassumestheloglinearform:

effort» sizeßXaiajakalam...az,

wherethea'sarethecoefficientsfortheadjustmentfactors.Unfortunately,theseadjustmentfactorsarenottreatedasvariablesinaregressionequation;rather,eachhasasetoffixedcoefficients(termed"weightingfactors")associatedwitheachlevelofthevariable.Theseareindependentlyappliedasifthevariableswereuncorrelated(anassumptionknowntobeincorrect).Theseweightingschemeshavebeendevelopedbasedonintuitionabouteachvariable'spotentialimpactratherthanonastatisticalmodelfittingusinghistoricaldata.Thus,althoughtherelationshipbetweeneffortandsize

isoftenrecalibratedfordifferentorganizations,theweightingfactorsarenot.

Exacerbatingtheproblemswithexistingcostestimationmodelsisthelackofrigorousvalidationoftheequations.Forinstance,Boehm(1981)hasacknowledgedthathiswell-knownCOCOMOestimatingmodelwasnotdevelopedusingstatisticalmethods.Manyindividualsmarketingcostestimationmodelingtoolsdenigratethevalueofstatisticalapproachescomparedtocleverintuition.Totheextentthatanalyticalmethodsareusedinthedevelopmentorvalidationofthesemodels,theyareoftenperformedondatasetsthatcontainasmanypredictorvariables(productivityfactors)asprojects.Thusdeterminationoftheseparateorindividualcontributionsofthevariablesalmostcertainlydependstoomuchonchanceandcanbedistortedbycollinearrelationships.Thesemodelsarerarelysubjectedtoindependentvalidationstudies.Further,littleresearchhasbeendonethatattemptstorestrictthesemodelstoincludingonlythoseproductivityfactorsthatreallymatter(i.e.,subsetselection).

Becauseofthelackofstatisticalrigorinmostcostestimationmodels,softwaredevelopmentorganizationsusuallyhandcraftweightingschemestofittheirhistoricalresults.Thus,thespecificinstantiationofmostcostestimationmodelsdiffersacrossorganizations.Undertheseconditions,cross-validationoftheweightingschemesisverydifficult,ifnotimpossible.Anew

Page30

approachtodevelopingcostestimationmodelswouldbebeneficial,onethatinvokessoundstatisticalprinciplesinfittingsuchequationstohistoricaldataandtovalidatingtheirapplicabilityacrossorganizations.Iftheinstantiationofsuchmodelsisfoundtobedomain-specific,statisticallyvalidmethodsshouldbesoughtforregeneratingaccuratemodelsindifferentdomains.

ProcessVolatility

Inimmaturesoftwaredevelopmentorganizations,theprocessesuseddifferacrossprojectsbecausetheyarebasedontheexperiencesandpreferencesoftheindividualsassignedtoeachproject,ratherthanoncommonorganizationalpractice.Thus,insuchorganizationscostestimationmodelsmustattempttopredicttheresultsofaprocessthatvarieswidelyacrossprojects.Inpoorlyrunprojectsthesignal-to-noiseratioislow,inthatthereislittleconsistentpracticethatcanbeusedasthebasisfordependableprediction.Insuchprojects,neitherthesizenortheproductivityfactorsprovideanyconsistentinsightintotheresourcesrequired,sincetheyarenotsystematicallyrelatedtotheprocessesthatwillbeused.

Thehistoricaldatacollectedfromprojectsinimmaturesoftwaredevelopmentorganizationsaredifficulttointerpretbecausetheyreflectwidelydivergentpractices.Suchdatasetsdonotprovideanadequatebasisforvalidation,sinceprocessvariationcanmaskunderlyingrelationships.Infact,becausetherelationshipsamongindependentvariablesmaychangewithvariationsintheprocess,differentprojectsmayrequiredifferentvaluesoftheparametersinthecostestimationmodels.Asorganizationsmatureandstabilizetheirprocesses,theaccuracyoftheestimatingmodelstheyuseusuallyincreases.

MaturityandDataGranularity

Inmatureorganizationsthesoftwaredevelopmentprocessiswelldefinedandisappliedconsistentlyacrossprojects.Themorecarefullydefinedtheprocess,thefinerthegranularityoftheprocessesthatcanbemeasured.Thus,assoftwareorganizationsmature,theentirebasisfortheircostestimationmodelscanchange.Immatureorganizationshavedataonlyatthelevelofoverallprojectsize,numberofperson-yearsrequired,andoverallcost.Withincreasingorganizationalmaturity,itbecomespossibletoobtaindataonprocessdetailssuchashowmanyreviewsmustbeconductedateachlifecyclestagebasedonthesizeofthesystem,howmanytestcasesmustberun,andhowmanydefectsmustbefixedbasedonthedefectremovalefficiencyofeachstageoftheverificationprocess.Thus,estimationinfullydevelopedorganizationscanbebasedonabottom-upanalysisinwhichthehistoricaldatacanbemoreaccuratebecausetheobjectsofestimation,andtheefforttheyrequire,aremoreeasilycharacterized.

Asorganizationsmature,thestructureofrelevantcostestimationmodelscanchange.Whenprocessmodelsarenotdefinedindetail,modelsmusttaketheformofregressionequationsbasedonvariablesthatdescribethetotalimpactofapredictorvariableonaproject's

Page31

developmentcycle.Thereislittlenotioninthesemodelsofthedetailedpracticesthatmakeupthetotality.Inmatureorganizationssuchpracticesaredefinedandcanbeanalyzedindividuallyandbuiltupintoatotalestimate.Normallytheerrorsinestimatingthesesmallercomponentsaresmallerthanthecorrespondingerroratthetotalprojectlevel,anditisassumedthatthesummaryeffectofaggregatingthesesmallererrorsisstillsmallerthantheerrorintheestimateatthetotalprojectlevel.

ReliabilityofModelInputs

Evenifacostestimationmodelisstatisticallysound,thedataonwhichitisbasedcanhavelowvalidity.Often,managersdonothavesufficientknowledgeofcrucialvariablesthatmustbeenteredintoamodel,suchastheestimatedsizeofvariousindividualcomponentsofasystem.Insuchinstances,processesexistforincreasingtheaccuracyofthesedata.Forinstance,Delphitechniquescanbeusedbysoftwareengineerswhohavepreviousexperienceindevelopingvarioussystemcomponents.Thelessexperienceanorganizationhaswithaparticularcomponentofasystem,thelessreliableisthesizeestimateforthatcomponent.Typically,componentsizesareunderestimated,withruinouseffectsontheresourcesandscheduleestimatedforaproject.Sometimeshistorical"fudgefactors"areappliedtoaccountforunderestimation,althoughamorerigorousdata-basedapproachisrecommended.Toaidinidentifyingthepotentialrisksinasoftwaredevelopmentproject,itwouldalsobebeneficialtohavereliableconfidenceboundsfordifferentcomponentsoftheestimatedsizeoreffort.

Statisticalmethodscanbeappliedtodeveloppriorprobabilities(e.g.,forBayesianestimationmodels)fromknowledgeablesoftwareengineersandtoadjusttheseusinghistoricaldata.Thesemethods

shouldbeusednotonlytosuggesttheconfidencethatcanbeplacedinanestimate,butalsotoindicatethecomponentswithinasystemthatcontributemosttoinaccuraciesinanestimate.

Asprojectsprogressduringtheirlifecyclefromspecificationsofrequirementstodesigntogenerationofcode,theinformationonwhichestimatescanbebasedgrowsmorereliable:thereisthusgreatercertaintyinestimatingfromthearchitecturaldesignofasystemorthedetaileddesignofeachmodulethaninestimatingfromtextualstatements.Inshort,thesourcesfromwhichestimatescanbedevelopedchangeastheprojectcontinuesthroughitsdevelopmentcycle.Eachsucceedinglevelofinputisamorereliableindicatoroftheultimatesystemsizethanaretheinputsavailableinearlierstagesofdevelopment.Thustheoverallestimateofsize,resources,andschedulepotentiallybecomesmoreaccurateinsucceedingphasesofaproject.Yetitisimportanttodeterminethemostaccurateindicatorsofcrucialparameterssuchassize,effort,andscheduleveryearlyinaproject,whentheleastreliabledataareavailable.Assuch,thereisaneedforstatisticallyvalidwaysofdevelopingmodelinputsfromlessreliableformsofdata(theseinputsmustreliablyestimatelatermeasuresthatwillbemorevalidinputs)andofestimatinghowmucherrorisintroducedintoanestimatebasedonthereliabilityoftheinputs.

Page32

ManagingtoEstimates

Complicatingtheabilitytovalidatecostestimationmodelsfromhistoricaldataisthefactthatprojectmanagerstrytomanagetheirprojectstomeetreceivedestimatesforcost,effort,schedule,andothersuchvariables.Thus,anestimateaffectsthesubsequentprocess,andhistoricaldataaremadeartificiallymoreaccuratebymanagementdecisionsandotherfactorsthatareoftenmaskedinprojectdata.Forinstance,projectswhoserequiredlevelofefforthasbeenunderestimatedoftensurviveonlargeamountsofunreportedovertimeputinbythedevelopmentstaff.Moreover,manymanagersarequiteskilledatcuttingfunctionalityfromasysteminordertomeetadeliverydate.Intheworstcases,engineersshort-cuttheirordinaryengineeringprocessestomeetanunrealisticschedule,usuallywithdisastrousresults.Techniquesformodelingsystemsdynamicsprovideonewaytocharacterizesomeoftheinteractionsthatoccurbetweenanestimateandthesubsequentprocessthatisgeneratedbytheestimate(Abdel-Hamid,1991).

Thevalidationofcostestimationmodelsmustbeconductedwithanunderstandingofsuchinteractionsbetweenestimatesandaprojectmanager'sdecisions.Someofthesedynamicsmaybeusefullydescribedbystatisticalmodelsorbytechniquesdevelopedinpsychologicaldecisiontheory(Kahnemanetal.,1982).Thus,itmaybepossibletodevelopastatisticaldynamicmodel(e.g.,amultistagelinearmodel)thatcharacterizesthereliabilityofinputstoanestimate,theestimateitself,decisionsmadebasedontheestimate,theresultingperformanceoftheproject,measuresthatemergelaterintheproject,subsequentdecisionmakingbasedontheselatermeasures,andtheultimateperformanceoftheproject.Suchmodelswouldbevaluableinhelpingprojectmanagerstounderstandtheramificationsofdecisionsbasedonaninitialestimateandalsoonsubsequentperiodic

updates.

ASSESSMENTANDRELIABILITY

ReliabilityGrowthModeling

Manyreliabilitymodelsofvaryingdegreesofplausibilityareavailabletosoftwareengineers.Thesemodelsareappliedateitherthetestingstageorthefield-monitoringstage.Mostofthemodelstakeasinputeitherfailuretimeorfailurecountdataandfitastochasticprocessmodeltoreflectreliabilitygrowth.Thedifferencesamongthemodelslieprincipallyinassumptionsmadebasedontheunderlyingstochasticprocessgeneratingthedata.Abriefsurveyofsomeofthewell-knownmodelsandtheirassumptionsandefficacyisgiveninAbdel-Ghalyetal.(1986).

Althoughmanysoftwarereliabilitygrowthmodelsaredescribedintheliterature,theevidencesuggeststhattheycannotbetrustedtogiveaccuratepredictionsinallcasesandalsothatitisnotpossibletoidentifyaprioriwhichmodel(ifany)willbetrustworthyinaparticular

Page33

context.Nodoubtworkwillcontinueinrefiningthesemodelsandintroducing"improved"ones.Althoughsuchworkisofsomeinterest,thepaneldoesnotbelievethatitmeritsextensiveresearchbythestatisticalcommunity,butthinksratherthatstatisticalresearchcouldbedirectedmorefruitfullytoprovidinginsighttotheusersofthemodelsthatcurrentlyexist.

Theproblemisvalidationofsuchmodelswithrespecttoaparticulardatasource,toallowuserstodecidewhich,ifany,predictionschemeisproducingaccurateresultsfortheactualsoftwarefailureprocessunderexamination.Someworkhasbeendoneonthisproblem(Abdel-Ghalyetal.,1986;BrocklehurstandLittlewood,1992),usingacombinationofprobabilityforecastingandsequentialprediction,theso-calledprequentialapproachdevelopedbyDawid(1984),butthisworkhassofarbeenratherinformal.Itwouldbehelpfultohavemoreproceduresforassessingtheaccuracyofcompetingpredictionsystemsthatcouldthenbeusedroutinelybyindustrialsoftwareengineerswithoutadvancedstatisticaltraining.

Statisticalinferenceintheareaofreliabilitytendsalmostinvariablytobeofaclassicalfrequentistkind,eventhoughmanyofthemodelsoriginatefromasubjectiveBayesianprobabilityviewpoint.ThisunsatisfactorystateofaffairsarisesfromthesheerdifficultyofperformingthecomputationsnecessaryforaproperBayesiananalysis.Itseemslikelythattherewouldbeprofitintryingtoovercometheseproblems,perhapsviatheGibbssamplingapproach(see,e.g.,SmithandRoberts,1993).

Anotherfruitfulavenueforresearchconcernstheintroductionofexplanatoryvariables,so-calledcovariates,intosoftwarereliabilitygrowthmodels.Mostexistingmodelsassumethatnoexplanatoryvariablesareavailable.Thisassumptionisassuredlysimplistic

concerningtestingforallbutsmallsystemsinvolvingshortdevelopmentandlifecycles.Forlargesystems(i.e.,thosewithmorethan100,000linesofcode)therearevariables,otherthantime,thatareveryrelevant.Forexample,itistypicallyassumedthatthenumberoffaults(foundandunfound)inasystemundertestremainsstablei.e.,thatthecoderemainsfrozenduringtesting.However,thisisrarelythecaseforlargesystems,sinceaggressivedeliverycyclesforcethefinalphasesofdevelopmenttooverlapwiththeinitialstagesofsystemtesting.Thus,thesizeofcodeand,consequently,thenumberoffaultsinalargesystemcanvarywidelyduringtesting.Ifthesechangesincodesizearenotconsidered,theresult,atbest,islikelytobeanincreaseinvariabilityandalossinpredictiveperformance,andatworst,apoorlyfittingmodelwithunstableparameterestimates.Takingthislogiconestepfurthersuggeststheneedtodistinguishbetweennewlinesofcode(newfaults)andcodecomingfrompreviousreleases(oldfaults),andpossiblytheageofdifferentpartsofcode.Ofcourse,onecancarrythislogictoanextremeandhaveunwieldymodelswithmanycovariates.Inpractice,whatisrequiredisacompromisebetweenthetwoextremesofhavingnocovariatesandhavinghundredsofthem.Thisiswhereopportunitiesaboundforapplyingstate-of-the-artstatisticalmodelingtechniques.DescribedbrieflybelowisacasestudyreportedbyDalalandMcIntosh(1994)dealingwithreliabilitymodelingwhencodeischanging.

Page34

Example.Consideranewreleaseofalargetelecommunicationssystemwithapproximately7millionnoncommentarysourcelines(NCSLs)and400,000linesofnoncommentaryneworchangedsourcelines(NCNCSLs).Forafasterdeliverycycle,thesourcecodeusedforsystemtestwasupdatedeverynightthroughoutthetestperiod.Attheendofeachof198calendardaysinthetestcycle,thenumberoffaultsfound,NCNCSLs,andthestafftimespentontestingwerecollected.Figure4(top)portraysgrowthofthesystemasafunctionofstafftime.ThedataareprovidedinTable3.

Figure4.Plotsofmodulesize(NCNCSLs)versusstafftime(days)foralargetelecommunicationssoftware

system(top).Observedandfittedcumulativefaultsversusstafftime(bottom).Thedottedline(barelyvisible)

representsthefittedmodel,thesolidlinerepresentstheobserveddata,andthedashedline(alsodifficulttosee)

istheextrapolationofthefittedmodel.

Page35

Table3.Dataoncumulativesize(NCNCSLs),cumulativestafftime(days),andcumulativefaultsforalargetelecommunicationssystemon198consecutivecalendardays(withduplicatelinesrepresentingweekendsorholidays).

Cum.StaffDays

Cum.Faults

Cum.NCNCSLs

Cum.StaffDays

Cum.Faults

Cum.NCNCSLs

Cum.StaffDays

Cum.Faults

Cum.NCNCSLs

0 0 0 334.8231 261669 776.5 612 318476

4.8 0 16012 342.7243 262889 793.5 621 320125

6 0 16012 350.5252 263629 807.2 636 321774

6 0 16012 356.3259 264367 811.8 639 321774

14.3 7 32027 360.6271 265107 812.5 639 321774

22.8 7 48042 365.7277 265845 829 648 323423

32.1 7 58854 365.7277 265845 844.4 658 325072

41.4 7 69669 365.7277 265845 860.5 666 326179

51.2 11 80483 374.9282 266585 876.7 674 327286

51.2 11 80483 386.5290 267325 892 679 328393

51.2 11 80483 396.5300 268607 895.5 686 328393

60.6 12 91295 408 310 269891 895.5 686 328393

70 13 102110 417.3312 271175 910.8 690 329500

79.9 15 112925 417.3312 271175 925.1 701 330608

91.3 20 120367 417.3312 271175 938.3 710 330435

97 21 127812 424.9321 272457 952 720 330263

97 21 127812 434.2326 273741 965 729 330091

97 21 127812 442.7339 275025 967.7 729 330091

97 21 127812 451.4346 276556 968.6 731 330091

107.722 135257 456.1347 278087 981.3 740 329919

119.128 142702 456.1347 278087 997 749 329747

127.640 150147 456.1347 278087 1013.9759 330036

135.144 152806 460.8351 279618 1030.1776 330326

135.144 152806 466 356 281149 1044 781 330616

135.144 152806 472.3359 283592 1047 782 330616

142.846 155464 476.4362 286036 1047 782 330616

148.948 158123 480.9367 288480 1059.7783 330906

156.652 160781 480.9367 288480 1072.6787 331196

163.952 167704 480.9367 288480 1085.7793 331486

169.759 174626 486.8374 290923 1098.4796 331577

170.159 174626 495.8376 293367 1112.4797 331669

170.659 174626 505.7380 295811 1113.5798 331669

174.763 181548 516 392 298254 1114.1798 331669

179.668 188473 526.2399 300698 1128 802 331760

185.571 194626 527.3401 300698 1139.1805 331852

194 88 200782 527.3401 300698 1151.4811 331944

200.393 206937 535.8405 303142 1163.2823 332167

200.393 206937 546.3415 304063 1174.3827 332391

200.393 206937 556.1425 305009 1174.3827 332391

207.297 213093 568.1440 305956 1174.3827 332391

211.998 219248 577.2457 306902 1184.6832 332615

217 105 221355 578.3457 306902 1198.3834 332839

223.5113 223462 578.3457 306902 1210.3836 333053

227 113 225568 587.2467 307849 1221.1839 333267

227 113 225568 595.5473 308795 1230.5842 333481

227 113 225568 605.6480 309742 1231.6842 333481

234.1122 227675 613.9491 310688 1231.6842 333481

241.6129 229784 621.6496 311635 1240.9844 333695

250.7141 233557 621.6496 311635 1249.5845 333909

259.8155 237330 621.6496 311635 1262.2849 335920

268.3166 241103 623.4496 311635 1271.3851 337932

268.3166 241103 636.3502 311750 1279.8854 339943

268.3166 241103 649.7517 311866 1281 854 339943

277.2178 244879 663.9527 312467 1281 854 339943

285.5186 247946 675.1540 313069 1287.4855 341955

294.2190 251016 677.4543 313069 1295.1859 341967

295.7190 251016 677.9544 313069 1304.8860 341979

298 190 254086 688.4553 313671 1305.8865 342073

298 190 254086 698.1561 314273 1313.3867 342168

298 190 254086 710.5573 314783 1314.4867 342168

305.2195 257155 720.9581 315294 1314.4867 342168

312.3201 260225 731.6584 315805 1320 867 342262

318.2209 260705 732.7585 315805 1325.3867 342357

328.9224 261188 733.6585 315805 1330.6870 342357

334.8231 261669 746.7586 316316 1334.2870 342358

334.8231 261669 761 598 316827 1336.7870 342358

SOURCE:DalalandMcIntosh(1994).

Page36

Assumethatthetestingprocessisobservedattimeti,i=0,...,h,,andatanygiventime,theamountoftimeittakestofindaspecific''bug"isexponentialwithratem.Attime,thetotalnumberoffaultsremaininginthesystemisPoissonwithmeanli+1,andNCNCSLisincreasedbyanamount.ThischangeaddsaPoissonnumberoffaultswithmeanproportionaltoC,sayqCi.Theseassumptionsleadtothemassbalanceequation,namely,thattheexpectednumberoffaultsinthesystematti(afterpossiblemodification)istheexpectednumberoffaultsinthesystematti-1adjustedbytheexpectednumberfoundintheinterval(ti-1,ti)plusthefaultsintroducedbythechangesmadeatti:

li+1=lie-m(ti-ti-1)+qCi,

fori=1,...h.NotethatrepresentsthenumberofnewfaultsenteringthesystemperadditionalNCNCSL,andrepresentsthenumberoffaultsinthecodeatthestartofsystemtest.Bothoftheseparametersmakeitpossibletodifferentiatebetweenthenewcodeaddedinthecurrentreleaseandtheoldercode.Forthedataathand,theestimatedparametersareq=0.025,m=0.002,andl1=41.ThefittedandtheobserveddataareplottedagainststafftimeinFigure4(bottom).Thefitisevidentlyverygood.Ofcourseassessingthemodelonindependentornewdataisrequiredforpropervalidation.

Theefficacyofcreatingastatisticalmodelisnowexamined.Theestimateofqishighlysignificant,bothstatisticallyandpractically,showingtheneedforincorporatingchangesinNCNCSLsasacovariate.Itsnumericalvalueimpliesthatforeveryadditional10,000NCNCSLsaddedtothesystem,25faultsarebeingaddedaswell.Forthesedata,thepredictednumberoffaultsattheendofthetestperiodisPoissondistributedwithmean145.DividingthisquantitybythetotalNCNCSLsgives4.2per10,000NCNCSLsasanestimatedfield

faultdensity.Theseestimatesoftheincomingandoutgoingqualityareveryvaluableinjudgingtheefficacyofsystemtestingandfordecidingwhereresourcesshouldbeallocatedtoimprovethequality.Here,forexample,systemtestingwaseffectiveinthatitremoved21ofevery25faults.However,itraisesanotherissue:25faultsper10,000NCNCSLsenteringsystemtestmaybetoohighandaplanoughttobeconsideredtoimprovetheincomingquality.

Noneoftheaboveconclusionscouldhavebeenmadewithoutusingastatisticalmodel.Theseconclusionsarevaluableforcontrollingandimprovingthereliabilitytestingprocess.Further,forthisanalysisitwasessentialtohaveacovariateotherthantime.

InfluenceoftheDevelopmentProcessonSoftwareDependability

Asnotedabove,surprisinglylittleusehasbeenmadeofexplanatoryvariablemodels,suchasproportionalhazardsregression,inthemodelingofsoftwaredependability.Amajorreason,thepanelbelieves,isthedifficultythatsoftwareengineershaveinidentifyingvariablesthatcan

Page37

playagenuinelyexplanatoryrole.Anotherdifficultyisthecomparativepaucityofdataowingtothedifficultiesofreplication.Thus,forexample,forpurposesofidentifyingthoseattributesofthesoftwaredevelopmentprocessthataredriversofthefinalproduct'sdependability,itisverydifficulttoobtainsomethingakintoa"randomsample"of"similar"subjectprograms.Thoseissuesarenotunliketheonesfacedinothercontextswherethesetechniquesareused,forexample,inmedicaltrials,buttheyseemparticularlyacuteforevaluationofsoftwaredependability.

Afurtherproblemisthattheobservableinthissoftwaredevelopmentapplicationisarealizationofastochasticprocess,andnotmerelyofalifetimerandomvariable.Thusthereseemstobeanopportunityforresearchintomodelsthat,ontheonehand,capturecurrentunderstandingofthenatureofthegrowthinreliabilitythattakesplaceasaresultofdebuggingand,ontheotherhand,allowinputaboutthenatureofthedevelopmentprocessorthearchitectureoftheproduct.

InfluenceoftheOperationalEnvironmentonSoftwareDependability

Itcanbemisleadingtotalkofthereliabilityofaprogram:asisthecaseforthereliabilityofhardware,thereliabilityofaprogramdependsonthenatureofitsuse.Forsoftware,however,onedoesnothavethesimplenotionsofstressthataresometimesplausibleinthehardwarecontext.Itisthusnotpossibletoinferthereliabilityofaprograminoneenvironmentfromevidenceoftheprogram'sfailurebehaviorinanother.Thisisaseriousdifficultyforseveralreasons.

First,onewouldliketobeabletopredicttheoperationalreliabilityofaprogramfromtestdata.Thesimplestapproachatpresentistoensurethatthetestenvironment,thatis,thetypeofusage,isexactlysimilarto,ordiffersinknownproportionsforspecifiedstratafrom,theoperationalenvironment.Realsoftwaretestingregimesareoften

deliberatelymadetobedifferentfromoperationalones,sinceitisclaimedthatinthiswayreliabilitycanbeachievedmoreefficiently:thisargumentissimilartothatforhardwarestresstestingbutismuchlessconvincinginthesoftwarecontext.

Afurtherreasontobeinterestedinthisproblemofinferringprogramreliabilityisthatmostsoftwaregetsbroadlydistributedtodiverselocationsandisusedverydifferentlybydifferentusers:thereisgreatdisparityinthepopulationofuserenvironments.Vendorswouldliketobeabletopredictdifferentusers'perceptionsofaproduct'sreliability,butitisclearlyimpracticaltoreplicateinatesteverydifferentpossibleoperationalenvironment.Vendorswouldalsoliketobeabletopredictthecharacteristicsofapopulationofusers.Thusitmightbeexpectedthatalessdisparatepopulationofuserswouldbepreferabletoamoredisparateone:intheformercase,forexample,problemsreportedatdifferentsitesmightbesimilarandthusbelessexpensivetofix.

Explanatoryvariablemodelingmayplayausefulroleifsuitablyinformative,measurableattributesofoperationalusagecanbeidentified.Theremaybeotherwaysofformingstochasticcharacterizationsofoperationalenvironments.Markovmodelsofthesuccessiveactivationofmodules,oroffunctions,havebeenproposed(Littlewood,1979;Siegrist,1988a,b)buthavenot

Page38

beenwidelyused.Furtherworkonsuchapproaches,andontheproblemsofstatisticalinferenceassociatedwiththem,couldbepromising.

Safety-CriticalSoftwareandtheProblemofAssuringUltrahighDependability

Itseemsclearthatcomputerswillplayincreasinglycriticalrolesinsystemsuponwhichhumanlivesdepend.Already,systemsarebeingbuiltthatrequireextremelyhighdependabilityafigureof10-9probabilityoffailureperhourofflighthasbeenstatedastherequirementforrecentfly-by-wiresystemsincivilaircraft.Thereareclearlimitationstothelevelsofdependabilitythatcanbeachievedwhenwearebuildingsystemsofacomplexitythatprecludesclaimsthattheyarefreeofdesignfaults.Moreimportantly,evenifwewereabletobuildasystemtomeetarequirementforultrahighdependability,wecouldhaveonlylowconfidencethatwehadachievedthatgoal,becausetheproblemofassessingtheselevelsissuchthatitwouldbeimpracticaltoacquiresufficientsupportingevidence(LittlewoodandStrigini,1993).

Althoughacompletesolutiontotheproblemofassessingultrahighdependabilityisnotanticipated,thereiscertainlyroomforimprovingonwhatcanbedonecurrently.Probabilisticandstatisticalproblemsaboundinthisarea,anditisnecessarytosqueezeasmuchaspossiblefromrelativelysmallamountsofoftendisparateevidence.Thefollowingaresomeoftheareasthatcouldbenefitfrominvestigation.

DesignDiversity,FaultTolerance,andGeneralIssuesofDependence

Onepromisingapproachtotheproblemofachievinghighdependability(herereliabilityand/orsafety)isdesigndiversity:buildingtwoormoreversionsoftherequiredprogramandallowing

anadjudicationmechanism(e.g.,avoter)tooperateatrun-time.Althoughsuchsystemshavebeenbuiltandareinoperationinsafety-criticalcontexts,thereislittletheoreticalunderstandingoftheirbehaviorinoperation.Inparticular,thereliabilityandsafetymodelsarequitepoor.

Forexample,thereisampleevidence(KnightandLeveson,1986)that,inthepresenceofdesignfaults,onecannotsimplyassumethatdifferentversionswillfailindependentlyofoneanother.Thusthesimplehardwarereliabilitymodelsthatinvolvemereredundancy,andassumeindependenceofcomponentfailures,cannotbeused.Itisonlyquiterecentlythatprobabilitymodelinghasstartedtoaddressthisproblemseriously(EckhardtandLee,1985;LittlewoodandMiller,1989).Thesemodelsprovideaformalconceptualframeworkwithinwhichitispossibletoreasonaboutthesubtleissuesofconditionalindependenceinvolvedinthefailureprocessesofdesign-diversesystems.However,theyprovidelittlequantitativepracticalassistancetoasoftwaredesignerorevaluator.

Furtherprobabilisticmodelingisneededtoelucidatesomeofthecomplexissues.Forexample,littleattentionhasbeenpaidtomodelingthefullfaulttolerantsystem,involvingdiversityandadjudication.Inparticular,thepropertiesofthestochasticprocessoffailuresof

Page39

suchsystemsarenotunderstood.If,asseemslikely,individualversionsofaprograminareal-timecontrolsystemexhibitclustersoffailuresintime,howdoestheclusterprocessofthesystemrelatetotheclusterprocessesoftheindividualversions?Althoughsuchissuesseemnarrowlytechnical,theyarevitallyimportantinthedesignofrealsystems,whosephysicalintegritymaybesufficienttosurviveoneortwofailedinputcycles,butnotmany.

Anotherareathathashadlittleworkisprobabilisticmodelingofdifferentpossibleadjudicationmechanismsandtheirfailureprocesses.

JudgmentandDecision-makingFramework

Althoughprobabilityseemstobethemostappropriatemechanismforrepresentinguncertaintyaboutsystemdependability,othercandidatessuchasShafer-Dempsterandpossibilitytheoriesmightbeplausiblealternativesinsafety-criticalcontextswherequantitativemeasuresarerequiredintheabsenceofdataforexample,whenoneisforcedtorelyontheengineeringjudgmentofanexpert.Furtherworkisneededtoelucidatetherelativeadvantagesanddisadvantagesofthedifferentapproachesapplicableinthesoftwareengineeringdomain.

Thereisevidencethathumanjudgment,evenin"hard"sciencessuchasphysics,canbeseriouslyinerror(HenrionandFischhoff,1986):peopleseemtomakeconsistenterrorsandtendtobeoptimisticintheirownjudgmentregardingtheirlikelyerror.Itislikelythatsoftwareengineeringjudgmentsaresimilarlyfallible,andsothisareacallsforsomestatisticalexperimentation.Inaddition,itwouldbebeneficialtohaveformalmechanismsforassessingwhetherjudgmentsarewellcalibratedandforrecalibratingjudgmentandpredictionschemes(ofhumansormodels)thathavebeenshowntobeinaccurate.Thisproblemhassomesimilaritytotheproblemsof

validatingsoftwarereliabilitymodels,alreadymentioned,inwhichprequentiallikelihoodplaysavitalrole.ItalsobearsonmoregeneralapplicationsofBayesianmodelingwhereelicitationofaprioriprobabilityvaluesisrequired.

Itseemsinevitablethatreasoningandjudgmentaboutthefitnessofsafety-criticalsystemswilldependonevidencethatisdisparateinnature.Suchevidencecouldincludedataonfailures,asinreliabilitygrowthmodels;humanexpertjudgment;resultsregardingtheefficacyofdevelopmentprocesses;informationaboutthearchitectureofasystem;orevidencefromformalverification.Iftherequiredjudgmentdependsonanumericalassessmentofasystem'sdependability,thereareclearlyimportantissuesconcerningthecompositionofverydifferentkindsofevidencefromdifferentsources.Theseissuesmay,indeed,beoverridingwhenitcomestochoosingamongthedifferentwaysofrepresentinguncertainty.TheBayestheorem,forexample,mayprovideaneasierwaythandoespossibilitytheorytocombineinformationfromdifferentsourcesofuncertainty.

Aparticularlyimportantproblemconcernsthewayinwhichdeterministicreasoningcanbeincorporatedintothefinalassessmentofasystem.Formalmethodsofachievingdependabilityarebecomingincreasinglyimportant.Suchmethodsrangefromformalnotations,whichassistintheelicitationandexpressionofrequirements,tofullmathematicalverificationofthecorrespondencebetweenaformalspecificationandanimplementation.Oneviewisthattheseapproachesincorporatingdeterministicreasoningtosystemdevelopmentremoveaparticular

Page40

typeofuncertainty,leavingothersuntouched(uncertaintyaboutthecompletenessofaformalspecification,thepossibilityofincorrectproof,andsoon).Oneshouldfactorintothefinalassessmentofasystem'sdependabilitythecontributionfromsuchdeterministic,logicalevidence,neverthelesskeepinginmindthatthereisanirreducibleuncertaintyinone'spossibleknowledgeofthefailurebehaviorofasystem.

StructuralModelingIssues

Concernsaboutthesafetyandreliabilityofsoftware-basedsystemsnecessarilyarisefromtheirinherentcomplexityandnovelty.Systemsnowbeingbuiltaresocomplexthattheycannotbeguaranteedtobefreefromdesignfaults.Theextenttowhichconfidencecanbecarriedoverfromthebuildingofprevioussystemsismuchmorelimitedinsoftwareengineeringthanin"real"engineering,becausesoftware-basedsystemstendtobecharacterizedbyagreatdealofnovelty.

Designersneedhelpinmakingdecisionsthroughoutthedesignprocess,especiallyattheveryhighestlevel.Realsystemsareoftendifficulttoassessbecauseofearlydecisionsregardinghowmuchsystemcontrolwilldependoncomputers,hardware,andhumans.FortheAirbusA320,forexample,theearlydecisiontoplaceahighleveloftrustinthecomputerizedfly-by-wiresystemmeantthatthissystem(andthusitssoftware)neededtohaveabetterthanprobabilityoffailureinatypicalflight.Stochasticmodelingmightaidinsuchhigh-leveldesigndecisionssothatdesignerscanmake"whatif"calculationsatanearlystage.

Experimentation,DataCollection,andGeneralStatisticalTechniques

Adearthofdatahasbeenaprobleminmuchofsafety-criticalsoftwareengineeringsinceitsinception.Onlyahandfulofpublished

datasetsexistsevenforthesoftwarereliabilitygrowthproblem,whichisbyfarthemostextensivelydevelopedaspectofsoftwaredependabilityassessment.Whenthelackofdataarisesfromtheneedforconfidentialityindustrialcompaniesareoftenreluctanttoallowaccesstodataonsoftwarefailuresbecauseofthepossibilitythatpeoplemaythinklesshighlyoftheirproductslittlecanbedonebeyondmakingeffortstoresolveconfidentialityproblems.However,insomecasestheavailabledataaresparsebecausethereisnostatisticalexpertiseonhandtoadviseonwaysinwhichdatacanbecollectedcost-effectively.Itmaybeworthwhiletoattempttoproducegeneralguidelinesfordatacollectionthataddressthespecificdifficultiesofthesoftwareengineeringproblemdomain.

Withnotableexceptions(Eckhardtetal.,1991;KnightandLeveson,1986),experimentationhassofarplayedalow-keyroleinsoftwareengineeringresearch.Somewhatsurprisingly,inviewofitsdifficultyandcost,themostextensiveexperimentationhasinvestigatedtheefficacyofdesigndiversity.Otherareaswhereexperimentalapproachesseemfeasibleandshouldbeencouragedincludetheobviousandgeneralquestionofwhichsoftwaredevelopmentmethodsaremostcost-effectiveinproducingsoftwareproductswithdesirableattributessuchasdependability.Statisticaladviceonthedesignofsuchexperimentswouldbeessential;itmight

Page41

alsobethecasethatinnovationinthedesignofexperimentscouldmakefeasiblesomeinvestigationsthatcurrentlyseemtooexpensivetocontemplate:themainproblemarisesfromtheneedforreplicationovermanysoftwareproducts.

Ontheotherhand,areaswhereexperimentscanbeconductedwithoutthereplicationproblembeingoverwhelminginvolvetheinvestigationofquiterestrictedhypothesesabouttheeffectivenessofspecifictechniques.Forexample,experimentationcouldaddresswhetherthetechniquesthatareclaimedtobeeffectiveforachievingreliability(i.e.,effectivenessofdebugging)aresignificantlybetterthanthose,suchasoperationaltesting,thatwillallowreliabilitytobemeasured.

SOFTWAREMEASUREMENTANDMETRICS

Measurementisatthefoundationofscienceandengineering.Animportantgoalsharedbysoftwareengineersandstatisticiansistoderivereliable,reproducible,andaccuratemeasuresofsoftwareproductsandprocesses.Measurementsareimportantforassessingtheeffectsofproposed"improvements"insoftwareproduction,whethertheybetechnologicalorprocessoriented.Measurementsserveanequallyimportantroleinscheduling,planning,resourceallocation,andcostestimation(seethefirstsectioninthischapter).

EarlypioneeringworkbyMcCabe(1976)andHalstead(1977)seededthefieldofsoftwaremetrics;anoverviewisprovidedbyZuse(1991).Muchoftheattentioninthisareahasfocusedonstaticmeasurementsofcode.Lessattentionhasbeenpaidtodynamicmeasurementsofsoftware(e.g.,measuringtheconnectivityofsoftwaremodulesunderoperatingconditions)andaspectsofthesoftwareproductionprocesssuchassoftwarereuse,especiallyinsystemsemployingobject-orientedlanguages.

Themostwidelyusedcodemetric,theNCSL(noncommentarysourceline),isoftenusedasasurrogateforfunctionality.Surprisingly,sincesoftwareisnownearly50yearsold,standardsforcountingNCSLsremainelusiveinpractice.Forexample,shouldasingle,two-linestatementinClanguagecountasoneNCSLortwo?

Countsoftokens(operatorsoroperands),delimiters,andbranchingstatementsareusedasotherstaticmetrics.Althoughsomeoftheseareclearlymeasuresofsoftwaresize,otherspurporttomeasuremoresubtlenotionsofsoftwarecomplexityandstructure.Ithasbeenobservedthatallsuchmetricsarehighlycorrelatedwithsize.Atthepanel'sinformation-gatheringforum,Munson(1993)concludedthatcurrentsoftwaremetricscaptureapproximatelythree"independent"featuresofasoftwaremodule:programcontrol,programsize,anddatastructure.Astatistical(principal-components)analysisof13metricsonHALprogramsinthespaceshuttleprogramwasthekeytothisfinding.Whileonemightarguethatperformingacommonstatisticaldecompositionofmultivariatedataishardlynovel,itmostcertainlyisinsoftwareengineering.Theimportantimplicationofthatfindingisthattherearefeaturesofsoftwarethatarenotbeingcapturedbytheexistingbatteryofsoftwaremetrics(e.g.,cohesionandcoupling)andifthesearekeydifferentiatorsofpotentiallyhigh-andlow-faultprograms,thereisnowaythatananalysisoftheavailablemetricswillhighlightthiscondition.Ontheothersideoftheledger,thestatisticalcostsofincluding"noisy"versionsofthesame(latent)variableinmodelsandanalysis

Page42

methodsthatarebasedonthesemetrics,suchascostestimation,seemnottohavebeenappreciated.Subsetselectionmethods(e.g.,Mallows,1973)provideonewaytoassessvariableredundancyandtheeffectonfittedmodels,butotherapproachesthatusejudgmentcomposites,orcompositesbasedonotherbodiesofdata(Tukey,1991),willoftenbemoreeffectivethandiscardingmetrics.

Metricstypicallyinvolveprocessesorproducts,aresubjectiveorobjective,andinvolvedifferenttypesofmeasurementscales,forexample,nominal,ordinal,interval,orratio.Anobjectivemetricisameasurementtakenonaproductorprocess,usuallyonanintervalorratioscale.Someexamplesincludethenumberoflinesofcode,developmenttime,numberofsoftwarefaults,ornumberofchanges.Asubjectivemetricmayinvolveaclassificationorqualificationbasedonexperience.Examplesincludethequalityofuseofamethodortheexperienceoftheprogrammersintheapplicationorprocess.

OnestandardforsoftwaremeasurementistheBasiliandWeiss(1984)Goal/Question/Metricparadigm,whichhasfiveparameters:

1.Anobjectofthestudyaprocess,product,oranyotherexperiencemodel;

2.Afocuswhatinformationisofinterest;

3.Apointofviewtheperspectiveofthepersonneedingtheinformation;

4.Apurposehowtheinformationwillbeused;and

5.Adeterminationofwhatmeasurementswillprovidetheinformationthatisneeded.

Theresultsarestudiedrelativetoaparticularenvironment.

Page43

5StatisticalChallengesIncomparisonwithotherengineeringdisciplines,softwareengineeringisstillinthedefinitionstage.Characteristicsofestablisheddisciplinesincludehavingdefined,time-tested,crediblemethodologiesfordisciplinarypractice,assessment,andpredictability.Softwareengineeringcombinesapplicationdomainknowledge,computerscience,statistics,behavioralscience,andhumanfactorsissues.Statisticalresearchandeducationchallengesinsoftwareengineeringinvolvethefollowing:

Generalizingparticularexperimentalresultstoothersettingsandprojects,

Scalingupresultsobtainedinacademicstudiestoindustrialsettings,

Combininginformationacrosssoftwareengineeringprojectsandstudies,

Adoptingexploratorydataanalysisandvisualizationtechniques,

Educatingthesoftwareengineeringcommunityastostatisticalapproachesanddataissues,

Developinganalysismethodstocopewithqualitativevariables,

Providingmodelswiththeappropriateerrordistributionsforsoftwareengineeringapplications,and

Improvingacceleratedlifetesting.

Thefollowingsectionselaborateoncertainofthesechallenges.

SOFTWAREENGINEERINGEXPERIMENTALISSUES

Softwareengineeringisanevolutionaryandexperimentaldiscipline.AsarguedforcefullybyBasili(1993),itisalaboratoryorexperimentalscience.Theterm"experimentalscience"hasdifferentmeaningsforengineersandstatisticians.Forengineers,softwareisexperimentalbecausesystemsarebuilt,studied,andevaluatedbasedontheory.Eachsysteminvestigatesnewideasandadvancesthestateoftheart.Forstatisticians,thepurposeofexperimentsistogatherstatisticallyvalidevidenceabouttheeffectsofsomefactor,perhapsinvolvingtheprocess,methodology,orcodeinasystem.

Therearethreeclassesofexperimentsinsoftwareengineering:

Casestudies,

Academicexperiments,and

Industrialexperiments.

Casestudiesareperhapsthemostcommonandinvolvean"experiment"onasinglelarge-scaleproject.Academicexperimentsusuallyinvolveasmall-scaleexperiment,oftenonaprogramor

Page44

methodology,typicallyusingstudentsastheexperimentalsubjects.Industrialexperimentsfallsomewherebetweencasestudiesandacademicexperiments.Becauseoftheexpenseanddifficultyofperformingextensivecontrolledexperimentsonsoftware,casestudiesareoftenresortedto.Theidealsituationistobeabletotakeadvantageofreal-worldindustrialoperationswhilehavingasmuchcontrolasisfeasible.Muchofthepresentworkinthisareaisatbestanecdotalandwouldbenefitgreatlyfrommorerigorousstatisticaladviceandcontrol.Thepanelforeseesanopportunityforinnovativeworkoncombininginformation(seebelow)fromrelativelydisparateexperiences.

Conductingstatisticallyvalidsoftwareexperimentsischallengingforseveralreasons:

Thesoftwareproductionprocessisoftenchaoticanduncontrolled(i.e.,immature);

Humanvariabilityisacomplicatingfactor;and

Industrialexperimentsareverycostlyandthereforemustproducesomethinguseful.

Manyvariablesinthesoftwareproductionprocessarenotwellunderstoodandaredifficulttocontrolfor.Forsoftwareengineeringexperiments,thefactorsofinterestincludethefollowing:

"People"factors:number,level,organization,processexperience;

Problemfactors:applicationdomain,constraints,susceptibilitytochange;

Processfactors:lifecyclemodel,methods,tools,programminglanguage;

Productfactors:deliverables,systemsize,systemreliability,portability;and

Resourcefactors:targetanddevelopmentmachines,calendartime,budget,existingsoftware,andsoon.

Eachofthesecharacteristicsmustbemodeledorcontrolsdonefortheexperimenttobevalid.

Humanvariabilityisparticularlychallenging,giventhatthedifferenceinqualityandproductivitybetweenthebestandworstprogrammersmaybe20to1.Forexample,inanexperimentcomparingbatchversusinteractivecomputing,Sackman(1970)observeddifferencesinabilityofupto28to1inprogrammersperformingthesametask.Thisvariationcanoverwhelmtheeffectsofachangeinmethodologythatmayaccountfora10%to15%differenceinqualityorproductivity.

Thehumanfactorissostronglyintegratedwitheveryaspectofthesubjectivedisciplineofsoftwareengineeringthatitaloneistheprimedriverofissuestobeaddressed.Thehumanfactorcreatesissuesintheprocess,theproduct,andtheuserenvironment.Measurementsoftheobjects(theproductandtheprocess)areobscuredwhenqualifiedbytheattributes(ambiguousrequirementsandproductivityissuesarekeyexamples).Recognizingandcharacterizingthehumanattributeswithinthecontextofthesoftwareprocessarekeytounderstandinghowtoincludetheminsystemandstatisticalmodels.

Thecapabilitiesofindividualsstronglyinfluencethemetricscollectedthroughoutthesoftwareproductionprocess.Capabilitiesincludeexperience,intelligence,familiaritywiththeapplicationdomain,abilitytocommunicatewithothers,abilitytoenvisiontheproblemspatially,andabilitytoverballydescribethatspatialunderstanding.Althoughnotscientificallyfounded,anecdotalinformationsupportstheincidenceofthesecapabilities(Curtis,1988).

Page45

Forsoftwareengineeringexperiments,thekeyproblemsinvolvesmallsamplesizes,highvariability,manyuncontrolledfactors,andextremedifficultyincollectingexperimentaldata.Traditionalstatisticalexperimentaldesigns,originallydevelopedforagriculturalexperiments,arenotwellsuitedforsoftwareengineering.Atthepanel'sforum,Zweben(1993)discussedaninterestingexampleofanexperimentfromobject-orientedprogramming,involvingafairlycomplexdesignandanalysis.Object-orientedprogrammingisanapproachthatissweepingthesoftwareindustry,butforwhichmuchofthesupportingevidenceisanecdotal.

Example.Thepurposeofthesoftwaredesignandanalysisexperimentwastogatherstatisticallyvalidevidenceabouttheeffectoneffortandqualityofusingtheprinciplesofabstraction,encapsulation,andlayeringtoenhancecomponentsofsoftwaresystems.Theexperimentwasdividedintotwotypesoftasks:

1.Enhancinganexistingcomponenttoprovideadditionalfunctionality,and

2.Modifyingacomponenttoprovidedifferentfunctionality.

Theexperimentalsubjectswerestudentsingraduateclassesonsoftwarecomponentdesignanddevelopment.Thetwoapproachesforthismaintenanceproblemare"whitebox,"whichinvolvesmodifyingtheoldcodetogetthenewfunctionality,and"blackbox,"whichinvolveslayeringonthenewfunctionality.Theexperimentsweredesignedtodetect,foreachtask,differencesbetweenthetwoapproachesinthetimerequiredtomakethemodificationandinthenumberofassociatedfaultsuncovered.Threeexperimentswereconducted.ExperimentAinvolvedanunboundedqueuecomponent.ThesubjectsweregivenabasicAdapackageimplementingenque,deque,andisempty,andthetaskwastoimplementtheoperatorsadd,

copy,clear,append,andreverse.Thesubjectwasinstructedtokeeptrackofthetimespentindesigning,coding,testing,anddebuggingeachoperator,andalsotheassociatednumberofbugsuncoveredineachtask.Thetaskswerecompletedintwoways:bydirectlyimplementingnewoperationsusingtherepresentationofthequeue,andbylayeringonthenewoperatorsascapabilities.ExperimentBinvolvedapartialmapcomponent,andexperimentCinvolvedanalmostconstantmapcomponent.Giventhatinexperimentsinvolvingstudents,theresultsmaybeinvalidatedbyproblemswithdataintegrity,forthisexperimentthestudentparticipantsweretoldthattheresultsoftheexperimentwouldhavenoeffectoncoursegrades.Thecodewasvalidatedbyaninstructortoensurethattherewerenolingeringdefects.Theexperimentalplanwasconductedusingacrossoverdesign.Eachsubjectimplementedtheenhancementstwice,usingboththewhiteboxandtheblackboxmethods.Thisparticularexperimentaldesigncouldtestforthetreatment(layeringornot)effectandtreatmentbysequenceinteraction.Thesubjectdifferenceswerenestedwithinthesequences,andthesequenceswerecounterbalancedbasedonexperiencelevel.Thecarryovereffectofthefirsttreatmentinfluencesthechoiceregardingthecorrectwayoftestingfortreatmenteffects.

Thestatisticalmodelusedtorepresentthebehaviorinthenumberofbugswassophisticatedaswell,anoverdispersedloglinearmodel.Theuseofthismodelallowedforananalysisofnonnormalresponsedatawhilealsopreventinginvalidinferencesthatwouldhaveoccurredhad

Page46

overdispersionnotbeentakenintoaccount.Indeed,onlyexperimentBdisplayedasignificanttreatmenteffectafteradjustmentforoverdispersion.

COMBININGINFORMATION

Theresultsofmanydiversesoftwareprojectsandstudiestendtoleadtomoreconfusionthaninsight.Thesoftwareengineeringcommunitywouldbenefitifmorevalueweregainedfromtheworkthatisbeingdone.Totheextentthatprojectsandstudiesfocusonthesameendpoint,statisticscanhelptofusetheindependentresultsintoaconsistentandanalyticallyjustifiablestory.

Thestatisticalmethodologythataddressesthetopicofhowtofusesuchindependentresultsisrelativelynewandistermed''combininginformation";arelatedsetoftoolsisprovidedbymeta-analysis.AnexcellentoverviewofthismethodologywasproducedbyaCATSpanelanddocumentedinanNRCreport(NRC,1992)thatisnowavailableasanAmericanStatisticalAssociationpublication(ASA,1993).Thereportdocumentsvariousapproachestotheproblemofhowtocombineinformationanddescribesnumerousspecificapplications.Oneoftherecommendationsmadeinit(p.182)iscrucialtoachievingadvancesinsoftwareengineering:

Thepanelurgesthatauthorsandjournaleditorsattempttoraisethelevelofquantitativeexplicitnessinthereportingofresearchfindings,bypublishingsummariesofappropriatequantitativemeasuresonwhichtheresearchconclusionsarebased(e.g.,ataminimum:samplesizes,means,andstandarddeviationsforallvariables,andrelevantcorrelationmatrices).

Itisnotsensibletomerelycombinep-valuesfromindependentstudies.Itisclearlybettertotakeweightedaveragesofeffectswhentheweightsaccountfordifferencesinsizeandsensitivityacrossthe

studiestobecombined.

Example.Kitchenham(1991)discussesanissueincostestimationthatinvolveslookingacross10differentsourcesconsistingof17differentsoftwareprojects.Theissueiswhethertheexponentßinthebasiccostestimationmodel,effortµsizeß,issignificantlydifferentfrom1.Theusualinterpretationofßisthe"overheadintroducedbyproductsize,"sothatavaluegreaterthan1impliesthatrelativelymoreeffortisrequiredtoproducelargesoftwaresystemsthantoproducesmallerones.Manycitesuch"diseconomiesofscale"insoftwareproductionasevidenceinsupportoftheirmodelsandtools.

The17softwareprojectsarelistedinTable4.Fortunately,thecitedsourcescontainbothpointestimates(b)oftheexponentanditsestimatedstandarderror.Thesesummarystatisticscanbeusedtoestimateacommonexponentandultimatelytestthehypothesisthatitisdifferentfrom1.

Page47

Table4.Reportedandderiveddataon17projectsconcernedwithcostestimation.

Study b SE(b) Var(b) w

Bai-Bas 0.951 0.068 0.004624 21.240Bel-Leh 1.062 0.101 0.010200 18.990Your 0.716 0.230 0.052900 10.490Wing 1.059 0.294 0.086440 7.758Kemr 0.856 0.177 0.031330 13.550Boehm.Org 0.833 0.184 0.033860 13.100Boehm.semi 0.976 0.133 0.017690 16.630Boehm.Emb 1.070 0.104 0.010820 18.770Kit-Tay.ICL 0.472 0.323 0.104300 6.813Kit-Tay.BTSX 1.202 0.300 0.090000 7.550Kit-Tay.BTSW 0.495 0.185 0.034220 13.040DS1.1 1.049 0.125 0.015630 17.220DS1.2 1.078 0.105 0.011020 18.700DS1.3 1.086 0.289 0.083520 7.938DS2.New 0.178 0.134 0.017960 16.550DS2.Ext 1.025 0.158 0.024960 14.830DS3 1.141 0.077 0.005929 20.670

SOURCE:Reprinted,withpermission,fromKitchenham(1992).(c)1992byNationalComputingCentre,Ltd.

FollowingtheNRCrecommendationsoncombininginformationacrossstudies(NRC,1992),theappropriatemodel(theso-calledrandomeffectsmodelinmeta-analysis)allowsforasystematicdifferencebetweenprojects(e.g.,biasindatareporting,managementstyle,andsoon)thataveragestozero.Underthismodel,theoverallexponentisestimatedasaweightedaverageoftheindividual

exponentswheretheweightshavetheformwi=var(bi)+t2andthecommonbetween-projectcomponentofvarianceisestimatedby

whereQ=Swi(bi- )2.ThestatisticQisitselfatestofthehomogeneityofprojectsandunderanormalityassumptionisdistributedasX2k-1.ForthesedataoneobtainsQ=55.19,whichstronglyindicatesheterogeneityacrossprojects.Althoughtherandomeffectsmodelanticipatessuchheterogeneity,otherapproachesthatmodelthedifferencesbetweenprojects(e.g.,

Page48

regressionmodels)maybemoreinformative.Sincenoexplanatoryvariablesareavailable,thisdiscussionproceedsusingthesimplermodel.

Theestimatedbetween-projectcomponentofvarianceist2=0.0425,whichissurprisinglylargeandisperhapshighlyinfluencedbytwoprojectswithb'slessthan0.5.Combiningthisestimatewiththeindividualwithin-projectvariancesleadstotheweightsgiveninthefinalcolumnofTable4.Thustheoverallestimatedexponentis =0.911withestimatedstandarderrors=0.0640(=Ö[1/Swi]).Combiningthesetwoestimatesleadsreadilytoa95%confidenceintervalforßof(0.78,1.04).Thusthedatainthesestudiesdonotsupportthediseconomies-of-scaleargument.

Evenbetterthanpublishedsummarieswouldbeacentralrepositoryofthedataarisingfromastudy.Thisinformationwouldallowassessmentofvariousdeterminationsofsimilaritiesbetweenstudies,aswellaspotentialbiases.Thepanelisawareofseveralinitiativestobuildsuchdatarepositories.TheproposedNationalSoftwareCouncilhasasoneofitsprimaryresponsibilitiestheconstructionandmaintenanceofanationalsoftwaremeasurementsdatabase.Atthepanel'sforum,aspecializeddatabaseonsoftwareprojectsintheaeronauticsindustrywasalsodiscussed(Keller,1993).

Anissuerelatedtocombininginformationfromdiversesourcesconcernsthetranslationtoindustryofsmallexperimentalstudiesand/orpublishedcasestudiesdoneinanacademicenvironment.Seriousdoubtsexistinindustryastotheupwardscalabilityofmostofthesestudiesbecausepopulations,projectsizes,andenvironmentsarealldifferent.Expectationsdifferregardingquality,anditisunclearwhethervariablesmeasuredinasmallstudyarethevariablesinwhichindustryhasaninterest.Thestatisticalcommunityshoulddevelop

stochasticmodelstopropagateuncertainty(includingvariabilityassessment)ondifferentcontrolfactorssothatadjustmentsandpredictionsapplicabletoindustry-levelenvironmentscanbemade.

VISUALIZATIONINSOFTWAREENGINEERING

Scientificvisualizationisanemergingtechnologythatisdrivenbyever-decreasinghardwarepricesandtheassociatedincreasingsophisticationofvisualizationsoftware.Visualizationinvolvestheinteractivepictorialdisplayofdatausinggraphics,animation,andsound.Muchoftherecentprogressinvisualizationhascomefromtheapplicationofcomputergraphicstothree-dimensionalimageanalysisandrendering.Datavisualization,asubsetofscientificvisualization,focusesonthedisplayandanalysisofabstractdata.Someoftheearliestandbest-knownexamplesofdatavisualizationinvolvestatisticaldatadisplays.

Themotivationforapplyingvisualizationtosoftwareengineeringistounderstandthecomplexity,multidimensionality,andstructureembodiedinsoftwaresystems.Muchoftheoriginalresearchinsoftwarevisualizationtheuseoftypography,graphicdesign,animation,andcinematographytofacilitatetheunderstandingandenhancementofsoftwaresystems-wasperformedbycomputerscientistsinterestedinunderstandingalgorithms,particularlyinthe

Page49

contextofeducation.Applyingthequantitativefocusofstatisticalgraphicsmethodstocurrentlypopularscientificvisualizationtechniquesisafertileareaforresearch.

Visualizingsoftwareengineeringdataischallengingbecauseofthediversityofdatasetsassociatedwithsoftwareprojects.Fordatasetsinvolvingsoftwarefaults,timestofailure,costandeffortpredictions,andsoon,thereisaclearstatisticalrelationshipofinterest.Softwarefaultdensitymayberelatedtocodecomplexityandtoothersoftwaremetrics.Traditionaltechniquesforvisualizingstatisticaldataaredesignedtoextractquantitativerelationshipsbetweenvariables.Othersoftwareengineeringdatasetssuchastheexecutiontraceofaprogram(thesequenceofstatementsexecutedduringatestrun)orthechangehistoryofafilearenoteasilyvisualizedusingconventionaldatavisualizationtechniques.Theneedforrelevanttechniqueshasledtothedevelopmentofspecializeddomain-specificvisualizationcapabilitiespeculiartosoftwaresystems.Applicationsincludethefollowing:

Configurationmanagementdata(Eicketal.,1992b),

Functioncallgraphs(Ganseretal.,1993),

Codecoverage,

Codemetrics,

Algorithmanimation(BrownandHershberger,1992;Stasko,1993),

Sophisticatedtypesettingofcomputerprograms(BaeckerandMarcus,1988),

Softwaredevelopmentprocess,

Softwaremetrics(Ebert,1992),and

Softwarereliabilitymodelsanddata.

Someoftheseapplicationsarediscussedbelow.

ConfigurationManagementData

Arichsoftwaredatabasesuitableforvisualizationinvolvesthecodeitself.Inproductionsystems,thesourcecodeisstoredinconfigurationmanagementdatabases.Thesedatabasescontainacompletehistoryofthecodewitheverysourcecodechangerecordedasamodificationrequest.Alongwiththeaffectedlines,thesourcecodedatabaseusuallycontainsotherinformationsuchastheidentityoftheprogrammermakingthechanges,datethechangesweresubmitted,reasonforthechange,andwhetherthechangewasmeanttoaddfunctionalityorfixabug.Thevariablesassociatedwithsourcecodemaybecontinuous,categorical,orbinary.Foralineinacomputerprogram,whenitwaswrittenis(essentially)continuous,whowroteitiscategorical,andwhetherornotthelinewasexecutedduringaregressiontestisbinary.

Example.Figure1(see"Implementation"inChapter3)showsproductioncodewritteninClanguagefromamoduleinAT&T's5ESSswitch(Eick,1994).Inthedisplay,rowcoloristiedtothecode'sage:themostrecentlyaddedlinesareinredandtheoldestinblue,withacolorspectruminbetween.Dynamicgraphicstechniquesareemployedforincreasingtheeffectivenessofthedisplay.TherearefiveinteractiveviewsofdatainFigure1:

Page50

1.Therowscorrespondingtothetextlines,

2.Thevaluesonthecolorscale,

3.Thefilenamesabovethecolumns,

4.Thebrowserwindows,and

5.Thebarchartbeneaththecolorscale.

Eachoftheviewsislinked,unitedthroughtheuseofcolor,andactivatedbyusingamousepointer.Thismodeofmanipulatingthedisplay,calledbrushingbyBeckerandCleveland(1987)andbyBeckeretal.(1987),isparticularlyeffectiveforexploringsoftwaredevelopmentdata.

FunctionCallGraphs

PerhapsthemostcommonvisualizationofsoftwareisafunctioncallgraphasshowninFigure5.Functioncallgraphsareawidelyused,visual,tree-likedisplayofthefunctioncallsinapieceofcode.Theyshowcallingrelationshipsbetweenmodulesinasystemandareonerepresentationofsoftwarestructure.Aproblemwithfunctioncallgraphsisthattheybecomeoverloadedwithtoomuchinformationforallbutthesmallestsystems.Oneapproachtoimprovingtheusefulnessoffunctioncallgraphsmightinvolvetheuseofdynamicgraphicstechniquestofocusthedisplayonthevisuallyinformativeregions.

TestCodeCoverage

Anotherinterestingexampleofsourcecodevisualizationinvolvesshowingtestsuitecodecoverage.Figure6showsthestatementcoverageandexecution"hotspots"foraprogramthathasbeenrunthroughitsregressiontest.Therowindentationandlinelengthhave

beenturnedoffsothateachlinereceivesthesameamountofvisualspace.Themostfrequentlyexecutedlinesareshowninredandtheleastfrequentlyinblue,withacolorspectruminbetween.Therearetwospecialcolors:theblacklinescorrespondtononexecutablelinesofCcodesuchascomments,variabledeclarations,andfunctions,andthegraylinescorrespondtotheexecutablelinesofcodethatwerenotexecuted.Thesearethelinesthattheregressiontestmissed.

CodeMetrics

AsdiscussedinChapter4(inthesection"SoftwareMeasurementandMetrics"),staticcodemetricsattempttoquantifyandmeasurethecomplexityofcode.Thesemetricsareusedtoidentifyportionsofprogramsthatareparticularlydifficultandarelikelytobesubjecttodefects.Onevisualizationmethodfordisplayingcodecomplexitymetricsusesaspace-fillingrepresentation(BakerandEick,1995).Takingadvantageofthehierarchicalstructureofcode,eachsubsystem,module,andfileistiledonthedisplay,whichshowsthemasnested,space-fillingrectangleswitharea,color,andfillencodingsoftwaremetrics.Thistechniquecandisplay

Page51

therelativesizesofasystem'scomponents,therelativestabilityofthecomponents,thelocationofnewfunctionality,thelocationoferror-pronecodewithmanyfixestoidentifiedfaults,and,usinganimation,thehistoricalevolutionofthecode.

Example.Figure7displaystheAT&T5ESSswitchingcodeusingtheSeeSys(system,adynamicgraphicsmetricsvisualizationsystem.Interactivecontrolsenabletheusertomanipulatethedisplay,resetthecolors,andzoominonparticularmodulesandfiles,providinganinteractivesoftwaredataanalysisenvironment.Thespace-fillingrepresentation:

Showsmodules,files,andsubsystemsincontext;

Providesanoverviewofacompletesoftwaresystem;and

Appliesstatisticaldynamicgraphicstechniquestotheproblemofvisualizingmetrics.

Amajordifferenceintheuseofgraphicsinscientificvisualizationandstatisticsisthatfortheformer,graphsaretheend,whereasforthelatter,theyaremoreoftenthemeanstoanend.Thusvisualizationsofsoftwarearecrucialtostatisticalsoftwareengineeringtotheextentthattheyfacilitatedescriptionandmodelingofsoftwareengineeringdata.Discussedbelowaresomepossibilitiesrelatedtotheexamplesdescribedinthischapter.

TherainbowfilesinFigure1suggestthatcertaincodeischangedfrequently.Frequentlychangedcodeisoftenerror-prone,difficulttomaintain,andproblematic.Softwareengineersoftenclaimthatcode,orpeople'sunderstandingofit,decayswithage.Eventuallythecodebecomesunmaintainableandmustberewritten(re-engineered).

Statisticalmodelsareneededtocharacterizethenormalrateofchangeandthereforedeterminewhetherthecurrentfilesareunusual.Suchmodelsneedtotakeaccountofthenumberofchanges,locationsoffaults,typeoffunctionality,pastdevelopmentpatterns,andfuturetrends.Forexample,acommonsoftwaredesigninvolveshavingasimplemainroutinethatcallsonseveralotherprocedurestoinvokeneededfunctionality.Themainroutinemaybechangedfrequentlyasprogrammersmodifysmallsnippetsofcodetoaccesslargechunksofnewcodethatisputintootherfiles.Forthiscode,manysimple,smallchangesarenormalanddonotindicatemaintenanceproblems.Ifmodelsexisted,thenitwouldbepossibletomakequantitativecomparisonsbetweenfilesratherthanthequalitativecomparisonsthatarecurrentlymade.

Figure5suggestssomenaturalcovariatesandmodelsforimprovingtheefficiencyofsoftwaretesting.Currentcompilertechnologycaneasilyanalyzecodetoobtainthefunctions,lines,andeventhepathsexecutedbycodeintestsuites.Forcertainclassesofprogrammingerrorssuchastypographicalerrors,theincrementalcodecoverageisanidealcovariateforestimatingtheprobabilityofdetectinganerror.Theexecutionfrequencyofblocksofcodeorfunctionsisclearlyrelatedtotheprobabilityoferrordetection.Figure5showsclearlythatsmallportionsoftheprogramareheavilyexercisedbutthatmostofthecodeisnottouched.Inanindirectwayoperationalprofiletestingattemptstocapturethisideabytestingthefeatures,andthereforethecode,inrelationtohowoftentheywillbeused.Thisnotionsuggeststhatstatisticaltechniquesinvolvingcovariatescanimprovetheefficiencyofsoftwaretesting.

Figure7suggestsnovelwaysofdisplayingsoftwaremetrics.Thecurrentpracticeistoidentifyoverlycomplexfilesforspecialcareandmanagementattention.Theproceduresfor

Page52

identifyingcomplexcodeareoftenbasedonverycleverandsophisticatedarguments,butnotondata.Astatisticalapproachmightattempttocorrelatethecomplexityofcodewiththelocationsofpastfaultsandinvestigatetheirpredictivepower.Statisticalmodelsthatcanrelatecomplexitymetricstoactualfaultswillincreasethemodels'practicalefficiencyforreal-lifesystems.Thesemodelsshouldnotbedevelopedintheabsenceofdataaboutthecode.Simplewaysofpresentingsuchdata,suchasanorderedlistoffaultdensity,filebyfile,canbeveryeffectiveinguidingtheselectionofanappropriatemodel.Inothercases,microanalysis,oftendrivenbygraphicalbrowsers,mightsuggestaricherclassofmodelsthatthedatacouldsupport.Forexample,softwarefaultratesareoftenquotedintermsofthenumberoffaultsper1,000linesofNCSL.ThelinesinFigure1canbecolor-codedtoshowthehistoricallocationsofpastfaults.Inotherrepresentations(notshown),clearspatialpatternswithfaultsareconcentratedinparticularfilesandinparticularregionsofthefiles,suggestingthatspatialmodelsoffaultdensitymightworkverywellinhelpingtoidentifyfault-pronecode.

ChallengesforVisualization

Theresearchopportunitiesandchallengesinvisualizingsoftwaredataaresimilartothoseforvisualizingotherlargeabstractdatabases:

1.Softwaredataareabstract;thereisnonaturaltwo-dimensionalorthree-dimensionalrepresentationofthedata.Aresearchchallengeistodiscovermeaningfulrepresentationsofthedatathatenableananalysttounderstandthedataincontext.

2.Muchsoftwaredataarenontraditionalstatisticaldatasuchasthechangehistoryofsourcecode,duplicationinmanuals,orthestructureofarelationaldatabase.Newmetaphorsmustbediscoveredforharmonioustransferinformation.

3.Thedatabaseassociatedwithlargesoftwaresystemsmaybehuge,potentiallycontainingmillionsofobservations.Effectivestatisticalgraphicstechniquesmustbeabletocopewiththevolumeofdatafoundinmodernsoftwaresystems.

4.Thelackofeasy-to-usesoftwaretoolsmakesthedevelopmentofhigh-qualitycustomvisualizationsparticularlydifficult.Currently,visualizationsmustbehand-codedinlow-levellanguagessuchasCorC++.Thisisatime-consumingtaskthatcanbecarriedoutonlybythemostsophisticatedprogrammers.

OpportunitiesforVisualization

Visualizationsassociatedwithsoftwareinvolvethecodeitself,dataassociatedwiththesystem,theexecutionoftheprogram,andtheprocessforcreatingthesystem.Opportunitiesincludethefollowing:

1.Objects/Patterns.Object-orientedprogrammingisrapidlybecomingstandardfordevelopmentofnewsystemsandisbeingretrofittedintoexistingsystems.Effective

Page53

Figure5.Functioncallgraphsshowingthecallingpatternbetweenprocedures.Thetoppanelshowsaninterpretable,easy-to-comprehenddisplay,whereasthe

bottompanelisoverlybusyandvisuallyconfusing.

Page55

Figure6.aSeeSoftTMdisplayshowingcodecoverageforaprogramexecutingitsregressiontest.Thecolorofeachlineisdeterminedbythenumberoftimesthatitexecuted.Thecolorsrangefromred(the"hotspots")todeepblue(forcodeexecutedonlyonce)usingared-green-bluecolorspectrum.Therearetwospecialcolors:theblacklinesarenon-executablelinesofcodesuchasvariable

declarationsandcomments,andthegraylinesarethenon-executed(notcovered)lines.Thefigureshowsthatgeneratingregressiontestswithhigh

coverageisquitedifficult.Source:Eick(1994).

Page57

Figure7.Adisplayofsoftwaremetricsforamillion-linesystem.Therectangleformingtheoutermostboundaryrepresentstheentiresystem.Therectanglescontainedwithintheboundaryrepresentthesize(inNCSLs)ofindividual

subsystems(eachlabeledwithasinglecharacterA-Z,a-t),andmoduleswithinthesubsystems.Colorisusedheretoredundantlyencodesizeaccordingtothe

colorschemeintheslideratthebottomofthescreen.

Page59

displaysneedtobedevelopedforunderstandingtheinheritance(ordependency)structure,semanticrelationshipsamongobjects,andtherun-timelifecycleofobjects.

2.Performance.Softwaresystemsinevitablyruntooslowly,makingrun-timeperformanceanimportantconsideration.Hostsystemsoftencollectlargevolumesoffine-grain(thatis,low-level)performancedataincludingfunctioncallingpatterns,lineexecutioncounts,operatingsystempagefaults,heapusage,andstackspace,aswellasdiskusage.Noveltechniquestounderstandanddigestdynamicprogramexecutiondatawouldbeimmediatelyuseful.

3.Parallelism.Recently,massivelyparallelcomputerswithtenstothousandsofcooperatingprocessorshavestartedtobecomewidelyavailable.Programmingthesecomputersinvolvesdevelopingnewdistributedalgorithmsthatdivideimportantcomputationsamongtheprocessors.Mostoftenanessentialaspectofthecomputationinvolvescommunicatinginterimresultsbetweenprocessorsandsynchronizingthecomputations.Visualizationtechniquesareacrucialtoolforenablingprogrammerstomodelanddebugsubtlecomputations.

4.Three-dimensional.Workstationscapableofrenderingrealisticthree-dimensionaldisplaysarerapidlybecomingwidelyavailableatreasonableprices.Newvisualizationtechniquesleveragingthree-dimensionalcapabilitiesshouldbedevelopedtoenablesoftwareengineerstocopewiththeever-increasingcomplexityofmodernsoftwaresystems.

ORTHOGONALDEFECTCLASSIFICATION

Theprimaryfocusofsoftwareengineeringistomonitorasoftwaredevelopmentprocesswithaviewtowardimprovingqualityand

productivity.Forimprovingquality,therehavebeentwodistinctapproaches.Thefirstconsiderseachdefectasuniqueandtriestoidentifyacause.Thesecondconsidersadefectasasamplefromanensembletowhichaformalstatisticalreliabilitymodelisfitted.Chillaregeetal.(1992)proposedanewmethodologythatstrikesabalancebetweenthesetwoendsofspectrum.Thismethod,calledorthogonaldefectclassification,isbasedonexploratorydataanalysistechniquesandhasbeenfoundtobequiteusefulatIBM.Itrecognizesthatthekeytoimprovingaprocessistoquantifyvariouscause-and-effectrelationshipsinvolvingdefects.

Thebasicapproachisasfollows.First,classifydefectsintovarioustypes.Then,obtainadistributionofthetypesacrossdifferentdevelopmentphases.Finally,havingcreatedthesereferencedistributionsandtherelationshipsamongthem,comparethemwiththedistributionsobservedinanewproductorrelease.Iftherearediscrepancies,takecorrectiveaction.

Operationally,thedefectsareclassifiedaccordingtoeight''orthogonal"(mutuallyexclusive)defecttypes:functional,assignment,interface,checking,timing,build/package/merge,datastructuresandalgorithms,anddocumentation.Further,developmentphasesaredividedintofourbasicstages(wheredefectscanbeobserved):design,unittest,functiontest,andsystemtest.Foreachstageandeachdefecttype,arangeofacceptablebaselinedefectratesisdefinedbyexperience.Thisinformationisusedtoimprovethequalityofanewproductorrelease.Toward

Page60

thisend,foragivendefecttype,defectdistributionsacrossdevelopmentstagesarecomparedwiththebaselinerates.Foreachchainofresultssay,toohighearlyon,lowerlater,andhighattheendanimplicationisderived.Forexample,theimplicationmaybethatfunctiontestingshouldberevamped.

Thismethodologyhasbeenextendedtoastudyofthedistributionoftriggers,thatis,theconditionsthatallowadefecttosurface.First,itisimplicitinthisapproachthatthereisnosubstituteforagooddataanalysis.Second,assumptionsclearlyarebeingmadeaboutthestationarityofreferencedistributions,anapproachthatmaybeappropriateforastableenvironmentwithsimilarprojects.Thus,itmaybenecessarytocreateclassesofreferencedistributionsandclassesofsimilarprojects.Perhapssomeclusteringtechniquesmaybevaluableinthiscontext.Third,althoughthedefecttypesaremutuallyexclusive,itispossiblethatafaultmayresultinmanydefects,andviceversa.Thismultiple-spawningmaycauseseriousimplementationdifficulties.Propermeasurementprotocolsmaydiminishsuchmultipropagation.Finally,givengood-qualitydata,itmaybepossibletoextendorthogonaldefectclassificationtoeffortstoidentifyrisksintheproductionofsoftware,perhapsusingdatatoprovideearlyindicatorsofproductqualityandpotentialproblemsconcerningscheduling.Thepotentialofthislineofinquiryshouldbecarefullyinvestigated,sinceitcouldopenupanexcitingnewareainsoftwareengineering.

Page61

6SummaryandConclusionsInthe1950s,astheproductionlinewasbecomingthestandardforhardwaremanufacturing,Demingshowedthatstatisticalprocesscontroltechniques,inventedoriginallybyShewhart,wereessentialtocontrollingandimprovingtheproductionprocess.Deming'scrusadehashadalastingimpactinJapanandhaschangeditsworldwidecompetitiveposition.Ithasalsohadaglobalimpactontheuseofstatisticalmethods,thetrainingofstatisticians,andsoforth.

Inthe1990stheemphasisisonsoftware,ascomplexhardware-basedfunctionalityisbeingreplacedbymoreflexible,software-basedfunctionality.Smallprogramscreatedbyafewprogrammersarebeingsupersededbymassivesoftwaresystemscontainingmillionsoflinesofcodecreatedbymanyprogrammerswithdifferentbackgrounds,training,andskills.Thisistheworldofso-calledsoftwarefactories.Thesefactoriesatpresentdonotfitthetraditionalmodelof(hardware)factoriesandmorecloselyresemblethedevelopmenteffortthatgoesintodesigningnewproducts.However,withthespreadofsoftwarereuse,theincreasingavailabilityoftoolsforautomaticallycapturingrequirements,generatingcodeandtestcases,andprovidinguserdocumentation,andthegrowingrelianceonstandardizedtuningandinstallationprocessesandstandardizedproceduresforanalysis,themodelismovingclosertothatofatraditionalfactory.Theeconomyofscalethatisachievablebyconsideringsoftwaredevelopmentasamanufacturingprocess,afactory,ratherthanahandcraftingprocess,isessentialforpreservingU.S.competitiveleadership.Thechallengeistobuildthesehugesystemsinacost-effectivemanner.Thepanelexpectsthischallengeto

concernthefieldofsoftwareengineeringfortherestofthedecade.Hence,anysetofmethodologiesthatcanhelpinmeetingthischallengewillbeinvaluable.Moreimportantly,theuseofsuchmethodologieswilllikelydeterminethecompetitivepositionsoforganizationsandnationsinvolvedinsoftwareproduction.

Withtheamountofvariabilityinvolvedinthesoftwareproductionprocessanditsmanysubprocesses,aswellasthediversityofdevelopers,users,anduses,itisunlikelythatadeterministiccontrolsystemwillhelpimprovethesoftwareproductionprocess.Asinstatisticalphysics,onlyatechnologybasedonstatisticalmodeling,somethingakintostatisticalcontrol,willwork.ThepanelbelievesthatthejunctureathandisnotverydifferentfromtheonereachedbyDeminginthe1950swhenhebegantopopularizetheconceptofstatisticalprocesscontrol.Whatisneedednowisadetailedunderstandingbystatisticiansofthesoftwareengineeringprocess,aswellasanappreciationbysoftwareengineersofwhatstatisticianscanandcannotdo.Ifcollaborativeinteractionsandthebuildingofthismutualunderstandingcanbecultivated,thentherelikelywilloccuramajorimpactofthesameorderofmagnitudeasDeming'sintroductionofstatisticalprocesscontroltechniquesinhardwaremanufacturing.

Ofcourse,thisisnottosaythatallsoftwareproblemsaregoingtobesolvedbystatisticalmeans,justasnotallautomobilemanufacturingproblemscanbesolvedbystatisticalmeans.Onthecontrary,thesoftwareindustryhasbeentechnologydriven,andthebulkoffuturegainsinproductivitywillcomefromnew,creativeideas.Forexample,muchofthegaininproductivity

Page62

between1950and1970occurredbecauseofthereplacementofassemblercodingbyhigh-levellanguages.

Nevertheless,asthepanelattemptstopointoutinthisreport,increasedcollaborationbetweensoftwareengineersandstatisticiansholdsmuchpromiseforresolvingproblemsinsoftwaredevelopment.Someofthecatalyststhatareessentialforthisinteractiontobeproductive,aswellassomeoftherelatedresearchopportunitiesforsoftwareengineersandstatisticians,arediscussedbelow.

INSTITUTIONALMODELFORRESEARCH

Thepanelstronglybelievesthattherightmodelforstatisticalresearchinsoftwaredevelopmentiscollaborativeinnature.Itisessentialtoavoidsolvingthe"wrong"problems.Itisequallyimportantthattheproblemsidentifiedinthisreportnotbe"solved"bystatisticiansinisolation.Statisticiansneedtoattainadegreeofcredibilityinsoftwareengineering,andsuchcredibilitywillnotbeachievedbydevelopingNnewreliabilitymodelswithhigh-powerasymptotics.Theidealcollaborationpartnersstatisticiansandsoftwareengineersinworkaimedatimprovingarealsoftwareprocessorproduct.

Thisconclusionassumesnotonlythatstatisticiansandsoftwareengineershaveamutualdesiretoworktogethertosolvesoftwareengineeringproblems,butalsothatfundingandrewardmechanismsareinplacetostimulatethetechnicalcollaboration.Uptonow,suchincentiveshavenotbeenthenorminacademicinstitutions,giventhat,forexample,coauthoredpapershavebeengenerallydiscountedbypromotionevaluationcommittees.Moreover,atfundingagencies,proposalsforcollaborativeworkhavetendedto"fallthroughthecracks"becauseofalackofinterdisciplinaryexpertisetoevaluatetheirmerits.Thepanelexpectssuchbarrierstobereducedinthecomingyears,butintheinterim,industrycanplayaleadershiprolein

nurturingcollaborationsbetweensoftwareengineersandstatisticiansandcanreduceitsownsetofbarriers(forinstance,thoserelatedtoproprietaryandintellectualpropertyinterests).

MODELFORDATACOLLECTIONANDANALYSIS

Asdiscussedaboveinthisreport,forstatisticalapproachestobeuseful,itisessentialthathigh-qualitydatabeavailable.Qualityincludesmeasuringtherightthingsattherighttimespecifically,adoptedsoftwaremetricsmustberelevantforeachoftheimportantstagesofthedevelopmentlifecycle,andtheprotocolofmetricsforcollectingdatamustbewelldefinedandwellexecuted.Withoutcarefulpreparationthattakesaccountofallofthesedataissues,itisunlikelythatstatisticalmethodswillhaveanyimpactonagivensoftwareprojectunderstudy.Forthisreason,itiscrucialtohavethesoftwareindustrytakealeadpositioninresearchonstatisticalsoftwareengineering.

Figure8,amodelfortheinteractionbetweenresearchersandthesoftwaredevelopmentprocess,displaysahigh-levelspiralviewofthesoftwaredevelopmentprocessofferedbyDalal

Page63

Figure8.Spiralsoftwaredevelopmentprocessmodel.SSEM,statisticalsoftwareengineeringmodule.

Figure9.Statisticalsoftwareengineeringmoduleatstagen.

etal.(1994).Figure9givesamoredetailedviewofthestatisticalsoftwareengineeringmodule(SSEM)atthecenterofFigure8.

TheSSEMhasseveralcomponents.Oneofitsmajorfunctionsistoactasthecentralrepositoryforallrelevantprojectdata(statisticalornonstatistical).Thusthismoduleservesasaresourcefortheentireproject,interfacingwitheverystage,typicallyatitsrevieworconclusion.Forexample,theSSEMwouldbeusedattherequirementreviewstage,whendataoninspection,faults,times,effort,and

coverageareavailable.Fortesting,informationwouldbegatheredattheendofeachstageoftesting(unit,integration,system,alpha,beta,...)aboutthenumberofopenfaults,closedfaults,typesofproblems,severity,changes,andeffort.Suchdatawouldcomefromtestcasemanagementsystems,changemanagementsystems,andconfigurationmanagementsystems.

Page64

AdditionalelementsoftheSSEMincludecollectionprotocols,metrics,exploratorydataanalysis(EDA),modeling,confirmatoryanalysis,andconclusions.AcriticalpartoftheSSEMwouldberelatedtoroot-causeanalysis.AnalysiscouldbeassimpleasIshikawa'sfishbonediagram(Ishikawa,1976),ormorecomplex,suchasorthogonaldefectclassification(describedinChapter5).Thiscapabilityaccordswiththebeliefthatacarefulanalysisofrootcauseisessentialtoimprovingthesoftwaredevelopmentprocess.CentralplacementoftheSSEMensuresthattheresultsofvariousanalyseswillbecommunicatedatallrelevantstages.Forexample,atthecodereviewstage,theSSEMcansuggestwaysofimprovingtherequirementprocessaswellaspointoutpotentiallyerror-pronepartsofthesoftwarefortesting.

ISSUESINEDUCATION

Enormousopportunitiesandmanypotentialbenefitsarepossibleifthesoftwareengineeringcommunitylearnsaboutrelevantstatisticalmethodsandifstatisticianscontributetoandcooperateintheeducationoffuturesoftwareengineers.Theareasoutlinedbelowarethosethatarerelevanttoday.Asthecommunitymaturesinitsstatisticalsophistication,theareasthemselvesshouldevolvetoreflectthematurationprocess.

Designedexperiments.Softwareengineeringisinherentlyexperimental,yetrelativelyfewdesignedexperimentshavebeenconducted.Softwareengineeringeducationprogramsmuststressthedesirability,whereverfeasible,ofvalidatingnewtechniquesthroughtheuseofstatisticallyvalid,designedexperiments.Partofthereasonforthelackofexperimentationinsoftwareengineeringmayinvolvethelargevariabilityinhumanprogrammingcapabilities.AspointedoutinChapter5,themosttalented

programmermaybe20timesmoreproductivethantheleasttalented.Thisdisparitymakesitdifficulttoconductexperimentsbecausethebetween-subjectvariabilitytendstooverwhelmthetreatmenteffects.Experimentaldesignsthataddressbroadvariabilityinsubjectsshouldbeemphasizedinthesoftwareengineeringcurriculum.Asimilaremphasisshouldbegiventorandom-andfixed-effectsmodelswithhierarchicalstructureandtodistinguishingwithin-andbetween-experimentvariability.

ThereisalsoaroleforthestatisticsprofessioninthedevelopmentofguidelinesforexperimentsinsoftwareengineeringakintothosemandatedbytheFoodandDrugAdministrationforclinicaltrials.Theseguidelineswillrequirereformulationinthesoftwareengineeringcontextwiththepossibleinvolvementofvariousindustryandacademicforums,includingtheInstituteofElectricalandElectronicsEngineers,theAmericanStatisticalAssociation,andtheSoftwareEngineeringInstitute.

Exploratorydataanalysis.Itisimportanttoappreciatethestrengthsandthelimitationsofavailabledatabychallengingthedatawithabatteryofnumerical,tabular,andgraphicalmethods.Exploratorydataanalysismethods(e.g.,Tukey,1977;MostellerandTukey,1977)areessentially"modelfree,"sothatinvestigatorscanbesurprisedby

Page65

unexpectedbehaviorratherthanhavetheirthinkingconstrainedbywhatisexpected.Oneoftheattitudestowardstatisticalanalysisthatisimportanttoconveyisthatof

data=fit+residual.

Theiterativenatureofimprovingthemodelfitbyremovingstructurefromtheresidualsmustbestressedindiscussionsofstatisticalmodeling.

Modeling.Themodelsusedbystatisticiansdifferdramaticallyfromthoseusedbynonstatisticians.Thedifferencesstemfromadvancesinthestatisticalcommunityinthepastdecadethateffectivelyrelaxassumptionsoflinearityfornearlyallclassicaltechniques.Thisrelaxationisobtainedbyassumingonlylocallinearityandusingsmoothingtechniques(e.g.,splines)toregularizethesolutions(HastieandTibshirani,1990).Theresultisquiteflexiblebutinterpretablemodelsthatarerelativelyunknownoutsidethestatisticscommunity.Arguablythesemorerecentmethodslackthewell-studiedinferentialpropertiesofclassicaltechniques,butthatdrawbackisexpectedtoberemediedincomingyears.Educationalinformationexchangesshouldbeconductedtostimulatemorefrequentandwideruseofsuchcomparativelyrecenttechniques.

Riskanalysis.Softwaresystemsareoftenusedinconjunctionwithothersoftwareandhardwaresystems.Forexample,intelecommunications,anoriginatingcallisconnectedbyswitchingsoftware;however,theactualconnectionismadebyphysicalcables,transmissioncells,andothercomponents.Themegasystemsthuscreatedrunournation'stelephonesystems,stockmarkets,andnuclearpowerplants.Failurescanbeveryexpensive,ifnotcatastrophic.Thus,itisessentialtohavesoftwareandhardwaresystemsbuiltinsuchawaythattheycantoleratefaults

andprovideminimalfunctionality,whileprecludingacatastrophicfailure.Thistypeofsystemrobustnessisrelatedtoso-calledfault-tolerantdesignofsoftware(Leveson,1986).

Riskanalysishasplayedakeyroleinidentifyingfault-pronecomponentsofhardwaresystemsandhashelpedinmanagingtherisksassociatedwithverycomplexhardware-softwaresystems.AparadigmsuggestedbyDalaletal.(1989)forriskmanagementforthespaceshuttleprogramandcorrespondingstatisticalmethodsareimportantinthiscontext.Forsoftwaresystems,riskanalysistypicallybeginswithidentifyingprogrammingstyles,characteristicsofthemodulesresponsibleformostsoftwarefaults,andsoon.Statisticalanalysisofroot-causedataleadstoariskprofileforasystemandcanbeusefulinriskreduction.Riskmanagementalsoinvolvesconsiderationoftheprobabilityofoccurrenceofvariousfailurescenarios.SuchprobabilitiesareobtainedeitherbyusingtheDelphimethod(e.g.,Dalkey,1972;Pill,1971)orbyanalyzinghistoricaldata.Oneofthekeyrequirementsinfailure-scenarioanalysisistodynamicallyupdateinformationaboutthescenariosasnewdataonsystembehaviorbecomeavailable,suchasachanginguserprofile.

Page66

Attitudetowardassumptions.Assoftwareengineersareaware,amajordifferencebetweenstatisticsandmathematicsisthatforthelatter,itmattersonlythatassumptionsbecorrectlystated,whereasfortheformer,itisessentialthattheprevailingassumptionsbesupportedbythedata.Thisdistinctionisimportant,butunfortunatelyitisoftentakentooliterallybymanywhousestatisticaltechniques.Tukeyhaslongarguedthatwhatisimportantisnotsomuchthatassumptionsareviolatedbutratherthattheireffectonconclusionsiswellunderstood.Thusforalinearmodel,wherethestandardassumptionsincludenormality,homoscedasticity,andindependence,theirimportancetostatementsofinferenceisexactlyintheoppositeorder.Statisticstextbooks,courses,andconsultingactivitiesshouldconveythestatistician'slevelofunderstandingofandperspectiveontheimportanceofassumptionsforstatisticalinferencemethods.

Visualization.Theimportanceofplottingdatainallaspectsofstatisticalworkcannotbeoveremphasized.Graphicsisimportantinexploratorystagestoascertainhowcomplexamodelthedatacansupport;intheanalysisstagefordisplayofresidualstoexaminewhatacurrentlyentertainedmodelhasfailedtoaccountfor;andinthepresentationstagewheregraphicscanprovidesuccinctandconvincingsummariesofthestatisticalanalysisandassociateduncertainty.Visualizationcanalsohelpsoftwareengineerscopewith,andunderstand,thehugequantitiesofdatacollectedinthesoftwaredevelopmentprocess.

Tools.Softwareengineerstendtothinkofstatisticiansaspeoplewhoknowhowtorunaregressionsoftwarepackage.Althoughstatisticiansprefertothinkofthemselvesmoreasproblemsolvers,itisstillimportantthattheypointoutgoodstatisticalcomputingtools-forinstance,S,SAS,GLIM,RS1,andsoon-tosoftware

engineers.ACATSreport(NRC,1991)attemptstoprovideanoverviewofstatisticalcomputinglanguages,systems,andpackages,butforsuchmaterialtobeusefultosoftwareengineers,amorefocusedoverviewwillberequired.

Page67

ReferencesAbdel-Ghaly,A.A.,P.Y.Chan,andB.Littlewood.1986.Evaluationofcompetingsoftwarereliabilitypredictions.IEEETrans.SoftwareEng.SE-12(9):950-967.

Abdel-Hamid,T.1991.SoftwareProjectDynamics:AnIntegratedApproach.EnglewoodCliffs,N.J.:Prentice-Hall.

AmericanHeritageDictionaryoftheEnglishLanguage,The.1981.Boston:HoughtonMifflin.

AmericanStatisticalAssociation(ASA).1993.CombiningInformation:StatisticalIssuesandOpportunitiesforResearch,ContemporaryStatisticsSeries,No.1.Alexandria,Va.:AmericanStatisticalAssociation.

Baecker,R.M.andA.Marcus.1988.HumanFactorsandTypographyforMoreReadablePrograms.Reading,Mass.:AddisonWesley.

Baker,M.J.andS.G.Eick.1995.Space-fillingsoftwaredisplays.J.VisualLanguagesComput.6(2).Inpress.

Basili,V.1993.Measurement,analysisandmodeling,andexperimentationinsoftwareengineering.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

Basili,V.andD.Weiss.1984.Amethodologyforcollectingvalidsoftwareengineeringdata.IEEETrans.SoftwareEng.SE-10:6.

Becker,R.A.andW.S.Cleveland.1987.Brushingscatterplots.Technometrics29:127-142.

Becker,R.A.,W.S.Cleveland,andA.R.Wilks.1987.Dynamicgraphicsfordataanalysis.StatisticalScience2:355-383.

Beckman,R.J.andM.D.McKay.1987.MonteCarloestimationunderdifferentdistributionsusingthesamesimulation.Technometrics29:153-160.

Blum,M.,M.Luby,andR.Rubinfeld.1989.Programresultcheckingagainstadaptiveprogramsandincryptographicsettings.Pp.107-118inDistributedComputingandCryptography,J.FeigenbaumandM.Merritt,eds.DIMACS:SeriesinDiscreteMathematicsandTheoreticalComputerScience,Vol.2.Providence,R.I.:AmericanMathematicalSociety.

Blum,M.,M.Luby,andR.Rubinfeld.1990.Self-testing/correctingwithapplicationstonumericalproblems.STOC22:73-83.

Boehm,B.W.1981.SoftwareEngineeringEconomics.EngelwoodCliffs,N.J.:PrenticeHall.

Brocklehurst,S.andB.Littlewood.1992.Newwaystogetaccuratereliabilitymeasures.IEEESoftware9(4):34-42.

Brown,M.H.andJ.Hershberger.1992.Colorandsoundinalgorithmanimation.IEEEComputer25(12):52-63.

Burnham,K.P.andW.S.Overton.1978.Estimationofthesizeofaclosedpopulationwhencaptureprobabilitiesvaryamonganimals.Biometrika45:343-359.

Chillarege,R.,I.Bhandari,J.Chaar,M.Halliday,D.Moebus,B.Ray,andM.Wong.1992.Orthogonaldefectclassification-Aconceptforin-processmeasurements.IEEETrans.Software.Eng.SE-18:943-955.

Cohen,D.M.,S.R.Dalal,A.Kaija,andG.Patton.1994.Theautomaticefficienttestgenerator(AETG)system.Pp.303-309inProceedingsofthe5thInternationalSymposiumonSoftware

ReliabilityEngineering.LosAlamitos,Calif.:IEEEComputerSocietyPress.

Page68

Curtis,W.1988.Theimpactofindividualdifferencesinprogrammers.Pp.279-294inWorkingwithComputers:TheoryversusOutcome,G.C.vanderVeeretal.,eds.SanDiego,Calif.:AcademicPress.

Dalal,S.R.andC.L.Mallows.1988.Whenshouldonestopsoftwaretesting?J.Am.Statist.Assoc.83:872-879.

Dalal,S.R.andC.L.Mallows.1990.Somegraphicalaidsfordecidingwhentostoptestingsoftware.IEEEJ.SelectedAreasinCommunications8:169-175.(Specialissueonsoftwarequalityandproductivity.)

Dalal,S.R.andC.L.Mallows.1992.Buyingwithexactconfidence.Ann.Appl.Probab.2:752-765.

Dalal,S.R.andA.M.McIntosh.1994.Whentostoptestingforlargesoftwaresystemswithchangingcode.IEEETrans.SoftwareEng.SE-20:318-323.

Dalal,S.R.,E.B.Fowlkes,andA.B.Hoadley.1989.Riskanalysisofthespaceshuttle:Pre-Challengerpredictionoffailure.J.Am.Stat.Assoc.84:945-957.

Dalal,S.R.,J.R.Horgan,andJ.R.Kettenring.1994.ReliablesoftwareandcommunicationII:Controllingthesoftwaredevelopmentprocess.IEEEJ.SelectedAreasinCommunications12:33-39.

Dalkey,N.C.1972.StudiesintheQualityofLife-DelphiandDecision-Making.Lexington,Mass.:D.C.Heath&Co.

Dawid,A.P.1984.Statisticaltheory:Theprequentialapproach.J.R.Stat.Soc.LondonA147:278-292.

DeMillo,R.A.,D.S.Guindi,K.S.King,W.M.McCracken,andA.J.

Offutt.1988.AnextendedoverviewoftheMOTHRAmutationsystem.Pp.142-151inProceedingsoftheSecondWorkshoponSoftwareTesting,VerificationandAnalysis.Alberta,Canada:Banff.

Ebert,C.1992.Visualizationtechniquesforanalyzingandevaluatingsoftwaremeasures.IEEETrans.SoftwareEng.11(18):1029-1034.

Eckhardt,D.E.andL.D.Lee.1985.Atheoreticalbasisofmultiversionsoftwaresubjecttocoincidenterrors.IEEETrans.SoftwareEng.SE-11:1511-1517.

Eckhardt,D.E.,A.K.Caglayan,J.C.Knight,L.D.Lee,D.F.McAllister,M.A.Vouk,andJ.P.Kelly.1991.Anexperimentalevaluationofsoftwareredundancyasastrategyforimprovingreliability.IEEETrans.SoftwareEng.SE-17(7):692-702.

Eick,S.G.1994.Graphicallydisplayingtext.J.Comput.GraphicalStat.3(2):127-142.

Eick,S.G.,C.R.Loader,M.D.Long,S.A.VanderWiel,andL.G.Votta.1992a.Estimatingsoftwarefaultcontentbeforecoding.Pp.59-65inProceedingsofthe14thInternationalConferenceonSoftwareEngineering(Melbourne,Australia).LosAlamitos,Calif.:IEEEComputerSocietyPress.

Eick,S.G.,J.L.Steffen,andE.E.Sumner.1992b.(Atoolforvisualizinglineorientedsoftware.IEEETrans.SoftwareEng.11(18):957-968.

Ganser,E.R.,E.E.Koutsofios,S.C.North,andK.-P.Vo.1993.Atechniquefordrawingdirectedgraphs.IEEETrans.SoftwareEng.SE-19(3):214-230.

Halstead,M.H.1977.ElementsofSoftwareScience.NewYork:Elsevier.

Hastie,T.J.andR.J.Tibshirani.1990.GeneralizedAdditiveModels.London:Chapman&Hall.

Page69

Henrion,M.andB.Fischhoff.1986.Assessinguncertaintyinphysicalconstants.Am.J.Phys.54(9):791-798.

Horgan,J.R.andS.London.1992.ATAC:AdataflowtestingtoolforC.Pp.2-10inProceedingsoftheSecondSymposiumonAssessmentofQualitySoftwareDevelopmentTools(May27-29,1992,NewOrleans,La.),E.Nahouraii,ed.LosAlamitos,Calif.:IEEEComputerSocietyPress.

Humphrey,W.S.1988.Characterizingthesoftwareprocess:Amaturityframework.IEEESoftware5:73-79.

Humphrey,W.S.1989.ManagingtheSoftwareProcess.Reading,Mass.:AddisonWesley.

Iman,R.L.andW.J.Conover.1982.Adistributionfreeapproachtoinducingrankcorrelationsamonginputvariables.Commun.Stat.,PartB11:311-334.

InstituteofElectricalandElectronicsEngineers(IEEE).1990.IEEEStandardGlossaryofSoftwareEngineeringTerminology.IEEEStd.610.12-1990.NewYork:IEEE,Inc.

InstituteofElectricalandElectronicsEngineers(IEEE).1993.IEEEStandardforSoftwareProductivityMetrics.IEEEComputerSociety,IEEEStd.1045-1992,January11,1993.NewYork:IEEE,Inc.

Ishikawa,K.1976.GuidetoQualityControl.Tokyo,Japan:AsianProductivityOrganization.

Kahneman,D.,P.Slovic,andA.Tversky,eds.1982.JudgmentUnderUncertainty:HeuristicsandBiases.NewYork:CambridgeUniversityPress.

Keller,T.W.1993.Maintenanceprocessmetricsforspaceshuttle

flightsoftware.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

Kitchenham,B.1991.Nevermindthemetrics;whataboutthenumbers!Pp.28-37inFormalAspectsofMeasurement,T.Denvir,R.Herman,andR.W.Whitty,eds.ProceedingsoftheBCS-FACSWorkshop,May5,1991,SouthBankUniversity,London.NewYork:Springer-Verlag.

Kitchenham,B.1992.AnalyzingSoftwareData.MetricsClubReport.Manchester,England:NationalComputingCentre,Ltd.

Knight,J.C.andN.G.Leveson.1986.Experimentalevaluationoftheassumptionofindependenceinmultiversionsoftware.IEEETrans.SoftwareEng.SE-12(1):96-109.

Lee,D.andM.Yanakakis.1992.On-lineminimizationoftransitionsystems.Pp.264-274inProceedingsofthe24thAnnualACMSymposiumonTheoryofComputing.NewYork:AssociationforComputingMachinery.

Leveson,N.G.1986.Softwaresafety:why,what,andhow.ACMComput.Surveys8:125-163.

Lipton,R.1989.Newdirectionsintesting.Pp.191-202inDistributedComputingandCryptography,J.FeigenbaumandM.Merritt,eds.DIMACS:SeriesinDiscreteMathematicsandTheoreticalComputerScience,Vol.2.Providence,R.I.:AmericanMathematicalSociety.

Littlewood,B.1979.Softwarereliabilitymodelformodularprogramstructure.IEEETrans.ReliabilityR-28(3):241-246.

Littlewood,B.andD.R.Miller.1989.Conceptualmodelingofcoincidentfailuresinmultiversionsoftware.IEEETrans.SoftwareEng.SE-15(12):1596-1614.

Page70

Littlewood,B.andL.Strigini.1993.Validationofultra-highdependabilityforsoftware-basedsystems.CommunicationsoftheAssociationforComputingMachinery36(11):69-80.

Mallows,C.L.1973.SomecommentsonCp.Technometrics15:661-667.

McCabe,T.J.1976.Acomplexitymeasure.IEEETrans.SoftwareEng.SE-1(3):312-327.

McKay,M.D.,W.J.Conover,andR.J.Beckman.1979.Acomparisonofthreemethodsforselectingvaluesofinputvariablesintheanalysisofoutputfromacomputercode.Technometrics21:239-245.

Mosteller,F.andJ.W.Tukey.1977.DataAnalysisandRegression:ASecondCourseinStatistics.Reading,Mass.:AddisonWesley.

Munson,J.C.1993.Therelationshipbetweensoftwaremetricsandqualitymetrics.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

NationalResearchCouncil(NRC).1991.TheFutureofStatisticalSoftware.CommitteeonAppliedandTheoreticalStatistics,BoardonMathematicalSciences.Washington,D.C.:NationalAcademyPress.

NationalResearchCouncil(NRC).1992.CombiningInformation:StatisticalIssuesandOpportunitiesforResearch.CommitteeonAppliedandTheoreticalStatistics,BoardonMathematicalSciences.Washington,D.C.:NationalAcademyPress.(Reprintedin1993bytheAmericanStatisticalAssociationasVolume1intheASAContemporaryStatisticsseries.)

Nayak,T.K.1988.Estimatingpopulationsizebyrecapturesampling.Biometrika75:113-120.

Phadke,M.S.1993.Robustdesignmethodforsoftwareengineering.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

Pill,J.1971.TheDelphimethod:Substance,context,acritiqueandanannotatedbibliography.Socio-EconomicPlanningScience5:57-71.

Randell,B.andP.Naur,eds.1968.SoftwareEngineeringConceptsandTechniques.NATOScienceCommittee,ProceedingsoftheNATOConferences,October7-11,1968,Garmisch,Germany.NewYork:Petrocelli/Charter.

Sackman,H.1970.Man-ComputerProblem-Solving:ExperimentalEvaluationofTime-SharingandBatchProcessing.NewYork:Auerbach.

Siegrist,K.1988a.ReliabilityofsystemswithMarkovtransfersofcontrol.IEEETrans.SoftwareEng.SE-14(7):1049-1053.

Siegrist,K.1988b.ReliabilityofsystemswithMarkovtransfersofcontrol,II.IEEETrans.SoftwareEng.SE-14(10):1478-1480.

Singpurwalla,N.D.1991.Determininganoptimaltimeintervalfortestinganddebuggingsoftware.IEEETrans.SoftwareEng.17(4):313-319.

Smith,A.F.M.andG.O.Roberts.1993.BayesiancomputationviatheGibbssamplerandrelatedMarkovchainMonteCarlomethods.J.R.Stat.Soc.LondonB55(1):3-23.

Stasko,J.1993.Softwarevisualization.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

Page71

Stein,M.1987.LargesamplepropertiesofsimulationsusingLatinhypercubesampling.Technometrics29:143-151.

Tukey,J.W.1977.ExploratoryDataAnalysis.Reading,Mass.:AddisonWesley.

Tukey,J.W.1991.Useofmanycovariatesinclinicaltrials.Int.Stat.Rev.59(2):123-128.

VanderWiel,S.A.andL.G.Votta.1993.Assessingsoftwaredesignsusingcapture-recapturemethods.IEEETrans.SoftwareEng.SE-19(11):1045-1054.

Zuse,H.1991.SoftwareComplexity:MeasuresandMethods.Berlin:deGruyter.

Zweben,S.1993.Statisticalmethodsinastudyofsoftwarere-useprinciples.UnpublishedpaperpresentedatForumonStatisticalMethodsinSoftwareEngineering,October11-12,1993,NationalResearchCouncil,Washington,D.C.

Page72

Appendix:ForumProgramMONDAY,OCTOBER11,1993

8:00AM WelcomeandIntroductions

8:05AM SessiononSoftwareProcess

SessionChair:GloriaJ.Davis(NASA-AmesResearchCenter)

InvitedSpeakers:TedW.Keller(IBMCorporation),DavidCard(ComputerSciencesCorporation)

9:45AM Break

10:15AM SessiononSoftwareMetrics

SessionChair:BillCurtis(CarnegieMellonUniversity)

InvitedSpeakers:VictorR.Basili(UniversityofMaryland),JohnC.Munson(UniversityofFlorida)

NOONBreak

1:00PM SessiononSoftwareDependabilityandTesting

SessionChair:RichardA.DeMillo(PurdueUniversity)

InvitedSpeakers:JohnC.Knight(UniversityofVirginia),RichardLipton(PrincetonUniversity)

2:25PM Break

3:15PM SessiononCaseStudies

SessionChair:DarylPregibon(AT&TBellLaboratories,

MurrayHill)

InvitedSpeakers:TsuneoYamaura(HitachiComputerProducts-America,Inc.),StuartZweben(OhioStateUniversity)

5:00PM Adjourn

Documents

Statistical software engineering