14
A novel multiple linear regression model for forecasting S-curves Karl Blyth White Building Services Ltd, Newton-le-Willows, UK, and Ammar Kaka School of the Built Environment, Heriot-Watt University, Edinburgh, UK Abstract Purpose – Cash flow forecasting is an indispensable tool for construction companies, and is essential for the survival of any contractor at all stages of the work. A simple and fast technique of forecasting cash flow accurately is required, considering the short time available and the associated cost. Seeks to examine this issue. Design/methodology/approach – The paper argues that instead of producing an S-curve that is based on historical projects combined (state-of-the-art is based on classifying projects into groups and producing a standard curve for each group simply by fitting one curve into the historical data), here the attempt is to produce an individual S-curve for an individual project. A sample of data from 50 projects was collected and 20 criteria were identified to classify these projects. Using the most influential criteria, a multiple linear regression model was created to forecast the programme of works and hence the S-curves. A further six projects were used to validate and test the model. Findings – The results of the model developed in this paper were compared with previous models and evaluated. It is concluded that the model produced more accurate results than existing value and cost models. Originality/value – The paper proposes an alternative and novel approach to the development of standard value and cost commitment S-curves. This approach is based on a multiple linear regression model of the programmes of works. Keywords Cash flow, Financial forecasting, Project management, Construction industry Paper type Research paper Introduction Cash is the most important of a construction company’s resources, because more companies become bankrupt due to lack of liquidity for supporting their day-to-day activities, than because of inadequate management of other resources (Singh and Lakanathan, 1992). Insolvency is more likely to occur in this industry than any other (Kaka and Price, 1993). Cash flow forecasting is essential for the survival of any contractor at all stages of the work. Ideally, cash flow forecasts should be based on the construction programme and a bill of quantities (Allsop, 1980). Cash flow forecasting at the tendering stage needs to be simple and fast however, considering the short time available and the associated cost. Contractors rarely prepare a detailed construction plan at this stage, and usually wait until winning the contract. Therefore a simple and fast technique for forecasting cash flow accurately is required. The majority of cash flow forecasting models developed have been based on standard value S-curves, representing the running cumulative value of work, developed using data from completed construction projects. S-curves are widely used in the industry for controlling projects throughout their execution phases. They are The current issue and full text archive of this journal is available at www.emeraldinsight.com/0969-9988.htm ECAM 13,1 82 Engineering, Construction and Architectural Management Vol. 13 No. 1, 2006 pp. 82-95 q Emerald Group Publishing Limited 0969-9988 DOI 10.1108/09699980610646511

A Novel Multiple Linear Regression Model for Forecasting S-Curves

Embed Size (px)

Citation preview

Page 1: A Novel Multiple Linear Regression Model for Forecasting S-Curves

A novel multiple linear regressionmodel for forecasting S-curves

Karl BlythWhite Building Services Ltd, Newton-le-Willows, UK, and

Ammar KakaSchool of the Built Environment, Heriot-Watt University, Edinburgh, UK

Abstract

Purpose – Cash flow forecasting is an indispensable tool for construction companies, and is essentialfor the survival of any contractor at all stages of the work. A simple and fast technique of forecastingcash flow accurately is required, considering the short time available and the associated cost. Seeks toexamine this issue.

Design/methodology/approach – The paper argues that instead of producing an S-curve that isbased on historical projects combined (state-of-the-art is based on classifying projects into groups andproducing a standard curve for each group simply by fitting one curve into the historical data), herethe attempt is to produce an individual S-curve for an individual project. A sample of data from 50projects was collected and 20 criteria were identified to classify these projects. Using the mostinfluential criteria, a multiple linear regression model was created to forecast the programme of worksand hence the S-curves. A further six projects were used to validate and test the model.

Findings – The results of the model developed in this paper were compared with previous modelsand evaluated. It is concluded that the model produced more accurate results than existing value andcost models.

Originality/value – The paper proposes an alternative and novel approach to the development ofstandard value and cost commitment S-curves. This approach is based on a multiple linear regressionmodel of the programmes of works.

Keywords Cash flow, Financial forecasting, Project management, Construction industry

Paper type Research paper

IntroductionCash is the most important of a construction company’s resources, because morecompanies become bankrupt due to lack of liquidity for supporting their day-to-dayactivities, than because of inadequate management of other resources (Singh andLakanathan, 1992). Insolvency is more likely to occur in this industry than any other(Kaka and Price, 1993).

Cash flow forecasting is essential for the survival of any contractor at all stages ofthe work. Ideally, cash flow forecasts should be based on the construction programmeand a bill of quantities (Allsop, 1980). Cash flow forecasting at the tendering stageneeds to be simple and fast however, considering the short time available and theassociated cost. Contractors rarely prepare a detailed construction plan at this stage,and usually wait until winning the contract. Therefore a simple and fast technique forforecasting cash flow accurately is required.

The majority of cash flow forecasting models developed have been based onstandard value S-curves, representing the running cumulative value of work,developed using data from completed construction projects. S-curves are widely usedin the industry for controlling projects throughout their execution phases. They are

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0969-9988.htm

ECAM13,1

82

Engineering, Construction andArchitectural ManagementVol. 13 No. 1, 2006pp. 82-95q Emerald Group Publishing Limited0969-9988DOI 10.1108/09699980610646511

Page 2: A Novel Multiple Linear Regression Model for Forecasting S-Curves

valuable to project management in stating current status and predicting the future ofprojects. Although they are used in scheduling and planning, for reporting actual,earned and planned values and for resource loading various activities of a project(Miskawi, 1989), their reliability and accuracy is still in question.

The paper begins with a clear statement of the main objectives of the researchfollowed by a detailed literature review based on previous attempts at developingmodels to forecast S-curves. A section on how and where the data was collected andused in the development of the model is then discussed. The model is developed andthen tested with a set of new projects from the sample that were not used in creation ofthe initial model. The accuracy of the present model is again tested against relevantprevious research models and conclusions are made to determine whether or not thecurrent research has improved on existing research.

Objectives of the researchThe research objective was to produce a model that would standardise activities, andforecast the duration, cost and end dates of these activities based upon determinedproject characteristics. Using this output, the model will then rapidly produce a forecastof cost flow and cash flow for a project at the pre-tender stage. Cost flow and cash flow, ifaccurately forecast, can be used to plan and control the financial recourses of the project

This paper attempts to automatically produce a programme of work based oncharacteristics of the individual projects (so it produces a project schedule and cost of eachactivity using regression models). This is then converted into an S-curve. The authors’argument is that instead of producing an S-curve that is based on historical projectscombined (state of the art is based on classifying projects into groups and producing astandard curve for each group simply by fitting one curve into the historical data), herethe attempt is to produce an individual S-curve for an individual project. It is the belief ofthe authors that each project is unique and has a unique S-curve. In order to generate this,regression models are developed to predict what the programme of work should look like.

The methodology of the work described in this paper therefore, was to produce amultiple linear regression model that predicts S-curves for individual projects, basedon project characteristics (function of building, size, location, construction form etc.). Inorder to do this the model would be required to both forecast the overall cost andduration of a project, and moreover to establish the costs and timings of individualactivities within the construction programme. In order to do the latter, it was necessaryto map the programme of works for each building onto a generic format and then toconsider the time relationships (or lags), between individual activities. This modelwould then be tested in order to confirm the hypothesis that the accuracy of suchpredictions would be greater than for previous models.

To the authors’ knowledge there has not been an attempt at automatically forecastingan S-curve that is unique to project characteristics (in the level of details reported in thispaper at least). The other question the paper addresses is whether more accuracy can beachieved by calculating the S-curve from automatically generated work schedules orrather by directly forecasting the S-curve without reference to the programme of works.

Previous attempts at developing models to forecast S-curvesThere have been many attempts in the past to develop cash flow forecasting models.They were mainly part of more comprehensive models aimed at assisting contractors

Model forforecasting

S-curves

83

Page 3: A Novel Multiple Linear Regression Model for Forecasting S-Curves

or clients forecast their cash flow on an individual project level (Kaka and Price, 1996),or on a company level (Kaka, 1994). The majority of these models were based on theidea of developing standard S-curves to represent the running value or cost of differenttypes of construction projects. Typically this was achieved by collecting data relatingto the monthly valuations and the projects’ general characteristics. These projectswould then be classified and distributed into groups and average S-curves would thenbe fitted on the individual groups (Balkau, 1975; Bromilow and Henderson, 1977;Hudson, 1978; Oliver, 1984; Miskawi, 1989; Khosrowshahi, 1991; Evans and Kaka,1998). Several mathematical models were used to fit the S-curves (e.g. alpha-beta cubicequation, Weibull function, DHSS model etc.). These models could be used, given thatthe total value and duration of the projects to be constructed are known, to forecast thecumulative monthly (or at any other time interval), value/cost of that project. Theaccuracy of these previous models is in question (see Kaka and Price (1993), for detailedanalysis of previous models).

Kenley and Wilson (1986), argued that the underlying principle of the idiographicapproach is that the value curves are generally unique and that they should bemodelled separately, hence a curve should be fitted for each project. This concept actsas a basis for the work involved in this paper.

Petros (1996), investigated the effect of having different works plans on the cost flowcurve of one project. Four different planners attempted to schedule the constructionactivities of the same industrial project. Each of the four plans was analysed and used toestimate a cost flow curve. Results showed the significant variability of the possibleS-curves for the same project arising from planning differences.

Kaka (1999), suggested that unless more accurate standard S-curves are produced(perhaps by the use of a more detailed classification criteria), contractors should resortto detailed calculations using the works plan and cost estimates. He also indicated thatas construction projects are unique, future attempts to standardise the cost/valuerelationship were likely to fail. He instead used a stochastic model based on historicaldata, as it allowed users to incorporate variability and inaccuracy in their forecasts anddecision making.

The two main findings of the literature search applicable to this research revealedthat previous attempts to forecast S-curves have not been accurate. First, that cash flowforecasts are likely to be inaccurate due to the fact that construction projects are uniqueand the progress of work varies greatly from one project to another, and second, that thechoice of project groupings in previous work has been poor. These problems helpedshape the nature of this research to test the first hypothesis that more accurate S-curvescould be modelled if projects had the same base activities in the works programme, andhence could be more easily related, thus improving the accuracy of the S-curvesproduced by the model. The second hypothesis responds from the first in that, groups ofthe same type of project can be more easily modelled as a result.

Data collectionIt was necessary to include a wide range of firms in the construction industry toprovide a variety of construction projects. The information was sought from clients,contractors, and quantity surveyors as it was believed that these would be the mostlikely to yield the detailed cost and programme information required. The contractorswere identified from the New Civil Engineer Contractors File, 1999 and client and

ECAM13,1

84

Page 4: A Novel Multiple Linear Regression Model for Forecasting S-Curves

quantity surveying organisations were identified via general and employmentpublicity material. Information was not sought, for example, from architects orengineers, as it is believed that they will not have the access to the detailed informationon the costs and programmes that were needed.

The sample consisted of over 200 firms. Each firm was sent a structured,multiple-choice questionnaire, consisting of 20 questions based on the characteristicsof the project, along with an accompanying letter explaining the purposes of the datacollection and why they had been asked to participate. If it was easier for a company todiscuss the data collection, interviews were arranged to collect the data in person, andto discuss any queries that may have been apparent. The structure, content and formatof the questionnaire was created from a sample in line with the Nkado (1992), studyplus additional questions and also with the principle that characteristics based aroundthe project size are perceived as being effective in the current research in that thedetermination of such information could be obtained quickly and easily by therespondents in the sample. The questionnaire was split into two sections. Questions 1-8sought non-numerical (non-interval level), data (e.g. project type, location etc.), whereasquestions 9-20 sought numerical (interval level), data (e.g. ground floor area, number offloors etc.). The data required for each project for use in the model were as follows:

. the project’s characteristics (based on the multiple-choice questionnaire);

. the project programme (i.e. the bar chart for the construction plan); and

. the individual activity/element and overall costs (or approximate estimates).

A total of 47 projects meeting the above specifications were received. The data on afurther 14 projects were obtained from a series of cost models published in Buildingmagazine using data provided by the QS practice, Davis, Langdon & Everest(1988-1994). In this, a building project was published each week and a breakdown ofthe programme of works and the costs of each activity were provided. Five projectswere unfinished and therefore rejected. This made a total of 56 building projects for usein the analysis. The sample consisted of building projects only. The projects werefinished in the period 1993-1999. The costs were not adjusted using relevant costindices as it is only the proportions of total cost that are important when used in theS-curve analysis. The cost breakdown of the data can be seen in Table I. It was decidedto use 50 random projects in the development of the model and the remaining six wereput aside to be used for testing. The sample sizes of both the development data and testdata were considered sufficient for this research.

Company typeProjectsreceived

Projectsused

Minimumvalue ofproject(£m)

Maximumvalue ofproject(£m)

Total value ofprojects

(£m)

Contractor 25 24 3.47 29.76 200.68Q.S. 14 12 5.79 15.32 102.89Client 8 6 3.31 23.44 51.94Cost models 14 14 2.20 10.01 119.97Total 61 56 2 2 475.49

Table I.Cost breakdown of the

data received

Model forforecasting

S-curves

85

Page 5: A Novel Multiple Linear Regression Model for Forecasting S-Curves

When analysing the S-curves, the point of 100 per cent cost commitment for thecontractor was taken to be the time of practical completion on site. The origin wastaken to be the commencement of work on site, a convenient and easily identified point.

Developing the modelStandardising activities and amalgamating durations and costsAs the projects in the data set differed greatly, especially in the number of activities, itwas necessary to standardise them in order to be able to compare and combine them. Allof the Gantt charts for 50 projects received were analysed to see if any similarities couldbe found in their activities. The frequency with which specific design options occurredsuggested that many could be regarded as generic. A range of between 20 (minimum,one storey), and 39 (maximum, seven storeys), activities were identified and used as atemplate for the projects in the sample. This was achieved by combining sub-activitiesinto logical groups determined by analysis of all the programmes in conjunction withadvice from professional practitioners. The overall duration of a standardised activitywas calculated from the start date of the first sub-activity to the end date of the lastsub-activity. Costs for each standardised activity were similarly aggregated.

Determination of time lagsThe next step in the analysis was to calculate the time lags relating activities to thosedependent upon them. This would determine when a specific activity would begin.

Initially, based on the practitioners’ advice referred to above and the specificprogrammes collected, a “normal” sequence of activities and activity relationships wasdetermined and the associated lag times calculated. If any lag time determined in thismanner was negative, then the previous activity in the programme was chosen aspredecessor and the lag recalculated until there was a zero or positive lag for allactivities in all of the projects. Table II shows the resulting set of relationships.

Using the above relationships, each activity in each programme was then analysedthus:

Time lag ¼ Activity start 2 Start of predecessor activity ð1Þ

Duration of activity ¼ End date of activity 2 Start date of activity ð2Þ

The time lag expressed as a percentage of the predecessor activity was then calculated:

Percentage Lag ¼ 100 £ Time lag=Duration of predecessor ð3Þ

Regression modellingRegression analysis is a technique that will fit an expression to a set of collected data.In multiple linear regression analysis, as used in this model, several independentvariables are used to model a single response variable (Weisberg, 1980). In this case,since the aims of the project include prediction of overall duration and cost and of costflow, each activity in the programme may require three models; one whose dependentvariable is activity duration, one whose dependent variable is lag time and one whosedependent variable is cost. That is, it was anticipated that, for time prediction alone, upto 78 models would be needed. In fact, as outlined below, in order to deal with

ECAM13,1

86

Page 6: A Novel Multiple Linear Regression Model for Forecasting S-Curves

variability in projects and their characteristics, significantly more models wereproduced. In all types of model the independent variables are project characteristicscollected in the survey.

The regression analysis was performed using a combination of largely automatedanalysis using SPSS and user-directed analysis using Microsoft Excel. Severalhundred models were generated, tested and modified or rejected, using a total of 16different modelling strategies. Three methodological issues are of interest in thisprocess: dealing with variability in the input data; dealing with non-numericalvariables and determining the optimum structure for each model.

Activity no. Activity name Activity dependency

1 Site set-up2 Foundations Site set-up3 Drainage Foundations4 Ground floor Foundations5 Frame Foundations6 External walls ground Frame7 External walls 1st External walls ground8 External walls 2nd External walls ground9 External walls 3rd External walls ground

10 External walls 4th External walls ground11 External walls 5th External walls ground12 External walls 6th External walls ground13 Internal walls ground External walls ground14 Internal walls 1st Internal walls ground15 Internal walls 2nd Internal walls ground16 Internal walls 3rd Internal walls ground17 Internal walls 4th Internal walls ground18 Internal walls 5th Internal walls ground19 Internal walls 6th Internal walls ground20 Internal doors Internal walls ground21 Lift/stairs 1st floor22 1st floor Frame23 2nd floor 1st floor24 3rd floor 1st floor25 4th floor 1st floor26 5th floor 1st floor27 6th floor 1st floor28 Roof Frame29 Watertight (milestone) Internal doors30 Windows and ext. doors Internal walls31 Plumbing and sanitary-ware Internal walls32 Mechanical services Internal walls33 Electrical services Internal walls34 Floor finishes Roof35 Ceiling finishes Internal walls36 Wall finishes Mechanical services37 Fixtures and fittings Mechanical services38 External works Foundations39 Hand over and clean Wall finishes

Table II.The standardised

building activities inorder with their activity

dependencies

Model forforecasting

S-curves

87

Page 7: A Novel Multiple Linear Regression Model for Forecasting S-Curves

As was discussed above, the projects in the data set varied significantly in theirnature, not least in the functions of the buildings and their sizes. It was expectedtherefore, that “outliers”, i.e. data-points distant from the general trend lines, mightprejudice the goodness-of-fit of the regression models. However, decisions on the removalof outliers always represent a compromise between improving accuracy and reducingthe applicability of the data set. Two strategies were used to attack this problem.

First, previous studies have generally indicated that building function is a stronginfluence on times and costs, therefore tests were performed to examine therelationship between other variables (e.g. size in terms of gross floor area), and buildingtype. On the basis of these tests, it was decided that a number of building types shouldbe modelled separately. This action, though increasing the number of models and thecomplexity in their application, was considered worthwhile since it greatly reduced thenumber of outliers across the range of variables.

Second, the remaining apparent outliers were examined in SPSS by assessing theiruniqueness. Projects that were apparent outliers, when considering a given variable,and whose characteristics were commonly found amongst the data set were deemed“true” outliers (i.e. truly unrepresentative of the group of projects that shared thecharacteristics). These data points were removed from the consideration of the specificvariable. Projects that appeared to be outliers, but whose characteristics were notrepresented elsewhere in the data set, were considered to form a logically differentgroup of projects and were retained in the analysis. Any change was not finalised untilits effects on both goodness-of-fit (R squared), and accuracy of prediction for the testprojects were assessed. It was found that the former and latter tests were sometimescontradictory, in which case the latter test took precedence.

Not all of the input variables can be expressed directly in numerical form (e.g.location), and cannot therefore be used in a regression analysis without furtherprocessing. Several strategies were tested to deal with the problem. SPSS provides forthe problem by allowing the use of “boolean” variables. This method was initially usedon the integers designated to the relevant non-interval variable. For example, in dealingwith building function each possible value (e.g. “office”, “school” etc.), is assigned avariable that can be either “on” (value ¼ 1), or “off” (value ¼ 0). Obviously, for anysingle project only one variable for building type can be “on”. As an alternative, thisapproach was compared to a number of patterns of manual allocation of integer values torepresent the different values of a variable (e.g. values in the range 1 to 11 in the case ofbuilding type). This approach was also found to be effective by Ameen et al. (2003). Ahuge number of patterns of allocation were possible, but several random patterns andpatterns based on the frequency of occurrence of values were tried. In each case the bestmethod was chosen by goodness-of-fit to the sample data and the accuracy achieved bythe regression models produced in predicting values for the test projects.

Not all input variables have the same influence on the predictions produced by themodels. Furthermore, the importance of variables may vary between models (i.e. thedifferent models for each activity and each dependent variable). It is therefore usuallypossible to simplify models by removing the least influential variables. As with theelimination of outliers described above, this process represents a compromise, in thiscase between simplicity (and thus speed and ease of use), and accuracy. Again, SPSSoffers automated methods for this process, identifying the relative influence of eachindependent variable and adding or removing them in a stepwise fashion (e.g. working

ECAM13,1

88

Page 8: A Novel Multiple Linear Regression Model for Forecasting S-Curves

from most to least influential), and testing the effects at each change. This method wasalso used by Draper and Smith (1998). As above, this process was compared to numberof manual manipulations of the model, including various random selections ofvariables as starting points for stepwise refinements. In each case the best result,judged by goodness-of-fit to sample data and accuracy of prediction for the testprojects, was selected.

Utilizing the modelling frameworkThe final predictive models were constructed as an Excel template, using theregression equations generated. The template performed any manipulation necessaryfor preparing input data for use in the equations and did the necessary repetition ofcalculation needed to deal with multi-storey buildings of varying sizes. When inputtingthe maximum number of 21 (coded where necessary in the methods discussedpreviously), interval and non-interval level data variables into the template, itimmediately produces predictions for individual activity duration and cost, activitystart and end date, as well as the associated time lag from its chosen predecessor andconsolidates these into a single programme prediction. A maximum of 16 variablescould be chosen and used (out of the 21 input), in the modelling of each dependentvariable in the Excel model. This limit was sufficient for most of the models, sinceSPSS indicated that greatest accuracy could be achieved with fewer than 16 variables.In the few cases where more than 16 variables were identified by SPSS, the final modelincluded only the most influential variables. Tests on the effects on the accuracy ofprediction of this simplification showed its influence to be negligible.

The predictive data produced were then automatically linked from Microsoft Exceland transferred into Microsoft Project where it could then be developed further in linewith other topics of research that are discussed in other papers.

Calculation of S-curvesThe programme and cost data predicted by the regression model was transferred intoMicrosoft Project and this software calculated the actual monthly costs and hence thecumulative running and total costs for each standardised project. An S-curve was thenfitted for each of the fifty projects using the logit transformation and standardised overa set duration of ten time periods, in order to be represented graphically. See Figure 1for an example of the S-curve generated for a typical project. The logit transformationhas been confirmed to be an appropriate and accurate method for fitting S-curves. Formore details on the logit transformation see Kenley and Wilson (1986).

Measuring the accuracy of fitIt is necessary to measure the accuracy of fit for the S-curves of any given project forthree reasons. The first reason is to compare the actual and predicted curves with eachother for any given project. Second, to compare a project’s S-curves with other curvesfor that given project type. And third, to draw comparisons with this and othercalculated models.

The measure chosen, the “standard deviation about the estimate of Y”, or “SDY”,was first put forward as a risk index by Jepson (1969), and was later adopted by Bernyand Howes (1982). It was used by Kenley and Wilson (1986), in their idiographic valuemodel. Kaka and Price (1993), Evans et al. (1996), Petros (1996), and Al-Jifri (1998), also

Model forforecasting

S-curves

89

Page 9: A Novel Multiple Linear Regression Model for Forecasting S-Curves

used this method. “SDY” adopts the common measure of dispersion and is calculatedas follows:

SDY ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�XY 2 YE 2

s=N

Where: Y ¼ actual value at any accounting period; YE ¼ estimated (or fitted), value;N ¼ number of observations/intervals (accounting periods).

This allows models to be compared with one another. The model with the lowestvalue of SDY has the best fit, and hence is the most accurate.

Kenley and Wilson (1986), devised a systematic process of trial and error to locatean optimum exclusion range in their value curves. It was concluded that anapproximate exclusion range of 10 percent at both extremes would produce the lowestaverage SDY value.

The extreme data points within a cash flow analysis are likely to be the leastsignificant, as they involve small amounts of monies, but can be the most dominant inthe regression analysis. It therefore makes sense to leave out the data points outside the10 per cent range at the extremes.

The exclusion range of 10 per cent was also used by Kaka and Price (1993), butEvans et al. (1996), used an exclusion range of between 10 per cent and 16 per cent inorder to achieve better results with their data.

Analysis and testingThe first stage of analysis and testing was to apply the modelling to the 50 projectsused to create it. The regression model results for the 50 projects were placedindividually into MS Project and their cash flow profiles generated. S-curves werefitted for each of the projects and presented graphically with the actual curves, in orderto make a quick visual comparison. The calculation of SDY values, between modelledand actual curves, was used as a final, objective measure of the accuracy of theindividual fits.

Figure 1.A typical example of thestandardised S-curve overten time periods

ECAM13,1

90

Page 10: A Novel Multiple Linear Regression Model for Forecasting S-Curves

A total of 11 project groupings were identified based upon project function (e.g.offices, industrial etc.). The SDY results for the 50 sample projects, distributedaccording to the 11 project types, can be seen in Table III. The most accuratepredictions (ignoring “airports” which had a sample size of one), were for shoppingcentres (PF9), and produced an average SDY of 1.088 per cent. Furthermore, thestandard deviation of 0.357 per cent suggests that the variation around this mean issmall. The highest mean SDY of 4.735 per cent for any group came from the “hospitals”(PF6). Conversely, this group also produced the lowest standard deviation and whichmight suggest strong similarities between the buildings. “General retail projects”(PF2), contained the best and least accurate fits for any individual project (0.533 percent and 8.852 per cent), and thus, had the highest standard deviation of SDY values.This was possibly indicative of a large variation between buildings in this group.

Although the projects produced good SDY values overall, with a mean error for allprojects of under 2.5 per cent, it is clear that some group results were of lowsignificance due to their small sample sizes. Hence, increasing the sample size in eachcase would give a more realistic measure of accuracy.

Initial results of applying the model were very encouraging, but a more rigorous testwould be to apply it to a new set of projects not used in the initial analysis nor used inthe development of the model. As mentioned previously, six projects were randomlyselected from the initial data set for this purpose. As before, the programme and costdata were generated by entering the projects’ characteristics into the regressiontemplate and Microsoft Project was used to process these results into a cash flowprediction. The goodness-of-fit of the predicted curves to the actual curves wasassessed and Table IV gives the individual SDY results.

As expected, the test results as a whole were less accurate than for the originalprojects, but still deemed reliable. The overall mean SDY was a little under 4 per centand individual projects varied between about 1 per cent and 8 per cent.

Comparisons with previous modelsIn order to evaluate how accurate the present study is, it was necessary to compare theresults with previous value and cost models. Table V illustrates this comparisonbetween the SDY results for individual S-curves.

Project function Mean Maximum Minimum Std dev. Sample size

PF1 (Offices) 2.775 5.707 0.547 1.571 11PF2 (Retail) 3.054 8.852 0.533 2.722 7PF3 (Housing) 2.431 3.512 1.359 1.006 5PF4 (Education) 3.042 3.661 1.619 0.830 5PF5 (Leisure) 1.922 3.516 0.575 1.332 5PF6 (Hospital) 4.735 4.805 4.664 0.100 2PF7 (Commercial) 1.552 3.970 0.581 1.619 4PF8 (Industrial) 2.455 5.000 1.006 2.211 4PF9 (Shopping centre) 1.088 1.430 0.627 0.357 4PF10 (Airport) 0.709 0.709 0.709 2 1PF11 (Museum/Gallery) 1.559 2.099 1.019 0.764 2Overall 2.493 8.852 0.533 1.636 50

Table III.The SDY results for the

11 project types for the 50projects

Model forforecasting

S-curves

91

Page 11: A Novel Multiple Linear Regression Model for Forecasting S-Curves

A small difference in the accuracy when compared to the other models mightdemonstrate that construction projects are unique, and thus contractors must conductdetailed estimates of the cost and schedule of work in order to forecast cash flow (Kakaand Price, 1993). While an improved accuracy would reveal that the causes ofvariability of value and cost commitment curves, mentioned earlier, have significantinfluence.

Despite producing the lowest individual SDY value of 0.250, the Al-Jifri (1998), studywas the least accurate, and produced a high maximum of 17.660 and also produced thehighest mean average of 7.580. This was mainly due to the fact that the SDY wascalculated for cost elements, as well as a limited number of projects, and the absence offeatures that connected the projects together, e.g. building type. Kenley and Wilson(1986), produced the most accurate results overall, with regards to the lowest meanaverage of 1.750, but this was put down to the fact that despite the larger sample sizethan the present study, there were only two building types. This meant that buildingswere more likely to be similar and hence use comparable building methods, techniques,etc. Although slightly less accurate in terms of SDY results, but not sample size, thanthe previously mentioned study, similar reasons can be identified for Kaka and Price(1993), as they only used a total of three building types i.e. commercial, industrial andpublic, but produced specialised categories like size and type of contract etc.

Study

Kenley andWilsona

(1986)

Kaka andPrice(1993)

Petros(1996)

Evans andKaka(1998)

Al-Jifri(1998)

Presentstudy

Projects used 72 150 4 29 39 50Figures based on Value Cost Cost Value Cost CostMinimum 1.030 0.475 4.308 1.413 0.250 0.533Maximum 7.330 8.987 7.618 8.644 17.660 8.852Mean 1.750 2.996 5.581 5.229 7.580 2.493Median 1.652 2.736 5.199 5.699 5.430 2.099Std dev. 0.782 1.572 1.551 1.931 5.312 1.636

Notes: aAs Kenley and Wilson (1986) analysed a total of 72 projects spread into two sample groups,i.e. commercial and industrial, it was decided to combine the findings to produce one set of overallresults. This would be a slightly more accurate indicator in comparison, as the present modelencompasses a variety of building types

Table V.A summary of the SDYresults of previousmodels

Test project number and function Individual

1 (Offices) 2.6232 (Industrial) 7.6743 (Retail) 1.8704 (Commercial) 1.6775 (Commercial) 8.2826 (Offices) 1.181Mean 3.885Std dev. 3.210

Table IV.The SDY results for thesix test projects

ECAM13,1

92

Page 12: A Novel Multiple Linear Regression Model for Forecasting S-Curves

The present cost curve model produced more accurate SDY results overall in four outof the five cases, and queries the conclusion that cost commitment curves, as proposedby Kaka and Price (1993), are the most reliable. The minimum SDY value of 0.533produced was lower than that of 1.030, 1.413, and 4.308, for Kenley and Wilson (1986),Evans and Kaka (1998), and Petros (1996), respectively, and compared very well withthe other two studies. Again, the maximum value of 8.852 compared very well with themore accurate models. Apart from the Kenley and Wilson (1986), study, the presentmodel produced the smallest mean and median, of 2.493 and 2.099 respectively, whencompared to the other models, and confirms the overall accuracy of this study. This isreinforced by the small standard deviation of 1.636 establishing that the SDY all of theprojects do not diverge from the small mean. When analysing the Kenley and Wilson(1986), study, Kaka and Price (1993), concluded that a group of one type of buildingswould have a smaller variability in project S-curves, and hence emphasises theaccuracy of the present study which encompasses 11 building groups.

ConclusionsThe use of cost curves for the production of standard S-curves has been developed andsuccessfully tested. Previous value and cost models have been discussed and theinadequacies of each have been established. A cost commitment curve was producedby Kaka and Price (1993), and was said to eliminate some of these reasons, and henceconcluded that the cost commitment model would be more reliable. Petros (1996),concluded that although slightly more reliable, their accuracy was still in question.With this in mind, it was decided to use cost curves as the basis of the present model, asthey are more accurate in general than value curves, for reasons mentioned earlier.

A total of 50 projects were used in the development of the regression model. Prior tothis, the programmes of work for all of the projects were standardised so they could becompared more accurately and realistically. This involved producing a set of between20 and 39 (depending on number of storeys), standardised activities, to whichsub-activity costs and durations where amalgamated accordingly. These data werethen used in the development of a multiple linear regression model that uses thecharacteristics of each project to predict each actual activity cost, duration, and startand end date. Hence, the whole programme of works could be predicted. From this thecumulative and monthly cash flow can be produced.

The actual and fitted cost curves were then generated for the 50 projects. Theprojects were classified into various criteria and distributed into 11 groups based uponproject function (building type). The cost curves were then subjected to the logittransformation. The alpha and beta values derived were used to regenerate the averagecurves of the eleven groups. To determine the accuracy of the model, the SDY wasmeasured to evaluate the individual fits and against the average within a group. Aminimum and a maximum SDY of 0.533 and 8.852 respectively were produced. Theseresults were complemented with a 2.493 mean average. With regards to the grouping, itwas found that the shopping centres (PF9), were the most accurate buildings as awhole, and produced the lowest mean average of 1.088 compared to the hospitals (PF6),which produced the highest mean SDY average of 4.735.

The model was tested on a further six test projects that were not included in theinitial development of the multiple linear regression model. This would be a moreobjective test of the accuracy of the model. As expected, the results for SDY were

Model forforecasting

S-curves

93

Page 13: A Novel Multiple Linear Regression Model for Forecasting S-Curves

slightly less accurate but very similar, with a 1.181 minimum, an 8.282 maximum, anda mean of 3.885. This reinforced the initial conclusion that the model was reliable.

The results were compared with previous value and cost models to establish theeffectiveness of the present model. A total of five models were identified and used inthe comparison. In general the present model was proven to be more accurate than fourout of the five models. In terms of maximum and minimum values, the most accuratemodel, Kenley and Wilson (1986), only consisted of two building types though, andhence results are expected to be accurate, because of the specific nature of the data.Apart from this study, the present model produced the lowest mean SDY of 2.493, andmedian of 2.099, showing the small spread of results. The results were also moreaccurate than the alternative to value curves, the cost commitment model, as proposedby Kaka and Price (1993). The majority of the SDY values for that model ranged frombetween 3.2 and 8.4, compared to the Kenley and Wilson (1986), study that producedfigures of between 4.8 and 11.6. The bulk of the SDY values for the present modelranged 1.580 to 3.493 with a median of 2.099, and confirms that better overall resultsare achieved with the multiple linear regression cost model. The reasons why theresults are not significantly lower could be the size of the sample data. This limiteddifference in the accuracy when compared to the other models confirm the conclusionthat construction projects in the sample are unique, but can be modelled more easilyvia a standardised works programme that consists of the same base activities for eachproject. The research has therefore proved that, subject to reservations about themoderate sample size used in development, a multiple linear regression model based onthe framework for a standardised works programme can be used to accurately modelS-curves. It provides the first innovative steps as a basis for further research, andwould benefit from an increasing number of projects, that would help determine thetrue accuracy of the proposed model.

References

Al-Jifri, M.A. (1998), “A study of the accuracy of calculating cost flow curves for individual costelement”, unpublished MArch thesis, University of Liverpool, Liverpool.

Allsop, P. (1980), “Cash flow and resource aggregation from estimator’s data (computer programCAFLARR)”, MSc in Construction Management project report, Loughborough Universityof Technology, Loughborough.

Ameen, J.R.M., Neale, R.H. and Abrahamson, M. (2003), “An application of regression analysis toquantify a claim for increased costs”, Construction Management and Economics, Vol. 21,pp. 159-65.

Balkau, B.J. (1975), “A financial model for public works programmes”, paper presented at theNational ASOR Conference, Sydney, 25-27 August.

Berny, J. and Howes, R. (1982), “Project management control using real time budgeting andforecasting models”, Construction Papers, Vol. 2, pp. 19-40.

Bromilow, F.J. and Henderson, J.A. (1977), Procedures for Reckoning the Performance of BuildingContracts, 2nd ed., CSIRO, Division of Building Research, Highett.

Draper, N.R. and Smith, H. (1998), Applied Regression Analysis, 3rd ed., John Wiley & Sons,New York, NY.

Evans, R.C. and Kaka, A.P. (1998), “Analysis of the accuracy of standard/average value curvesusing food retail building projects as case studies”, Engineering, Construction andArchitectural Management, Vol. 5, pp. 42-5.

ECAM13,1

94

Page 14: A Novel Multiple Linear Regression Model for Forecasting S-Curves

Evans, R., Kaka, A.P. and Lewis, J. (1996), “Development of a computer based model to assistcontractors to plan, forecast and control cash flow”, ARCOM: Proceedings of the 12thAnnual Conference and Annual General Meeting, Vol. 2, pp. 556-65.

Hudson, K.W. (1978), “DHSS expenditure forecasting method”, Chartered Surveyor – Buildingand Quantity Surveying Quarterly, Vol. 5, pp. 42-5.

Jepson, W.B. (1969), “Financial control of construction and reducing the element of risk”,Contract Journal, 24 April, pp. 862-4.

Kaka, A.P. (1994), “Contractors’ financial budgeting using computer simulation”, ConstructionManagement and Economics, Vol. 14, pp. 35-44.

Kaka, A.P. (1999), “The development of a benchmark model that uses historical data formonitoring the progress of current construction projects”, Engineering, Construction andArchitectural Management, Vol. 6 No. 3, pp. 256-66.

Kaka, A.P. and Price, A.D.F. (1993), “Modelling standard cost commitment curves forcontractors’ cash flow forecasting”, Construction Management and Economics, Vol. 11,pp. 271-83.

Kaka, A.P. and Price, A.D.F. (1996), “Towards more flexible and accurate cash flow forecasting”,Construction Management and Economics, Vol. 14, pp. 35-44.

Kenley, R. and Wilson, O. (1986), “A construction project cash flow model – an idiographicapproach”, Construction Management and Economics, Vol. 4, pp. 213-32.

Khosrowshahi, F. (1991), “Simulation of expenditure patterns of construction projects”,Construction Management and Economics, Vol. 9, pp. 113-32.

Miskawi, Z. (1989), “An S-curve equation for project control”, Construction Management andEconomics, Vol. 7, pp. 115-25.

Nkado, R.N. (1992), “Construction time information system for the building industry”,Construction Management and Economics, Vol. 10, pp. 489-509.

Oliver, J.C. (1984), “Modelling cash flow projections using a standard micro-computerspreadsheet program”, MSc project in Construction Management, LoughboroughUniversity of Technology, Loughborough.

Petros, H.S. (1996), “An investigation into the effects of construction planning on cost flowcurves: a case study”, PhD thesis, Department of Architecture and Building Engineering,University of Liverpool, Liverpool.

Singh, S. and Lakanathan, G. (1992), “Computer based cash flow model”, Proceedings of the 36thAnnual Trans., American Association of Cost Engineers, Morgantown, VA, pp. R5.1-R5.14.

Weisberg, S. (1980), Applied Linear Regression, John Wiley & Sons, Chichester.

Model forforecasting

S-curves

95

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints