9
90 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. EM-16, NO. 3, AUGUST 1969 An Analytical Approach to Scoring Model Design—Application to Research and Development Project Selection JOUNCE. MOORE, JR., AND NORMAN R. BAKER, ASSOCIATE MEMBER, IEEE Abstract—Multiple criteria scoring models have been suggested for use in evaluating competing research and development project proposals. This model form, more than any other, affords the decision maker the opportunity to combine, in exacting fashion, both qualitative and quantitative factors that affect his decisions. To date, however, the project scoring models that have appeared in the R&D literature contain limitations that have caused research management to turn to economic analysis and mathematical pro- gramming models of project selection. These other models, while valuable, are severely limited in their ability to consider qualitative criteria important to the evaluation of research. This paper identi- fies some of the shortcomings present in standard scoring model formulations of the project selection problem and then defines a new form of scoring model, which not only overcomes existing limitations but also exhibits several qualities desirable in a pre- scriptive model. Literature study suggests that two difficulties are in large part responsible for an observed lack of managerial interest m the development and use of multiple criteria scoring models. First, the scoring model is often thought of as being considerably less accurate in its ability to process data than rate of return analysis or mathematical programming approaches to project selection. Second, because of lack of explicit model structure and due to the rather arbitrary manner in which previous scoring models have been presented, it is nearly impossible to prescribe how to construct an acceptable model for a specific environment. Described herein is a scoring model defined in terms of the statistical regularities of a given research environment, which compares favorably in accuracy and sensitivity to estimation error with other more widely accepted models of project evaluation. An analytical method for model design and verification is defined and discussed in terms of the following steps: 1) selection of evaluation criteria; 2) development of per- formance measures; 3) quantification of the research environment; 4) determination of criteria weights; 5) initial model specification; 6) selection of model objectives; 7) initial model verification; and 8) complete model specification and verification. SCORING MODEL LIMITATIONS AND ADVANTAGES I NHERENT in the multiple criteria scoring ap- proach to project evaluation are two key limitations that go far to explain an observed lack of industrial interest in the development and testing of project scoring techniques. First, due to the absence of a formal and operational structure, the scoring model is often thought of as being considerably less accurate in its ability to process project data than rate of return or mathematical programming approaches to project evaluation. This at- Manuscript received March 1969; revised June 1969. J. R. Moore, Jr., is with the Graduate School of Business, Stan- ford University, Stanford, Calif. N. R. Baker is with the Georgia Institute of Technology, At- lanta, Ga. titude is reinforced when financial or economic criteria are primary factors in the decision process. (The partial model in Fig. 1 is representative of scoring models that have appeared in the literature and will serve to illus- trate model shortcomings.) Scoring model input data are often regarded as inaccurate in contrast to those used in other models. For example, a scoring model uses interval estimates to reflect the uncertainty associated with a given measure of project performance, e.g., income, cost, completion time, etc. Although an interval estimate may appear to be unreliable data, for a given level of project uncertainty it may be more dependable than the point estimate required for economic and programming models. The composite project score resulting from the combina- tion of several criteria performance scores is also re- garded with suspicion. Since the operation of forming the project score is intuitive and not tied to a well-de- fined structure, the resulting number is dimensionless and not endowed with a well-defined meaning such as profit, rate of return, or even persona] utility. 1 The in- dividual criterion performance scores are often assigned arbitrarily without regard for the effect of such assign- ments upon the final composite score. A second limitation of the scoring model is the arbi- trary manner in which such models have been described in the literature. Several important methodological is- sues have gone untreated, thereby making it nearly im- possible to prescribe how a model should be designed and verified for use in a specific environment. Fig. 1 il- lustrates a structure in which the decision criteria have different numbers of scoring intervals. The number of intervals used is an important consideration in model design since this number affects the ultimate usefulness of the model in a given situation. Thus far, no guidelines exist for determining the "proper" number of intervals or for other important decisions relating to the weight- ing of criteria, the formation of the project score, the positioning of scoring intervals on an absolute scale, and the selection of interval width patterns. Further- more, problems even more basic to the decision environ- ment such as selection of criteria, development of per- 1 Fishburn [6] has suggested a comprehensive theory for scoring models that are directly tied to personal utilities. Although the results of this theory are interesting, the prospects of its being made operational in the near future are not good. In addition, aspects of personal utility are deemphasized in the scoring model discussed in this paper.

An analytical approach to scoring model design — Application to research and development project selection

Embed Size (px)

Citation preview

Page 1: An analytical approach to scoring model design — Application to research and development project selection

90 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. EM-16, NO. 3, AUGUST 1969

An Analytical Approach to Scoring Model Design—Application to Research and

Development Project Selection

JOUNCE. MOORE, JR., AND NORMAN R. BAKER, ASSOCIATE MEMBER, IEEE

Abstract—Multiple criteria scoring models have been suggested for use in evaluating competing research and development project proposals. This model form, more than any other, affords the decision maker the opportunity to combine, in exacting fashion, both qualitative and quantitative factors that affect his decisions. To date, however, the project scoring models that have appeared in the R&D literature contain limitations that have caused research management to turn to economic analysis and mathematical pro-gramming models of project selection. These other models, while valuable, are severely limited in their ability to consider qualitative criteria important to the evaluation of research. This paper identi-fies some of the shortcomings present in standard scoring model formulations of the project selection problem and then defines a new form of scoring model, which not only overcomes existing limitations but also exhibits several qualities desirable in a pre-scriptive model.

Literature study suggests that two difficulties are in large part responsible for an observed lack of managerial interest m the development and use of multiple criteria scoring models. First, the scoring model is often thought of as being considerably less accurate in its ability to process data than rate of return analysis or mathematical programming approaches to project selection. Second, because of lack of explicit model structure and due to the rather arbitrary manner in which previous scoring models have been presented, it is nearly impossible to prescribe how to construct an acceptable model for a specific environment. Described herein is a scoring model defined in terms of the statistical regularities of a given research environment, which compares favorably in accuracy and sensitivity to estimation error with other more widely accepted models of project evaluation. An analytical method for model design and verification is defined and discussed in terms of the following steps: 1) selection of evaluation criteria; 2) development of per-formance measures; 3) quantification of the research environment; 4) determination of criteria weights; 5) initial model specification; 6) selection of model objectives; 7) initial model verification; and 8) complete model specification and verification.

SCORING MODEL LIMITATIONS AND ADVANTAGES

I NHERENT in the multiple criteria scoring ap-proach to project evaluation are two key limitations that go far to explain an observed lack of industrial

interest in the development and testing of project scoring techniques. First, due to the absence of a formal and operational structure, the scoring model is often thought of as being considerably less accurate in its ability to process project data than rate of return or mathematical programming approaches to project evaluation. This at-

Manuscript received March 1969; revised June 1969. J. R. Moore, Jr., is with the Graduate School of Business, Stan-

ford University, Stanford, Calif. N. R. Baker is with the Georgia Institute of Technology, At-

lanta, Ga.

titude is reinforced when financial or economic criteria are primary factors in the decision process. (The partial model in Fig. 1 is representative of scoring models that have appeared in the literature and will serve to illus-trate model shortcomings.) Scoring model input data are often regarded as inaccurate in contrast to those used in other models. For example, a scoring model uses interval estimates to reflect the uncertainty associated with a given measure of project performance, e.g., income, cost, completion time, etc. Although an interval estimate may appear to be unreliable data, for a given level of project uncertainty it may be more dependable than the point estimate required for economic and programming models. The composite project score resulting from the combina-tion of several criteria performance scores is also re-garded with suspicion. Since the operation of forming the project score is intuitive and not tied to a well-de-fined structure, the resulting number is dimensionless and not endowed with a well-defined meaning such as profit, rate of return, or even persona] utility. 1 The in-dividual criterion performance scores are often assigned arbitrarily without regard for the effect of such assign-ments upon the final composite score.

A second limitation of the scoring model is the arbi-trary manner in which such models have been described in the literature. Several important methodological is-sues have gone untreated, thereby making it nearly im-possible to prescribe how a model should be designed and verified for use in a specific environment. Fig. 1 il-lustrates a structure in which the decision criteria have different numbers of scoring intervals. The number of intervals used is an important consideration in model design since this number affects the ultimate usefulness of the model in a given situation. Thus far, no guidelines exist for determining the "proper" number of intervals or for other important decisions relating to the weight-ing of criteria, the formation of the project score, the positioning of scoring intervals on an absolute scale, and the selection of interval width patterns. Further-more, problems even more basic to the decision environ-ment such as selection of criteria, development of per-

1 Fishburn [6] has suggested a comprehensive theory for scoring models that are directly tied to personal utilities. Although the results of this theory are interesting, the prospects of its being made operational in the near future are not good. In addition, aspects of personal utility are deemphasized in the scoring model discussed in this paper.

Page 2: An analytical approach to scoring model design — Application to research and development project selection

MOORE AND BAKER: ANALYTICAL APPROACH TO SCORING MODEL DESIGN 91

C r i t e r i o n : Promise o f Succes s

P r o j e c t Performance

U n f o r e s e e a b l e F a i r High

S c o r e

1 2

3

C r i t e r i o n : P r o j e c t Development Cost

P r o j e c t Performance

More than $12,000 $8,000 t o $12,000 $5,000 t o $8,000 $3,000 t o $5,000 l e s s than $3,000

Scon* 1 3 5

C r i t e r i o n : A v a i l a b i l i t y o f Eng ineer ing S k i l l s

P r o j e c t Performance

O u t s i d e c o n s u l t a n t requ ired S k i l l in company—other d i v i s i o n S k i l l in company—other planL S k i l l in p l a n t E n g i n e e r i n g s k i l l s not r e q u i r e d

Score

1 2

5

Fig. 1. Evaluation criteria and performance scores.

formance measures, and the selection of research and model objectives have not been adequately discussed.

The lack of attention devoted to the use of scoring models in project evaluation is particularly distressing in view of the significant advantages this model form holds o\rer more widely used models. Primary among these advantages is the fact that the scoring model is the only model to permit the explicit inclusion of subjective or qualitative factors that may influence the decision to undertake a project. Corporate prestige, the criticality of need for a product, competitive reactions, and the degree of interface with existing products and markets are often not easily measured in economic terms but are, nonethe-less, important to an evaluation. Another advantage of the scoring model is the opportunity to use simple, low-cost methods of data acquisition. In situations where the uncertainty associated with a project does not permit a meaningful point estimate of performance to be made, interval estimates not only suffice but give a true picture of the accuracy of the information being used. Subjec-tive estimates can be obtained through interviews when objective techniques do not apply. Since the model builder is free to include whatever factors he finds rel-evant to the decision, the scoring model becomes adapt-able to the conditions of data availability associated with the problem or decision situation.

The scoring model also allows the decision maker to predetermine the impact of every factor he considers in arriving at an evaluation. Criteria can be assigned weights corresponding to individual or group utilities for a particular type of outcome. In this way several con-flicting or even inconsistent objectives can be incorpo-rated to form a project score. Finally, a significant ad-vantage of the scoring model is its ability to serve as an

C r i t e r i o n : P r o b a b i l i t y ol" S u c c e s s

P r o j e c t Performance

μ -

Under μ 1.75 ó 1.25 ó .75 ο .25 ο .25 ο .75 ο

1.25 ο Over μ

1.75 ο μ -μ -μ -μ + μ + μ +

to μ + + 1.75 to

1 .25 ο .75 ο .25 ο .25 ó .75 ϋ

1.25 ο 1.75 ο

C r i t e r i o n : T o t a l Income

P r o j e c t Performance

Under ì - 1.75 ó

t o ì to ì

1.75 α 1.25 a

.75 ó to μ

.25 ó to μ

.25 ó

.75 ó 1.25 ο Over μ

to ì Lo ì Ιï ì + 1.7

1.25 ó

.75 ó

.25 ó

.25 ó

.75 ó 1.2 5 ο 1.75 ο

C r i t e r i o n : T o t a l Cos t

Project: Performance Over μ + 1.75 ç

α+ 1.75ó to μ, + 1.25 ó μ +• 1.25 α to μ + .75 ó μ + .75 ó to μ + .25 ó μ + .25 α to μ - .25 ó μ - .25 ó to u. - .75 α μ - .75 ó to μ - 1.25 ó μ - 1.25 ó to μ - 1.75 ó

Under μ - 1.75 ó

S c o r e

S c o r e

I

Fig. 2. Environmental scoring model; sample criteria specifica-tions, μ — project performance distribution mean, ó = project performance distribution standard deviation.

information producing device. Since management is not likely to use any model to actually decide among pro-jects, the model's true value will most probably be found in the wide range of information it can generate for use in making selection decisions. The scoring model can be used to scan large project lists to reduce the number of possible alternatives and locate specific outcome con-figurations, to identify areas of project or laboratory weakness, to study the effects of management decision rules on the research program, and to produce informa-tion useful in research scheduling and budgeting.

ENVIRONMENTAL MODEL

In a recent paper [9] the authors discussed a new form of scoring model and verification device that over-conies several of the limitations listed in the previous section. This new model, defined in terms of the statisti-cal regularities of a given research environment, com-pares favorably in terms of accuracy and sensitivity to estimation error with other more widely accepted models of project evaluation. In addition, a methodology has been developed that simplifies constructing and verify-ing the scoring model for use in other environments.

Fig. 2 illustrates how scoring intervals are defined for three sample criteria in an environmental scoring model. Project performance with respect to each decision cri-terion is assumed distributed according to probability-

Page 3: An analytical approach to scoring model design — Application to research and development project selection

92 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, AUGUST 1969

-ô—1 r—1 L-T—1 r—1 '—τ-50 75 100 125 15Τ

Total Project Income

(000 Dollars)

Fig. 3. Scored project performance distribution. Partitions cen-tered on mean.

5

ð Hr— J L ^ r - ^ ι — ' 1 r 25 50 75 100 125 150

Total Project Income (000 Dollars)

Fig. 4. Scored project performance distribution. Partitions cen-tered below mean.

distribution functions such as the one in Fig, 3. By par-titioning the criterion measurement space, e.g., dollars, hours, unit sales, in terms of the distribution of per-formance, the resulting criterion scores will be sensitive to statistically outstanding levels of project performance. This is a direct contrast to an arbitrary partitioning (and scoring) of a criterion measurement space that could, as in Fig. 4, fail to discriminate between an average project and a truly outstanding one.

For a given set cf projects, the recommendations of an environmental scoring model were compared to repre-sentative economic and linear programming project-se-lection models. In a detailed computational analysis, the economic and programming models served as benchmarks of comparison, and scoring model accuracy was mea-sured by the degree to which project rankings of the scoring and benchmark models were in agreement.2 As a result of this study, it was determined that the project scores produced by the environmental model were highly rank-order consistent (0.1 percent significance) with

2 For a detailed description of the analytical method, specific models used, and the results of this study the reader should con-suit [S], [93.

those resulting from the benchmark models. In addition, this consistency was shown to be relatively insensitive even to serious errors made in the estimation of the shape and parameters of the project performance distributions. And, as added support for the importance of considering the research environment in defining a scoring model, the level of scoring model accuracy was found to be a func-tion of both the effective range, over which each crite-rion measurement space was partitioned, and the power

of the model to discriminate beUveen levels of project performance across the effective range. As a result, the number of scoring intervals used, and the width of those intervals emerge as extremely important considerations in the model building process. Other results of this study will be called upon as needed in the description of an analytical method for scoring model design and verifica-tion presented in the following section.

MODEL DESIGN AND VERIFICATION

The insights gained as a result of the construction and testing of the environmental scoring model have made it possible to set forward recommendations that amount to an analytical method for the design and verification of a scoring model for a specific environment. The sub-

Page 4: An analytical approach to scoring model design — Application to research and development project selection

MOORE AND BAKER: ANALYTICAL APPROACH TO SCORING MODEL DESIGN 93

lapping criteria often exhibit causal dependency. A fa-miliar example of causal dependency is the belief that increases in research spending over a limited range will produce associated increases in project probability of success. Although this phenomenon is said to be impor-tant, there is little research that reveals the effect of such dependency upon scoring model output. Viewing, as the model does, the project alternatives as a fixed set of dis-crete performance estimates, causal dependence of cri-teria is not expected to impeach the results of the scor-ing model evaluation.3

Performance Measurement

Once a criteria list has been formed, the model builder, with assistance from the decision maker, must select a performance measure for each factor. This again is not a well-defined task, since one criterion may possess several possible measures that arise "naturally" from the crite-rion definition, whereas another factor may have no as-sociated measure. A popular measure is the dollar. In fact, models often err by trying to express too much in dollar terms wrhere other, perhaps more subjective but also perhaps more reliable, measures are more appropri-ate. Certainly savings, income, and costs are easily ex-pressed in dollar terms. Other factors, such as market penetration, could be measured in sales dollars, but total unit sales or percent of total market captured could also apply and may better represent management goals than net sales. Whereas time to completion could be stated in years or months, a discrete scale based on "near-term," "average," and "long-term" might be preferred if exact time estimates were subject to considerable estimation error. The decision maker must guard against incorporat-ing a scale of measure that is more accurate than the performance estimates he supplies. Since performance es-timates tend to improve, in terms of accuracy, for projects at the engineering development end of the R & D spec-trum, one would expect the types of measures to differ be-tween models used for basic research evaluation and those used in engineering development project selection.

In several cases where measurement schemes are not immediately apparent, an "artificial" measure must be constructed. This is not undesirable and constitutes one of the more imaginative facets of research management. However, care must be taken to assure that the proposed measure is a reliable, accurate, and meaningful measure of the desired performance criterion. For example, one measure of the critical need for a product may be the number of competitors wrho currently have the product. Should the decision maker decide that his need is "very high" since his is the only company without that item in his product line, he could well be guilty of misjudging his market. Competitors in some highly segmented mar-kets are often not (and perhaps should not be) com-

3 Preliminary studies with correlated project performance dis-tributions indicate that the presence of causal dependency has very little effect upon the degree of accuracy, (intermodel agree-ment) demonstrated by the environmental scoring model.

tasks that comprise the model design process are dis-cussed in the following paragraphs with some sense of chronology being maintained.

Criteria Selection

Each relevant decision maker, with the model build-er's assistance, must first specify all criteria that he con-siders relevant to the evaluation and selection of pro-jects for research. There is no "tried and true" list that can be proposed since the criteria list will be unique to the particular laboratory and even to the decision mak-ers within the laboratory. The resulting list, which wrill be used in scoring model construction, should, howrever, possess the following properties.

1) This list should be complete to guarantee that no important evaluation factors are overlooked in the anal-ysis.

2) At the same time, the true relevance of each cri-terion should be challenged before adding it to the list. Since data collection and processing requirements in-crease with the number of criteria used, the decision maker should avoid including factors that are of decid-edly minor importance in the decision process.

3) Each criterion should be measurable in the sense that a method and a scale for obtaining a measure of project performance with respect to the criterion either exists or can be constructed. The particular measure may be "natural" or "artificial." The requirement being stated here is simply that some measurement scheme is avail-able.

4) Criteria overlap should be held to a minimum. To avoid multiple counting or overweighting the impact of a particular factor, one criterion should not include or be encompassed by another. Duplication is often the re-sult of too much "soul searching" to drag out more cri-teria. Similar factors can usually be combined to form a single composite criterion. The removal of overlapping criteria facilitates the understanding of trade-offs that exist within the criterion set.

In the absence of a criteria-selection algorithm, the decision maker should give careful study to management goals and directives as well as any stated objectives of the research program. Lists of factors considered rele-vant to other prescriptive models may also be helpful. Some criteria identified as important by other authors include probability of success, project cost, income or cost savings, the timing of income and cost streams, level of technical and managerial familiarity with the research area, budget levels, market penetration, time to completion, and strategic need. Although there is no "correct" number of criteria, a number between five and ten should be sufficient to evaluate projects at almost any point in the research-development^-engineering spec-trum.

The issue of the intercriteria independence often en-ters into discussions of multiple-criteria scoring. Causal independence implies the lack of correlation between pro-ject performance levels for two or more criteria. Over-

Page 5: An analytical approach to scoring model design — Application to research and development project selection

94 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, AUGUST 1969

Petitors in other markets. Even where a single competitor in a given market is without product representation his need may still not be high if the other firms have the market so "saturated" that his new product would not be accepted. A second example is given for the criterion "effect on distribution and inventory costs." A measure might be the average number of units in stock at any time. However, if warehousing is such that the signifi-cant costs are in handling rather than storage then some measure including turnover or shipping frequency may be better suited to the problem.

Estimation of Research Environment

The key to the high degree of intermodel consistency demonstrated by the environmental scoring model is that estimates of the research environment play an important part in defining the model. As a requirement for initial model specification, the probability distributions of pro-ject performance must be estimated for each criterion. These can often be obtained by forming frequency his-tograms from the historical project data files. In the process of satisfying the model's data requirements, un-usual aspects of the research environment may be un-covered. For example, if subenvironments can be iden-tified for distinguishable proj ect classes then multimodal performance distributions of the type discussed in [9] are likely to result. It may therefore be wise to consider developing several scoring models—one for each project class—rather than trying to treat the entire research spectrum with a single model. In the course of earlier field research it was noticed that an electronics firm faced widely separated distributions of income, cost, and probability of success for projects in one laboratory de-pending upon the type of semiconductor material being used. In this case the better strategy appeared to be to construct a scoring model for each semiconductor ma-terial class. This is likely to be preferable to accepting the lower consistency (accuracy) levels that accompany multimodal performance distributions, especially if model output is used for project screening or generating information on which subsequent decisions will be based.

Where historical information is scarce or nonexistent, subjective probability distributions must be constructed by questioning "knowledgeable" persons in the research organization. Several authors have indicated that this is a feasible and even desirable course of action. Evi-dence is available to indicate that the necessary sub-jective information is available and can be translated into distributions [14]. The resulting distributions can be revised in a Bayesian procedure as the supply of project data is enlarged. In most cases there will be sufficient data to form frequency distributions for cri-teria relating to financial performance data. The use of subjective distributions will probably be necessary mainly for those criteria for which the measures are in-herently subjective.

The research environment should not be expected to remain constant over time. As new projects are com-pleted and new areas of research open up, many of the distributions that underlie the environment will shift. This is particularly true for a model that includes mar-ket factors that are constantly fluctuating. Older and more competitive markets, for example, will not permit the high returns characteristic of newer markets where only a few firms have begun to market new products. As a result, there is a continual need for review procedures to update the performance distributions and thereby al-ter the design parameters of the model to insure that the research environment is perceived as accurately as possible.4

Criteria Weights

Once criteria have been determined, weights that in-dicate the relative importance of each criterion in arriv-ing at a project evaluation and selection decision should be assigned. This is one point in the development of the scoring model where it is necessary to extract personal or group utilities from the decision maker (s). I t is un-likely that each criterion will be regarded as equally important by an individual decision maker and it may be even more unlikely that a group of decision makers will agree to an order of importance. Several methods of weighting criteria for individuals or groups are avail-able. These range from simple rank-ordering schemes to partial and complete paired comparisons (see, e.g., [2], [3]). In two independent testing situations it has been determined that a selected set of these methods produce similar weights but that the simple ranking method is by far the easiest to use [5], [13]. Unfortunately, these two research reports are more sugges t s than conclusive. However, they do indicate that the \~arious methods pro-duce similar utility weights and that the model builder should utilize whichever method elicits ease of use and confidence from the decision maker.

The importance of criteria weight is fairly obvious since it allows the scoring model to reflect the priorities of the decision maker (s). More formally, the weights define the rates at which performance level trade-offs occur between criteria. Once criterion scores are deter-mined, the composite weighted project score, or objective function, can be represented on a multidimensional in-difference map as a complex of trade-off functions. But, as in the case of the performance distributions, criteria weights cannot be expected to remain constant. Shifts in research emphasis, management policies, or the cor-porate financial position all dictate changes in the rnan^ agement priority system. A policing method is required to guarantee that the criteria weights accurately reflect current priorities.

4 Although criteria and performance measures should be more stable than the performance distributions, these factors are also subject to change and should be reviewed periodically [13].

Page 6: An analytical approach to scoring model design — Application to research and development project selection

M O O R E A N D B A K E R : A N A L Y T I C A L A P P R O A C H T O S C O R I N G M O D E L D E S I G N 95

Initial Spécification

An initial scoring model specification should be de-veloped to include only those criteria related to measures of economic performance. The noneconomic criteria will be added after a good financial scoring model is fash-ioned. Although the model builder can perform this task in many ways, the authors feel that the insights they gained through extensive work with scoring model forms are informative. Hence, these are presented in a form that outlines a procedure by which an initial model definition can be obtained.

At the outset, the scoring model should include as many scoring intervals as is feasible for eventual implementa-tion. Psychometric testing indicates that nine intervals is the maximum in most cases where judgmental data are involved. Since intermodel consistency increases with the

number o) scoring intervals [8], [ 9 ] , the initial scoring model will have as much discriminatory power and ef-fective range potential as is reasonable to provide. As-suming nine intervals are to be used to score project performance, nine intervals should be used for every cri-terion. A scoring function is defined by assigning an integer-valued score to each scoring interval. For a nine-interval model these scores come from the closed inter-val [1,9]. Performance scores should be drawn from this interval since to do otherwise would subvert the criteria weights determined earlier. For example, assign-ing the nine intervals of one criterion values of 4, 5, 6, . . . , 12 from [4,12] while all other criterion scores were in [1,9] would automatically give that one criterion more wreight in the final project score than that held by others. In the partial model in Fig. 1, the promise of success criterion carries less weight than the remaining criteria.

The scoring intervals that partition each criterion measurement space should be of equal width except for the "end intervals" that reach to the limits of each per-formance scale in order to detect extremely high or low values. A suggested interval width is equal to one-half the estimated standard deviation of each project performance distribution function. The intervals should be symmetric (centered) about the estimated mean of each distribution.5 Equal interval widths are suggested so that an approximate upper bound on intermodel con-sistency can be determined early in the verification process. If this level is satisfactory, further experimen-tation with interval width configurations will not be nec-essary. The handling of the income, cost, and probability of success criteria in Fig. 2 can serve as an example of the initial specification of a nine-interval scoring model.

The nature of the performance measure selected for each criterion is important for determining how the scoring function will be specified. For continuous scales

5 Since these distributions result from an analysis of historical and/or subjective interview data, their parameters should be re-garded and referred to as "estimates."

P a t t e r n 1 :

Outcome Score

Outcome d 9

8

Outcome c 7

(:

b

Outcome b 3

Outcome a 1

P a t t e r n 2 :

Outcome Score

Outcome d 9

8

7

6

!>

Outcome c 3

Outcome b 2

Outcome a 1

Fig. 5 . Possible performance score assignments. Criterion: criti-cality of need.

such as dollars and probability the scoring intervals will simply encompass a part of the total scale on either side of the estimated distribution mean. An integer-valued score can be assigned to a performance value depending upon the interval in which the value falls. The procedure is the same for many discrete measurement scales such as "number of research personnel required" or "ship-ments per month." There is a problem, however, when the number of relevant points or measurements on a scale is less than the number of intervals. Consider the criterion "criticality of need for the new product" that has four rele\ rant measurement points (in the decision maker's opinion) which, in order of increasing impor-tance, are as follows.

1) The product is a novel item for which a limited de-mand exists.

2) The product will enhance the sales of a profitable secondary product line.

3) The product is a new market entry for which sig-nificant demand exists.

4) The product is needed to meet competition and in-sure the survival of a major product line.

Fig. 5 indicates two of many possible ways in which scores can be assigned to performance values. That pat-tern providing the best level of model accuracy can be determined during the initial model verification process.

Benchmark Determination

Scoring model accuracy was defined earlier as the de-gree to which, for a given set of projects, the financial scoring model's project scores are in good agreement with other prescriptive financial evaluation models. The environmental scoring models used in previous analyses

Page 7: An analytical approach to scoring model design — Application to research and development project selection

96 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, AUGUST 1969

the degree to which the scoring model gives recommenda-tions similar to the benchmark(s) . 6

4) The measure of intermodel consistency should be averaged, for a given scoring model structure, over sev-eral sets to avoid the effects of special grouping of pro-jects within sets. Projects should be assigned to sets in a random fashion.

5) If the resulting consistency levels are not satis-factory to the decision maker, structural alterations in the scoring model can be made and the verification re-peated.

Since large-scale computer testing is expensive, it is in the decision maker's best interest that the model builder obtain a satisfactory scoring model as quickly as pos-sible. In describing how a verification analysis can be organized, the authors are unable to quote rigid proce-dures since these would necessarily differ from appli-cation to application. The following paragraphs discuss, in a general way, the stages in which an analysis might be conducted. The initial specification is assumed to be a nine-interval model with a weighted additive project score.7

First stage: An interval width analysis using three or four alternatives to the initial width, e.g., one-half stan-dard deviation, will establish an upper bound on con-sistency for a nine-interval, equal-interval width model. For these trials the widths of all criterion-scoring in-tervals should be varied simultaneously to obtain max-imum effect upon the resulting correlations. As interval widths are widened, the portion of the criterion mea-surement space that is effectively partitioned increases, thereby increasing the effective measurement range of the model. Such a range increase acts to produce higher levels of correlation. However, interval widening also de-creases the ability of the model to discriminate between levels of performance within any given range. This acts to produce lower levels of consistency. Since a con-sistency trade-off can be identified as the model's dis-criminatory power and effective measurement range change with variation in interval widths, a "correct" in-terval width is defined as the width that results in the highest level of intermodel consistency. As an example, tests with several initial scoring model formulations in unimodal environments produced rank-correlations above 0.89. If the decision maker is satisfied with the results at this point, the analysis should proceed to stage four. Otherwise, stage two should be undertaken.

Second stage: If the "correct" interval width does not result in an acceptable consistency level there are several analyses that can be undertaken. However, determining

, the reason for a low level of consistency involves much trial and error. Thus, each new run should make signifi-

6 A set of 15-25 projects provides an adequate basis for correla-tion studies. Frequently a model is asked only to rank projects with a set. In this case, rank correlation or measure of concord-ance statistics can be used to determine intermodel consistency [14L

7 Jb or a detailed description of a simulation procedure for model verification, see [8], [91.

[8], [9] exhibited a very high degree of intermodel con-sistency with selected "economic" rules and mathemati-cal programming models. The initial model verification process requires the systematic variation of structural parameters of the initial model defined in the step above to produce an acceptable level of scoring model accuracy. To do this, the decision maker must first specify a basis of comparison—one or more benchmark models. This enables the model builder to ensure that the initial scor-ing model specification is consistent with management preferences for financial results.

Selecting the proper benchmark model (s) is often simplified by the existence of a decision rule or evalua-tional model currently favored by management. A re-cent study [8] concluded that many research managers determine the financial desirability of a project by using simple economic analysis and then consider the relevant qualitative factors, but in a way less analytical than the scoring model permits. Several benchmarks can be used where the decision maker wishes to combine several complementary or even contradictory decision rules, e.g., rate of return, payback period, cash flow, etc. The draw-back to using more than a single benchmark is that the level of overall consistency with several models will probably not be as high as that possible if only one benchmark were used. In any event, experimentation during verification is a simple process. The final deci-sion can be made in terms of the resulting consistency levels and the factors that are included.

Initial Verification

This step consists of repeated testing of the initial scoring model specification with simulation and correla-tion methods for the purpose of developing a financial scoring model that is both consistent with other financial evaluation methods and capable of identifying outstand-ing project performance in a specified research environ-ment. The computational procedures of verification are organized into the following sequence of operations.

1) The scoring and benchmark models must first be programmed for the computer so that estimates of pro-ject performance can be input and the associated scores, indices, rates of return, etc., are output.

2) A bank of project data—performance estimates drawn from the environment under study—must be cre-ated. This can be done by Monte Carlo "sampling" from the project performance distributions to produce hypo-thetical projects. This is particularly useful when there is not a great amount of project information on hand. In laboratories where hundreds of engineering develop-ment projects are completed every few months, historical or current project data can be used. To provide a stable basis for model verification, it is suggested that at least one hundred projects be used.

3) Project data is then given to the scoring and bench-mark models in sets containing at least fifteen projects. Each model's score (or ranking thereof) for every pro-ject in a set is used in a correlation analysis to determine

Page 8: An analytical approach to scoring model design — Application to research and development project selection

MOORE AND BAKER: ANALYTICAL APPROACH TO SCORING MODEL DESIGN 97

environment. In particular, how accurate are the esti-mated distribution means? If the largest "possible" error is felt to be, say, 15 percent in these estimates, then an analysis using distribution means in error by =±= 15 per-cent to partition the criterion measurement spaces wTill determine the sensitivity of consistency to estimation error. The scoring models used in previous studies were remarkably insensitive to mean estimation errors under =t 25 percent. Sensitivity, when present, can often be reduced by adding scoring intervals or by widening exist-ing ones. In certain extreme cases where errors may be on the order of 50 percent, it may be a wiser strategy to build a stable model with intermodel correlations of 0.75 than a sensitive model that, without error in esti-mation, is 0.95 consistent with benchmarks. I t is possible, though untested, that the use of unequal interval widths contributes to model insensitivity to estimation error.

Fifth stage: Going beyond the previous stage may provide data having more informational than model-building value. A reduction in the number of intervals, while supposedly making the model easier to apply, low-ers consistency. This trade-off can be analyzed for any specified number of intervals if there is doubt regarding the ability to obtain data sufficiently precise to permit meaningful use of the nine-interval model just developed. The use of an additive project score was assumed from the beginning since this type of function was shown to be more consistent and less sensitive to estimation error than a multiplicative project score [8], [9]. This result can be challenged for the model just developed by com-puting a multiplicative score within the verification anal-ysis and testing its sensitivity. Such a computation of an alternative score could be carried through the entire development process so that unusual behavior, if present, can be detected and investigated before substantial effort is expended.

The decision maker can use the verification process to learn more about the proposed model. One interesting analysis involves the inclusion of project data that car-ries considerable opportunity for financial loss. The re-sulting project scores, when compared with less risky projects, will give the decision maker some estimate of the risk-averting tendencies of the model. In this or any other testing where the decision maker is the bench-mark, the opportunity exists for styling the scoring model to the needs of the decision maker charged with decision responsibility.

Complete Specification

The analytical methodology for model design and veri-fication is of less value in determining a complete model specification including noneconomic criteria because of the absence of a subjective benchmark model. The only benchmark capable of combining all evaluation cri-teria into a decision is the decision maker himself. Ob-viously, if he were capable of weighing all factors and arriving, within a reasonable time frame, at a decision with which he is satisfied, there would be no need for an

cant alterations in the scoring model structure since sub-tle changes appear to have little or no effect upon con-sistency. In situations where consistency is not as high as desired, this insensitivity is no longer a blessing.

First consider all economic criteria with discrete mea-surement scales for which fewer than nine relevant per-formance measurement points exist. Fig. 5 illustrates that several scoring functions are possible for these criteria. A few simulation trials using different scoring functions should be sufficient to identify that function most con-sistent with the benchmarks. If acceptable to the deci-sion maker, the "more consistent function" should be used.

I t is still possible that an acceptable degree of con-sistency will not have been reached since one of the more difficult aspects of scoring model development is in recording time preferences for cash flows. Two or three trials with different discounting rates used in the bench-marks will determine if the scoring model is "discount-ing" cash flows at the desired rate. If time preference criteria are found to contribute to inconsistency, scoring function realignment, for discrete measurement scales (as described), or variation of criteria weights should be considered. Due to the great opportunity for endless trial and error testing of different criteria weights, the model builder should attempt only a few simulation runs be-fore proceeding to stage three.

Third stage: A low degree of consistency could re-sult from a lack of scoring model power to discriminate between levels of performance, thereby misclassifying projects. This type of behavior was noticed in popula-tions with large distribution variances. In this case, widening intervals to encompass a larger portion of the criterion measurement space reduces discriminatory power near the performance distribution mean where most performance values actually occur. Therefore, if consistency cannot be increased in stage two, the require-ment of equal interval widths should be eliminated. In-tervals near the distribution mean—where the greatest opportunity for ranking error exists—should be nar-rower than those closer to the distribution tails. Skewed or multimodal distributions can be scored more accu-rately when unequal wridths are used since discriminatory power can be placed wdiere it is needed most. The shape of each distribution should dictate how interval widths are to be determined since there are too many possible configurations to permit blind trial-and-error proce-dures. Any inconsistencies present upon entering this stage should be resolved once the intervals are realigned. The model designer is warned against spending too much effort in stage-two analysis before studying distribution shapes and discarding the equal interval width struc-ture.

Fourth stage: Once an acceptable financial scoring model has been determined, the model builder should consider the degree to which the estimated project per-formance distributions could differ from the true popula-tion distributions that describe the underlying research

Page 9: An analytical approach to scoring model design — Application to research and development project selection

98 I EEL TRANSACTIONS OX ENGINEERING MANAGEMENT, AUGUST 1969

evaluation model. However, the expressed interest of R & D management for an evaluation model that can serve, at a minimum, to screen projects, suggests that many decision makers are not completely satisfied with current approaches, including their own best judgment. The model designer should therefore attempt to take ad-vantage of the insights gained through repeated verifica-tion analyses to fashion a model that is responsive to both the research environment and the preferences of the decision maker.

A few suggestions to guide the model builder are offered. For noneconomic criteria with continuous mea-surement scales, the estimated performance distribution should be used as a guide to interval determination. Intervals of an equal width in the 0.35-0.50 standard deviation range should provide good discriminatory power and adequate measurement range for most per-formance distributions. For highly dispersed or skewed distributions, however, an unequal width configuration is advised. Criteria with discrete scales and few rele-vant measurement points should be assigned scoring functions derived from knowledge of the decision mak-er's utility for outcomes wherever possible.

Some limited testing and final verification of the com-pleted model is possible, particularly where historical project records are complete. One type of analysis would be a hindsight approach of remaking previous decisions based on knowledge of project outcomes. For example, the scoring model could be asked to evaluate the ten best projects in corporate history and the ten most dis-mal failures. Any inconsistencies noted at this point could indicate that too much wreight was placed upon an inconsequential criterion or that an important factor had been overlooked. Such testing procedures should in-sure model sensitivity to the extremes in project per-formance. There is very little that can be done at present to measure the scoring model's ability to evaluate be-havior closer to performance means.

CONCLUSION

There is little evidence to suggest that any one model form is preferable to any other for all the various evalua-tion and selection decisions that arise within the re-search-development^engineering-marketing framework. It is, however, widely recognized that, because of data availability and the magnitude of financial risk involved, the more demanding and informative analyses are feasi-ble and should be used at the engineering-marketing phases. Thus, we observe the utilization of economic risk analyses and constrained extremum studies at these

phases. Conversely, because of data unavailability and relatively low economic risk, arguments are made that these model forms are not entirely relevant to the re-search and development phases of the process.

As early as 1964, Baker and Pound recognized the need for constructing a design process whereby scoring models, which apparently can be used at the research and development phases, could be constructed to pro-duce evaluations consistent with the model forms to be applied for engineering and marketing analyses. This paper argues that it is now possible to construct a multi-ple-criteria scoring model with sufficient accuracy to per-mit its use at the more applied end of the R & D spectrum. Such a model can be designed and \-erified for a specific research environment by taking advantage of certain structural relationships that influence model behavior. Thus, an analyst should now be able to design, verify, and implement a scoring model that can be used by his management for the evaluation and selection of research and development projects.

REFERENCES

[1] N. R. Baker and W. H. Pound, "R and D project selection: Where we stand," IEEE Trans. Engineering Management, vol. EM-11, pp. 124-134, December 1964.

[2] W. D. Buel, "A simplification of Hay's method of recording paired comparisons,1' J. Ay pi. Psych., voi. 44, pp. 347-34S, 1960.

L3] C. W. Churchman, . L. Ackoff, and E. L. Arnoff, Introduc-tion to Operations Research. New York: Wiley, 1957.

[4] Â. V. Dean and M. J. Nishry, "Scoring and profitability models for evaluating and selecting engineering projects," J. Operations Res. Soc. Am., pp. 550-570, July-August 1965.

[5] R. T. Eckenrode, "Weighting multiple criteria," Manage-ment Sei., vol. 12. pp. 180-192, November 1965.

[6] P. C. Fishburn, "Independence in utility theory with whole product sets," Operations Res., vol. 13, pp. 28-45, 1965.

[7] J. R. Miller, "The assessment of worth: A systematic pro-cedure and its experimental validation," Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, Septem-ber 1966.

[S] J. R. Moore, "Research and development project selection: Theoretical and computational analysis of a project scoring model." Ph.D. dissertation, Purdue University, Lafayette, Ind., June 1968.

[9] J. R. Moore and N. R. Baker, "A computational analysis of an R & D project scoring model," Institute for Research in the Behavioral, Economic and Management Sciences, Kran-nert Graduate School of Industrial Administration, Purdue University, Lafayette, Ind., paper 205, July 1968. Manage-ment Sei. (to be published).

[10] C. M. Mottley and R. D. Newton, "The selection of projects for industrial research," Operations Res., vol. 7, pp. 740-751, November-December 1959.

[113 E. A. Pessemier, New Product Decisions: An Analytical Ap-proach. New York: McGraw-Hill, 1966.

[12] A. H. Rubenstein, "Selecting criteria for R & D," Harvard Business Rev., pp. 95-104, January-February 1957.

[133 C. C. Schimpeler, "A decision-theoretic approach to weighting community development criteria and evaluating alternative plans," Ph.D. dissertation, Purdue University, Lafayette, Ind., August 1967.

[143 S. Siegel, Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill, 1956.