Reassessment of Expectations as a Comparison Standard in Measuring Service Quality- Implications of Future Research

A. Parasuraman, Valarie A. Zeithaml, & Leonard L. Berry

Reassessment of Expectations as aComparison Standard in Measuring

Service Quaiity: implications forFurtiier Research

The authors respond to concerns raised by Cronin and Taylor (1992) and Teas (1993) about the SERVQUAL in-strument and the perceptions-minus-expectations specification invoked by it to operationalize service quality. Afterdemonstrating that the validity and alleged severity of many of those coiicerns are questionable, they offer a set ofresearch directions for addressing unresolved issues and adding to the understanding of service qualityassessment.

T wo recent articles in this joumalCronin and Taylor(1992) and Tfeas (1993)have raised concerns aboutour specification of service quality as the gap between cus-tomers' expectations and perceptions (Parasuraman, Zei-thaml, and Berry 1985) and about SERVQUAL, a two-partinstrument we developed for measuring service quality (Par-asuraman, Zeithaml, and Berry 1988) and later refined (Par-asuraman, Berry, and Zeithaml 1991). In this article we re-spond to those concerns by reexamining the underlying ar-guments and evidence, introducing additional perspectivesthat we argue are necessary for a balanced assessment ofthe specification and measurement of service quality, andisolating the concerns that seem appropriate from those thatare questionable. Building on the insights emerging fromthis exchange, we propose an agenda for further research.Because many of the issues raised in the two articles and inour response are still unsettled, we hope that the researchagenda will encourage all interested researchers to addressthose issues and add to the service quality literature in a con-structive manner.'

Though the concerns raised by both C&T and Teas re-late to service quality assessment, by and large they focuson different issues. For this reason, we discuss C&T's con-cerns first, then address the issues raised by Teas. We con-clude with a discussion of further research for addressing un-

'Because of the frequent references to Cronin and Taylor in this article,hereafter we refer to them as C&T. For the same reason, we use initials inciting our own work, that is, PZB, PBZ. ZBP, or ZPB. Also, when appro-priate, we use Ihe abbreviations CS, SQ, and PI to refer to customer satis-faction, service quality, and purchase intentions, respectively.

A. Parasuraman is Federated Professor of Marketing.Texas A&M Univer-sity. Valarie A. Zeithami is a Partner, Schmalensee and Zeitfiaml. LeonardL Berry is JCPenney Chair of Retailing Studies and Professor of Market-ing, Texas A&M University.

Joumal of MarketingVol. 58 (January 1994), 111-124

resolved issues, including the relationship between servicequality and customer satisfaction.

Response to Cronin andTayior(1992)

C&T surveyed customers in four sectors (banking, pest con-trol, dry cleaning, and fast food) using a questionnaire thatcontained (1) the battery of expectations and perceptionsquestions in the SERVQUAL instrument (PBZ 1991; PZB1988), (2) a separate battery of questions to measure the im-portance of the SERVQUAL items, and (3) single-itemscales to measure overall service quality, customer satisfac-tion, and purchase intentions. On the basis of their study,C&T conclude that it is unnecessary to measure customer ex-pectations in service quality research. Measuring percep-tions is sufficient, they contend. They also conclude that ser-vice quality fails to affect purchase intentions. We do not be-Heve that these conclusions are warranted at this stage of re-search on service quality, as we attempt to demonstrate inthis section. Our comments on C&T's study focus on (1)conceptual issues, (2) methodological and analytical issues,and (3) practical issues.

Conceptual IssuesThe perceptions-expectations gap conceptualization of SQ.C&T contend (p. 56) that, "little if any theoretical or empir-ical evidence supports the relevance of the expectations-performance gap as the basis for measuring service qual-ity." They imply here and elsewhere that PZB's extensivefocus group research (PZB 1985; ZPB 1990) suggest onlythe attributes of SQ and not its expectations-performancegap formulation. We wish to emphasize that our researchprovides strong support for defining SQ as the discrepancybetween customers' expectations and perceptions, a pointwe make in PZB (1985) and ZPB (1990). C&T's assertion

Reassessment of Expectations as a Comparison Standard /111

also seems to discount prior conceptual work in the SQ lit-erature (Gronroos 1982; Lehtinen and Lehtinen 1982; Sas-ser, Olsen, and Wyckoff 1978) as well as more recent re-search (Bolton and Drew 1991a, b; ZBP 1991) that sup-ports the disconfirmation of expectations conceptualizationof SQ. Bolton and Drew (1991b, p. 383), in an empiricalstudy cited by C&T, conclude:

Consistent with prior exploratory research concerning ser-vice quality, a key determinant of overall service qualityis the gap between performance and expectations (i.e., dis-confirmation). For residential customers, perceived tele-phone service quality depended on the disconfirmation trig-gered by perceived changes in existing service or changesin service providers. . . . It is interesting to note that dis-confirmation explains a larger proportion of the veiriancein service quality than performance.

Therefore, C&T's use of the same Bolton and Drew articleas support for their claim (p. 56) that' 'the marketing litera-ture appears to offer considerable support for the superior-ity of simple performance-based measures of service qual-ity" is surprising and questionable. Moreover, a second ci-tation that C&T offer to support this contentionMazis,Ahtola, and Klippel (1975)is an article that neither dealtwith service quality nor tested performance-based measuresagainst measures incorporating expectations.

There is strong theoretical support for the general no-tion that customer assessments of stimuli invariably occurrelative to some norm. As far back as Helson (1964), adap-tation-level theory held that individuals perceive stimulionly in relation to an adapted standard. More recently,Kahneman and Miller (1986) provide additional theoreticaland empirical support for norms as standards against whichperformance is compared and can be measured.

Attitude formation versus attitude measurement. The em-pirical, longitudinal studies (e.g., Bolton and Drew 1991a,b; Churchill and Surprenant 1982; Oliver 1980) that C&Tcite to support their arguments focus on the roles of expec-tations, performance, and disconfirmation in the formationof attitudes. However, the relevance of those arguments forraising concems about SERVQUAL is moot because the in-strument is designed merely to measure perceived SQanattitude levelat a given point in time, regardless of the pro-cess by which it was formed. SERVQUAL is a tool to ob-tain a reading of the attitude level, not a statement abouthow the level was developed.

Relationship between CS and SQ. An important issueraised by C&T (as well as Teas) is the nature of the link be-tween CS and SQ. This is a complex issue characterized byconfusion about the distinction between the two constructsas well as the causal direction of their relationship. SQ re-searchers (e.g.. Carman 1990; PZB 1988) in the past havedistinguished between the two according to the level atwhich they are measured: CS is a transaction-specific assess-ment (consistent with the CS/D literature) whereas SQ is aglobal assessment (consistent with the SQ literature). Onthe basis of this distinction, SQ researchers have positedthat an accumulation of transaction-specific assessmentsleads to a global assessment (i.e., the direction of causalityis from CS to SQ). However, on careful reflection, we now

believe that this distinction may need to be revised, particu-larly because some recent studies (Reidenbach and Sandifer-Smallwood 1990; Woodside, Frey, and Daly 1989) havemodeled SQ as an antecedent of CS.

In the research implications section, we propose and dis-cuss an integrative framework that attempts to reconcile thediffering viewpoints about the distinction, and the directionof causality, between CS and SQ. Here we wish to pointout an apparent misinterpretation by C&T of our publishedwork (PZB 1985, 1988). Specifically, in PZB (1985) we donot discuss the relationship between SQ and CS, and inPZB (1988) we argue in favor of the "CS leads to SQ" hy-pothesis. Therefore, we are surprised by C&T's attributingto us the opposite view by stating (p. 56) "PZB (1985,1988) proposed that higher levels of perceived SQ result inincreased CS" and by referring to (p. 62) "the effects hy-pothesized by PZB (1985, 1988) [that] SQ is an antecedentof CS." Moreover, as we discuss in the next section,C&T's empirical findings do not seem to warrant their con-clusion (p. 64) that "perceived service quality in fact leadsto satisfaction as proposed by PZB (1985, 1988)."

Types of comparison norms in assessing CS and SQ. Incritiquing the perceptions-minus-expectations conceptualiza-tion of SERVQUAL, C&T raise the issue of appropriatecomparison standards against which perceptions are to becompared (p. 56):

[PZB] state that in measuring perceived service qualitythe level of comparison is what a consumer should ex-pect, whereas in measures of satisfaction the appropriatecomparison is what a consumer wouid expect. However,such a differentiation appears to be inconsistent withWoodruff, Cadotte, and Jenkins' (1983) suggestion that ex-pectations should be based on experience normswhatconsumers shouid expect from a given service providergiven their experience with that specific tyjje of serviceorganization.

Though the first sentence in this quote is consistent with thedistinction we have made between comparison norms in as-sessing SQ and CS, the next sentence seems to imply thatthe debate over norms has been resolved, the consensusbeing that the "experience-based norms" of Woodruff, Ca-dotte, and Jenkins (1983) are the appropriate frame of refer-ence in CS assessment. We believe that such an inference isunwarranted because conceptualizations more recent thanWoodruff, Cadotte, and Jenkins (1983) suggest that CS as-sessment could involve more than one comparison norm(Forbes, Tse, and Taylor 1986; Oliver 1985; Tse and Wil-ton 1988; Wilton and Nicosia 1986). Cadotte, Woodruff,and Jenkins (1987) themselves have stated (p. 313) that"additional work is needed to refine and expand the concep-tualization of norms as standards." Our own recent re-search on customers' service expectations (ZBP 1991, a pub-lication that C&T cite) identifies two different comparisonnorms for SQ assessment: desired service (the level of ser-vice a customer believes can and should be delivered) andadequate service (the level of service the customer consid-ers acceptable).

Our point is that the issue of comparison norms andtheir interpretation has not yet been resolved fully. Though

112 / Joumal of Marketing, January 1994

recent conceptual work (e.g., Woodruff et al. 1991) and em-pirical work (e.g., Boulding et al. 1993) continue to add toour understanding of comparison norms and how they influ-ence customers' assessments, more research is needed.

Methodoiogicai and Anaiyticai issuesC&T's empirical study does not justify their claim (p. 64)that "marketing's current conceptualization and measure-ment of service quality are hased on a flawed paradigm"and that a performance-hased measure is superior to theSERVQUAL measure. A numher of serious questionsahout their methodology and their interpretation of the find-ings challenge the strong inferences they make.

Dimensionality of service quality. One piece of evi-dence C&T use to argue against the five-component struc-ture of SERVQUAL is that Uieir LISREL-hased confirma-tory analysis does not support the model shown in their Fig-ure 1. Unfortunately, C&T's Figure 1 is not a totally accu-rate depiction of our prior findings pertaining toSERVQUAL (PZB 1988) hei:ause it does not allow for pos-sible intercorrelations among the five latent constructs.Though we have said in our previous work thatSERVQUAL consists of five distinct dimensions, we alsohave pointed out that the factors representing those dimen-sions are intercorrelated and hence overlap to some degree.We report average interfactor correlations of .23 to .35 inPZB (1988). In a more recent study in which we refinedSERVQUAL, reassessed its dimensionality, and comparedour findings with four other replication studies, we acknowl-edge and discuss potential overlap among the dimensionsand arrive at the following conclusion (PZB 1991, p. 442):

Though the SERVQUAL dimensions represent five con-ceptually distinct facets of service quality, they are also in-terrelated, as evidenced hy the need for ohlique rotationsof factor solutions in the various studies to obtain themost interpretahle factor patterns. One fruitful area for fu-ture research is to explore the nature and causes of theseinterrelationships.

Therefore, we would argue that the fit of C&T'sSERVQUAL data to their model in Figure 1 might havebeen better than that implied by their Table 1 results if themodel had allowed the five latent constructs tointercorrelate.

C&T use results from theiir oblique factor analyses to re-iterate their inference that the 22 SERVQUAL items are uni-dimensional. They do so by merely referring to their Table2 and indicating (p. 61) that "all of the items loaded predict-ably on a single factor with the exception of item 19." Buta careful examination of C&T's Table 2 raises questionsabout the soundness of their inference.

First, except for the factor representing SERVQUAL'sperceptions component (i.e., SERVPERF) in the pest con-trol context, the percentage of variance in the 22 items cap-tured by the one factor for which results are reported is lessthan 50%, the cut-off value recommended by Bagozzi andYi (1988) for indicators of latent constructs to be consid-ered adequate. Moreover, across the four contexts, the vari-ance captured for the SERVQUAL items is much lowerthan for the SERVPERF items (mean of 32.4% for

SERVQUAL versus 42.6% for SERVPERF). These resultsstrongly suggest that a unidimensional factor is not suffi-cient to represent fiiUy the information generated by the 22items, especially in the case of SERVQUAL, for whichover two-thirds of the variance in the items is unexplainedif just one factor is used. The difference in the variance ex-plained for SERVQUAL and SERVPERF also suggeststhat the former could be a richer construct that more accu-rately represents the multifaceted nature of SQ posited inthe conceptual literature on the subject (Gronroos 1982;Lehtinen and Lehtinen 1982; PZB 1985; Sasser, Olsen, andWyckoff 1978).

Second, though C&T say that they conducted obliquefactor analysis, they report loadings for just one factor ineach setting (Table 2). It is not clear from their articlewhether the reported loadings are from the rotated or unro-tated factor-loading matrix. If they are prerotation loadings,interpreting them to infer unidimensionality is questiona-ble. If they are indeed postrotation loadings, the variance ex-plained uniquely by each factor might be even lower thanthe percentages shown in Table 2. Because oblique rotationallows factors to correlate with one another, part of the var-iance reported as being explained by one factor might beshared with the other factors included in the rotation. Thepossibility that the unique variance represented by the sin-gle factors might be lower than the already low percentagesin Table 2 further weakens C&T's claim that the SQ con-struct is unidimensional.

Finally, C&T's inference that SERVQUAL andSERVPERF can be treated as unidimensional on the basisof the high alpha values reported in Table 2 is erroneous be-cause it is at odds with the extensive discussion in the liter-ature about what unidimensionality means and what coeffi-cient alpha does and does not represent (Anderson and Ger-bing 1982; Gerbing and Anderson 1988; Green, Lissitz,and Mulaik 1977; Hattie 1985; Howell 1987). As Howell(1987, p. 121) points out, "Coefficient alpha, as an esti-mate of reliability, is neither necessary nor sufficient for uni-dimensionality." And, according to Gerbing and Anderson(1988, p. 190), "Coefficient alpha ... sometimes has beenmisinterpreted as an index of unidimensionality rather thanreliability.... Unidimensionality and reliability are distinctconcepts.''

In short, every argument that C&T make on the basis oftheir empirical findings to maintain that the SERVQUALitems form an unidimensional scale is questionable. There-fore, summing or averaging the scores across all items to cre-ate a single measure of service quality, as C&T have donein evaluating their structural models (Figure 2), is question-able as well.

Validity. Citing evidence from the correlations in theirTable 3, C&T conclude that SERVPERF has better validitythan SERVQUAL (p. 61): "We suggest that the proposedperformance-based measures provide a more construct-valid explication of service quality because of their contentvalidity ... and the evidence of their discriminant validity."This suggestion is unwarranted because, as we demonstratesubsequently, SERVQUAL performs just as well asSERVPERF on each validity criterion that C&T use.


C&T infer convergent validity for SERVPERF by stat-ing (p. 61), "A high correlation between the itemsSERVPERF, importance-weighted SERVPERF, and ser-vice quality indicates some degree of convergent validity."However, they fail to examine SERVQUAL in similar fash-ion. According to Table 3, the average pairwise correlationamong SERVPERF, importance-weighted SERVPERF,and overall service quality is .6892 (average of .9093,.6012, and .5572). The corresponding average correlationfor SERVQUAL is .6870 (average of .9787, .5430, and.5394). Clearly, the virtually identical average correlationsfor SERVPERF and SERVQUAL do not warrant the conclu-sion that the former has higher convergent validity than thelatter.

C&T claim discriminant validity for SERVPERF by stat-ing (p. 61), "An examination of the correlation matrix inTable 3 indicates discriminant validity ... as the three ser-vice quality scales all correlate more highly with each otherthan they do with other research variables (i.e., satisfactionand purchase intentions)." Though this statement is true, itis equally true for SERVQUAL, a finding that C&T fail toacknowledge. In fact, though the average pairwise correla-tion of SERVPERF with satisfaction and purchase inten-tions is .4812 (average of .5978 and .3647), the correspond-ing value for SERVQUAL is only .4569 (average of .5605and .3534). Because the average within-construct intercorre-lations are almost identical for the two scales, as demon-strated previously, these results actually imply somewhatstronger discriminant validity for SERVQUAL.

Regression analyses. C&T assess the predictive abilityof the various multiple-item service quality scales by con-ducting regression analyses wherein their single-item over-all SQ measure is the dependent variable. On the basis ofthe R^ values reported in Table 4, they correctly concludethat SERVPERF outperforms the other three scales. Never-theless, one should note that the average improvement in var-iance explained across the four contexts by usingSERVPERF instead of SERVQUAL is 6%. (The mean R^values, rounded to two decimal places, are .39 forSERVQUAL and .45 for SERVPERF.) Whether this differ-ence is large enough to claim superiority for SERVPERF isarguable. The dependent variable itself is a performance-based (rather than disconfirmation-based) measure and, assuch, is more similar to the SERVPERF than theSERVQUAL formulation. Therefore, at least some of the im-provement in explanatory power achieved by usingSERVPERF instead of SERVQUAL could be merely an ar-tifact of the "shared method variance" between the depend-ent and independent variables. As Bagozzi and Yi (1991, p.426) point out, "When the same method is used to measuredifferent constructs, shared method variance always inflatesthe observed between-measure correlation."

The overall pattern of significant regression coefficientsin C&T's Table 4 offers some insight about the dimension-ality of the 22 items as well. Specifically, it can be parti-tioned into the following five horizontal segments corre-sponding to the a priori classification of the items into theSERVQUAL dimensions (PZB 1988):

Segment 1Segment 2Segment 3Segment 4Segment 5

TangiblesReliabilityResponsivenessAssuranceEmpathy

Variables V1-V4Variables V5-V9Variables V10-V13Variables V14-V17Variables V18-V22

The total number of significant regression coefficients inthe various segments of Table 4 are as follows:

Segment 1Segment 2Segment 3Segment 4Segment 5


6329

1411

As this summary implies, the items under the five dimen-sions do not all contribute in like fashion to explaining thevariance in overall service quality. The reliability items arethe most critical drivers, and the tangibles items are theleast critical drivers. Interestingly, the relative importanceof the five dimensions implied by this summarythat is, re-liability being most important, tangibles being least impor-tant, and the other three being of intermediate importancealmost exactly mirrors our findings obtained through both di-rect (i.e., customer-expressed) and indirect (i.e., regression-based) measures of importance (PBZ 1991; PZB 1988;ZPB 1990). The consistency of these findings across stud-ies, researchers, and contexts offers strong, albeit indirect,support for the multidimensional nature of service quality.They also suggest that techniques other than, or in additionto, factor analyses can and should be used to understand ac-curately the dimensionality of a complex construct such asSQ. As we have argued in another article comparing severalSERVQUAL-replication studies that yielded differing fac-tor pattems (PBZ 1991, p. 442-3):

Respondents' rating a specific company similarly onSERVQUAL items pertaining to different dimensionsrather than the respondents' inability to distinguish concep-tually between the dimensions in generalis a plausibleexplanation for the diffused factor pattems. To illustrate,consider the following SERVQUAL items: Employees at XYZ give you prompt service (a responsive-

ness item) Employees of XYZ have the knowledge to answer your ques-

tions (an assurance item) XYZ has operating hours convenient to all its customers (an

empathy item)If customers happen to rate XYZ the same or even simi-larly on these items, it does not necessarily mean that cus-tomers consider the items to be part of the same dimen-sion. Yet, because of high inter-correlations among thethree sets of ratings, the items are likely to load on thesame factor when the ratings are factor analyzed. There-fore, whether an unclear factor pattern obtained through an-alyzing company-specific ratings necessarily implies poordiscriminant validity for the general SERVQUAL dimen-sions is debatable.

In discussing the rationale for their first proposition,which they later test by using regression analysis, C&Tstate (p. 59), "The evaluation [of] P, calls for an assess-ment of whether the addition of the importance weights sug-gested by ZPB (1990) improves the ability of the

114 / Journal of Marketing, January 1994

SERVQUAL and SERVPERF scales to measure servicequality." This statement is misleading because in ZPB(1990), we recommend the use of importance weightsmerely to compute a weighted average SERVQUAL score(across the five dimensions) as an indicator of a company'soverall SQ gap. Moreover, the importance weights that weuse in ZPB (1990) are weights for the dimensions (derivedfrom customer responses to a 100-point allocation ques-tion), not for the individual SERVQUAL items. Neither inZPB (1990) nor in our other work (PBZ 1991; PZB 1988)have we suggested using survey questions to measure the im-portance of individual items, let alone using weighted indi-vidual-item scores in regression analyses as C&T havedone. In fact, we would argue that using weighted itemscores as independent variables in regression analysis is notmeaningful b^ause a primaiy purpose of regression analy-sis is to derive the importance weights indirectly (in theform of beta coefficients) by using unweighted or "raw"scores as independent variables. Therefore, using weightedscores as independent variables is a form of "doublecounting."

Relationships among SQ, CS, and PI. Figure 2 inC&T's article shows the structural model they use to exam-ine the interrelationships among SQ, CS, and PL Their oper-ationalization and testing of this model suffer from severalserious problems. C&T's use of single-item scales to meas-ure SQ, CS, and PI fails to do justice to the richness ofthese constructs. As already discussed, SQ is a multifacetedconstruct, even though there is no clear consensus yet onthe number of dimensions and their interrelationships.Though a single-item overall SQ measure may be appropri-ate for examining the convergent validity and predictivepower of alternative SQ measures such as SERVQUAL, itis not as appropriate for testing models positing structural re-lationships between SQ and other constructs such as PI, es-pecially when a multiple-item scale is available. Therefore,in Figure 2, using SERVQUAL or SERVPERF directly tooperationalize overall SQ (i.e., 1)2) would have been far bet-ter than using a single-item SQ measure as C&T have done.In other words, because C&T's Pj, P3, and P4 relate to justthe bottom part of their Figure 2, the procedure they used tooperationalize 5, should have been used to operationalize1)2 instead.

C&T's failure to use multiple-item scales to measureCS and PI, particularly in view of the centrality of these con-stucts to their discussion and claims, is also questionable.Multiple-item CS scales (e.g., Westbrook and Oliver 1981)are available in our literature and have been used in severalcustomer-satisfaction studies (e.g., Oliver and Swan 1989;Swan and Oliver 1989). Multiple-item scales also havebeen used to operationalize purchase or behavioral inten-tions (e.g., Bagozzi 1982; Dodds, Monroe, and Grewal1991).

Another serious consequence of C&T's use of single-item measures as indicators for their model's four latent con-structs is the lack of degrees of freedom for a robust test oftheir model. The four measures yield only six intermeasurecorrelations (i.e., 4X3 /2 = 6) to estimate the model's para-meters. Because five parameters are being estimated as per

Figure 2, only one degree of freedom is available to test themodel, a fact that C&T do not report in their LISREL re-sults summarized in Table 5 or their discussion of the re-sults. Insufficient degrees of freedom to estimate the parame-ters of a structural model will tend to inflate the fit of thedata to the model. For example, a model with zero degreesof freedom (i.e., the number of intermeasure correlationsequal to the number of model parameters) will fit the ob-served data perfectly.

Therefore, the meaningfulness of C&T's observation(p. 63) that "Model 1 (SERVQUAL) had a good fit in twoof the four industries ... whereas Model 2 (SERVPERF)had an excellent fit in all four industries," should be as-sessed carefully, keeping in mind two important points: (1)The fit values for both Model 1 and Model 2 could be in-flated due to insufficient degrees of freedom and (2) Thesomewhat better fit values for Model 2 coidd be an artifactof the "shared method variance" between the SERVPERFand the "Overall Service Quality" measures in the model.Because of the questionable meaningfulness of the struc-tural model tests, C&T's interpreting the test results (p. 63)"as additional support for the superiority of theSERVPERF approach" is debatable as well.

A comparison of the results in C&T's Tables 3 and 5 re-veals several serious inconsistencies and interpretationalproblems that reiterate the inadequacies of their measuresand structural model test. For example, the correlation be-tween the single-item measures of SQ and CS is .8175(Table 3), implying that the measures are highly coliinearand exhibit little or no discriminant validity. Yet C&T seemto ignore this problem and rely solely on the Table 5 results(which themselves are questionable because of the prob-lems described previously) to make strong claims pertain-ing to the direction of causality between the two constructs(i.e., SQ leads to CS and not vice versa) and the relative in-fiuence of the two constructs on PI (i.e., CS has a signifi-cant effect in all four contexts and SQ has no significant ef-fect in any context).

Regarding the latter claim, it is instructive to note fromTable 3 that CS and SQ have virtually identical correlationswith PI (.5272 for SQ and .5334 for CS). ITie most logicalinference on the basis of this fact is that SQ and CS have asimiliar effect on PI, particularly because SQ and CS them-selves are highly correlated. Such an inference is muchmore plausible and justified than C&T's conclusion thatonly CS has a significant effect on PL That the findingsfrom C&T's structural-model test led them to this question-able inference is perhaps a manifestation of the problemswith their measures (single-item measure, muIticoUinearity)and their model (questionable causal path, lack of degreesof freedom).

Practicai IssuesArguing in favor of performance-based measures of SQ,C&T state (p. 58), "Practitioners often measure the determi-nants of overall satisfaction/perceived quality by having cus-tomers simply assess the performance of the company's busi-ness processes." Though the practice of measuring only per-ceptions is widespread, such a practice does not necessarily


Table 1Mean Perceptions-Only and SERVQUAL Scores^

ServiceQualityDimensions


P

5.65.15.15.45.2

Tel.Co.

SQ.3

-1.3-1.2-1.1-1.1

P

5.34.85.15.45.1

Ins. Co.1

SQ0

-1.6-1.3-1.0-1.1

P

5.65.45.65.85.6

Ins. Co.2

SQ.6

-1.0-.8-.7-.8

P

5.45.14.85.24.7

Bank1

SQ.3

-1.4-1.6-1.4-1.6

P

5.85.45.45.75.2

Bank2

SQ.7

-1.1-.9-.8

-1.0numbers under "P" and "SQ" represent mean perceptions-only and SERVQUAL scores respectively.

mean performance-based measures are superior to disconfir-mation-based measures. As demonstrated in PBZ (1990),SQ measurements that incorporate customer expectationsprovide richer information than those that focus on percep-tions only. Moreover, executives in companies that haveswitched to a disconfirmation-based measurement approachtell us that the information generated by this approach hasgreater diagnostic value.

In a recent study we administered the SERVQUALscale to independent samples of customers of five nation-ally known companies (details of the study are in PBZ1991). Table 1 summarizes perceptions-only andSERVQUAL scores by dimension for each of the five com-panies. As Table 1 shows, the SERVQUAL scores for eachcompany consistently exhibit greater variation across dimen-sions than the perceptions-only scores. This pattern of find-ings suggests that the SERVQUAL scores could be supe-rior in terms of pinpointing areas of deficiency within a com-pany. Moreover, examining only performance ratings canlead to different actions than examining those ratings rela-tive to customer expectations. For example. Ins. Co. 1might focus more attention on tangibles than on assuranceif it relied solely on the perceptions-only scores. Thiswould be a mistake, because the company's SERVQUALscores show a serious shortfall for assurance and no short-fall for tangibles.

An important question to ask in assessing the practicalvalue of SERVQUAL vis-k-vis SERVPERE is. Are manag-ers who use SQ measurements more interested in accu-rately identifying service shortfalls or explaining variancein an overall measure of perceived service? (Explained var-iance is the only criterion on which SERVPERF performsbetter than SERVQUAL, and, as discussed previously, thiscould be due to shared method variance.) We believe thatmanagers would be more interested in an accurate diagno-sis of SQ problems. From a practical standpoint,SERVQUAL is preferable to SERVPERF in our judgment.The superior diagnostic value of SERVQUAL more than off-sets the loss in predictive power.

Response to Teas (1993)The issues raised by Teas (1993) fall under three main top-ics: (1) interpretation of the expectations standard, (2) oper-ationalization of this standard, and (3) evaluation of alterna-tive models specifying the SQ construct.

interpretation of ExpectationsA key issue addressed by Teas is the impact of the interpre-tation of the expectations measure (E) on the meaningful-ness of the P-E specification invoked by the SERVQUALframework. Specifically, on the basis of a series of concep-tual and mathematical arguments. Teas concludes that in-creasing P-E scores may not necessarily reflect continu-ously increasing levels of perceived quality, as theSERVQUAL framework implies. Though his conclusionhas merit, several assumptions underlying his argumentsmust be reexamined to assess accurately the severity of theproblem.

The P-E specification is problematic only for certaintypes of attributes under certain conditions. As Teas's discus-sion suggests, this specification is meaningful if the servicefeature being assessed is a vector attributethat is, one onwhich a customer's ideal point is at an infinite level. Withvector attributes, higher performance is always better (e.g.,checkout speed at a grocery store). Thus, as portrayedunder Case A in Figure 1, for vector attributes there is a pos-itive monotonic relationship between P and SQ for a givenlevel of the expectation norm E, regardless of how E is in-terpreted. This relationship is consistent with theSERVQUAL formulation of SQ. Moreover, customers arelikely to consider most of the 22 items in the SERVQUALinstrument to be vector attributesa point we discuss sub-sequently and one that Teas acknowledges in footnote 16 ofhis article.

The P-E specification could be problematic when a ser-vice attribute is a classic ideal point attributethat is, oneon which a customer's ideal point is at a finite level and,therefore, performance beyond which will displease the cus-tomer (e.g., friendliness of a salesperson in a retail store).However, the severity of the potential problem depends onhow the expectations norm E is interpreted. Teas offers twointerpretations of E that are helpful in assessing the meaning-fulness of the P-E specification: a "classic attitudinalmodel ideal point" interpretation and a "feasible idealpoint" interpretation. If E is interpreted as the classic idealpoint (i.e., E = I ) and the performance level P exceeds E,the relationship between P and SQ for a given level of E be-comes negative monotonic, in contrast to the positive mon-otonic relationship implied by the SERVQUAL formula-tion. As Case B in Figure 1 shows, under the classic idealpoint interpretation of E the P-E specification is meaning-ful as long as P is less than or equal to E, but becomes a


FIGURE 1Functional Relationship Between PerceivedPerformance (Pj) and Service Quality (SQ)

Case A: Attribute j is a vector attribute and. therefore, thegeneral form of the function below holds regardlessof how the expectations standard (Ej) is defined.

SQ 0 PJ

Case B: Attribute j is a classic ideal-point attrit>ute and Ej isinterpreted as the classic ideal point (i.e., Ej = Ij)

SQ 0 PJ

CaseC: Attribute j is a classic Ideal-point attribute and Ej isinterpreted as the feasible ideal point (i.e., Ej < Ij)

SQ 0

problem if P exceeds E. When this condition occurs, the cor-rect specification for SQ is -(P-E).

To examine the P-E specification's meaningfulnessunder the feasible ideal point interpretation. Teas proposesa "modified one-attribute SERVQUAL model" (MQj), rep-resented by equation 3 in his article. Dropping the subscripti from this equation for simplicity (doing so does not mate-rially affect the expression) yields the following equation:(1) MQ = -1 [ |P - I | - |E - I | ]where

MQ = "Modified" qualityI =The ideal amount of the attribute (i.e.. the classic

attitudinal model ideal point)E = The expectation norm interpreted as the feasible

ideal pointP = The perceived performance level

Teas defines the feasible ideal point (p. 20) as ' 'a feasiblelevel of performance under ideal circumstances, that is, thebest level of performance by the highest quality providerunder perfect circumstances." If one examines this defini-tion from a customer's perspectivean appropriate onegiven that E and I are specified by customersit is incon-ceivable that the feasible ideal point would exceed the clas-sic ideal point. Because the feasible ideal point representsthe level of service the customer considers possible for thebest service company, exceeding that level still should beviewed favorably by the customer. In contrast, the classicideal point is the service level beyond which the customer

would experience disutility. Therefore, the feasible idealpoint logically cannot exceed the classic ideal point. Inother words, when customers specify the levels of E and I,the range of feasible ideal points will be restricted as fol-lows: E < I. Yet, implied in Tbas's discussion of MQ is theassumption that E can exceed I (Teas makes this assump-tion explicit in footnote 7). Under the more realistic assump-tion that E < I, the absolute value of the E - I difference(the second component in equation l's right side) is equiv-alent to -(E - I). Making this substitution in equation 1yields

MQ =- l [ |P - l |

Furthermore, when P < I, |P - I| is equivalent to -(P - I),and the equation for MQ simplifies to

MQ = -1 HP - I)] - (E -1)= (P - I) - (E - I)= P - E

Therefore, as long as the perceived performance (P)does not exceed the classic ideal point (I), MQ is equivalentto SERVQUAL's P-E specification, regardless of whetherP is below or above the feasible ideal point (E). As such,Teas's conclusion (p. 21) that "the SERVQUAL P-E meas-urement specification is not compatible with the feasibleideal point interpretation of E when finite classic ideal pointattributes are involved" is debatable. As Case C in Figure1 shows, when P < I, the relationship between P and SQ ispositive monotonic and identical to the relationship underthe vector-attribute assumption (Case A).

The P-E specification does become a problem when P> I under the classic ideal point attribute assumption, withE being interpreted as the feasible ideal point. As shown byCase C in Figure 1, perceived service quality (SQ) peakswhen P = I and starts declining thereafter. An important fea-ture of Case C not obvious from Teas's discussion is thatthe classic ideal point (I) in effect supersedes the feasibleideal point (E) as the comparison standard when P exceedsI. In other words, when P > I, the shape of the SQ functionis the same in Cases B and C except that in Case C the de-cline in SQ starts from a positive base level (of magnitudeI - E), whereas in Case B it starts from a base level of zero.Therefore, when P > I under the feasible ideal point interpre-tation of E, the expression for service quality is

SQ = (I - E) - (P -1)

Figure 2 summarizes in flowchart form how the type ofattribute and interpretation of E infiuence the specificationof SQ. The P-E specification is appropriate under three ofthe five final branches in the fiowchart. Therefore, the inap-propriateness of the P-E specification depends on the like-lihood of the occurrence of the remaining two branches.This is perhaps an empirical question that also has implica-tions for further research on the nature of the service attrib-utes, as we discuss subsequently.

Reassessment of Exp:tations as a Comparison Standard /117

FIGURE 2Impact of Attribute Type and Expectation Standard

on Expression for Perceived Service Quality

Is attribute j a vector attributeor a classic ideal point attribute?

J:Vector attribute Ciassic ideal point attribute

Is E interpreted as a classic idealpoint or a feasible ideal point?

Classic ideal point(E = i)

1Feasible ideai point

( l )

Is P < E? Is P < I?

P < E P > E p . I p > i

SQ=P-E SQ=P-E SQ=P-E

Opemtlonallzatlon of ExpectationsTeas raises several concerns about the operationalization ofE, including a concern we ourselves have raised (PBZ1990) about die use of the word "should" in our original ex-pectations measure (PZB 1988). Because of this concern,we developed a revised expectations measure (E*) represent-ing the extent to which customers believe a particular attrib-ute is "essential" for an excellent service company. Teaswonders if E* is really an improvement over E, our originalmeasure (p. 21-22):

Defining the revised SERVQUAL E* in this way, in con-junction with the P-E measurement specification,... sug-gests high performance scores on essential attributes (highE* scores) reflect lower quality than high performance onattributes that are less essential (low E* scores). It is diffi-cult to envision a theoretical argument that supports thismeasurement specification.

Tfeas's argument is somewhat misleading because high per-formance on an "essential" attribute may not be highenough (from the customer's standpoint) and, therefore, log-ically can reflect lower quality on that attribute (a keyphrase missing fiom Tfeas's argument) than equally high per-formance on a less essential attribute. In fact, as we have ar-gued in our response to C&T, this possibility is an impor-tant reason why measuring only perceived service perfor-mance can lead to inaccurate assessment of perceived ser-vice quality.

The operationalization issue addressed and discussedmost extensively by Teas is the congruence (or lack

thereof) between the conceptual and operational definitionsof the expectation measures. The systematic, comprehen-sive, and unique approach Teas has used to explore thisissue is commendable. His approach involves obtaining rat-ings on three different comparison norms (E, E*, and I), con-ducting follow-up interviews of respondents providing non-extreme ratings (i.e., less than 7 on the 7-point scales used),using multiple coders to classify the open-ended responses,and computing indexes of congruence on the basis of rela-tive frequencies of responses in the coded categories.

Though the results from Teas's content analysis of theopen-ended responses (summarized in his Appendix B andTables 4 and 5) are sound and insightful, his interpretationof them is open to question. Specifically,' 'under the assump-tion that the feasibility concept is congruent with the excel-lence norm concept'' (p. 25), he concludes that the index ofcongruence is only 39.4% for E and 35.0% for E*, thoughhe later raises these indexes to 52.4% and 49.1%, respec-tively, by including several more response categories. Onthe basis of these values. Teas further concludes (p. 25) that' 'a considerable portion of the variance in the SERVQUALE and E* measures is the result of measurement error in-duced by respondents misinterpreting the scales." As dis-cussed subsequently, both conclusions are problematic.

The purpose of the follow-up interviews in Teas's studywas to explore why respondents had lower than the maxi-mum expectation levels^as Tfeas acknowledges (p. 24), ' 're-spondents answered qualitative follow-up questions concern-ing their reasons for non-extreme (non-7) responses." Theissue of how respondents interpreted the expectations ques-tions was not the focus of the follow-up interviews. There-fore, inferring incongruence between intended and inter-preted meanings of expectations (i.e., measurement error)on the basis of the open-ended responses is questionable.

Pertinent to the present discussion is a study that we con-ducted to understand the nature and detenninants of custom-ers' service expectations (ZBP 1993, a revised version ofthe monograph ZBP 1991). In this study, we developed aconceptual model explicating the general antecedents of ex-pectations and how they are likely to influence expectationlevels. In the first column of Table 2, we list five general an-tecedents from our study that are especially likely to lowerexpectations. Shown in the second column are illustrativeopen-ended responses from Teas's study that relate to thegeneral antecedents. The correspondence between the twocolumns of Table 2 is additional evidence that the items inAppendix B of Teas's article refiect reasons for expectationlevels rather than interpretations of the expectationmeasures.

A key assumption underlying Teas's indexes of congru-ence is that the only open-ended responses congruent withthe excellence norm standard are the ones included underthe "not feasible," and perhaps "sufficient," categories.The rationale for this critical assumption is unclear. There-fore, what the index of congruence really represents is un-clear as well. Given that the open-ended responses are rea-sons for customers' nonextreme expectations, no responselisted in Appendix B of Teas's article is necessarily incom-


TABLE 2Correspondence Between Antecedent Constructs

in ZBP's Expectations Model and Foliow-upResponses Qbtained by Teas

Illustrative Follow-upResponses from Ibas(1993) Reflecting Each

ConstructAntecedent Constructs

from ZBP (1993)Personal Needs: States orconditions essential to thephysical or psychologicalwelt-being of the customer.Perceived Service Alterna-tives: Customers' percep-tions of the degree to whichthey can obtain better ser-vice through providers otherthan the focal company.Self-Perceived ServiceRole: Customers' percep-tions of the degree to whichthey themselves influencethe level of service they re-ceive.Situational Factors: Ser-vice-performance contingen-cies that customers perceiveare beyond the control of theservice provider.Past Experience: Custom-ers' previous exposure tothe service that is relevant tothe focal service.

Not necessary to knowexact need.

All of them have pretty muchgood hours.

I've gone into the store notknowing exactly myself whatI want.

Can't wait on everybody atonce.

You don't get perfection inthese kind of stores.

patible with the "excellence norm"that is, customers' per-ceptions of attribute levels essential for service excellence.

Interestingly, the coded responses from Teas's studyoffer rich insight into another important facet of measuringSQ, namely, the meaningfulness of the P-E specification.As already discussed, whether the P-E specification is appro-priate critically hinges on whether a service feature is a vec-tor attribute or a classic ideal point attribute. The P-E spec-ification is appropriate if the feature is a vector attribute orit is a classic ideal point attribute and perceived perfor-mance is less than or equal to the ideal level (see Figure 2).Customers' reasons for less-ttian-maximum expectations rat-ings in Teas's study shed light on whether a service featureis a vector or classic ideal point attribute. Specifically, rea-sons suggesting that customers dislike or derive negativeutility from performance levels exceeding their expressed ex-pectation levels imply classic ideal point attributes.

Of the open-ended responses in Appendix B of Tfeas's ar-ticle, only those classified as "Ideal" (including double-classifications involving "Ideal") clearly imply that the ser-vice features in question are classic ideal point attributes.None of the other open-ended responses implies that perfor-mance levels exceeding the expressed expectation levelswill displease customers. Therefore, service features associ-ated with all open-ended responses not classified as"Ideal" are likely to possess vector-attribute properties.

The response-category frequencies in Teas's Tables 4and 5 can be used to compute a different type of "index ofcongruence'' that refiects the extent to which reasons fornonextreme expectations ratings are compatible with a vec-tor-attribute assumption. This index is given by

100 - E Percentages of responses in the "Ideal"categories

From Table 4, this index is 97.6% for E and 96.6% for E*,implying strong support for the vector-attribute assumption.From Table 5, which shows the distribution of respondentsrather than responses, this index is 91.6% for E and 90.8%for E*. These are conservative estimates because Table 5 al-lows for multiple responses and, therefore, some respon-dents could be included in more than one category percent-age in the summation term of the previous expression,thereby inflating the term. Therefore, for a vast majority ofthe respondents, the vector-attribute assumption is tenablefor all service attributes included in Tbas's study.

Evaluation of Alternative Serv/ce Quality ModelsUsing the empirical results from his study. Teas examinesthe criterion and construct validity of eight different SQmodels: unweighted and weighted versions of theSERVQUAL (P-E), revised SERVQUAL (P-E*), normedquality (NQ), and evaluated performance (EP) formula-tions. On the basis of correlation coefficients refiecting cri-terion and construct validity, he correctly concludes that hisEP formulation outperforms the other formulations. How-ever, apart from the statistical significance of the correla-tion differences reported in his Tables 7 and 8, several ad-ditional issues are relevant for assessing the superiority ofthe EP formulation.

In introducing these issues we limit our discussion tothe unweighted versions of just the P-E, P-E*, and EP for-mulations for simplicity and also because (1) for all formu-lations, the unweighted version outperformed its weightedcounterpart on both criterion and construct validity and (2)the NQ formulation, in addition to having the lowest crite-rion and construct validity among the four formulations, isnot one that we have proposed. From Tfeas's equation 6 andhis discussion of it, the unweighted EP formulation for anyservice is as follows (the service subscript " i " is sup-pressed for notationa! simplicity):

where " m " is the number of service attributes. This expres-sion assumes that the attributes are classic ideal point attrib-utes as evidenced by the fact that the theoretical maximumvalue for EP is zero, which occurs when Pj = L for all attrib-utes j . As discussed in the preceding section, the evidencefrom Tfeas's follow-up responses strongly suggests that theservice attributes included in the study are much morelikely to be vector attributes than classic ideal point attrib-utes. Therefore, the conceptual soundness of the EP formu-lation is open to question.

Tfeas's findings pertaining to nonextreme ratings on hisIj scale raise additional questions about the soundness of


the classic ideal point attribute assumption invoked by theEP formulation. As his Table 4 reveals, only 151 non-7 rat-ings (13% of the total 1200 ratings) were obtained for theideal point (I) scale, prompting him to observe (p. 26), "Itis interesting to note, however, that the tendency for nonex-treme responses was much smaller for the ideal point meas-ure than it was for the expectations measures." Ironically,this observation, though accurate, appears to go against theclassic ideal point attribute assumption because the fact thatonly 13% of the ratings were less than 7 seems incompati-ble with that assumption. If the service features in questionwere indeed classic ideal point attributes, one would expectthe ideal-point levels to be dominated by nonextremeratings.

Even in instances in which nonextreme ratings were ob-tained for the ideal point (I) measure, a majority of the rea-sons for those ratings (as reflected by the open-ended re-sponses to the follow-up questions) do not imply the pres-ence of finite ideal-point levels for the attributes investi-gated. From Teas's Table 4, only 40.5% of the reasons areclassified under "Ideal" (including double classificationswith "Ideal" as one of the two categories). Teas does ac-knowledge the presence of this problem, though he labels itas a lack of discriminant validity for the ideal-pointmeasure.

Finally, it is worth noting that a semantic-differentialscale was used for the " P " and " I " measures in the EP for-mulation, whereas a 7-point "strongly disagree"-"stronglyagree" scale was used for the three measures (P, E, and E*)in the SERVQUAL and revised SERVQUAL formulations.Therefore, the observed criterion and construct validity ad-vantages of the EP formulation could be an artifact of thescale-format differences, especially because the correlationdifferences, though statistically significant, are modest inmagnitude: For the unweighted formulations, the mean cri-terion-validity correlation difference between EP and thetwo SERVQUAL formulations is .077 ([.081 + .073]/2);the coiresponding mean construct-validity correlation differ-ence is .132 ([.149 -f- .114]/2). However, because these corre-lation differences are consistently in favor of EP, the seman-tic-differential scale format could be worth exploring in thefuture for measuring the SERVQUAL perceptions and ex-pectations components.

Implications and Directions forFurther Research

C&T and Teas have raised important, but essentially differ-ent, concerns about the SERVQUAL approach for measur-ing SQ. C&T's primary concerns are that SERVQUAL'sexpectations component is unnecessary and the instru-ment's dimensionality is problematic. In contrast. Teas ques-tions the meaningfulness of SERVQUAL's P-E specifica-tion and wonders whether an alternate specificationthe"evaluated performance" (EP) specificationis superior.We have attempted to demonstrate that the validity and al-leged severity of many of the issues raised are debatable.Nevertheless, several key unresolved issues emerging from

this exchange offer a challenging agenda for further re-search on measuring SQ.Measurement of ExpectationsOf the psychometric concerns raised by C&T aboutSERVQUAL, the only one that has consistent support fromtheir empirical findings is the somewhat lower predictivepower of the P-E measure relative to the P-only measure.Our own findings (PBZ 1991) as well as those of other re-searchers (e.g., Babakus and Boiler 1992) also support thesuperiority of the perceptions-only measure from a purelypredictive-validity standpoint. However, as argued previ-ously, the superior predictive power of the P-only measuremust be balanced against its inferior diagnostic value. There-fore, formally assessing the practical usefulness of measur-ing expectations and the trade-offs involved in not doing sois a fruitful avenue for additional research. More generally,a need and an opportunity exist for explicitly incorporatingpractical criteria (e.g., potential diagnostic value) into the tra-ditional scale-assessment paradigm that is dominated by psy-chometric criteria. Perreault (1992, p. 371), in a recent com-mentary on the status of research in our discipline, issues asimilar call for a broadened perspective in assessingmeasures:

Current academic thinking tends to define measures as ac-ceptable or not primarily based on properties of the meas-ure. However, research that focuses on defining what is"acceptable" in terms of a measure's likely impact on in-terpretation of substantive relationships, or the conclu-sions that might be drawn from those relationships, mightprovide more practical guidelinesnot only for practitio-ners but also for academics.... There is probably a rea-sonable 'middle ground' between the 'all or nothing' con-trast that seems to have evolved.The most appropriate way to incorporate expectations

into SQ measurement is another area for additional re-search. Though the SERVQUAL formulation as well as thevarious SQ models evaluated by Teas use a difference-score formulation, psychometric concerns have been raisedabout the use of difference scores in multivariate analyses(for a recent review of these concerns, see Peter, Churchill,and Brown 1993). Therefore, there is a need for studies com-paring the paired-item, difference-score formulation ofSERVQUAL with direct formulations that use single itemsto measure customers' perceptions relative to theirexpectations.

Brown, Churchill, and Peter (1993) conducted such acomparative study and concluded that a nondifference scoremeasure had better psychometric properties thanSERVQUAL. However, in a comment on their study, wehave raised questions about their interpretations and demon-strated that the claimed psychometric superiority of the non-difference score measure is questionable (PBZ 1993). Like-wise, as our response to C&T's empirical study shows, thepsychometric properties of SERVQUAL's difference scoreformulation are by and large just as strong as those of its P-only component. Because the cumulative empirical evi-dence about difference score and nondifference score meas-ures has not established conclusively the superiority of onemeasure over the other, additional research in this area is

120 / Joumal of Marketing, January 1994

warranted. Such research also should explore altematives tothe Likert-scale fomiat ("strongly agree"-"strongly disa-gree") used most frequently in past SERVQUAL studies.The semantic-differential scale format used by Tfeas to oper-adonalize his EP formulation is an especially attractive alter-nativeas we pointed out previously, this scale format is aplausible explanation for the consistently superior perfor-mance of the EP formulation.

Dimensionality of Service QuaiityAs already discussed, C&T's conclusion that the 22SERVQUAL items form a unidimensional scale is unwar-ranted. However, replication studies have shown significantcorrelations among the five dimensions originally derivedfor SERVQUAL. Therefore, exploring why and how thefive service quality dimensions are interrelated (e.g., couldsome of the dimensions be antecedents of the others?) is afertile area for additional research. Pursuing such a researchavenue would be more appropriate for advancing our under-standing of SQ than hastily discarding the multidimen-sional nature of the construct.

The potential drawback pointed out previously of usingfactor analysis as the sole approach for assessing the dimen-sionality of service quality implies a need for developingother approaches. As we have suggested elsewhere (PBZ1991), an intriguing approach is to give customers defini-tions of the five dimensions and ask them to sort theSERVQUAL items into the liimensions only on the basis ofeach item's content. The proportions of customers "cor-rectly" sorting the items into the five dimensions would re-flect the degree to which the dimensions are distinct. The pat-tem of correct and incorrect classifications also could revealpotentially confusing items and the consequent need to re-word the items and/or dimensional definitions.

Relationsiiip Between Service Quaiity andCustomer SatisfactionThe direction of causality between SQ and CS is an impor-tant unresolved issue that C&T's article addresses empiri-cally and Teas's article addresses conceptually. ThoughC&T conclude that SQ leads to CS, and not vice versa, theempirical basis for their conclusion is questionable becauseof the previously discussed problems with their measuresand analysis. Nevertheless, there is a lack of consensus inthe literature and among resejuthers about the causal link be-tween the two constmcts. Specifically, the view held bymany service quality researchers that CS leads to SQ is atodds with the causal direction implied in models specifiedby CS researchers. As Tfeas's discussion suggests, these con-flicting perspectives could be due to the global or overall at-titude focus in most SQ research in contrast to the transac-tion-specific focus in most CS research. Adding to the com-plexity of this issue, practitioners and the popular pressoften use the terms service quality and customer satisfac-tion interchangeably. An integrative framework that reflectsand reconciles these differing perspectives is sorely neededto divert our attention from merely debating the causal direc-tion between SQ and CS to enhancing our knowledge ofhow they interrelate.

FIGURE 3Components of Transaction-Specific Evaiuations

Evaluation ofService Quaiity (SQ)

Evaiuation ofProduct Quaiity (SO)

Evaiuation ofPrice (P)

TransactionSatisfaction (TSAT)

Teas offers a thoughtful suggestion that could be thefoundation for such a framework (p. 30):

One way to integrate these two causal perspectives is tospecify two perceived quality conceptstransaction-specific quality and relationship qualityand to specifyperceived transaction-specific quality as the transaction-specific performance component of contemporary con-sumer satisfaction models. This implies that transaction-specific satisfaction is a function of perceived transaction-specific performance quality. Furthermore,... transaction-specific satisfaction could be argued to be a predictor ofperceived long-term relationship quality.

A key notion embedded in Teas's suggestion is that bothSQ and CS can be examined meaningfully from both trans-action-specific as well as global perspectives, a viewpointembraced by other researchers as well (e.g., Dabholkar1993). Building on this notion and incorporating two otherpotential antecedents of CS^product quality and pricewe propose (1) a transaction-specific conceptualization ofthe constructs' interrelationships and (2) a global frame-work reflecting an aggregation of customers' evaluations ofmultiple transactions.

Figure 3 portrays the proposed transaction-specific con-ceptual model. This model posits a customer's overall satis-faction with a transaction to be a function of his or her as-sessment of service quality, product quality, and price.^This conceptualization is consistent with the "quality leadsto satisfaction" school of thought that satisfaction research-ers often espouse (e.g., Reidenbach and Sandifer-Small-wood 1990; Woodside, Frey, and Daly 1989). The two sep-arate quality-evaluation antecedents capture the fact that vir-tually all market offerings possess a mix of service and prod-uct features and fall along a continuum anchored by "tangi-

^Though the term "product" theoretically can refer to a good (i.e., tan-gible product) or a service (i.e., intangible product), here we use it only inthe former sense. Thus, "product quality" in Figure 3 and elsewhere refersto tangible-product or good quality.


FiGURE 4Components of Global Evaluations

Global ImpresJonsabout Firm

'SatisfactionService QualityProduct QualityPrice

ble dominant" at one end and "intangible dominant" at theother (Shostack 1977). Therefore, in assessing satisfactionwith a personal computer purchase or an airline flight, for ex-ample, customers are likely to consider service features(e.g., empathy and knowledge of the computer salesperson,courtesy and responsiveness of the flight crew) and productfeatures (e.g., computer memory and speed, seat comfortand meals during the flight) as well as price. Understandingthe roles of service quality, product quality, and price eval-uations in determining transaction-specific satisfaction is animportant area for further research. For example, do custom-ers use a compensatory process in combining the threetypes of evaluations? Do individual factors (e.g., past expe-rience) and contextual factors (e.g., purchase/consumptionoccasion) influence the relative importance of the differentevaluations and how they are combined?

In Figure 4, we present our proposed global framework.It depicts customers' global impressions about a firm stem-ming from an aggregation of transaction experiences. Weposit global impressions to be multifaceted, consisting ofcustomers' overall satisfaction with the firm as well as theiroverall perceptions of the firm's service quality, productquality, and price. This framework, in addition to capturingthe notion that the SQ and CS constructs can be examinedat both transaction-specific and global levels, is consistentwith the "satisfaction (with specific transactions) leads tooverall quality perceptions'' school of thought embraced byservice quality researchers (e.g., Bitner 1990; Bolton andDrew 1991a, b; Carman 1990). The term "transaction" inthis framework can be used to represent an entire service ep-isode (e.g., a visit to a fitness center or barber shop) or dis-crete components of a lengthy interaction between a cus-

tomer and firm (e.g., the multiple encounters that a hotelguest could have with hotel staff, facilities, and services).

The SERVQUAL instrument, in its present form, is in-tended to ascertain customers' global perceptions of afirm's service quality. In light of the proposed framework,modifying SERVQUAL to assess transaction-specific ser-vice quality is a useful direction for further research. Theproposed framework also raises other issues worth explor-ing. For example, how do customers integrate transaction-specific evaluations in forming overall impressions? Aresome transactions weighed more heavily than others be-cause of "primacy" and "recency" type effects or satisfac-tion/dissatisfaction levels exceeding critical thresholds? Dotransaction-specific service quality evaluations have any di-rect influence on global service quality perceptions, in addi-tion to the posited indirect influence (mediated through trans-action-specific satisfaction)? How do the four facets ofglobal impressions relate to one another? For example, arethere any halo effects? Research addressing these and re-lated issues is likely to add significantly to our knowledgein this area.

Specification of Service QualityA primary contribution of Teas's article is that it highlightsthe potential problem with the P-E specification of SQwhen the features on which a service is evaluated are notvector attributes. Though we believe that the collective evi-dence from Teas's study strongly supports the vector-attribute assumption, the concern he raises has implicationsfor correctly specifying SQ and for further research on ser-vice attributes. Teas's EP specification, in spite of its appar-ent empirical superiority over the other models he tested, suf-fers from several deficiencies, including its being based onthe questionable assumption that all service features are clas-sic ideal point attributes. A mixed-model specification thatassumes some features to be vector attributes and others tobe classic ideal point attributes would be conceptually moreappropriate (cf. Green and Srinivasan 1978). The flowchartin Figure 2 implies the following mixed-model specifica-tion that takes into account service-attribute type as well asinterpretation of the comparison standard E:

SQ = twhere

^ +

= 1 if j is a vector attribute, or if it is a classic ideal pointattribute and ? < l-\

= 0 otherwise.= 1 if j is a classic ideal point attribute and Ej is inter-

preted as the classic ideal point (i.e., Ej = ip and Pj > IJ;= 0 otherwise.= 1 if j is a classic ideal point attribute and Ej is inter-

preted as the feasible ideal point (i.e.,V

= 0 otherwise.

jj < I-) and Pj

Operationalizing this specification has implications fordata collection and analysis in SQ research. Specifically, inaddition to obtaining data on customers' perceptions (Ppand comparison standards (L and/or E-), one should ascer-


and comparison standards (I- and/or E-), one should ascer-tain whether customers view each service feattire as a vec-tor or classic ideal point attribute. Including a separate bat-tery of questions is one option for obtaining the additionalinformation, though this option will lengthen the SQ sur-vey. An altemate approach is to conduct a separate study(e.g., focus group interviews in which customers discusswhether "more is always better" on each service feature)to preclassify the features into the two attribute categories.

The expression for SQ is implicitly an individual-levelspecification because a given attribute j can be viewed dif-ferently by different customers. In other words, for thesame attribute j , the values of D^, Djj, and Djj can varyacross customers. This possibility could pose a challenge inconducting aggregate-level analyses across customers simi-lar to that faced by researchers using conjoint analysis(Green and Srinivasan 1990). An interesting direction for fur-ther research is to investigate whether distinct customer seg-ments with homogeneous views about the nature of service

attributes exist. Identification of such segments, in additionto offering managerial insights for market segmentation,will reduce the aforementioned analytical challenge in thatthe expression for SQ can be treated as a segment level,rather than an individual-level, specification.

ConclusionThe articles by C&T and Tfeas raise important issues aboutthe specification and measurement of SQ. In this article, weattempt to reexamine and clarify the key issues raised.Though the current approach for assessing SQ can andshould be refined, abandoning it altogether in favor of the al-ternate approaches proffered by C&T and Teas does notseem warranted. The collective conceptual and empirical ev-idence casts doubt on the alleged severity of the concemsabout the current approach and on the claimed superiorityof the altemate approaches. Critical issues remain unre-solved and we hope that the research agenda we have artic-ulated will be helpful in advancing our knowledge.

REFERENCESAnderson, James C. and David W. Gerbing (1982), "Some Meth-

ods for Respecifying Measurement Models to Obtain Unidimen-sional Construct Measurement," Joumal of Marketing Re-search, 19 (November), 453-60.

Babakus, Emin and Gregory W. Boiler (1992), "An Empirical As-sessment of the SERVQUAL Scale," Joumal of Business Re-search, 24, 253-^8.

Bagozzi, Richard P. (1982), "A Field Investigation of Causal Re-lations Among Cognitions, Affect, Intentions, and Behavior,"Joumal of Marketing Research, 19 (November), 562-84.

and Youjae Yi (1988), "On the Evaluation of StructuralEquation Models," Joumal of the Academy of Marketing Sci-ence, 16 (Spring), 74-79.

and (1991), "Multitrait-Multimethod Matri-ces in Consumer Research, " Journal of Consumer Research,17 (March), 426-39.

Bitner, Mary Jo (1990), "Evaluating Service Encounters: The Ef-fects of Physical Surroundings and Employee Responses," Jour-nal of Marketing, 54 (April), 69-82.

Bolton, Ruth N. and James H. Drew (1991a), "A LongitudinalAnalysis of the Impact of Service Changes on Customer Atti-tudes," Joumal of Marketing, 55 (January), 1-9.

and (1991b), "A Multistage Model of Custom-ers' Assessments of Service Quality and Value," Joumal ofConsumer Research, 17 (Miirch), 375-84.

Boulding, William, Ajay Kalra, Richard Staelin, and Valarie A.Zeithaml (1993), "A Dynamic Process Model of Service Qual-ity: From Expectations to Behavioral Intentions," Journal ofMarketing Research, 30 (February), 7-27.

Brown, Tom J., Gilbert A. Churchill, Jr., and J. Paul Peter (1993),"Improving the Measurement of Service Quality," Journal ofRetailing, 69 (Spring), 127-39.

Cadotte, Earnest R., Robert B. Woodruff, and Roger L. Jenkins(1987), "Expectations and Norms in Models of ConsumerSatisfaction," Journal of Marketing Research, 24 (August)305-14.

Carman, James M. (1990), "Consumer Perceptions of ServiceQuality: An Assessment of the SERVQUAL Dimensions,"Joumal of Retailing, 66 (1), 33-55.

Churchill, Gilbert A., Jr. and Carol Surprenant (1982), "An Inves-

tigation Into the Determinants of Customer Satisfaction," Jour-nal of Marketing Research, 19 (November), 491-504.

Cronin, J. Joseph, Jr. and Steven A. Taylor (1992), "Measuring Ser-vice Quality: A Reexamination and Extension," Journal ofMarketing, 56 (July), 55-68.

Dabholkar, Pratibha A. (1993), "Customer Satisfaction and Ser-vice Quality: Two Constructs or One?" in Enhancing Knowl-edge Development in Marketing, Vol. 4, David W. Cravensand Peter R. Dickson, eds. Chicago, IL: American MarketingAssociation, 10-18.

Dodds, William B., Kent B. Monroe, and Dhruv Grewal (1991),' 'The Effects of Price, Brand and Store Information on Buyers'Product Evaluations," Joumal of Marketing Research, 28 (Au-gust), 307-19.

Forbes, J.D., David K. Tse, and Shirley Taylor (1986), "Toward aModel of Consumer Post-Choice Response Behavior," in Ad-vances in Consumer Research, Vol. 13, Richard L. Lutz, ed.Ann Arbor, MI: Association for Consumer Research.

Gerbing, David W. and James C. Anderson (1988), "An UpdatedParadigm for Scale Development Incorporating Unidimension-ality and Its Assessment," Joumal of Marketing Research, 15(May), 186-92.

Green, Paul E. and V. Srinivasan (1978), "Conjoint Analysis inConsumer Research: Issues and Outlook," Journal of Con-sumer Research, 5 (September), 103-23.

and (1990), "Conjoint Analysis in Marketing:New Developments With Implications for Research and Prac-tice," Joumal of Marketing, 54 (October), 3-19.

Green, S.B., R. Lissitz, and S. Mulaik (1977), "Limitations ofCoefficient Alpha as an Index of Test Unidimensionality," Edu-cational and Psychological Measurement, 37 (Winter), 827-38.

Gronroos, Christian (1982), Strategic Management and Marketingin the Service Sector, Helsinki, Finland: Swedish School of Ec-onomics and Business Administration.

Hattie, John (1985), "Methodology Review: Assessing Unidimen-sionality," Applied Psychological Measurement, 9 (June),139-64.

Helson, Harry (1964), Adaptation Level Theory. New York:Harper and Row.


Howell, Roy D. (1987), "Covariance Structure Modeling andMeasurement Issues: A Note on 'Interrelations Among A Chan-nel Entity's Power Sources,"' Joumal of Marketing Research,14 (February), 119-26.

Kahneman, Daniel and Dale T. Miller (1986), "Norm Theory:Comparing Reality to Its Altematives." Psychologicai Review,93 (2), 136-53.

Lehtinen, Uolevi and Jarmo R. Lehtinen (1982), "Service Quality:A Study of Quality Dimensions," unpublished working paper.Helsinki, Finland: Service Management Institute.

Mazis, Michael B., Olli T. Ahtola, and R. Eugene Klippel (1975),"A Comparison of Four Multi-Attribute Models in the Predic-tion of Consumer Attitudes," Joumal of Consumer Research,2 (June), 38-52.

Oliver, Richard L. (1980), "A Cognitive Model of the Antece-dents and Consequences of Satisfaction Decisions," Joumal ofMarketing Research, 17 (November), 460-69.

(1985), "An Extended Perspective on Post-Purchase Phe-nomena: Is Satisfaction a Red Herring?'' unpublished paper pre-sented at 1985 Annual Conference of the Association for Con-sumer Research, Las Vegas (October).

- and John B. Swan (1989), "Consumer Perceptions of In-terpersonal Equity and Satisfaction in Transactions: A Field Sur-vey Approach," Joumal of Marketing, 53 (April), 21-35.

Parasuraman, A., Leonard L. Berry, and Valarie A. Zeithaml(1990), "Guidelines for Conducting Service Quality Re-search," Marketing Research (December), 34^W.

, , and (1991), "Refinement and Re-assessment of the SERVQUAL Scale," Joumal of Retailing,67 (Winter), 420-50.

, , and (1993), "More on ImprovingService Quality Measurement," Journal of Retailing, 69(Spring), 140-47.

-, Valarie A. Zeithaml, and Leonard L. Berry (1985). "AConceptual Model of Service Quality and Its Implications forFuture Research," Joumal of Marketing, 49 (Fall), 41-50.

-, and (1988), "SERVQUAL: A Mul-tiple-Item Scale for Measuring Consumer Perceptions of Ser-vice Quality," Joumal of Retailing, 64 (1), 12^0.

Perreault, William D., Jr. (1992), "The Shifting Paradigm in Mar-keting Research," Journal of the Academy of Marketing Sci-ence, 20 (Fall), 367-75.

Peter, J. Paul, Gilbert A. Churchill, Jr., and Tom J. Brown (1993),' 'Caution in the Use of Difference Scores in Consumer Re-search," Joumal of Consumer Research, 19 (March), 655-62.

Reidenbach, R. Eric and Beverly Sandifer-Smallwood (1990), "Ex-ploring Perceptions of Hospital Operations by a Modified

SERVQUAL Approach," Joumal of Health Care Marketing,10 (December), 47-55.

Sasser, W. Earl, Jr., R. Paul Olsen, and D. Daryl Wyckoff (1978),"Understanding Service Operations," in Management of Ser-vice Operations. Boston: AUyn and Bacon.

Shostack, G. Lynn (1977), "Breaking Free from Product Market-ing," Joumal of Marketing, 41 (April), 73-80.

Swan, John E. and Richard L. Oliver (1989), "Postpurchase Com-munications by Consumers," Joumal of Retailing, 65 (Win-ter), 516-33.

Teas, R. Kenneth (1993), "Expectations, Performance Evaluationand Consumers' Perceptions of Quality," Joumal of Market-ing, 57 (October), 18-34.

Tse, David K. and Peter C. Wilton (1988), "Models of ConsumerSatisfaction Formation: An Extension," Joumal of MarketingResearch, 25 (May), 204-12.

Westbrook, Robert A. and Richard L. Oliver (1981), "DevelopingBetter Measures of Consumer Satisfaction: Some PreliminaryResults," in Advances in Consumer Research, Vol. 8, Kent B.Monroe, ed. Ann Arbor, MI: Association for Consumer Re-search, 94-99.

Wilton, Peter and Franco M. Nicosia (1986), "Emerging Para-digms for the Study of Consumer Satisfaction," European Re-search, 14 (January), 4-11.

Woodruff, Robert B., Ernest R. Cadotte, and Roger L. Jenkins(1983),' 'Modeling Consumer Satisfaction Processes Using Ex-perienced-Based Norms," Joumal of Marketing Research, 20(August), 296-304.

, D. Scott Clemons, David W. Schumann, Sarah F. Gar-dial, and Mary Jane Burns (1991), "The Standards Issue InCS/D Research: A Historical Perspective," Joumal of Con-sumer Satisfaction, Dissatisfaction and Complaing Behavior,4, 103-9.

Woodside, Arch G., Lisa L. Frey, and Robert Timothy Daly(1989), "Linking Service Quality, Customer Satisfaction, andBehavioral Intention," Journal of Health Care Marketing, 9(December), 5-17.

Zeithaml, Valarie A., Leonard L. Berry, and A. Parasuraman(1991), "The Nature and Determinants of Customer Expecta-tions of Service," Marketing Science Institute Research Pro-gram Series (May), Report No. 91-113.

, and (1993), "The Nature and Deter-minants of Customer Expectations of Service," Joumal of theAcademy of Marketing Science, 21 (1), 1-12.

-, A. Parasuraman, and Leonard L. Berry (1990), Deliver-ing Service Quality. Balancing Customer Perceptions and Ex-pectations. New York: The Free Press.


Documents

Reassessment of Expectations as a Comparison Standard in Measuring Service Quality- Implications of Future Research