Intra- and inter-observer reliability of the application of the cellulite severity scale to a Spanish female population

ORIGINAL ARTICLE

Intra- and inter-observer reliability of the application ofthe cellulite severity scale to a Spanish female population

M. De La Casa Almeida,* C. Suarez Serrano, J.J. Jimenez Rejano, R. Chillon Martınez,E.M. Medrano Sanchez, J. Rebollo Roldan

Department of Physiotherapy, University of Seville, Avicena S ⁄ S, Seville, Spain

*Correspondence: M. De La Casa Almeida. E-mail: [email protected]

AbstractBackground ‘Hexsel, dal’Forno and Hexsel Cellulite Severity Scale’ (CSS) was developed to evaluate cellulite with

an objective and easy to apply tool.

Objective Study CSS intra- and inter-observer reliability in a Spanish female population by evaluating patients’

cellulite through photographs of their overall gluteofemoral zone as opposed to its creators who distinguished

between buttocks and thigh.

Methods Cellulite Severity Scale was applied to 27 women, evaluating gluteofemoral cellulite, differentiating

between left and right. Evaluations were made by three expert examiners each at three times with a 1-week

separation. Variables were the five CSS dimensions (number of evident depressions; depth of depressions;

morphological appearance of skin surface alterations; grade of laxity, flaccidity, or sagging skin; and the Nurnberger

and Muller classification scale), and the overall CSS score. Cronbach’s alpha, intra-class correlation and item total

correlation were analysed.

Results Cronbach’s alpha values were 0.951 (right) and 0.944 (left). In the intra-observer reliability analysis, intra-

class correlation coefficient ranged from 0.993 to 0.999 (P < 0.001) and in the inter-observer analysis were 0.937

(right) and 0.947 (left) (P < 0.001). Item total correlation showed all dimensions to be needed except grade of laxity,

flaccidity or sagging skin (0.959 right; 0.955 left).

Conclusion Cellulite Severity Scale has excellent reliability and internal consistency when used to evaluate cellulite

on the buttocks and back of the thighs considered together. Nevertheless, the dimension grade of laxity, flaccidity

or sagging skin does not contribute positively to the final consistency of the scale. This dimension needs to be

analysed in greater depth in future studies.

Received: 18 December 2011; Accepted: 5 March 2012

Conflict of interestNone declared.

Funding sourcesNone declared.

IntroductionCellulite is one of the æsthetic phenomena currently of greatest

concern to the female population,1–6 possibly because of both its

high prevalence in postpuberal women5,7–14 and the prevailing

standards of beauty in today’s society. As Terranova15 and

Blanchemaison16 observed, cellulite is a real problem for women

in general, and can cause mental suffering to women who are

affected by it.

Currently there are numerous medical and surgical procedures

available for its therapy. Tools for its objective evaluation are,

however, of difficult access, high cost and, in most cases, strongly

dependent on the skill of both the operator who performs the tests

and the evaluator who analyses them. For example, imaging tests,

such as ultrasound and nuclear magnetic resonance, are frequently

used in research but not in clinical practice, as also is the case for

such invasive techniques as biopsies with their subsequent histo-

logical analysis.17–20

Other evaluation methods which are frequently used in clinical

practice are more accessible, easier to apply, and of lower cost, but

they have the disadvantage of not being very objective, and of

sometimes not providing a true valuation of the cellulite. For

example, anthropometric measurements such as weight or body

ª 2012 The Authors

JEADV 2012 Journal of the European Academy of Dermatology and Venereology ª 2012 European Academy of Dermatology and Venereology

DOI: 10.1111/j.1468-3083.2012.04536.x JEADV

perimeters provide a quantitative, indirect estimate of adipose tis-

sue, but does not shed any real light on cellulite.21–23 The use of

simple visual inspection, of conventional photographs, and of clas-

sifications and scales that have never been validated has the afore-

mentioned problem of lack of objectivity and is prone to

interference from a great many factors.2,20,24

For this reason, Drs Hexsel, dal’Forno & Hexsel25 developed

and validated an objective tool of easy application – the ‘Hexsel,

dal’Forno and Hexsel Cellulite Severity Scale’ (CSS) to evaluate

the cellulite phenomenon and establish a new classification. As

those authors state, advances in the treatment of cellulite, its prev-

alence and the growing demand for care by patients made it neces-

sary to develop an objective method of measuring or evaluating its

severity, and which would also allow the effects of different treat-

ments to be studied.

Although other scales had been developed and used to obtain

more objective information on cellulite than was provided by the

different classifications previously in use,10,18,24,26 CSS’s ease of

application, the rapidity of the evaluation and the absence of any

need for instrumentation all make this scale an invaluable tool

both in clinical practice and in research.25

At this point, we have to emphasize that CSS has been validated

only to address cellulite on the back of the thighs and the buttocks.

It is not valid to evaluate other zones of the body in which cellulite

may be present.25

The CSS includes the classification proposed by Nurnberger &

Muller8 together with four other cellulite-associated morphological

features relating to the macroscopic appearance of the surface of

the skin. It thus involves the evaluation of a total of five dimen-

sions (Fig. 1):

1 Number of evident depressions.

2 Depth of depressions.

3 Morphological appearance of skin surface alterations.

4 Grade of laxity, flaccidity or sagging skin.

5 Classification scale by Nurnberger and Muller.

The objective of the present work was to study the intra- and

inter-observer reliability of the ‘Hexsel, dal’Forno and Hexsel Cel-

lulite Severity Scale’ in a Spanish female population by evaluating

the patients through photographs taken under standardized condi-

tions of the overall gluteofemoral region, unlike the usage of the

authors of the scale who differentiate between buttock and thigh.

The aim is to be able to recommend its use in clinical practice. No

cultural adaptation was necessary because this is a visual scale in

which the terminology is scientific, specific to the problem of

cellulite, and known and of everyday use by any specialist the field,

independently of the language of origin.

Materials and methods

Design

Study of the intra- and inter-observer reliability of the Cellulite

Severity Scale.

Sample

The sample consisted of 27 women who participated in a clinical

trial for the treatment of cellulite. This trial was approved by the

Ethics Committee of the University of Seville in line with the stan-

dards of the Helsinki Declaration of 2008.

The participants presented cellulite in one of the stages of the

scale of Nurnberger and Muller. They all gave their informed con-

sent. Their mean age was 26.41 ± 6.16 (range 20–40) years.

To calculate the sample size we used the program ‘Tamano de

la Muestra’ of Perez Medina et al. For a type-I error of 0.05, an

estimate of the intra-class correlation coefficient of 0.7, and a

level of accuracy of 0.3, the result was a total of 23 subjects.

Finally, 27 women were included to allow for possible losses.

There were eventually no losses, so that the actual level of accuracy

was 0.27.

Data acquisition process

The study was conducted from July to November 2011 at the

University of Seville, Spain.

Cellulite was evaluated for the overall gluteofemoral zone, dif-

ferentiating between left and right. This was different from the

method described by the creators of the scale which differentiates

four zones: left and right buttock, and left and right thigh.25

The degree of cellulite assessed using the CSS was estimated by

scoring each of the five items of the scale between 0 and 3 points

depending on the severity of involvement (0 being the best possi-

ble score and 3 the worst). This resulted in an overall score on a

scale between 0 and 15 points (0 again being the best possible

score and 15 the worst). Based on this total score, the cellulite was

classified as mild (0–5 points), moderate (6–10 points) or severe

(11–15 points), with the score giving a finer-grained estimate of

the severity within each of these three classifications.

The evaluations were carried out by three expert observers:

one was a member of the research team who was familiar with

the CSS, and the other two were from outside the team, were blind

in the study, and were not familiar with this scale. As had been

done by the ‘evaluator author’ in the CSS validation process,25 the

three observers each performed their assessments at three different

times separated by 1 week. These assessments were made on

photographs of the gluteofemoral zone of each patient taken

under standardized conditions, and which were provided to the

observer in a digital form.

The photographs were always taken in the same room and by

the same investigator. We used a Canon EOS 500D camera

(Canon Espana S.A., Alcobendas, Madrid, Spain) with a resolution

of 15.1 megapixels, mounted on a tripod at 1.60 m from the

patient. For each patient, the lens was placed at the height of the

base of the sacrum, perpendicular to the plane of the skin sur-

face.14,26 The lighting was kept constant throughout the evalua-

tions, using the artificial lighting in the evaluation room with

supplementary illumination from a lamp providing additional

indirect light tangential to the zone being photographed.

ª 2012 The Authors


2 De La Casa Almeida et al.

Two photographs were taken of each patient standing – one in

a position with relaxed gluteofemoral musculature, and the other

in a position with the musculature in contraction (useful only in

patients presenting Stage 1 in the Nurnberger and Muller classifi-

cation). To be photographed, all the patients wore the same model

of disposable white thong.

The variables studied were those corresponding to each of the

five dimensions of the CSS, and the overall CSS score.

The statistical analysis of the data was done using SPSS for Win-

dows version 17.0, with the following procedure in accordance

with the objectives of the study:

1 First, Cronbach’s alpha for the CSS was calculated to estab-

lish the scale’s internal consistency.

2 Second, the intra-class correlation coefficients were calculated

to determine the intra-observer reliability (i.e. between each

of the three moments of the study for each evaluator) and

(a)

(b)

(c)

(d)

(e)

Figure 1 Hexsel Dal’Forno & Hexsel CSS.Reproduced from: Hexsel DM, Dal’forno T,

Hexsel CL. A validated photonumeric

cellulite severity scale. J Eur Acad

Dermatol Venereol 2009 May; 23(5):523–528. Publisher; John Wiley & Sons

Ltd.

ª 2012 The Authors


CSS reliability in a Spanish female population 3

the inter-observer reliability (i.e. the degree of agreement or

concordance between the different evaluators).

3 And third, the item total correlation, i.e. the correlation

between the whole CSS and each item of the scale, was

analysed to determine whether all the items (dimensions of

the scale) were necessary or whether some were redundant,

not contributing to the final score of the scale.

In all cases, the significance level was taken to be P < 0.05.

ResultsThe values of Cronbach’s alpha were 0.951 (right side) and 0.944

(left side), indicative of the CSS’s excellent internal reliability

(a > 0.9).27

The intra-observer reliability analysis of the three assessments

made by each evaluator gave values of the intra-class correlation

coefficient that ranged from 0.993 to 0.999 with P < 0.001

(Table 1), indicative of almost perfect agreement (>0.81) and high

internal consistency.28

The inter-observer reliability analysis gave values of the intra-

class correlation coefficient of 0.937 (P < 0.001) for the right side,

and 0.947 (P < 0.001) for the left side, again indicative of almost

perfect agreement (>0.81) and high internal consistency.28

Finally, the item total correlation analysis showed that all the

scale’s dimensions were necessary (Table 2) with the exception of

one, the degree of laxity, flaccidity or sagging skin, on both sides,

with coefficients of 0.959 (right) and 0.955 (left).

DiscussionThe values obtained for Cronbach’s alpha constitute evidence for

CSS’s excellent reliability and internal consistency when used to

evaluate the back of the thighs and the buttocks conjointly and

not only separately as was done by the scale’s creators.25 In their

validation of the scale, those authors obtained values for this statis-

tic ranging from 0.851 to 0.989.25 The present result may be an

aspect of some clinical relevance as it allows a patient’s cellulite to

be evaluated in the gluteofemoral zone overall without segmenta-

tion of the two body regions being required.

As also had been the case in the study of Drs Hexsel, dal’Forno,

and Hexsel,25 we found high values of inter-observer reliability for

the CSS.

The item total correlation analysis, however, showed that the

grade of laxity, flaccidity or sagging skin dimension actually

reduces the scale’s internal consistency for both the right side and

the left side measurements. In particular, removing this item from

the scale led to higher values of Cronbach’s alpha (Table 2). This

dimension therefore needs to be analysed in greater depth. In their

study, Drs Hexsel, dal’Forno, and Hexsel25 state that: ‘All items

were necessary in order to grade the severity of cellulite with the

exception of it in the right buttock.’ They therefore decided to keep

this dimension of the scale even though they noted that: ‘Flaccidity

may require additional physical examination and it is a considerable

aggravating factor that seems to contribute at least partially to surface

alterations in the affected areas.’ However, we observed in the

present study that for neither the left nor the right side was

there any need for this variable, as it did not contribute positively

to the final consistency of the scale.

As flaccidity is not an inherent factor of cellulite per se, but

rather, as the creators of CSS state, an aggravating factor,25 there is

a need for future studies to consider either the reformulation of

this variable or, if appropriate, its removal from CSS to endow the

scale with even greater reliability and internal consistency.

Finally, once the question of the dimension represented by

grade of laxity, flaccidity or sagging skin has been clarified, it

would be interesting to consider validating the scale for other

zones of the skin where this disease may be present, such as abdo-

men, arms or back. According to the creators of the scale, such

other areas were not included in the validation process, and the

CSS is therefore not useful for them.25

ConclusionsThis study has confirmed the excellent reliability and internal con-

sistency of the Cellulite Severity Scale when it is used to evaluate

Table 1 Results of the intra-observer reliability analysis

Left Right

Evaluator 1 ICC 0.993 0.993

P-value <0.001 <0.001


P-value <0.001 <0.001


P-value <0.001 <0.001

Table 2 Results of the item total correlation analysis and Cronbach’s alpha values

Item total correlation Cronbach’s alpha values

Left Right Left Right

Number of evident depressions 0.931 0.943 0.944 0.951

Depth of depressions 0.925 0.937

Morphological appearance of skin surface alterations 0.919 0.927

Grade of laxity, flaccidity or sagging skin 0.955 0.959

Classification scale by Nurnberger and Muller 0.925 0.930

ª 2012 The Authors


4 De La Casa Almeida et al.

cellulite conjointly on the back of the thighs and the buttocks.

Nevertheless, the total item correlation analysis showed that one of

the dimensions of the scale – the grade of laxity, flaccidity or sag-

ging skin – does not contribute positively to the final consistency

of the scale. We therefore propose for future studies the analysis of

this dimension in greater depth, considering its possible reformu-

lation or, if appropriate, elimination to endow the scale with even

greater internal consistency.

Given the reliability demonstrated by the CSS in the present

study, we also propose that, once the question of dimension repre-

sented by the grade of laxity, flaccidity or sagging skin has been

resolved, this scale be validated for use on other body areas where

cellulite is also common.

References1 Mole B, Blanchemaison P, Elia D et al. High frequency ultrasonography

and celluscore: an improvement in the objective evaluation of cellulite

phenomenon. Ann Chir Plast Esthet 2004; 49: 387–395.

2 Volga B, Turkan A, Yesim B et al. Effects of mechanical massage, man-

ual lymphatic drainage and connective tissue manipulation techniques

on fat mass in women with cellulite. J Eur Acad Dermatol Venereol

2010; 24: 138–142.

3 Altabas K, Altabas V, Berkovic MC, Rotkvic VZ. From cellulite to

smooth skin: is Viagra the new dream cream? Med Hypotheses 2009; 73:

118–119.

4 Nootheti PK, Magpantay A, Yosowitz G et al. A single center, random-

ized, comparative, prospective clinical study to determine the efficacy of

the VelaSmooth system versus the Triactive system for the treatment of

cellulite. Lasers Surg Med 2006; 38: 908–912.

5 Sadick NS, Mulholland RS. A prospective clinical study to evaluate the

efficacy and safety of cellulite treatment using the combination of opti-

cal and RF energies for subcutaneous tissue heating. J Cosmet Laser

Ther 2004; 6: 187–190.

6 Khan MH, Victor F, Rao B, Sadick NS. Treatment of cellulite: Part I.

Pathophysiology. J Am Acad Dermatol 2010; 62: 361–370; quiz 371–2.

7 Scherwitz C, Braun-Falco O. So-called cellulite. J Dermatol Surg Oncol

1978; 4: 230–234.

8 Nurnberger F, Muller G. So-called cellulite: an invented disease.

J Dermatol Surg Oncol 1978; 4: 221–229.

9 Avram MM. Cellulite: a review of its physiology and treatment. J Cosmet

Laser Ther 2004; 6: 181–185.

10 Smalls LK, Lee CY, Whitestone J et al. Quantitative model of cellulite:

three-dimensional skin surface topography, biophysical characteriza-

tion, and relationship to human perception. J Cosmet Sci 2005; 56:

105–120.

11 Smalls LK, Hicks M, Passeretti D et al. Effect of weight loss on cellulite:

gynoid lypodystrophy. Plast Reconstr Surg 2006; 118: 510–516.

12 Pavicic T, Borelli C, Korting HC. Cellulite–the greatest skin problem in

healthy people? An approach. J Dtsch Dermatol Ges 2006; 4: 861–870.

13 Rawlings AV. Cellulite and its treatment. Int J Cosmet Sci 2006; 28:

175–190.

14 Ortonne JP, Zartarian M, Verschoore M et al. Cellulite and skin ageing:

is there any interaction?. J Eur Acad Dermatol Venereol 2008; 22: 827–

834.

15 Terranova F, Berardesca E, Maibach H. Cellulite: nature and aetio-

pathogenesis. Int J Cosmet Sci 2006; 28: 157–167.

16 Blanchemaison P. La cellulite: physiopathologie, diagnostic, evaluation

et traitements. Angeiologie 2004; 56: 77–83.

17 Rosenbaum M, Prieto V, Hellmer J et al. An exploratory investigation

of the morphology and biochemistry of cellulite. Plast Reconstr Surg

1998; 101: 1934–1939.

18 Mirrashed F, Sharp JC, Krause V et al. Pilot study of dermal and sub-

cutaneous fat structures by MRI in individuals who differ in gender,

BMI, and cellulite grading. Skin Res Technol 2004; 10: 161–168.

19 Callaghan T, Wilhelm KP. An examination of non-invasive imaging

techniques in the analysis and review of cellulite. J Cosmet Sci 2005; 56:

379–393.

20 Wanitphakdeedecha R, Manuskiatti W. Treatment of cellulite with a

bipolar radiofrequency, infrared heat, and pulsatile suction device: a

pilot study. J Cosmet Dermatol 2006; 5: 284–288.

21 Rossi AB, Vergnanini AL. Cellulite: a review. J Eur Acad Dermatol Vene-

reol 2000; 14: 251–262.

22 Rao J, Gold MH, Goldman MP. A two-center, double-blinded, random-

ized trial testing the tolerability and efficacy of a novel therapeutic

agent for cellulite reduction. J Cosmet Dermatol 2005; 4: 93–102.

23 Rona C, Carrera M, Berardesca E. Testing anticellulite products. Int J

Cosmet Sci 2006; 28: 169–173.

24 Angehrn F, Kuhn C, Voss A. Can cellulite be treated with low-energy

extracorporeal shock wave therapy? Clin Interv Aging 2007; 2: 623–630.

25 Hexsel DM, Dal’forno T, Hexsel CL. A validated photonumeric cellulite

severity scale. J Eur Acad Dermatol Venereol 2009; 23: 523–528.

26 Perin F, Perrier C, Pittet JC et al. Assessment of skin improvement

treatment efficacy using the photograding of mechanically-accentuated

macrorelief of thigh skin. Int J Cosmet Sci 2000; 22: 147–156.

27 Pardo Merino A, Ruiz Dıaz MA. SPSS 11. Guıa para el analisis de datos,

1st edn. McGraw-Hill ⁄ Interamericana de Espana, S.A.U, Madrid, 2002.

28 Landis JR, Koch GG. The measurement of observer agreement for

categorical data. Biometrics 1977; 33: 159–174.

ª 2012 The Authors


CSS reliability in a Spanish female population 5

Documents

Intra- and inter-observer reliability of the application of the cellulite severity scale to a Spanish female population