Assessing the Severity of Challenging Behaviour: Psychometric Properties of the Challenging Behaviour Interview

Assessing the Severity of ChallengingBehaviour: Psychometric Properties ofthe Challenging Behaviour InterviewChris Oliver�, Karen McClintock�, Scott Hall�, Melanie Smith�, Dave Dagnany and Biza Stenfert-Kroese�

�School of Psychology, University of Birmingham, UK and yWest Cumbria NHS Trust and University of Northumbria in Newcastle, UK

Accepted for publication 3 October 2002

Background The Challenging Behaviour Interview (CBI)

was developed as an assessment of the severity of challen-

ging behaviour. The CBI is divided into two parts. Part I of

the interview identifies the occurrence of five clearly oper-

ationalized forms of challenging behaviour that have

occurred in the last month. Part II of the interview assesses

the severity of the behaviours identified on 14 scales

measuring the frequency and duration of episodes, effects

on the individual and others and the management strate-

gies used by carers. In this paper we report upon its

psychometric properties and discuss potential clinical

and research uses of the new scale.

Methods The CBI was administered to 40 adults and 47

children. Test–retest and inter-rater agreement was

assessed for 22 participants in the adult sample. Concur-

rent validity was assessed by correlating total scores for the

child sample with the subscale and total scores of the

Aberrant Behavior Checklist (ABC). Content validity

was assessed by comparing scores for each behaviour

on specific items relating to relevant aspects of severity

of impact that would be expected to differ based upon the

topographies of the behaviour.

Results Mean inter-rater and test–retest reliability kappa

indices for the behaviours in Part I of the interview were

0.67 (range: 0.50–0.80) and 0.86 (range: 0.70–0.91), respec-

tively. Mean inter-rater and test–retest reliability Pearson‘s

correlation indices for the behaviours in Part II of the

interview were 0.48 (range: 0.02–0.77) and 0.76 (range:

0.66–0.85), respectively. Correlations with the ABC varied

between 0.19 and 0.68. The majority of content validity

comparisons were in line with prediction.

Conclusions The potential of the interview for clinical

assessment, as an outcome measure for services and indi-

vidual interventions and research purposes, is discussed.

Keywords: assessment, challenging behaviour

Introduction

Between 4 and 14% of people with intellectual disabilities

show challenging behaviour such as aggression and self-

injury (Oliver et al. 1987; Kiernan & Kiernan 1994). Chal-

lenging behaviour has been defined as:

Culturally abnormal behaviour(s) of such an intensity,

frequency or duration that the physical safety of the

person or others is likely to be placed in serious

jeopardy or behaviour which is likely to seriously

limit or deny access to and use of ordinary community

facilities. (Emerson 1995)

Emerson (1998) identified three important aspects of this

definition. These are that challenging behaviour is defined

by its impact, that challenging behaviour is to some extent

socially constructed and that challenging behaviour can

have a wide range of personal and social consequences.

Thus, ’challenging behaviour‘ does not refer to a single

topography of behaviour but to behaviours that will have a

wide range of impacts upon the quality of life of people

with challenging behaviour and those who live and work

with them. The social construction of challenging beha-

viour suggests that the identification of challenging beha-

viour will vary across settings, with some settings able to

manage more severe behaviours such that the behaviours

are not perceived to be challenging.

This consideration of the concept and definition of chal-

lenging behaviour suggests that assessments that identify

only a limited number of dimensions of impact of challen-

ging behaviour may be insufficient to properly identify the

significance of such behaviour to services and people with

intellectual disabilities themselves. Assessments available

Journal of Applied Research in Intellectual Disabilities 2003, 16, 53–61

# 2003 BILD Publications

for identifying challenging behaviour tend to focus on

single or a small section of possible impacts. For example,

scales that are widely used in the measurement of change

following intervention or as measures for demonstrating

the effectiveness of services such as Part II of the Adaptive

Behavior Scale (ABS) (Nihira et al. 1974) and the Aberrant

Behavior Checklist (ABC; Aman et al. 1985) use single

Likert scales of frequency, severity or ’degree of problem‘

for a predetermined set of behaviours. These scales tend to

produce total or factor scores that allocate equal weighting

to all behaviours regardless of relative impacts of the

different behaviours on the social and physical environ-

ment or the quality of life of the individual and those who

live and work with them (McDevitt et al. 1977; Clements

et al. 1980; Holmes & Batt 1980; Spreat 1982; Felce & Lowe

1995; Havercamp & Reiss 1996).

Other scales have assessed a broader range of impacts

for behaviours. These include the Checklist for Challeng-

ing Behaviour (CCB) (Russell & Harris 1993, Harris et al.

1994), the definitions of challenging behaviour used by

Quereshi (1993) and Emerson et al. (1997) and the Mala-

daptive Behaviour Inventory (Dagnan et al. 1995). The CCB

Harris et al. 1994) which has been used in an epidemiolo-

gical study of aggressive behaviours in a health district in

the South of England consists of two scales. The first is

primarily concerned with aggressive behaviours. Items on

this scale are rated in terms of their frequency, severity and

management difficulty using 5-point Likert scales. The sec-

ond scale within the CCB consists of other types of challeng-

ingbehaviour thatareconsideredlikely tobeassociatedwith

aggressive behaviour, these items are rated in terms of

their frequency and severity only. In the pilot version of the

scale, inter-rater and test–retest agreement ranged from

0.67 to 0.70 and 0.53 to 0.69, respectively. In the revised

CCB, inter-rater agreement ranged from 29 to 100%.

Quereshi (1993) and Emerson et al. (1997) conducted a

large-scale, longitudinal epidemiological study within the

North West Regional Health Authority, UK. This study

used a definition of challenging behaviour that clearly

identified the topography of challenging behaviour, the

current impact upon the environment and the manage-

ment strategies. This interview format had a good level of

reliability in the identification of people who were appro-

priately defined as presenting challenging behaviour.

Cohen‘s kappa for inter-rater identification of people

who fitted the definition of challenging behaviour used

in the studies varied from 0.62 for people living in hospital

settings to 0.71 for people living in Social Services Hostels.

This interview process is important in that it emphasizes

again that different dimensions of impact should be con-

sidered in identifying challenging behaviour. Dagnan et al.

(1995) present a brief scale for use in population registers

and epidemiological surveys based upon the factor struc-

ture of the ABC (Aman et al. 1985). The scale listed topo-

graphies of behaviour and asked carers to rate on two 10-

point scales the frequency and severity of the behaviours.

Psychometric analysis of this scale used with 378 people

with intellectual disabilities found that both dimensions

produced the same four-factor structure for the scale items

(factors labelled ’impulsive and aggressive behaviour‘,

’passive behaviour and lethargy‘, ’stereotypic and self-

injurious behaviour‘ and ’active social avoidance‘). Further,

the correlations between the severity and frequency scales

were high and significant for all items (mean 0.65, SD¼0.22) except those concerned with passive behaviours (for

example, ’standsstill‘ and ’withdrawn‘ where both extremes

of the frequency scale might be seen as indicating a severe

behaviour). The assessments reviewed here draw attention

to the potential for assessments of the impact of challeng-

ing behaviour to include a wide range of dimensions.

It is not only in the definition of challenging behaviour

that it is important to consider a broad range of impacts for

such behaviour. Non-aversive intervention in challenging

behaviour are characterized by attention to quality of life

and ecological change (e.g. LaVigna & Donnellan, 1985;

Horner et al. 1990; Kushlick et al. 1997). Intervention for

challenging behaviour may involve strategies for the suc-

cessful management of challenging behaviour and the

improvement of quality of life for people who challenge

and those who live and work with them alongside strategies

to reduce the frequency and intensity of such behaviours.

Clearly, in order to measure the impact of such interventions

measures of challenging behaviour should assess a wider

range of possible impacts of challenging behaviour. The

measures reviewed above tend to use limited definitions of

severity and limited scoring systems. This may not allow

comparative severity within and between individuals to be

sufficiently described. This issue is immediately relevant

to comprehensive intervention evaluation (Meyer & Janney

1989). The current paper describes the development of a

measure of the severity of challenging behaviour that uses

a broad range of dimensions of impact of challenging beha-

viour and an examination of its psychometric properties

based upon its use with 87 adults and children with

intellectual disabilities and challenging behaviour.

Method

Participants and respondents

Two groups of individuals participated in the study.

Participants were selected who were regarded as having

54 Journal of Applied Research in Intellectual Disabilities

# 2003 BILD Publications, Journal of Applied Research in Intellectual Disabilities, 16, 53–61

challenging behaviour because the focus of assessment

was the severity of challenging behaviour, not whether

it was present or absent. The adult sample comprised 40

adults aged 17–58 years with moderate to severe intellec-

tual disabilities. They were selected to participate on the

basis that they had been identified as showing challenging

behaviour from informal observation, clinical referral or

information provided by carers. Participants either lived in

a hospital or in a community-based service in the West

Midlands, all received 24-h care, and had been living in

their homes for at least 3 months. There were 26 males and

14 females and the mean age of the participants was 36.0

years (SD¼ 12.0 years). The child sample comprised 47

children aged 4–12 years with severe intellectual disabil-

ities. Children were selected from 10 schools for children

with severe intellectual disabilities in the West Midlands

and had been reported to show challenging behaviour by

interview with their classroom teacher. There were 32

males and 15 females and the mean age of the participants

was 8.71 (SD¼ 2.23) years.

The challenging behaviour interview1

The Challenging Behaviour Interview is conducted in two

parts. In Part I, respondents are asked to determine

whether the participant has shown one of the following

five types of behaviour within the last month: ’self-injury‘,

’physical aggression‘, ’verbal aggression‘, ’disruption of

the environment‘ and ’inappropriate vocalizations‘. The

time period of 1 month is used to enhance reliability. For

each behaviour type, a fully operationalized description is

provided, example topographies and other information

about the category. For example, ’self-injury‘ is described

as ’non-accidental behaviours which produce temporary

marks or reddening of the skin or cause bruising, bleeding

or other temporary or permanent tissue damage‘. Exam-

ples listed under the self-injury category include ’self-

biting, head-banging, head-punching or slapping, remov-

ing hair, self-scratching, body-hitting, eye-poking or -

pressing‘. Other information about the category states

’Do not include anal-poking but do include poking of

other body orifices‘.

Part II of the interview consists of 14 questions designed

to assess the severity of each topographical class of beha-

viour identified in Part I. Each question in Part II consists of

a clearly anchored, four or five-point Likert scale (see

Table 1). For example, question number 4 measures the

response required by the worst instance of the identified

behaviour in the past month: a score of one indicates that a

’verbal discouragement or reminder‘ was necessary, a

score of two indicates that an ’informal physical interven-

tion by one member of staff, removal to a safe environ-

ment, and/or removal of staff or others from immediate

environment‘ was necessary, a score of three indicates that

’informal physical intervention by more than one member

of staff, a formal restraint procedure and/or protective

devices‘ was necessary and a score of four indicates that

’seclusion, PRN medication, legal involvement or legal

advice was sought and/or a section of the Mental Health

Act (MHA) being invoked‘ was necessary.

Table 1 Questions in Part II of the interview and their corresponding Likert scales

Question Likert scale

1. Frequency of behaviour 1 (this time next month) to 5 (in the next 15 min)

2. Longest episode of behaviour 1 (less than a minute) to 5 (more than an hour)

3. Average episode of behaviour 1 (less than a minute) to 5 (more than an hour)

4. Response to worst episode 0 (nothing) to 4 (seclusion)

5. Effect on individual‘s physical health 0 (no effect) to 3 (significant injury)

6. Effect on staff physical health 0 (no effect) to 3 (significant injury)

7. Effect on service users physical health 0 (no effect) to 3 (significant injury)

8. Effect on service users well-being 0 (no effect) to 4 (nearly every day)

9. Effect on immediate environment 0 (no damage) to 4 (extreme damage)

10. Restrictive devices applied 0 (never) to 4 (almost continuously)

11. Modifications made to environment 0 (none) to 3 (modifications been made)

12. Verbal response given by staff 0 (never) to 4 (at least once an hour)

13. Physical restraint given by staff 0 (never) to 4 (at least once an hour)

14. More than one staff member needed 0 (never) to 4 (at least once an hour)

1Copies of the interview can be obtained from the first author.

Journal of Applied Research in Intellectual Disabilities 55


Procedure

For each participant in the adult sample, the interview

was conducted with a member of staff who had worked

closest with the participant in the last 3 months (e.g. the

keyworker). For each participant in the child sample, the

interview was conducted with the child‘s teacher at

school. Each respondent was given a copy of the inter-

view schedule to refer to during the interview. In the

Part I of the interview, respondents were asked to

identify whether or not the participant had shown any

of the listed behaviours within the last month. The inter-

viewer read out each of the behavioural categories in turn,

giving definitions and examples of each. Part II of the

interview was then administered for each behaviour iden-

tified in Part I.

Results

Reliability

Test–retest and inter-rater agreement data for the inter-

view were collected for 22 participants in the adult sample.

To assess inter-rater agreement, a second respondent was

interviewed within 2 days of the first interview. The sec-

ond respondent was also required to have worked in the

same home or hospital and to have known the participant

for at least 3 months. To assess test–retest agreement, the

first respondent was re-interviewed after a period of

between 2 and 10 days from the first interview.

Table 2 shows the number of participants in the relia-

bility analysis identified as showing each topography of

challenging behaviour in Part I of the interview and data

for test–retest and inter-rater agreement. For each topo-

graphy, occurrence agreement was calculated by dividing

agreement on occurrence by agreements plus disagree-

ments on occurrence, non-occurrence agreement was cal-

culated by dividing agreement on non-occurrence by

agreements plus disagreements on non-occurrence and

total reliability was calculated by dividing all agreements

by all agreements plus all disagreements.

The mean kappa coefficient across behaviours was 0.67

(range: 0.50–0.80) for inter-rater agreement and 0.86

(range: 0.70–0.91) for test–retest agreement, indicating that

the reliability of Part I of the interview was good.

For the purposes of reliability assessment only, after Part

I of the interview had been administered, respondents

were then asked to indicate how concerned they were

about each behaviour on a seven point Likert scale ranging

from 0 (not at all concerned) to 6 (extremely concerned).

Only those behaviours rated 3 and above on the concern

scale by the first respondent were considered for rating on

Part II of the interview. This was because numerous

behaviours were often identified by informants and it

was thought important to avoid interviewee fatigue. To

assess item reliability, Pearson‘s correlation coefficients

were computed on scores to each question for each beha-

viour, pooled across participants. Table 3 shows the results

of this analysis.

The mean item reliability was 0.53 (range: 0.39–1.00) for

inter-rater agreement and 0.74 (range: 0.54–1.00) for test–

retest agreement. To assess the test–retest and inter-rater

reliability of the total score for each behaviour, the scores

for each question were summed and Pearson‘s correlation

coefficients were computed for each behaviour. Table 4

shows the coefficients from this analysis.

Table 2 Reliability statistics for Part I of the Challenging Behaviour Interview

Inter-rater agreement Test–retest agreement

Behaviour

Number identified

(% of sample,

n¼ 22)

Occ.

(%)

Non-

occ. (%)

Total

(%) Kappa

Occ.

(%)

Non-

occ. (%)

Total

(%) Kappa

Self-injury (SIB) 13 (59.1) 80 70 86 0.71 92 90 95 0.91

Physical aggression (PAG) 18 (81.8) 90 100 91 0.62 94 80 95 0.86

Verbal aggression (VAG) 14 (63.6) 87 78 91 0.80 81 67 86 0.70

Disruption of the environment

(DST)

12 (54.5) 80 70 86 0.72 92 91 95 0.91

Inappropriate vocalizations (IV) 15 (68.2) 71 50 77 0.50 93 88 95 0.90

Mean 81.6 73.6 86.2 0.67 90.4 83.2 93.2 0.86

Range 71–90 50–100 77–91 0.50–0.80 81-94 67-91 86-95 0.70-0.91



The mean inter-rater agreement across behaviours was

0.48 (range: 0.02–0.77) and the mean test–retest agreement

was 0.76 (range 0.66 –0.85). The reliability of the total

overall score was very high (0.90 and 0.96 for inter-rater

and test–retest agreement, respectively). Table 5 shows the

mean total CBI scores computed for each behaviour for the

adult and child samples.

It can be seen that behaviours that are the most likely to

have the broadest range of impacts (such as physical

aggression) have the highest mean total scores (19.50 for

the adult sample and 19.31 for the child sample). Beha-

viours that have a more specific range of impacts (such as

inappropriate vocalizations which has little impact on the

surrounding physical environment) have lower mean total

CBI scores (13.73 and 13.43 for the adult and child samples,

respectively).

Validity

Concurrent validity of the CBI was assessed by correlating

the total score of the CBI for the child sample with the

subscale and total scores of the Aberrant Behavior Check-

list (ABC). The correlation between the total score of

the CBI and the total score of the ABC was 0.56

(P < 0.01). Correlations between the total score of the CBI

and the subscales of the ABC were: Irritability (0.68,

P < 0.01), Lethargy (0.27, n.s.), Stereotypy (0.19, n.s.),

Hyperactivity (0.47, P < 0.01) and Inappropriate Speech

(0.33, P < 0.05).

To assess the content validity of Part II of the CBI, mean

item scores were compared for selected pairs of behaviour.

The pairs of behaviours selected for comparison were

chosen because differences could be predicted between

them (based upon pragmatic judgements from the topo-

graphy of the behaviour) for particular items if these items

were valid. The pairs were also chosen to compare beha-

viours with similar topography but different directions of

action. Thus, self-injurious behaviour was compared to

aggression with differences predicted for scores on items

concerned with the effects of the behaviours on the indi-

vidual’s health (higher for SIB), the health of staff and other

service users (both higher for physical aggression) and the

use of restrictive devices (higher for SIB). Similarly, verbal

and physical aggression were compared with differences

predicted for scores on items concerned with the effect on

carer and other service users‘ health and the frequency of

physical restraint (all higher for physical aggression).

Destruction of the environment and physical aggression

were compared, with differences predicted on items con-

cerned with the effect of the behaviours on the staff’s and

other service users‘ health (both higher for physical

aggression) and the effect on and modifications to the

environment (both higher for destruction of the environ-

ment). Finally, SIB and inappropriate vocalizations were

compared with differences predicted on items concerned

with the effects of the behaviour on the individual’s health

(higher for SIB).

For each comparison, Part II item scores were included

in a between-subjects design. Consequently, if a partici-

pant had scores on both behaviours in the comparison,

Table 3 Inter-rater and test–retest agreement statistics

(Pearson‘s correlations) for individual items rated on Part II of

the Challenging Behaviour Interview

Item

Inter-rater

agreement

Test–retest

agreement

1. Frequency of behaviour 0.50 0.78

2. Longest episode of behaviour 0.53 0.68

3. Average episode of behaviour 0.28 0.75

4. Response to worst episode 0.72 0.86

5. Effect on individual‘s physical

health

0.64 0.82

6. Effect on staff physical health 0.81 0.79

7. Effect on service users physical

health

0.40 0.62

8. Effect on service users well-being 0.42 0.72

9. Effect on immediate environment 0.44 0.67

10. Restrictive devices applied 1.00 1.00

11. Modifications made to

environment

0.39 0.62

12. Verbal response given by staff 0.40 0.75

13. Physical restraint given by staff 0.54 0.54

14. More than one staff member

needed

0.39 0.77

Data were pooled across behaviours and participants (n¼ 47).

Table 4 Inter-rater and test–retest agreement statistics

(Pearson‘s correlations) for total scores for each behaviour and

for the total overall score on Part II of the Challenging Behaviour

Interview

Topography

Inter-rater

agreement

Test–retest

agreement

Self-injury (n¼ 10) 0.63 0.85

Physical aggression (n¼ 14) 0.54 0.76

Verbal aggression (n¼ 9) 0.45 0.75

Disruption of the environment (n¼ 6) 0.77 0.77

Inappropriate vocalizations (n¼ 8) 0.02 0.66

Total overall score (n¼ 21) 0.90 0.96



scores for one behaviour were randomly discarded whilst

scores for the other were retained. The only constraint on

the random selection was the need to have approximately

equal numbers in each group for comparison. As this

process was repeated for each comparison, different scores

were used in each of the four comparisons and the group

sizes for the same behaviour differ between comparisons.

To ensure that the behaviours in the comparisons

occurred at a similar frequency and duration, the scores

on the first three items of Part II which refer to frequency,

the duration of the longest episode in the last month and

the duration of the average episode were compared using

Mann–Whitney U-tests. There were no significant differ-

ences for any of these items between behaviour pairs thus

the behaviours in each comparison were of comparable

frequency and duration. The mean item scores for the

behaviour pairs and the results of the Mann–Whitney

comparisons are shown in Table 6.

To avoid type one errors, the Bonferroni correction was

applied within each comparison and the alpha level for all

comparisons was set at 0.006; all comparisons are two-

tailed. Table 6 shows significant differences in line with the

predictions made for SIB and physical aggression for the

effect of SIB on the person’s health (U¼ 10, P < 0.001), the

effect of aggression on staff’s health (U¼ 40, P < 0.001) and

the effect of aggression on other service users’ health

Table 5 Mean total CBI scores for each behaviour and total overall score on Part II of the interview for child and adult samples

Adult sample Child sample

Topography of

challenging behaviour n

Mean

total

score SD Minimum Maximum n

Mean

total

score SD Minimum Maximum

Self-injury 18 17.06 6.91 7 28 18 13.22 5.31 3 24

Physical aggression 26 19.50 8.78 7 37 32 19.31 8.02 5 35

Verbal aggression 16 14.31 5.15 7 22 9 15.89 5.73 6 24

Disruption of the environment 16 18.69 7.84 7 34 19 16.16 5.83 5 28

Inappropriate vocalizations 15 13.73 4.65 7 25 21 13.43 5.24 7 25

Total overall score 40 39.74 22.59 7 106 47 33.79 19.70 9 91

Table 6 Item scores (meanþ SD in parentheses) for each behaviour from Part II of the interview

SIB

(n¼ 16)

Physical

aggression

(n¼ 17)

Verbal

aggression

(n¼ 14)

Physical

aggression

(n¼ 19)

Disruption of

environment

(n¼ 16)

Physical

aggression

(n¼ 15)

SIB

(n¼ 11)

Inappropriate

vocalization

(n¼ 11)

5. Effect on individual‘s

health

2.31 (1.30) 0.06 (0.25) 0.08 (0.28) 0.22 (0.65) 0.60 (1.18) 0.33 (0.72) 2.64 (1.12) 0.00 (�)

6. Effect on staff‘s health 0.0 (�) 1.31 (1.14) 0.00 (�) 1.44 (1.25) 0.13 (0.35) 1.00 (0.93) 0.00 (�) 0.00 (�)

7. Effect on service users‘

health

0.0 (�) 0.88 (1.17) 0.00 (�) 1.00 (1.89) 0.00 (�) 0.80 (0.94) 0.00 (�) 0.00 (�)

8. Effect on service users

well-being

0.29 (0.61) 1.12 (1.17) 1.43 (1.65) 1.39 (1.19) 1.00 (0.96) 1.20 (1.21) 0.20 (0.63) 1.30 (1.89)

9. Effect on the

environment

0.25 (0.68) 0.41 (1.00) 0.00 (�) 0.84 (1.30) 2.31 (1.66) 0.40 (0.63) 0.36 (0.81) 0.00 (�)

10. Restrictive devices

applied

0.25 (1.00) 0.00 (�) 0.00 (�) 0.00 (�) 0.00 (�) 0.00 (� ) 0.36 (1.21) 0.00 (�)

11. Modifications to

environment

0.50 (0.97) 0.35 (0.86) 0.00 (�) 0.63 (1.01) 1.37 (1.20) 0.20 (0.56) 0.45 (1.04) 0.00 (�)

13. Physical restraint given

by staff

0.62 (0.81) 1.18 (1.07) 0.21 (0.58) 1.26 (1.05) 0.69 (0.87) 0.93 (0.96) 0.36 (0.67) 0.00 (�)

Numbers in bold indicate significant differences between the scores, P < 0.006.



(U¼ 63, P < 0.01) but not the predicted difference for the

use of restrictive devices (U¼ 127.5, n.s.). Similarly the

comparison of verbal and physical aggression showed

differences in line with predictions for the effect of phy-

sical aggression on staff’s health (U¼ 32.5, P < 0.001), other

service users health (U¼ 63, P < 0.005) and the frequency of

physical restraint (U¼ 57.5, P < 0.005). The comparison of

destruction of the environment and physical aggression

showed predicted differences for the effect of physical

aggression on other services users health (U¼ 56,

P < 0.005), the effect of destruction of the environment

on the environment (U¼ 40.5, P < 0.001); the necessity

for modifications to the environment (U¼ 55.5, P < 0.003)

but not the predicted effect on staff’s health (U¼ 54, n.s.).

Finally, the comparison between SIB and inappropriate

vocalizations showed the predicted effect of SIB on the

person’s health (U¼ 66, P < 0.001).

A further assessment of the validity of the CBI compared

the total mean scores of each of these four behaviours.

Predictions were made on the assumption that the beha-

viours that potentially would have an overall greater range

of impacts would have a higher mean total score than the

behaviours whose potential impact was limited to one

person or the environment. Again, due to repeated testing,

the Bonferoni correction was applied and the alpha level

for each comparison was set at 0.0125. Controlling for

frequency and duration, a t-test for independent samples

found significant differences between self-injurious beha-

viour and inappropriate vocalizations (t¼ 2.93, d.f.¼ 20,

P < 0.01), and physical aggression and verbal aggression

(t¼ 4.11, d.f.¼ 27, P < 0.005). As predicted, no significant

differences were found between self-injurious behaviour

and physical aggression (t¼ 0.86, d.f.¼ 31, n.s.), or physi-

cal aggression and destruction of property (t¼ 0.52,

d.f.¼ 29, n.s.).

Discussion

The CBI was developed to provide a measure with which

to assess a broader range of impact of challenging beha-

viour in line with recent definitions of challenging beha-

viour (e.g. Emerson 1998). The scale has been piloted on 87

children and adults with intellectual disabilities. Inter-

rater and test–retest reliability have been reported for

the identification of behaviours of concern and the ratings

of impact of the behaviours. In general, these reliabilities

are good. The test–retest reliability for the total of all

impact scores for each behaviour is very high, and is at

a level that suggests that this score can be used to monitor

individual change over time. The reliability of the CBI is

probably due, at least in part, to the strategy of focusing on

behaviours that have occurred in the last month. This does

not imply that behaviours of lower rate should not be

considered to be challenging. However, less frequent

behaviours that have occurred some time ago are likely

to be unreliably appraised using objective measures.

The correlation between the CBI total score and Aber-

rant Behavior Checklist total score was highly significant.

Significant correlations were also obtained between the

Irritability, hyperactivity and inappropriate vocalizations

subscales of the ABC, indicating that the concurrent valid-

ity of the interview is good. That the correlations were

highest for the irritability and hyperactivity subscales is

consistent with the finding of Lowe et al. (1995) that people

with severe challenging behaviour referred for specialist

intensive support had significantly higher scores on these

subscales than people with severe challenging behaviour

not so referred. Content validity was established in this

study by demonstrating that the CBI discriminates

between self-injurious behaviour, physical aggression, dis-

ruption of the surrounding environment, and inappropri-

ate vocalizations on specific items relating to relevant

aspects of severity of impact. For example, significant

differences were found between self-injurious behaviour

and physical aggression on items relating to the impact of

each topography on the individual‘s health, (self-injurious

behaviour scores were significantly higher), staff health

and other service users‘ health (physical aggression scores

significantly higher). Most of the differences that were

predicted from an understanding of the topographies of

the behaviours included were significant. Differences that

were predicted, such as the difference between environ-

mental and physical aggression on carer health were

found not to be as predicted following the adjustment

of alpha levels based upon the Bonferonni procedure.

Importantly, the frequency of the behaviour was con-

trolled for in all comparisons so higher scores could not

be attributed to a higher frequency of behaviour. Thus, the

content validity of the CBI is confirmed by finding signi-

ficant differences in impacts between behaviours that by

definition have different impacts on the social and physical

environment.

The descriptive analysis of the CBI also contributes to

the content validity of the instrument. The range of total

scores for each behaviour demonstrates that the CBI

detects differences in the variability of severity of different

behaviours. Behaviours that have the potential for great

variability in terms of severity such as physical aggression

have a larger range of total scores. Alternatively, beha-

viours where the potential impact of the severity is limited

(e.g. inappropriate vocalizations) have a smaller range of

total scores. These differences also indicate that the CBI is



able to recognize the varying degrees of impact the same

behaviour can have on the lives of different individuals.

Whilst these preliminary analyses of the psychometric

properties of the CBI are encouraging there are a number

of ways in which assessment of validity and reliability

might be extended. In this evaluation only behaviours

rated as of concern were included as most participants

showed a number of behaviours and it is important to

avoid informant fatigue. Further studies should examine a

broader range of behaviours without using concern as an

inclusion criterion. Additionally, future research might

employ children and adults in non-service settings, con-

sider concurrent validity in an adult population and exam-

ine the internal consistency and factor structure of the CBI.

There is an assumption in the scoring of the CBI that items

carry equal weight both within and between behaviours

and this assumption warrants examination.

The CBI has a number of potential uses. It has particular

potential as a routine outcome measure in work with

people with challenging behaviour both at an individual

and service level in that it combines a focus on specific

behaviours with a psychometrically sound interview

method for data collection. Some previous evaluations

have used non-standardized records of specific behaviours

such as direct observation of engagement or records of

individual incidents of behaviour (e.g. Dagnan et al. 1996;

Hoefkens & Allen 1990). These methods are likely to be

sensitive to change but may be cumbersome or unreliable

as routine data collection methods. Others evaluations

have used measures such as the ABC (Aman et al. 1985)

and the ABS (Nihira et al. 1974) which measure a range of

pre-determined behaviours with a limited range of

impacts and give equal weighting to all behaviours regard-

less of whether they are present in a person‘s repertoire of

behaviours (Holmes & Batt 1980) or whether they have

been the focus for intervention. In some evaluation studies

these measures have demonstrated change (e.g. Lowe et al.

1996). However, these measures are likely to be relatively

insensitive to changes that may be targeted in community-

based interventions (LaVigna & Donnellan, 1986; Kushlick

et al. 1997; Horner et al. 1990). The CBI records a range of

impacts for the specific behaviours shown by the person

using a standardized interview. This feature of the scale

may make it more suited to the evaluation of multi-com-

ponent, non-aversive strategies (e.g. Horner et al. 1990).

The CBI has a range of other potential uses; such as in the

initial assessment of challenging behaviour, and as part of

the routine assessment of risk within services. It could also

be used in the examination of relationships between indi-

vidual topographies and other variables, such as quality of

life.

Correspondence

Any correspondence should be directed to Prof. Chris

Oliver, School of Psychology, University of Birmingham,

Edgbaston, Birmingham B15 2TT, UK (e-mail: c.oliver@b-

ham.ac.uk).

References

Aman M. G., Singh N. N., Stewart, A. W. & Field C. J. (1985) The

Aberrant Behavior Checklist: a behavior rating scale for the

assessment of treatment effects. American Journal of Mental

Deficiency 89, 485–491.

Clements P. R., Bost L. W., Dubois Y. G. & Turpin, W. B. (1980)

Adaptive Behavior Scale Part Two relative severity of

maladaptive behavior. American Journal of Mental Deficiency

84, 465–469.

Dagnan, D., McEvoy, J. & Sturmey, P. (1995) The psychometric

properties of a brief scale for assessing challenging behaviour.

The Irish Journal of Psychology 16, 21–28.

Dagnan D., Trout, A. Jones, J. & McEvoy, J. (1996) Changes in

quality of life following a move from hospital to small com-

munity unit for people with learning disabilities and challen-

ging behaviour. British Journal of Developmental Disabilities 42,

125–135.

Emerson E. (1995) Challenging Behaviour: Analysis and Intervention

in People with Learning Disabilities. Cambridge Press, Cambridge.

Emerson E. (1998) Working with people with challenging be-

haviour. In: Clinical Psychology and People with Intellectual Dis-

abilities (eds Emerson E., Hatton C., Bromley J. & Caine A),

pp. 127–153. Wiley, Chichester.

Emerson E., Alborz A., Kiernan C., Mason H., Reeves D., Swar-

brick R. & Mason L. (1997) The HARC Challenging Behaviour

Project Report 5. In: The Treatment and Management of Challenging

Behaviour. Hester Adrian Research Centre, University of Man-

chester,Manchester.

Felce K. & Lowe D. (1995) The definition of challenging behaviour

in practice. Mental Handicap 23 (3) 118–123.

Harris P., Humphreys J. & Thomson G. (1994) A checklist of

challenging behaviour. The development of a survey instru-

ment. Mental Handicap Research 7, 118–133.

Havercamp S. M. & Reiss S. (1996) Composite versus multiple

rating scales in the assessment of psychopathology in people

with mental retardation. Journal of Intellectual Disability Research

40 (2), 167–179.

Hoefkens A. & Allen D. (1990) Evaluation of a special behaviour

unit for people with mental handicaps and challenging beha-

viour. Journal of Mental Deficiency Research 34, 213–228.

Holmes C. B. & Batt R. (1980) Is choking others really equivalent to

stamping one’s feet An analysis of adaptive behavior scale

items. Psychological Reports 46, 1277–1278.

Horner R. H., Dunlap G., Koegel R. L., Carr E. G., Sailor W.,

Anderson J., Albin R. W. & O‘ Neill R. E. (1990) Toward a

technology of ’nonaversive‘ behavioral support. Journal of the

Association of Persons with Severe Handicap 15, 125–132.



Kiernan C. & Kiernan D. (1994) Challenging behaviour in schools

for pupils with severe learning difficulties. Mental Handicap

Research 7, 177–201.

Kushlick A., Trower P. & Dagnan, D. (1997) Applying cognitive

behavioural approaches to the carers of people with learning

disabilities. In: Cognitive Therapy for People with Learning Dis-

abilities (eds B. Kroese, D. Dagnan & K. Loumidis), pp. 95–107.

Routledge, London.

LaVigna G. & Donnellan A. (1986) Alternatives to Punishment:

Solving Behavior Problems with Non-Aversive Strategies. Irvington,

New York.

Lowe K., Felce D. & Blackman D. (1995) People with learning

disabilities and challenging behaviour: the characteristics of

those referred and not referred to specialist teams. Psychological

Medicine 25, 595–603.

Lowe K., Felce D. & Blackman D. (1996) Challenging behaviour:

the effectiveness of specialist support teams. Journal of Intellec-

tual Disability Research 40, 336–347.

McDevitt S. C., McDevitt S. C. & Rosen M. (1977) The Aberrant

Behavior Checklist. Part II. A cautionary note and suggestions

for revisions. American Journal of Mental Deficiency 82, 210–212.

Meyer L. & Janney R. (1989) User-friendly measures of meaningful

outcome: evaluating behavioral interventions. Journal of the

Association for Persons with Severe Handicaps 14, 263–270.

Nihira K., Foster R., Shellaas M. & Leland H. (1974) AAMD

Adaptive Behavior Scale. American Association on Mental Defi-

ciency, Washington, DC.

Oliver C., Murphy G. H. & Corbett J. A. (1987) Self-injurious

behaviour in people with mental handicap: a total population

study. Journal of Mental Deficiency Research 31, 147–162.

Quereshi H. (1994) The size of the problem. In: Severe Learning

Disabilities and Challenging Behaviours: Designing High Quality

Services (eds E. Emerson, P. McGill & J. Mansell), pp. 73–102.

Chapman & Hall, London.

Russell O. & Harris P. (1993) Assessing the prevalence of aggres-

sive behaviour and the effectiveness of interventions. In:

Research to Practice: Implications of Research on the Challenging

Behaviour of People with Learning Disability (ed. C. Kiernan),

pp. 37–52. BILD Publications, Avon.

Spreat S. (1982) An empirical analysis of item weighting on the

Adaptive Behavior Scale. American Journal of Mental Deficiency 87

(2) 159–163.



Documents

Assessing the Severity of Challenging Behaviour: Psychometric Properties of the Challenging Behaviour Interview