Standardized Scales. Standardization Use of identical procedures to collect, score, interpret, and report results of a measure Assures that differences

Standardized Scales

Standardization

Use of identical procedures to collect, score, interpret, and report results of a measure

Assures that differences over time or among different people are due to the variable being measured and not to different measurement procedures

What are Standardized Scales? Set of uniform procedures to collect,

score, interpret, and report numerical results

Usually have norms and empirical evidence of reliability and validity

Typically include multiple items aggregated into one or more composite scores

Frequently used to measure constructs

Construct

Complex concept (e.g., intelligence, well-being, depression)

Inferred or derived from a set of interrelated attributes (e.g., behaviors, experiences, subjective states, attitudes) of people, objects, or events

Typically embedded in a theory Oftentimes not directly observable but

measured using multiple indicators

Evaluating and Selecting Standardized Scales Purpose Reference populations and normative

groups Reliability Validity Practical considerations

Purpose

Identify whether or not a client has a significant problem

Measure and monitor your client’s outcomes to determine if your client is making satisfactory progress

Reference Population

Population of people for which a measure is intended and from which a normative group is sampled and norms are created

Normative Group

Representative sample of a reference population, used to estimate norms for that population and, more generally, used to develop and test standardized measures

Also known as a “standardization group” or “standardization sample”Population

Sample

Reliability

Internal consistency reliability (coefficient alpha) (most important)

Interrater rater reliability (sometimes) Test-retest reliability

Validity

Face Content Criterion Construct

Sensitivity to change especially important

Practical Considerations

Time Effort Training Cost Availability Acceptability (e.g., clients, practitioners,

etc.)

Decisions, Decisions…

Who Where When How often to

collect outcomedata

Who

Client Practitioner Relevant others Independent evaluators

Where and When

Private, quiet, physically comfortable location

Complete at about the same time and under the same conditions on a regular basis

How Often

Regular, frequent, pre-designated intervals

Often enough to detect significant changes in the problem, but not so often that it becomes problematic

In general about once per week

Engage and Prepare Clients

Be certain the client understands and accepts the value and purpose of monitoring progress

Discuss confidentiality Present measures with confidence Don’t ask for info the client

can’t provide

Engage and Prepare Clients (cont’d) Be sure the client is prepared Be careful how you respond to

information Use the information that is collected Be careful how you respond to

information Use the information that is collected

Administering, Scoring, and Interpreting Standardized Scales Score, scoring formula, composite score Unidimensional and multidimensional

scales Cut scores Reverse-worded items Reliable change, reliable improvement,

reliable deterioration Clinically significant improvement Expected treatment response

Score

Generic term for a number derived from a measure that represents the quantity or amount of an attribute or observation (e.g., number of times a behavior is observed, value obtained from a standardized scale)

Interpret in context of all available quantitative and qualitative information

Scoring

Procedure by which data from a measure are used to produce a score (e.g., number of times a behavior occurs or value on a standardized scale) or category (e.g., diagnostic category)

Scoring Formula

A mathematical rule by which data from a measure are used to produce a score (e.g., sum or average of responses to items on a multi-item standardized scale)

Item 1 Item 2 Item 3 Score

Composite Score

Score that combines results from two or more related items or other measures using a specified formula (e.g. percentage of items answered correctly on a statistics test)

Score

Item 3

Item 2Item 1

Unidimensional Scale

Scale that measures a single attribute or construct (e.g., depression). (Contrast with multidimensional scale.)

Multidimensional Scale

Scale that measures two or more distinct but related attributes or constructs, and measures of the different attributes or constructs are referred to as “subscales”

Global Distress

SubjectiveWell-Being

Problems &

SymptomsSocial

Functioning

Cut Scores

Specific predetermined numerical values along a continuum of scores Used to separate people into categories

with distinct substantive interpretations (e.g., clinically depressed or not)

Used to make decisions (provide treatment for depression or not)

Only as good as the normative sample(s) on which it is derived

Interpret in context of all available quantitative and qualitative information

Reverse-Worded Item

Item for which smaller numbers indicate a higher score on the measured variable because the item is worded to mean the opposite of the measured variable

Reliable Change

Change in a score from one time to another that is more than expected just from random measurement error Clinical significance.xls

Reliable Improvement

Improvement in a score from one time to another that is more than expected just from random measurement error

Reliable Deterioration

Deterioration in a score from one time to another that is more than expected just from random measurement error

Clinically Significant Improvement

Change that occurs when a client’s measured functioning on a standardized scale is:

In the dysfunctional range before intervention (e.g., greater than 5 on the QIDS-SR)

In the functional range after intervention (e.g., 5 or below on the QIDS-SR)

Change is reliable

Clinically Significant Improvement (cont’d) Interpret in context of all available

quantitative and qualitative information Does not guarantee a meaningful

change in a client’s real-world functioning or quality of life

Only as good as the normative sample(s) on which it is derived

Does not speak to the question of whether it was your intervention or something else that caused the change

Expected Treatment Response Session-by-session progress is

determined in comparison to normative data from ongoing responses to treatment of thousands of clients

Feedback used in real time to monitor client progress and modify services as needed to reduce treatment failures and increase overall effectiveness

Global Rating

Single rating based on a rater’s integration of information about numerous factors (e.g., global rating of change, improvement, or social functioning)

Single-Item Global Standardized Scales Global Assessment of Functioning (GAF) Children’s Global Assessment Schedule

(CGAS) Social and Occupational Functioning

Assessment Scale (SOFAS) Global Assessment of Relational

Functioning (GARF)

Potential Advantages of Standardized Scales Pretested for reliability and validity Structured, so information less likely to

be missed Can be used to compare individual

functioning to normative group functioning

Can be efficient and simple to use

Cautions in the Use of Standardized Scales May not measure concept suggested by

scale name Different measures of the same concept

may not be equivalent Sometimes limited information about

reliability and validity Concepts as measured

may not be completelyrelevant to individualclients

Resources

Compendiums of measures See Appendix B

Web measurement resources See Appendix B

Documents

Standardized Scales. Standardization Use of identical procedures to collect, score, interpret, and report results of a measure Assures that differences