Carma internet research module scale development

Scale Development for the Web

CARMA Internet Research ModuleJeff Stanton

Scale Development for the Web

• …is just like scale development for paper and pencil instrument, except…

• Diminishing response rates make shorter scales (3-5 items) more critical

• Having catchy, interesting, easy-to-read item content also encourages persistence with your study

• Reduced overall instrument length affects choices of the generality/breadth of measures

Scale Development Steps• Scale/concept development and definition (literature & researcher)• Item generation (subject matter experts)• Item review (subject matter experts)• Pilot test psychometrics (item variance, internal consistency)• Cull items based on statistical and judgmental criteria (researcher;

subject matter experts)• Secondary pilot test with initial evidence of nomothetic network

(researcher; subject matter experts)• Preliminary analysis of validation evidence (researcher; subject matter

experts)• Validation with experimental evidence or multi-trait, multi-method

matrix (researcher; subject matter experts)• Publication of psychometric and validity evidence (researcher)

Scale/concept development and definition (literature & researcher)

• The development of any scale should begin with a literature review of related concepts or constructs

• Based on ideas in the literature the researcher should develop a definition of the new construct to be measured

• The new construct should be defined positively (what it is) and negatively (what it isn’t)

• The rationale for creating the new construct and measure should be fleshed out at this time

Item generation (subject matter experts)

• Armed with the construct definition, a panel of experts (faculty, students, industry experts, practitioners, etc.) can generate an initial pool of items

• The pool should contain 5-10 times as many items as one expects to include in the final measure

• One can use a range of brainstorming techniques to generate item ideas

• Web surveys can be useful for collecting item ideas!• The response format should be considered at this time as well;

depending on the construct, a Likert, frequency, intensity, pair-choice, checklist, semantic differential or other scale format may be suitable

Item review (subject matter experts)

• Generally, after an initial item generation activity, one should using sorting techniques to organize the items into factors or banks

• Sorting can also be used for review by new SMEs; reviewed items can be kept, held for editing or discarded

• Final item pool should be presented with appropriate response format to a final set of SMEs prior to pilot testing

Pilot test psychometrics (item variance, internal consistency)

• Without worrying too much about validity concerns at this stage, the items should be fielded for response by a group of appropriate participants

• Generally, a minimum of responses per item fielded should be collected

• After item data are collected, screened, and cleaned, calculate basic item statistics such as mean, variance, skewness, inter-item correlations, and internal consistency

Cull items based on statistical and judgmental criteria (researcher; subject matter experts)

• Use the basic statistics to delete (or hold for editing) those items that performed poorly

• If there is sufficient data, some preliminary work with exploratory factor analysis can be used to assess factor purity and make decisions about whether a unitary or faceted scale is more desirable

• Items with borderline statistical properties should be considered for editing by SMEs before completely discarding: use a combination of statistical and judgmental criteria to decide

Secondary pilot test with initial evidence of nomothetic network (researcher; subject matter experts)

• The second pilot test will generally be on a diminished set of items, but not necessarily the final set; there may be rewritten items that have not been fielded before

• Field the items together with a few other related measures, some where a strong correlation is expected and some where no correlation is expected

• Here the demands of statistical power are stronger because you are looking both for significant correlations with other measures and some nil correlations as well; demonstrating a null result requires more statistical power; consult Cohen’s “A Power Primer” for guidance: use regression models

Preliminary analysis of validation evidence (researcher; subject matter experts)

• This is the final adjustment step prior to an actual validity run; items can be discarded at this stage, but any rewriting should be very minimal

• Depending upon the amount of data you have collected and the maturity of earlier processes, it is possible to perform confirmatory factor analysis on these data

• The output of this stage should be a scale that is considered final and basically ready for publication (after the collection of another batch of validity evidence)

Validation with experimental evidence or multi-trait, multi-method matrix (researcher; subject matter experts)

• This is the “official” validation, whose statistical results will be reported for publication: as much care and attention to this study as any substantive study of a research topic

• Experimental validation procedures have several merits; a manipulated independent variable is not subject to the common method variance critique; the choice of a manipulation must be based in theory, hopefully the same theory that was initially used to define the construct; experimental methods (when successful) help allay concerns of spurious correlations with other measures

• Short of experimental evidence, another powerful strategy is the multi-trait, multi-method matrix; it is quite challenging to find measures captured by alternate methods; MTMM, when successful, is good for showing how the new measure is uniquely positioned to avoid capturing variance of unrelated measures while being related but distinctive from similar constructs

Publication of psychometric and validity evidence (researcher)

• Not many new scale developments get this far, and there is generally a dearth of journals that will publish validation studies

• Nonetheless, this is the sine qua non of validation: peer review of the techniques used to support the goodness and usability of the new scale

Technology

Carma internet research module scale development