1
test developer correlates the scores on one testing with scores on a second testing that uses the same instrument and the same group of people. This correlation results in a test-retest reliability coefficient. One prob- lem with using the identical test for the test- retest reliability is that the person taking the test may learn from the test administration and carry the learning over to the next test session. This carryover effect could invali- date the reliability estimate because "all other sources of inconsistency" would not have been held constant. Reliability derived from equivalence compares the scores on 2 parallel test forms that purport to measure the same content domain or construct. The scores on one form of the test are correlated with the scores on the other form. Equivalent or parallel tests eliminate some of the prob- lems, due to learning from the test, that may arise with test-retest on the same in- strument. However, the validity of this ap- proach relies heavily upon the extent to which the 2 forms are, in fact, parallel. In split-half (e.g., odd-even, matched half) reliability, one constructs parallel forms within a single test and computes the correlation between the 2 parts. Closely related to the split-half reliability estimate is internal consistency reliability, which refers to the degree to which each item or cluster of items relates to the total test score. The Kuder-Richardson formu- lae, Hoyt's ANOVA, and Cronbach's alpha are among the approaches commonly used to determine internal consistency reliability. These formulas use data from a single test administration to estimate how highly the test would correlate with a parallel test. Several aspects of an instrument may in- fluence reliability. In general, the greater the number of items in a test, the higher will be the reliability. Longer tests have a greater chance of comprehensively sampling the domain being tested. However, practical considerations such as time and expense will limit test expansion as a means of improv- ing reliability. Subscores of a test usually have a lower reliability coefficient than does the test as a whole. Subscore reliabilities must be determined separately; they cannot be assumed from the reliability of the total test. Item analyis procedures often reveal reasons for poor reliability of an instru- ment. The difficulty index is the percentage of respondents that choose the correct re- sponse. An item with a very low difficulty index may be too complicated for the pur- pose of the test or may be poorly worded. The discrimination index and the item-to- total test correlation are approaches to mea- surement of the degree to which high-scor- ing respondents score higher on the item than do low-scoring respondents. High, positive values are desirable. Items that have negative discrimination indices are answered correctly more often by low- scoring than by high-scoring respondents. An item might have a low discrimination in- dex because it is so easy or difficult that everyone scores similarly, because it is am- biguous or confusing, or because the key is incorrect. Revision or deletion of items with low or negative discrimination indices or item-to-total test correlations usually will improve reliability of the instrument. CONCLUSIONS In considering the overall worth of an in- strument, an adequate reliability is a necessary but not a sufficient condition to justify the instrument. Unless the instru- ment is valid for its specific purpose, no matter how high the reliability coefficient, the instrument has no value. For example, a food attitude instrument with a reliability of 0.90 cannot properly be used as a predictor of food choice behavior unless the predic- tive validity of attitude and behavior has been established. Furthermore, we cannot assume that because a test is valid and reli- able for one group, it will be equally valid NUTRITION AND PREGNANCY A supplement to the American Journal oj Clinical Nutrition [34(4):655-817, 1981] presents the proceedings of a workshop en- titled "Nutrition of the Child: Maternal Nutritional Status and Fetal Outcome." Conference participants addressed 3 main topics: the assessment of the nutritional status of the mother and fetus, the relation- VOLUME 13 NUMBER 3 1981 ship of maternal status to fetal outcome, and the impact of nutritional intervention. Many papers relate more closely to .the spe- cific research interests of the authors than to a broad comprehensive review of the vari- ous topics. Nonetheless, the proceedings gather together a wealth of information and opinion on nutrition and pregnancy. The and reliable for all groups. Validity and reliability are dependent upon the char- acteristics of the test subjects and the pur- pose of the assessment. The bottom line on the worth of a re- search study is the quality of the instru- ments used to obtain the data. The primary burden of responsibility for quality instru- ments rests with the test developer. How- ever, this burden is shared by users and interpreters of the data, particularly with respect to understanding the limitations of instruments.. D ACKNOWLEDGMENT The authors would like to express their appreci- ation for the critical review and suggestions offered by Edward Haertel, Geneva Haertel, Judy Brun, Linda Junker, Peggy Uguroglu, Susan Levy, and Ernest Pascarella during various stages of development of this paper. LITERATURE CITED I American Psychological Association. Joint Committee of the American Psychological Association, American Educational Re- search Association, and National Council on Measurement in Education. Standards for educational and psychological tests. Washington, D.C.: American Psychological Assn., 1974,76 pp. 2 Buros, O. K., ed. The eighth mental mea- surements yearbook. 2 vols. Highland Park, N.J.: Gryphon Press, 1978,2,182 pp. 3 Anastasi, A. Psychological testing. 4th ed. New York: Macmillan Publishing Co., 1976, 750 pp. 4 Thorndike, R. L., and E. Hagen. Measure- ment and evaluation in psychology and education. 4th ed. New York: John Wiley & Sons, 1977,693 pp. 5 Popham, W. J., ed. Criterion-referenced measurement: An introduction. Englewood Cliffs, N.J.: Educational Technology Pubs., 1971, 108 pp. 6 Wolf, R. M. Evaluation in education: Foun- dations of competency assessment and pro- gram review. New York: Praeger Pubs., 1979,217 pp. 7 Cronbach, L. J. Essentials of psychological testing. 3d ed. New York: Harper & Row Pubs., 1970,752 pp. workshop participants identified priority topics for further research in this area- ex- amination of the usefulness of developmen- tal standards, the limits and benefits of nutrition intervention, and the relationship of nutritional status to psychological development. S.M o. JOURNAL OF NUTRITION EDUCATION 85

# Nutrition and pregnancy

votruc

• View
218

0

Embed Size (px)

Citation preview

test developer correlates the scores on one testing with scores on a second testing that uses the same instrument and the same group of people. This correlation results in a test-retest reliability coefficient. One prob­lem with using the identical test for the test­retest reliability is that the person taking the test may learn from the test administration and carry the learning over to the next test session. This carryover effect could invali­date the reliability estimate because "all other sources of inconsistency" would not have been held constant.

Reliability derived from equivalence compares the scores on 2 parallel test forms that purport to measure the same content domain or construct. The scores on one form of the test are correlated with the scores on the other form. Equivalent or parallel tests eliminate some of the prob­lems, due to learning from the test, that may arise with test-retest on the same in­strument. However, the validity of this ap­proach relies heavily upon the extent to which the 2 forms are, in fact, parallel.

In split-half (e.g., odd-even, matched half) reliability, one constructs parallel forms within a single test and computes the correlation between the 2 parts.

Closely related to the split-half reliability estimate is internal consistency reliability, which refers to the degree to which each item or cluster of items relates to the total test score. The Kuder-Richardson formu­lae, Hoyt's ANOV A, and Cronbach's alpha are among the approaches commonly used to determine internal consistency reliability. These formulas use data from a single test administration to estimate how highly the test would correlate with a parallel test.

Several aspects of an instrument may in­fluence reliability. In general, the greater the number of items in a test, the higher will be the reliability. Longer tests have a greater chance of comprehensively sampling the domain being tested. However, practical considerations such as time and expense will

limit test expansion as a means of improv­ing reliability. Subscores of a test usually have a lower reliability coefficient than does the test as a whole. Subscore reliabilities must be determined separately; they cannot be assumed from the reliability of the total test.

Item analyis procedures often reveal reasons for poor reliability of an instru­ment. The difficulty index is the percentage of respondents that choose the correct re­sponse. An item with a very low difficulty index may be too complicated for the pur­pose of the test or may be poorly worded. The discrimination index and the item-to­total test correlation are approaches to mea­surement of the degree to which high-scor­ing respondents score higher on the item than do low-scoring respondents. High, positive values are desirable. Items that have negative discrimination indices are answered correctly more often by low­scoring than by high-scoring respondents. An item might have a low discrimination in­dex because it is so easy or difficult that everyone scores similarly, because it is am­biguous or confusing, or because the key is incorrect. Revision or deletion of items with low or negative discrimination indices or item-to-total test correlations usually will improve reliability of the instrument.

CONCLUSIONS In considering the overall worth of an in­strument, an adequate reliability is a necessary but not a sufficient condition to justify the instrument. Unless the instru­ment is valid for its specific purpose, no matter how high the reliability coefficient, the instrument has no value. For example, a food attitude instrument with a reliability of 0.90 cannot properly be used as a predictor of food choice behavior unless the predic­tive validity of attitude and behavior has been established. Furthermore, we cannot assume that because a test is valid and reli­able for one group, it will be equally valid

NUTRITION AND PREGNANCY

A supplement to the American Journal oj Clinical Nutrition [34(4):655-817, 1981] presents the proceedings of a workshop en­titled "Nutrition of the Child: Maternal Nutritional Status and Fetal Outcome." Conference participants addressed 3 main topics: the assessment of the nutritional status of the mother and fetus, the relation-

VOLUME 13 NUMBER 3 1981

ship of maternal status to fetal outcome, and the impact of nutritional intervention. Many papers relate more closely to .the spe­cific research interests of the authors than to a broad comprehensive review of the vari­ous topics. Nonetheless, the proceedings gather together a wealth of information and opinion on nutrition and pregnancy. The

and reliable for all groups. Validity and reliability are dependent upon the char­acteristics of the test subjects and the pur­pose of the assessment.

The bottom line on the worth of a re­search study is the quality of the instru­ments used to obtain the data. The primary burden of responsibility for quality instru­ments rests with the test developer. How­ever, this burden is shared by users and interpreters of the data, particularly with respect to understanding the limitations of instruments.. D

ACKNOWLEDGMENT

The authors would like to express their appreci­ation for the critical review and suggestions offered by Edward Haertel, Geneva Haertel, Judy Brun, Linda Junker, Peggy Uguroglu, Susan Levy, and Ernest Pascarella during various stages of development of this paper.

LITERATURE CITED

I American Psychological Association. Joint Committee of the American Psychological Association, American Educational Re­search Association, and National Council on Measurement in Education. Standards for educational and psychological tests. Washington, D.C.: American Psychological Assn., 1974,76 pp.

2 Buros, O. K., ed. The eighth mental mea­surements yearbook. 2 vols. Highland Park, N.J.: Gryphon Press, 1978,2,182 pp.

3 Anastasi, A. Psychological testing. 4th ed. New York: Macmillan Publishing Co., 1976, 750 pp.

4 Thorndike, R. L., and E. Hagen. Measure­ment and evaluation in psychology and education. 4th ed. New York: John Wiley & Sons, 1977,693 pp.

5 Popham, W. J., ed. Criterion-referenced measurement: An introduction. Englewood Cliffs, N.J.: Educational Technology Pubs., 1971, 108 pp.

6 Wolf, R. M. Evaluation in education: Foun­dations of competency assessment and pro­gram review. New York: Praeger Pubs., 1979,217 pp.

7 Cronbach, L. J. Essentials of psychological testing. 3d ed. New York: Harper & Row Pubs., 1970,752 pp.

workshop participants identified priority topics for further research in this area- ex­amination of the usefulness of developmen­tal standards, the limits and benefits of nutrition intervention, and the relationship of nutritional status to psychological development. S.M o.

JOURNAL OF NUTRITION EDUCATION 85