Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
THE DEVELOPMENT AND EVALUATION OF
A DIAGNOSTIC SYSTEM OF REMEDIATION
FOR AN AUTO-TUTORIAL COURSE IN
GENERAL COLLEGE CHEMISTRY
A Dissertation
for II'Ie Degree of DH. D.
MICHIGAN STATE UNIVERSITY
Gary William VanKempen
I977
L [B RA R Y
Michigan State
University
This is to certify that the
thesis entitled
THE DEVELOPMENT AND EVALUATION OF A
DIAGNOSTIC SYSTEM OF REMEDIATION
FOR AN AUTO-TUTORIAL COURSE IN
GENERAL COLLEGE CHEMISTRY
presented by
GARY WILLIAM VANKEMPEN
has been accepted towards fulfillment
of the requirements for
Ph . D . Chemis try anddegree in
Administration & Higher
Education
Major professor
Date ~Deceuber.10, 1976
0—7639
..
.y~-
ABSTRACT
THE DEVELOPMENT AND EVALUATION OF A
DIAGNOSTIC SYSTEM OF REMEDIATION
FOR AN AUTO-TUTORIAL COURSE IN
GENERAL COLLEGE CHEMISTRY
BY
Gary William VanKempen
A unique system of diagnosis and remediation has
been developed for an auto-tutorial computer managed
course in general college chemistry. The system is based
upon an analysis of examination questions used in the
course to identify the kind or kinds of thinking required
by each question. A task analysis was used to identify
six important kinds of thinking which are: memorization,
translation, classification, visualization, reasoning
and reasoning with math.
The validity of these categories was tested by
examining the agreement obtained when several content
experts classified the questions independently. The
highest interclassifier agreement (over 90 percent) was
obtained for the memorization and reasoning categories.
For each of the other categories, the agreement was
approximately 75 percent. A further test of validity
compared the inter-item correlation coefficients between
Gary William VanKempen
pairs of questions each of the same kind of thinking and
pairs of questions in which the kinds of thinking were not
matched. The correlations from questions of the same kind
of thinking were significantly higher (a = .05) than
correlations for different kinds of thinking for the
memorization and reasoning with math categories.
An experimental remedial system was designed in
which students who scored below 60 percent on previous
examination questions received remediation based on the
kind of thinking in which they were most deficient. The
two categories on which the remediation was based were
memorization and reasoning with math. A distinction was
made between scores obtained from tests involving content
discussed in remediation (the initial learning score)
and scores obtained when the student was being introduced
to new material (the transfer score). The latter repre-
sents the transfer of training in a particular kind of
thinking to a new topic in the course.
Supplementary instructional materials and classes
were made available to students for the first four weeks
of a ten-week term. For both the memorization and the
reasoning with math categories, the experimental group
scored significantly higher than the control group on the
initial learning score but there was no significant
difference between the groups on the transfer score.
Thus remediation seems to improve performance on material
Gary William VanKempen
discussed during the remedial class, but the improvement
is not maintained when new material is introduced.
THE DEVELOPMENT AND EVALUATION OF A
DIAGNOSTIC SYSTEM OF REMEDIATION
FOR AN AUTO-TUTORIAL COURSE IN
GENERAL COLLEGE CHEMISTRY
BY
Gary William VanKempen
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Departments of Chemistry and Higher Education and Administration
1977
ACKNOWLEDGMENTS
I would like to express my sincere thanks to Dr. Robert
N. Hammer for his guidance throughout my graduate study.
A special thanks is also extended to Dr. Ed Smith for
his friendship and for his assistance with this project.
Appreciation is also extended to Dr. Jack B. Kinsinger
who provided a much needed inspiration.
This work was supported in part by a grant from the
Educational Development Program of Michigan State Uni-
versity and a grant from the Alfred P. Sloan Foundation
administered through the College of Engineering at Michigan
State. Appreciation is extended to the Chemistry Depart-
ment of Michigan State University for supporting me during
the initial stage of my graduate study and providing me
with my first significant opportunity to teach.
Finally, I would like to thank my parents, Peter and
Manual VanKempen, and my wife, Dorinda, for their con—
tinued support and love.
iii
TABLE OF CONTENTS
Chapter Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES. . . . . . . . . . . . . . . . . . . .Viii
INTRODUCTION: AN OVERVIEW OF THE PROJECT. . . . . . . l
The Problem. . . . . . . . . . . . . . . . . . . . 1
Goals of the Project . . . . . . . . . . . . . . . 2
Research Questions . . . . . . . . . . . . . . . . 4
Generalizability of the Results. . . . . . . . . . 5
HISTORICAL: A REVIEW OF THE LITERATURE. . . . . . . . 6
Classifying the Outcomes of Education. . . . . . . 6
Gagne's Classification of Learning . . . . . . 8
Science Processes. . . . . . . . . . . . . . . ll
Empirical Support of Taxonomies. . . . . . . . l2
Diagnosis and Remediation. . . . . . . . . . . . . l4
Diagnosis Based on Piagetian Theory. . . . . . 15
The Effects of Diagnostics . . . . . . . . . . l8
EXPERIMENTAL METHODS AND PROCEDURES. . . . . . . . . . 22
The Instructional Setting. . . . . . . . . . . . . 22
The Computer Management System . . . . . . . . . . 23
The Classification Scheme. . . . . . . . . . . . . 25
Developing the Categories: A
Task Analysis. . . . . . . . . .f. . . . . . . 26
Kinds of Thinking. . . . . . . . . . . . . . . 27
Criteria and Procedures for
Characterizing Questions . . . . . . . . . . . 28
Validation of the Categories . . . . . . . . . . . 35
iv
Chapter
Reliability. . . . . . . . .
Agreement Among Classifiers.
An Analysis of Correlation
Coefficients . . . . . . . .
The Remedial System: Project CLIC . . . . .
Selecting the Sample . . . .
The Design . . . . . . . . .
The Treatment. . . . . . . .
Evaluation of the Remedial System. . . . . .
SUMMARY AND DISCUSSION
Overview of the Project. . . . .
The Validity of the Classification
System 0 O O O O O O O O O O I 0
Reliability. 0 O O O O O O O
Interclassifier Agreement. .
Analysis of Correlation Coefficients . .
Evaluation of Project CLIC . . .
IMPLICATIONS FOR FUTURE RESEARCH AND
A Chemical Education Laboratory.
The Difficulty Index . . . . . .
Alternative Validity Studies . .
Selecting the Sample . . . . . .
A Piagetian Classification . . .
REFERENCES . . . . . . . . . . . . .
APPENDIX A . . . . . . . . . . . . .
APPENDIX B . . . . . . . . . . . . .
APPENDIX C . . . . . . . . . . . . .
APPENDIX D O O O O O O O O O O O O 0
DEVELOPMENT
Page
36
38
39
47
47
48
49
50
55
56
56
57
60
63
69
69
71
73
74
75
78
80
93
95
99
LIST OF APPENDICES
Letter Referred to on Page Title
A 23 Fortran Programs
B 34 Characterization of
Questions
C 40 Calculating Grouped
Average Correlation
Coefficients
D 49 ‘ Outline of CLIC Material
Vi
LIST OF TABLES
Table
I. Interclassifier Agreement . . . . .
II. Averages and Standard Deviations
for Grouped Sets of Correlation
Coefficients. . . . . . . . . . . .
III. Planned Comparisons of Grouped
Correlation Coefficients. . . . . .
IV. Two-way Analysis of Variance of
Grouped Correlation Coefficients. .
V. T-test Evaluation of Project CLIC .
VI. A Summary of Interclassifier Agree-
ment 0 O O O O O O O O O O O O O O 0
vii
Page
40
43
44
46
S4
58
INTRODUCTION: AN OVERVIEW OF THE PROJECT
The Problem
It is obvious to anyone teaching chemistry at the
introductory college level that many students find the
subject difficult to comprehend. Unfortunately, the
sources of this difficulty are much less obvious, as is
the remedy. The problem is compounded by the trend toward
less stringent admission requirements, the increasing
number of special programs for students who are seeking
post-secondary education, but do not meet standard admis-
sion requirements, and the large enrollments in courses
for non-majors which are prerequisite for courses in other
academic areas. College science teachers are being asked
to deal effectively with students who traditionally would
have been considered incapable of doing science, either
because of poor backgrounds in high school science or a
lack of sufficient basic abilities. These students, when
placed into courses with more capable students, present a
difficult problem for instructors. The more heterogeneous
a group of students becomes, the more difficult it is to
present materials and tasks which challenge students, but
represent reasonable expectations. There have been many
attempts to solve this problem through programmed instruc-
tion, personalized systems of instruction, or through
traditional remediation.
While there are advantages and disadvantages accompany-
ing each of these methods, one important advantage of re-
medial instruction is its supplementary nature. Any pro-
cedure for remediation can be developed as an addition
to--rather than a modification of—-an existing course.
Remedial instruction, however, can only be effective to the
extent that it focuses accurately on student learning
problems. The identification of these problems is a dif-
ficult task which usually requires a substantial amount of
student-teacher contact. Often the high enrollment and
emphasis on decreased cost prevents a great deal of the
type of contact between students and instructors which is
necessary for any kind of systematic look at student learn-
ing problems.
Goals of the Project
This project begins with the assumption that there
are a significant number of students in introductory fresh-
man chemistry courses who do not achieve at the level they
could because they have not developed some of the skills
and abilities necessary for success. We realize that there
are other factors, such as motivation, attitude, and prior
experience in studying chemistry which also affect a student's
achievement. We have, however, chosen to focus our atten-
tion on a specific set of abilities to be identified through
a study of student performance on course examinations.
This set of abilities will be divided into categories which
are referred to as "kinds of thinking". These kinds of
thinking are identified through a task analysis of the
questions typically asked on freshman chemistry exams.
From this task analysis, generalized task descriptions are
developed which are then converted into descriptions of
specific kinds of thinking. Each exam question can then
be characterized according to the kinds of thinking it
requires.
The specific goals of this project can be considered
in three parts:
a. To develop and validate a series of criteria by
which one can classify questions commonly asked
in general chemistry. This classification will
be based upon the kinds of thinking required to
answer each question.
b. To develop a system of diagnosis which identifies
students who are scoring poorly on a group of
questions which require the same kind of thinking.
c. To develop and evaluate remedial materials which
focus on particular kinds of thinking as they apply
to general chemistry.
The purpose of the criteria described in part "a'
above is to guide the characterization of test items in
terms of the kinds of thinking required to answer them.
Once questions are characterized in this manner, it should
be possible to group them into separate classes depending
upon whether the question does or does not require specific
kinds of thinking. Examination of student scores on a
particular class of questions may reveal a pattern of misses
which could be traced to a lack of ability to perform the
kind of thinking required. Remediation which focuses on
these thought processes as they relate to chemistry, might
then improve scores on this particular class of questions.
Research Questions
In general, the project can be viewed as a study of
student performance on examination questions in general
chemistry as a function of the kinds of thinking required
by the questions. To facilitate the discussion of the
research questions, the following definitions have been
established.
a. Class oquuestions: All questions which are classi-
fied as having a particular kind of thinking or
series of kinds of thinking in common.
b. Topic: Any specified content which is discussed
sequentially during a course.
c. Subtest score: The percentage correct responses
to a specified group of questions which are all
members of the same class of questions.
The project is designed to answer the following research
questions.
a. Does student performance on examination questions
support the categories proposed in this classifica-
tion scheme?
b. Do the remedial materials which focus on the kind
of thinking required in a specific class of questions
covering a particular topic in chemistry improve
student performance on that class of questions for
that particular topic?
c. Once students receive remediation on a particular
kind of thinking as it applies to general chemistry
in one topic, will there be any improvement in
their performance on questions requiring the same
kind of thinking but covering a different topic?
That is, is there transfer of training of a par-
ticular kind of thinking to a new content within
the course?
Generalizability of the Results
Since the content of general chemistry is fairly stan-
dard throughout the majority of freshman college courses
and many high school courses, we expect that the classi-
fication scheme will be useful to anyone teaching general
chemistry. Since the students involved are representative
college freshmen, the identified learning problems are
likely to be present in varying degrees in any general
chemistry course. The system of diagnosis is not neces—
sarily restricted to an individualized course, although
frequent examinations and careful record keeping are likely
to be essential. The success of the diagnostic system and
the supplementary instructional materials in improving
students' examination scores in general chemistry at Michigan
State will be a clear indication of their probable success
at other schools.
The classification of a question may be dependent on
the kind of instruction given in a course. That is, the
kind of thinking used by a student to answer a question
may depend upon the instruction received by the student.
HISTORICAL: A REVIEW OF THE LITERATURE
Classifying the Outcomes of Education
The interest in a classification of learning in terms
other than the specific content of a discipline has been
evident for decades. One of the earliest and most widely
used classifications of this type is Bloom's Taxonomy of
1 in which BloomEducational Objectives: Cognitive Domain,
organizes learning into categories labelled knowledge, com-
prehension, application, analysis, synthesis, and evaluation.
One very popular use of this taxonomy has been to describe
the content of achievement tests. Fast2 reported an analysis
of the American Chemical Society-National Science Teacher's
Association High School Chemistry Achievement Test in which
he found 40 percent of the questions to be in the knowledge
category, 25 percent each in comprehension and application
and 10 percent in the analysis category. He also discovered
a trend toward a higher percentage of application and
analysis questions during the period from 1957 to 1971.
Airaisian3 used Bloom's taxonomy to describe the objectives
of two chapters of Chemistry: An Experimental Science
according to Bloom's level of the objective. He found that
high school chemistry teachers could classify the objec-
tives with a 90 percent level of agreement. Airaisian
also determined that a majority of the objectives fell into
the knowledge class and required only recall of information.
4 used Bloom's taxonomy to analyze the cognitiveScott
levels of activities and exercises in a particular set of
instructional materials. He examined an early edition
of Science--A Process Approach5 and found that many activi-
ties required application behavior and some required anal-
ysis and synthesis. As the use of the taxonomy became
widespread, there developed a consensus that many textbooks
and standardized tests fail to provide enough tasks at the
higher levels of synthesis, evaluation and application.
This has lead in some cases to a decrease in emphasis on
tasks which require the student simply to recall informa—
tion. This situation illustrates the effect that a classi-
fication of the objectives of education can have on the
direction of curricular change.
One weakness of a thought process approach to ques-
tion classification is that these processes are only
inferential constructs. They cannot be observed directly.
One cannot assume that all students answer the same ques—
tions using the same cognitive processes. To help deal
with this problem, one should keep in mind the instruc-
tional material on which the questions being classified
are based. The chances may be greater that two students
will answer the same questions using the same processes
if they have been exposed to the same instructional mater—
ial. The classification of questions which is used in
'the present. project is not based upon a strict cognitive
process approach. That is, we do not intend to identify
in detail the information processing routines used by
students to answer questions. The procedure used in this
research is similar to a task analysis described by Smith6
in which tasks are described according to the characteris-
tics of the given information and the information which
the student is attempting to find. An assumption must be
made concerning the information which students bring to
a given task. For this research, the assumption will be
based upon a knowledge of the instructional material
presented to the student and the knowledge gained from
many years of experience observing how students perform
the required tasks of this freshman chemistry course.
Gagne's Classification of Learning
Another widely accepted classification of learning is
that described by Gagne. In his article on the domains
of learning,7 he categorizes learning processes into class-
es described as motor skills, verbal information, intellec—
tual skills, cognitive strategies, and attitudes. He
emphasizes that these kinds of learning can be found to
varying extents in all disciplines. He suggests that state-
ments concerning optimum learning conditions and methods
of testing which are apprOpriate for one of the domains of
learning may not be appropriate for another. In particu-
lary the instructional procedures which maximize learning
‘mithin one domain are different from those which best
encourage learning in a different domain. For example, the
nature of instruction designed to teach the learner to
perform an acid-base titration should certainly be dif-
ferent from instruction on the nomenclature of simple
molecules. In the former case one is teaching a motor
skill, while the latter involves intellectual skills and
verbal information.
Gagne also has emphasized that different assessment
techniques are required for objectives in different domains.
One does not test possession of verbal knowledge in the same
manner as he would test the possession of intellectual
skills. If this latter idea is correct, it should then
be possible to describe the domain or domains of learning
of specific test questions and, from scores on these ques-
tions, to evaluate student learning within these specific
domains. Although Gagne's categories are still best des-
cribed as inferential constructs, they have provided sig-
nificant guidelines for the design and evaluation of in-
struction. They have also greatly influenced the nature
of the classifications proposed in this study. Since the
categories that have been developed for the present diag-
:nostic system.can best be described as verbal information
and intellectual skills, these domains will be discussed
in more detail .
The learning of verbal information means that the
learner is able to state in declarative form what he has
10
learned. Verbal information comprises the facts, principles,
and generalizations which make up a large part of school
learning in any discipline. In introductory chemistry,
the student is asked to learn the chemical symbols of the
elements. This verbal information can be presented by
examining the relationships between the names of elements
and their symbols and giving the appropriate Latin or
German names. To test whether a student has learned these
symbols one usually asks that the name be declared from
the symbol, or vise-versa. The learning of verbal informa-
tion does not necessarily give the learner the ability to
apply that information to a novel situation.
Gagne describes intellectual skills as "knowing how
as contrasted with knowing that"8. A student learns to
convert fractions to decimals or to represent the electronic
configuration of the elements or the Lewis dot structures
of simple molecules. An intellectual skill is a learned
capability which enables the learner to perform a particular
group of tasks if he possesses the appropriate knowledge.
The lack of an intellectual skill may prevent a student from
performing a class of tasks in spite of his knowledge of
verbal information. Studies have shown that ability to
perform a specific task is greatly increased if the student
is taught the prerequisite skills.9 Gagne and Brownlo’11
investigated the learning of a task of constructing formulas
for the sums of number series. They identified the "sub-
Cufiinate skills" which were necessary to perform the task
11
and related these skills to one another in a hierarchy.
The hierarchy therefore represents the sequence of abilities
upon which the learner would rely in order to perform the
superordinate task. Seven students who were unable to
construct the formulas were tested on each of the sub-
ordinate skills and given instruction on the skills which
they could not perform. The final task was again presented
with verbal directions about how to do it but no additional
practice. Six out of the seven students were then able
to perform the final task. Thus, through an analysis of
the task and through appropriate instruction on subordinate
tasks, the experimenters were able to significantly increase
students' ability to perform the desired task.
Science Processes
An important classification of the objectives of science
education has been in terms of "science processes". A
science process can be described as a class of similar
tasks which scientists perform. These tasks include ob-
serving, comparing, classifying, quantifying, measuring,
experimenting, inferring, and predicting. The idea of
science processes is included in this review because the
philosophy behind the characterization of these processes
has influenced the conception of the present project. A
science process is a class of similar tasks which scientists
perform, while each category developed in this study is a
12
class of similar tasks which students are required to
perform. Exploring the nature of these classes of tasks
is an important step in determining the skills which are
necessary to perform them.
Guided mostly by the work of Schwab, efforts were made
to characterize the nature of what it is that scientists
do with the information that they obtain and how they go
about obtaining it. Despite Schwab's warnings, the science
teaching establishment accepted the notion that these
processes once identified, would prove to be common through-
out the various scientific disciplines. Although this did
not prove to be completely true, the notion that there
are certain important abilities which are essential com-
ponents of many science disciplines has remained. Research
in this area has focused on the measurement of student per—
formance of these process objectives, relating this per-
formance to overall course achievement, and studying the
nature of the processes themselves.
Empirical Support of Taxonomies
A serious criticism raised against the taxonomies which
have been described is that the abilities suggested in
these categories are not reflected in measurements of
12 studiedstudent performance. For example, Tannenbaum
science processes by means of an empirical test which he
designed. The science processes which he studied include
13
observing, comparing, classifying, quantifying, measuring,
experimenting, inferring, and predicting. His test included
96 items chosen from several of the natural sciences. The
author suggests (but does not substantiate) that student
performance on the test should not depend upon the distribu-
tion of questions among the various disciplines. Tannen-
baum used textbooks and various research reports to develop
a list of behaviors which science students are expected
to exhibit. These behaviors were then classified according
to the eight categories listed above. The test was admin-
istered and several statistical procedures were applied
to the results. The author reports overall test reliability
as well as subtest reliability, which is the reliability of
a group of questions from only one of the categories men-
tioned. For four of the eight processes studied, the sub-
test reliabilities were not significantly greater than that
expected of a random sample of the corresponding number of
questions from the entire test. A factor analysis identified
only one general factor which accounted for about 50% of
the variance in the scores.
In an empirical study of the hierarchical nature of
Bloom's taxonomy, Stedman13 found no significant difference
in scores for questions identified as knowledge and compre—
hension or application and analysis. There was however
a significant difference between scores on questions from
the comprehension and application categories.
14
The failure of many of these taxonomic categories
of objectives to be validated in empirical studies may
in part be due to their general nature. In this project,
the categories are based upon an analysis of the ques-
tions typically asked in a general chemistry sequence.
We hope that these categories-—which are designed not
only for a particular discipline but for a particular
group of courses within that discipline~-may be more easily
validated and consequently may provide more useful infor—
mation about student learning than general categories.
Diagnosis and Remediation
There have been a number of projects reported which
are in some ways similar to the present project. This
review is limited to diagnostic and remedial systems
used in high school chemistry and physics courses as
well as in college science courses because the informa-
tion from these areas is the most generalizable to the
freshman college chemistry course used as the laboratory
for the present research.
For the purposes of this study, remediation is defined
as an attempt to supply the thinking skills prerequisite
to a particular learning task when those prerequisites are
not part of the subject matter of the course. Furthermore,
the prerequisites are defined as a set of intellectual skills
which are applied to the tasks of college level general
15
chemistry. Since a vast literature exists on the effects
of various remedial treatments, it seems appropriate to
consider only the effect of that part of remediation con—
cerned with thinking skills assumed to he possessed by most
college freshmen and which are applied throughout most
introductory science courses.
Diagnosis Based on Piagetian Theory
Recent attempts to apply Piagetian theory to college
students has produced some unexpected results. According
14to Piaget, we each pass through four distinct periods
of intellectual development as we mature. There are:
l. Sensori-motor (0-2 years);
2. Preoperational thought (2-7 years);
3. Concrete operations (7-11 years);
4. Formal operations (11-15 years).
Research in this area has focused on (1) the develop-
ment of tests on an individual's stage of intellectual
development and (2) the mechanism by which one advances
to a higher stage and the effect of instructional experiences
on that advancement.15 For example, in a study reported
by Bredderman,16 the effect of training fifth and sixth
grade students to control variables was measured by ad—
ministering pre and post tests involving the control of
several different variables. According to Piagetian
16
theory, individuals who are not yet in the formal stage
of intellectual development will be unable to perform
this task. Bredderman demonstrated a significant but
small improvement in students' ability to perform these
tasks as the result of training. There were some students
for whom the training had no effect as demonstrated by
their pretest and posttest scores. These results are
typical of the Piagetian training studies reviewed by
Beilin.15
The application of Piagetian tasks to college students
has revealed that many college freshmen do not demonstrate
an ability to think at the formal level in some situations.
17 states that in a study at Oklahoma City Univer-McKinnon
sity 50 percent of 143 college freshmen failed to perform
tasks requiring thinking at the formal level. In a study
18by Renner and Lawson, only 22 percent of a sample of
college freshmen were judged to be in the formal stage
19 and Renner20of development. Studies by Griffith
have produced similar results.
The question which remains unanswered is whether the
results with college students indicate a general develOp-
mental retardation on the part of a vast majority of
college freshmen or an inability of these students to
think at the formal level in specific situations. An
interpretation of these results which is not in conflict
with Piagetian theory is that many students who have
advanced to the formal stage of intellectual development
17
do not always demonstrate their ability to think at this
level in all situations. This may be due to the particular
content involved in the Piagetian task, the effect of test
anxiety, the student's habit of reverting to concrete think-
ing in certain situations, or the lack of validity of the
particular test. Support of this interpretation comes
20 with children from gradesfrom a study by Danner and Day
five to twelve. Initially, 50% of the older subjects and
none of the younger subjects were able to perform tasks
which require formal operations. After a few prompts,
nearly all of the older subjects and a few of the younger
subjects were able to perform at the formal level on a
different task.
Renner and Lawson also found that high school chemistry,
biology and physics students scored significantly higher
than the general population of high school students. This
may be the result of a selection of science courses by
formal thinkers, or is the result of the practice one
can get in thinking at the formal level in a typical
science class, or both. The evidence does indicate
however that many freshman college students do not demon—
strate an ability to think at the formal level.
Few would deny that college chemistry and physics are
taught at the formal level. Students are required to deal
with relationships between variables and to have an under—
standing of how these relationships are developed and
tested. If students taking these courses are not in the
18
habit of thinking at the formal level, they will experience
a great deal of difficulty mastering the materials in
these courses. Assuming that these students could be
identified, they could be provided with instruction de-
signed to increase their tendency to use formal operations
in their study of science. The optimum nature of this
instruction is, at this time, unknown and is a question
which deserves some serious attention.
The Effects of Diagnostics
Lawler has investigated the effects of a diagnostic
system which identifies for each student the objectives
which he has failed to master in a health sciences course
21 Exams were taken on anat the freshman college level.
IBM 1500 terminal designed for computer assisted instruc-
tion. Students who received this diagnostic information
showed greater achievement as measured by the course final
exam than students who did not. Apparently the diagnostic
information helped students focus on objectives which they
were unable to perform.
In a freshman chemistry course at the University of
22 a diagnostic system called CHEM TIPS hasWisconsin,
been established in which students take a once-a-week
survey which requires them to demonstrate knowledge of
recent course material. The responses are computer
analyzed according to a set of predetermined criteria.
19
Those students who miss particular questions or groups of
questions receive computer generated messages indicating
topics they should work on, textbook page numbers where
this material can be found, and times during the week when
help sessions dealing with this material will be held.
Survey results were given to the teaching assistants so
they could work on this material during recitation sections.
In an attempt to evaluate the effectiveness of this system,
one of two lecture sections taught by the same instructor
was given the option of taking the CHEM TIPS survey while
the other was not. Enrollments were 163 in the experimental
group and 167 in the control group. There was in this
case no significant difference between the average scores
on three course examinations for the two groups. There
was, however, a difference in the attitudes of students
toward their teaching assistants. The experimental group
responded more favorably to questions concerning the
teaching assistant's interest in student's progress, ef-
fectiveness as a teacher and ability to answer questions.
This could be the result of the teaching assistants bene-
fiting from the information provided by the CHEM TIPS
survey.
Riban23 has studied a diagnostic system which identi-
fies deficiencies in mathematical abilities based upon
patterns of correct and incorrect solutions of physics
problems by high school physics students. The mathematical
abilities were established through an analysis of the
20
skills required for the solution of each problem. One
hundred sixty three separate mathematical abilities were
identified. This list was decreased to 42 by rejection
of those abilities believed to be present in all students
as well as those required by only a few problems. The
decision to enter remediation for a specific ability was
based upon the percentage of missed questions which required
a specific ability. Students diagnosed as deficient in a
particular mathematical ability were randomly divided into
control and experimental groups. The experimental group
received the appropriate programmed remediation. There was
no significant difference between these two groups as
measured by scores on two subsequent Physical Science Study
Commission achievement tests. The author did not present
a breakdown of student scores on groups of questions requir-
ing the remediated ability. Such a breakdown would reveal
whether the remediation actually improved achievement in
some areas while decreasing achievement in others. Also
the effect on the total test score might be so small that
it is masked by other sources of variance. A test given
just before remediation revealed that about half the stu—
dents diagnosed to be deficient in a particular mathematical
ability could perform physics tasks which require that
ability. Thus, assuming the validity of this test, many
students apparently were misdiagnosed. With 42 separate
abilities being tested, it seems apparent that the decision
to enter remediation must have been based on a small number
21
of questions. Frequently, a particular ability occurred
in only one small segment of the course content; it was
therefore impossible to check performance in that ability
throughout the course. In the present study, we hope to
avoid these two problems by defining the kinds of thinking
in a manner which will yield a small number of more general
abilities which occur throughout the topics of the course.
24 has reported the effect of remediation in aLarkin
college physics course. The remedial instruction involved
how to apply relationships in physics. In the first part
of the course, a randomly selected group of students re—
ceived instruction in how to identify relationships, the
important characteristics of a relationship, and how to
demonstrate knowledge of a relationship. Her study showed
that students who received this instruction were better
able to acquire an understanding of the relationships
which they encountered throughout the course.
10'11 has demonstrated theAs discussed earlier, Gagne
effect of remediation when that remediation has been linked
to a task analysis of the desired learning.
The project also attempts to apply a task analysis,
but the application is made on the objectives of an entire
course rather than on one specific task. For this reason,
the subtasks identified are stated in general terms accord-
ing to the criteria mentioned earlier. We hope that pro-
viding students with some general intellectual skills will
improve their performance on tasks which require those
skills.
EXPERIMENTAL METHODS AND PROCEDURES
The Instructional Setting
Before describing the design and procedures which were
used to answer the research questions raised in the first
chapter, it is necessary to outline the format of the
courses to which the project was applied.
The Chemistry Department at Michigan State University
has transformed the first two courses of one of its intro—
ductory chemistry sequences from the traditional lecture-
recitation format to a modular self-instructional mode.25
Most students who take these two courses (CEM 130 and CEM
131) are not chemistry majors but are required by their
major area to take an introductory chemistry sequence.
The primary instruction in these courses is contained
in a series of audio cassettes and accompanying work—
books. The workbooks contain diagrams and examples to which
the students refer as they listen to the tape. Although
the students' pace through the course is somewhat flexible,
they are required to finish specified amounts of material
within two week periods. The course can be described as
26-28 since students are permitteda modified mastery approach
(within a designated time frame) to take examinations as
often as once a day until they are satisfied with the grade
that they have earned. A criterion referenced grading
22
23
scale is used so that students at any point during the course
can predict their course grade from performance on course
examinations. Thus, a student could make a contract (with
himself) for a particular course grade and repeat each exam-
ination until it is passed at a level which would translate
into the grade desired.
The alternate forms of examinations are generated by
computer from a bank of over 4000 questions. Each question
has a library number which indicates the associated unit
of the course and the concept or idea which it is testing.
Each fifteen item test contains a specified number of ques-
tions from each of the units covered by that exam.
Supplementary instruction is provided by graduate (or
occasionally undergraduate) student instructors who staff
a "Help Room" which is open to students during daytime and
evening hours. While this system allows students to get
their questions answered at any time, it does not foster
the kind of student instructor contact which is necessary
to identify and remedy any systematic problems that students
may have. Thus, one of the goals of this project was to
produce a system of diagnosis and remediation which would
effectively deal with student learning problems.
The Computer Management System
An important part of this remedial system is the com-
puter management system which identifies students who are
24
scoring poorly on a particular class of questions. This
section will provide a detailed description of the technology
that has been developed.
A subtest score is defined as the percent correct res-
ponses to a specified group of questions which are all members
of the same class of questions. Each fifteen item examina-
tion (an exam form) in Chemistry 130 and 131 has associated
with it an exam composition index (ECI) which is a list of
master file library numbers of questions chosen for that
exam form. In order to create a subtest score, the classi-
fication of each of the questions is first added to the ECI
and stored on a disk file. The EDITOR subroutine (Appendix
A) is then used to pick out all of the questions requiring
the memorization of a property, one would scan the appropriate
columns of the file for the MP designation and create a new
file with only these questions in it. Additional file modi-
fication can be done by deleting questions having undesirable
characteristics. In this study, we used those questions which
required only memorization for the memorization class and
deleted all questions with additional or alternative classi-
fications. A file of questions for the reasoning with math
group was created in a similar fashion.
Students mark their answers to exam questions on machine
scorable answer sheets. Fill-in questions are graded by hand
and the graders mark the appropriate boxes for correct and
incorrect answers. A special information sheet which is
25
prepared for each exam form, indicates the exam form num-
ber, the course number, and the time and date the exam was
administered. In the scoring process, the information is
transferred to magnetic tape and eventually to a disk
file. Each record on the disk file contains the student
number, the exam form number, the number of questions out
of fifteen which the student has answered correctly, and
the student's performance on each question. If the student
answered a question correctly, a "l" is recorded in the
column representing that question. If the question was
answered incorrectly, the letter corresponding to the
distractor chosen is recorded. Finally, a subtest score
is calculated by determining the percentage of correct
responses for questions of a particular class. The FORTRAN
programs which have been written to perform this analysis
are listed in Appendix A- The ANNOVA and FACTOR programs
were written for use with the Statistical Package for the
27Social Sciences (SPSS) subroutines and are also listed
in Appendix A-
The Classification Scheme
The diagnostic system which has been developed is based
upon a classification of examination questions in terms of
the kind of thinking required to answer them. The first
part of this section will describe the process by which
26
the categories were established. The characteristics
of each of the categories will then be described, and
finally the method of creating classes of questions based
on the classification scheme will be discussed.
Developing the Categories: A Task Analysis
In their book on Learning System Design, Alexander,
Yelon, and Davis29 describe a task analysis as a detailed
description of how a particular task is to be performed.
It takes into account the entry skills of the learner,
the type of learning involved, and the particular condi-
tions or constraints in the instructional environment which
influence the learning process. The process of develop-
ing categories for the present classification scheme began
with this type of task analysis. From each of the questions
used in Chemistry 130-131 a generalized description of the
task to be performed was developed. In each case the task
was described without reference to any of the chemistry
content involved. As the analysis proceeded, it became
apparent that a small number of these generalized task
descriptions were emerging as the most important types of
tasks required of students in these courses. Furthermore,
the general task would appear many times throughout the
various topics of each course. From each of these task
27
descriptions was developed a statement of a "kind of think-
ing" which is required of students taking introductory
chemistry. Six major kinds of thinking were identified,
some having several subcategories.
Once these kinds of thinking were identified, each ques-
tion was characterized according to whether it does or does
not require a particular kind of thinking. The criteria
for this characterization and a description of these kinds
of thinking are given in the next sections.
Kinds of Thinking
The various kinds of thinking required by questions
typically asked in a general chemistry course are listed
below.
I. M-Recalling Memorized Information
1. Mp — recalling memorized properties
2. Mr - recalling memorized relationships
II. C—Classification, Discrimination, Pattern Recogni-
tion.
Identifying an entity as a member or a nonmember
of a class without being given the criteria of
that identification.
III. V-Visualization
Forming an image of an object or set of objects
which are static, do not require the recognition
of color and have been seen by the student. The
following designations are added to the V in the
situations indicated.
d - if the object(s) is dynamic
c - if the color of the object(s) is required
n - if the object has not been seen but has been
described.
28
IV. ~Translation
Twm - between words and math
T
l.
2. Tcw - between words and chemical symbols
3. Tom - between math and chemical symbols
V. R—Reasoning
The sequencing or combining of ideas to derive
or evaluate new ideas.
Rs - The sequencing or combining of any of the
processes which have been listed in this
classification.
R1 - A one step reasoning process in which a single
relationship is applied to known properties
to determine an unknown property.
R2 - A more than one step reasoning process in
which a number of relationships are applied
in sequence.
R3 - A process which involves a series of reasoning
steps which have been described to students
in an algorithm.
VI. Mt-Math
Me - Working with numbers expressed in exponential
notation or as logarithms.
Ml - Manipulating linear algebraic equations.
M2 - Manipulating nonlinear equations.
A more detailed explanation of the criteria by which one
can identify the kinds of thinking required by a question
is given in the following section.
Criteria and Procedures for
Characterizing Questions
The following is a description of the criteria which
are used to decide whether or not a question requires a
29
particular kind of thinking.
I. Memorization: This category includes questions
which require students to recall memorized in-
formation. This information can be classified
into two types.
1. Properties: A property is one specific
characteristic of an entity. This character-
istic is specified through variables (mass,
height, color) and their corresponding values
(3 grams, 3 feet, red).
Sample question: "What is the precision of
an analytical balance?"
The property of the balance which is to be
specified is its precision. The value required
is 0.0001 gram.
Relationships: Any statement which defines
thé dependence of one property on other
properties is called a relationship.
Sample question: "When a block of solid is
dropped into an insulated
beaker of liquid, the heat
lost by the substance orig-
inally at the higher tempera-
ture is:
Answer: equal to the heat gained by
the substance at the lower
temperature.
This question requires the student to recall
the relationship between the heat lost by an
object and the heat gained by its surround-
ings. This interpretation of the question
assumes that the entity being considered is the
block of solid and the important properties
of that entity are heat lost and heat gained.
An alternative interpretation assumes the
entity under discussion is the heat lost by
that object and a property of the heat lost
is that it is equal to the heat gained. In
this and similar cases the interpretation
which leads to the classification of the
information as a relationship will take
precedence.
II.
III.
30
Classification, Discrimination, and Pattern Recog—
nition
The process of evaluating information for the
purpose of grouping.
The question must require the student to identify
an entity as a member or nonmember of a class
without being given, in the question, the criteria
of class membership.
Sample question: "Which of the following series
of elements contains only non-
metals?"
Most students in introductory chemistry would use
the position of the element on the periodic chart
to determine whether each element was a member
of the class of nonmetals. At an advanced stage,
one is able to classify by recognizing patterns
of stimuli. For instance, one can usually tell
that an object is a chair without examining its
properties one by one. Classification also re-
quires the consideration of the properties assoc-
iated with a particular class and will usually
require an Mp process. The Mp designation will
be assumed to be a part of the C designation,
and therefore is not listed with it.
Visualization
Questions will be characterized as involving
visualization if they require the student to form
an image of a static object which has been seen
and for which the recognition of color is not
important.
If the student must move the object in his mind,
a Q will be added to indicate a dynamic object.
If the question requires the student to remember
color, a 2 will be added.
If the question requires the student to form an
image of an object which has been described but
not seen by the student, an n will be added.
Sample question: 1. If in a particular course
students are shown samples of one
mole of various substances, the
question "At room temperature
the volume occupied by one mole
IV.
31
of water is about . . .", would
be classified as V since the
sample was static and its color
was unimportant.
2. "A cube has how many four-
fold axes of symmetry?" This
question would be classified
as involving Vd since it re-
quires the student to rotate
the object in his mind.
Translation
Questions which require the students to interpret
from one language to another will be characterized
as involving translation. There are three possible
subcategories.
l. Twm - translation between words and math
For example: "Given a = b + c, produce a
is equal to b plus c."
2. Tcw - translation between chemical symbols and
words
For example: "Given K + 02 = K02, write one
mole of potassium reacts with one mole of
... etc."
3. Tcm - translation between chemical symbols
and math
For example: "Given N203 state the ratio
of oxygen atoms to nitrogen atoms in the
molecule."
Reasoning
The combining or sequencing of ideas to derive
or evaluate new ideas.
1. Rs - The combining or sequencing of the kinds
of thinking which have been characterized in
this classification scheme.
Sample question: 1. "A tetrahedron has how
many three-fold axes? The
complete characterization of
this question would be Rs
(V,Mp,Vd). The letters in
parenthesis indicate the
kinds of thinking being se—
quenced. In this question
the student must first
32
visualize a tetrahedron, then re-
call the definition of a three-
fold axis, and finally rotate
the image to determine how many
three-fold axes are present.
R1 - A one step reasoning process in which a
single relationship is applied to known prop-
erties to determine an unknown property.
Sample question:
Answer:
1. "If the density of a 4
cubic centimeter block of
metal is 4 g/cc. What is
its mass?" This question re-
quires the use of the rela-
tionship between density,
mass, and volume to determine
the mass of an object, given
its density and volume.
2. "Two metal samples each
contain exactly the same
chemical elements. Emission
spectra show lines at exactly
the same wavelengths. One
can conclude:
Each of the two samples contain
exactly the same chemical
elements." This question
requires the use of the fact
that each element has a
unique emission spectrum to
determine that two samples
which produce exactly the
same emission spectra must
be composed of exactly the
same elements.
R2 - A multistep process which requires the ap-
plication of two or more relationships without
simply applying a learned algorithm.
Sample question: 1. "What is the density of a
4 9 cube of metal with 3 centi-
meter edges?" The question
requires the combination of
two relationships; d = m/v
and v = e3.
2. "If an atom and an ion
contain the same number of
electrons, then they:
VII.
1.
33
Answer: must be of different elements.
The question involves realiz-
ing that an atom and an ion
have a different charge and
that the charge is equal to
the number of protons minus
the number of electrons.
Therefore the number of
protons must be different.
Since nuclei with different
numbers of protons are of
different elements, the atom
and ion in question must be
of different elements. In
this sequence, a number of
principles have been applied
to the given information to
determine which of the given
statements is correct.
R3 - A process which involves a series of reason-
ing steps which have been described to students
in an algorithm.
Specific tasks are common and complex enough
that they are often taught through the presenta-
tion of an algorithm or step by step procedure.
For example, one usually outlines the procedure
for finding the percent composition from a
molecular formula. Students are expected to
solve these problems by applying the proper
procedure to them.
Obviously, any question to which an algorithm
has been applied could be answered in the ab-
sence of the algorithm by combining the necessary
ideas. Thus an appropriate alternative classi-
fication for these types of question is Rs.
It is also important to note the characteristics
of the steps in the algorithm. This will be
done by including in parenthesis after the R3
designation the symbols for kinds of thinking
involved. '
Math
The mathematics which is required is divided into
three categories.
Me - scientific notation, exponents, and
logarithms
34
This category will include any question which
requires the manipulation of exponents or
logarithms, or numbers written in scientific
notation.
2. M1 - algebra with linear equations
All questions requiring the manipulation of
linear algebraic equations will be classified
as M1.
3. M2 - algebra with quadratic or higher power
equations.
Sample question: "What is the wavelength of
an electromagnetic wave having a frequency of
104 cycles/sec?" This question requires the
solving of the equation relating wavelength to
frequency (Ml)i and also the manipulation of
104 and 3 x 10 0 (Me).
Using this set of criteria, one can characterize examina—
tion questions according to the kinds of thinking involved.
In this project, the characterization was performed by
asking, "What method would be used by the majority of stu-
dents in CEM 130 and CEM 131 to answer this question?"
There are two important considerations. First of all, an
estimation of student performance was used rather than an
analysis of how a trained chemist might solve a particular
problem. Second, the instructional setting was carefully
considered. It was felt that the methods students use to
work problems and answer questions will depend upon the
information students bring to the problem. This obviously
will be a function of the instruction which the students
have received.
The characterization of the questions asked in CEM 130-
131 would often produce important combinations or sequences
35
of kinds of thinking which could then be considered as a
unique class of questions. For example, a very common se-
quence is Rl with M1 (a one step application of a mathematical
relationship). This combination appeared often enough that
it was given a special designation (R1(Ml)) and was considered
as a distinct kind of thinking. There were also questions for
which there were two alternate methods of solution commonly
used by students. For these questions, a slash (/) was used
to indicate that a question required one kind of thinking or
another kind of thinking depending upon the method a student
chose to work the problem. For example, the designation
Mr/Rl would be used for questions which some students answer
by recalling a memorized relationship and other students
answer by applying a relationship in a one step reasoning
process. To help illustrate the characterization procedure,
some sample questions and their classification are given in
Appendix B .
Validation of the Categories
As discussed earlier, the process of characterizing ques-
tions produces a class of questions which is used as an
instrument to measure a student's ability to perform a
particular kind of thinking. Whenever a new instrument is
developed, it should be accompanied by evidence concerning
its ability to measure performance accurately and precisely.
The usual procedure is to supply data concerning the
36
reliability and validity of the test.
Reliability
One measure of the reliability of a test is the correla-
tion between scores on two tests which attempt to measure
the same thing. Often the two tests are obtained by ar-
bitrarily dividing a test into two parallel forms of equal
length and measuring scores on each half of the test. A
very popular measure of reliability is the Kuder-Richardson
Formula 21 (K.R.21)30 which essentially creates all possible
parallel forms of an examination and averages the correla-
tions between them.
Since the examinations used in CEM 130-131 are very
carefully designed to include an even distribution of ques-
tions from a rather wide range of tOpics, the correlation
between arbitrary split halves of the test would not be
expected to be very high. The correlation between scores
on two equivalent forms of an examination was therefore used
as a measure of the reproducibility of the test scores and
hence the reliability of the questions.
In a previous study of CEM 130-131,31 thirty item examina-
tions were prepared by combining two alternate forms of the
usual fifteen item exams. The items were mixed thoroughly
to avoid fatigue or time limit factors. Students were told
that the score on each form would be computed individually
and they would receive the highest of the two scores. The
37
Pearson product-moment correlation coefficient between
these two scores was calculated as follows.
n zxi'zyir = Z N (1)
i=1
where
x.-§£
2xi = 1
0x
and
Yi-Y
zyi = 0Y
x and y are the scores obtained on alternate forms.
When this analysis was performed on approximately thirty
sets of examinations, the average correlation coefficient
obtained was 0.69.
In order to obtain some measure of the reliability inde-
pendent of test length, the Spearman Brown formula (Equation
2) was used to estimate the reliability of these tests if
they were composed of more items
nr
_ s
rn _ (n-l)rs+l (2)
This relation is used to calculate the reliability (rn)
of a test with n times as many items as a shorter test of
known reliability (rs).
38
Many nationally used tests of educational achievement
containing over 100 items report reliabilities between
0.90 and 0.95. If the tests used in CEM 130-131 were lengthen-
ed to 75 items, the calculated reliability would be 0.92
which compares favorably with standard educational achieve-
ment tests.
Agreement Among Classifiers
Another important measure of the validity of the classi-.
fication scheme is the agreement obtained when a number of
individuals who are familiar with the content of the course
attempt to classify questions. In one test of this inter-
classifier agreement, three chemistry faculty and the author
classified a group of questions independently. In another
test, an undergraduate teaching assistant and the author
compared their classification of questions. In both cases,
the percentage agreement was calculated as the number of
classifiers who agreed on the classification of a question
divided by the number of classifiers and then multiplied by
100%. For example, if three of four individuals classify
a question as memorization and the other as reasoning,
a tally of 3 out of 4 would be assigned to the memori-
zation class and no tally would be made for the reasoning
class. If two of four classified the question as reason-
ing and the other two as memorization, a tally of 2 out
of 4 would be added to each group. If for a different
39
question, two classifiers identified the question as reason—
ing and math and two identified it as memorization and
math, the math category would receive a tally of 4 out of
4 and the other two categories 2 out of 4. The results
of this analysis are shown in Table I.
An Analysis of Correlation Coefficients
Campbell and Fiske32 have described a set of procedures
for the validation of tests of individual differences. The
validation is based upon an analysis of the correlation
between scores on tests which are supposed to measure the
same trait, compared to the correlations between scores
on tests which measure different traits. A "multitrait-
multimethod" matrix is created which groups correlations
according to the trait being measured and the method used
to measure that trait. If the tests were valid, one would
expect that the correlations between tests measuring the
same trait would be greater than the correlations between
tests measuring different traits.
In the present study, an item which is characterized
as requiring a particular kind of thinking can be thought
of as a one item test of the student's ability to perform
that kind of thinking. One would expect that the correla-
tion between items of the same kind of thinking would be
greater than that for items requiring different kinds of
thinking. Obviously, there are many other factors control-
ing student performance on a particular examination question.
Table I. Interclassifier Agreement
40
Number of
Kind of Identical Total
Thinking Classifications Possible Percentage
Memorization 176 190 92.6%
Reasoning 78 82 95.1%
Translation 50 66 75.8%
Classification 30 40 75.0%
Visualization 44 58 75.9%
Math 42 54 77.8%
41
One of the most important of these is the topic from which
the item was chosen. In this study, the Pearson product—
moment correlations from selected fifteen item tests were
separated into four groups as shown below.
Group A: Coefficients between items from a given
class of question which are related to the same
topic.
Group B: Coefficients between items from different
classes taken from the same topic.
Group C: Coefficients between items from the same
class but from different topics.
Group D: Coefficients between items from different
classes and from different topics.
The individual fifteen item tests were selected for this
analysis if they produced an approximately equal number of
Pearson product moment correlation coefficients in each of
the groups listed above. For every test, an average of cor-
relation coefficients for each of the four groups was ob-
tained. Thus, every fifteen item test yielded four values,
each being an average of correlation coefficients from
Groups A through D. A description of the procedure used
in calculating the coefficients, and a list of tOpics
used are given in Appendix C. These values were then used
as the data for an analysis of variance. For each of the
six major classes of questions, the following hypotheses
were tested at the a = .05 level.
42
Hypothesis 1. The average of correlation coefficients
for group A will be higher than the average for group
B.
Hypothesis 2. The average of correlation coefficients for
group C will be greater than the average for group D.
Hypothesis 3. The average correlation between items from
the same topic will be higher than the average correla-
tion between items from different tOpics.
Hypothesis 4. The average correlation between items of
the same class will be greater than the average cor-
relation between items from different classes.
Hypotheses l and 2 were tested with a planned comparison
analysis of variance contrasting group A vs. group B and
group C vs. group D using the average correlation co-
efficients as the input data. The analysis was performed
on each of the six kinds of thinking except the category
classification, since there were not enough questions in
this category to produce any useful information concerning
its validity.
The averages, standard deviations, and the number of
tests used for the four groups of correlation coefficients
for each kind of thinking tested are shown in Table II.
The results of the planned comparison analysis are
shown in Table III. Hypothesis 1 was supported for the
reasoning with math category and was not supported for the
other four categories. Hypothesis 2 was supported for the
reasoning with math and the memorization category but was
43
Table II. Averages and Standard Deviations for Grouped
Sets of Correlation Coefficients.
Number Standard
Kind of Thinking of Tests Group Average Deviation
Memorization 6 A .1213 .0255
B .0970 .0268
C .0970 .0266
D .0648 .0149
Visualization 6 A .1217 .0569
B .1158 .0503
c .1163 .0479
D .0763 .0420
Translation 10 A .1411 .0281
B .1472 .0245
C .1234 .0256
D .1063 .0263
Reasoning 6 A .1358 .0281
B .1151 .0339
C .1007 .0272
D .0824 .0254
Reasoning with 9 A .1729 .0366
”at“ a .1040 .0432
C .1206 .0241
D .0586 .0160
44
Table III. Planned Comparisons of Grouped Correlation
Coefficients.
Kind of Thinking Contrasts T Value P Less Than
Memorization A vs B 2.03 0.084
C vs D 2.69 0.012
Visualization A vs B .206 0.84
C vs D 1.40 0.15
Translation A vs B -.521 0.61
C vs D 1.46 0.15
Reasoning A vs B 1.24 0.23
C vs D 1.10 0.29
Reasoning A vs B 4.60 .001
with Math
C vs D 4.14 .001
45
not supported for the other three categories.
To test hypotheses 3 and 4, a two way analysis of var-
iance was used. This analysis is illustrated in Figure l.
Topic
Same Not Same
Same Group A Group C
Class
L.,_._1:I?t....-_s.ém§,ML.--6.179.“) B . l. 939“" ”MIMI
Figure 1. A Two-way Analysis of Variance.
By combining groups A and B into one group, and groups
C and D into another, one can compare directly the correla-
tions between items from the same topic and items from dif-
ferent tOpics. The results of this analysis are then used
as a test of hypothesis 3. Similarly by combining groups A
and C into one group and groups B and D into another, one
can compare correlations between items from the same class
to correlations between items of different classes and
thus perform a test of hypothesis 4. The results of this
analysis are shown in Table IV.
The values listed in the column labelled "Matched"
are the means of correlation coefficients between items of
either the same tOpic or the same class. In the "Not Matched"
column are means of correlation coefficients between items
from different topics or classes.
Table
IV.
Two-way
Analysis
of
Variance
of
Grouped
Correlation
Coefficients
Mean
Correlation
Coefficients
Kind
of
Thinking
Main
Effects
Matched
Not
Matched
fP
Less
Than
Memorization
Topic
.109
.081
8.32
.009*
Class
.109
.081
8.32
.009*
Visualization
Topic
.119.
.096
1.23
.281
Class
.119
.096
1.28
..270
Translation
Topic
.144
.119
12.6
.008*
Class
.132
.127
.443
.999
Reasoning
Topic
.125
.092
12.5
.002*
Class
.118
.098
4.1
.049*
Reasoning
Topic
.138
.090
16.5
.008*
”1th
math
Class
.147
.081
29.6
.001*
46
*Significant
at
a=
.05.
47
As shown in this table, there is a class main effect
for the categories Memorization, Reasoning and Reasoning
with Math. There were no significant two-way interactions
found.
The Remedial System: Project CLIC
Selecting the Sample
During Spring Term 1976, an experimental remedial class
called CLIC (Comprehensive Learning in Chemistry) was pro-
vided for selected students in Chemistry 131. The students
were chosen on the basis of their performance in Chemistry
130 during the previous term. Students who scored below
60 percent on tests in CEM 130 were placed in the group cor-
responding to the class of questions for which they received
the lowest score. Students who failed Chemistry 130 were
not invited since they would not be taking Chemistry 131.
Of the 120 students invited, 30 indicated that they were
not going to enroll in Chemistry 131. Of the remaining 90,
70 participated in the project to some extent, 45 completed
all segments of the remedial class, and 54 missed less than
one of the three classes and one of the three tapes. This
group of 54 students formed the sample for this study. In
order to keep the sample size as large as possible, there
were three bonus points (out of a possible 90) offered to
students who participated in the project. This created
48
an additional incentive which most likely would not be
available to students in the ongoing operation of the
remedial system.
The Design
Students in each group were placed into control and
experimental groups. The control group in each case was
given remediation corresponding to the kind of thinking
for which these students were lgggp deficient. That is,
a control group student who was diagnosed more deficient
in memorization skills would be placed in the reasoning
with math class. The dependent variable used to measure
the effect of remediation was the score obtained by the
student on the class of questions for which he was diagnosed
as being most in need of remediation. For example, of the
students who scored lowest on questions requiring memoriza-
tion, half would be placed in the reasoning with math class
(the control group) and half would be placed in the memoriza-
tion class (the experimental group). In the subsequent
analysis,-only their scores on memorization questions
would be examined. This design was chosen to control for
the effect of students receiving additional help and indi-
vidual attention. If the Eypg of remediation is important,
one would expect the experimental group to perform better
on the remediated kind of thinking than the control group.
49
The Treatment
In Chemistry 131, the students take five examinations
and a final. As described earlier, students may repeat
examinations, within a specified time period until they
are satisfied with the grade they have obtained. The
final exam may not be repeated.
If, for example, a student is satisfied with the grade
he has obtained for exam 1, he will begin studying the topics
covered by exam 2. When he feels prepared, he takes his
first try of exam 2 and can then repeat exam 2 until he has
received a satisfactory grade, or until the deadline for
taking exam 2 had passed.
Participants in Project CLIC were asked to follow the
procedures outlined below in the order given.
1. To study the materials for each exam in their usual
fashion.
2. To take one attempt at an exam.
3. To listen to a CLIC Tape.
4. To attend a CLIC Class.
5. To retake the exam until satisfied with the grade
obtained.
This set of procedures was to be followed for each of
the first three examinations. There were no CLIC tapes or
classes provided for exams four and five.
Each CLIC tape begins with a discussion of the methods
by which a student could improve his skill in performing a
50
particular kind of thinking. This is followed by a discussion
of how the kind of thinking can be applied to the chemistry
topic being discussed. A detailed outline of the material
presented on each tape is given in Appendix D.
The CLIC classes followed a similar outline, but more
time was spent applying the kind of thinking skill to exam-
ples from the context of the course. Students were permitted
to ask questions and request that the instructor work problems
from the study guide or from previous tests. Whenever pos-
sible, the instructor would attempt to relate the answer to
a student's question to the kind of thinking being remediated.
Often questions were asked which related to the wrong kind
of thinking. That is, a student in the memorization class
would ask a reasoning with math type question. When this
occurred the instructor would simply work the problem with-
out relating it to a particular kind of thinking.
Evaluation of the Remedial System
In the evaluation of the effectiveness of the CLIC
Project, two distinct factors were considered. They can
best be described by restating two of the research ques-
tions posed in the first chapter.
Research Question b. Initial Learning
"Do the remedial materials which focus on the kind
of thinking required in a specific class of questions
51
covering a particular topic in chemistry improve stu-
dent performance on that class of questions for that
particular topic?"
Research Question 0. Transfer of Training
"Once students receive remediation on a particular
kind of thinking as it applies to general chemistry in
one topic, will there be any improvement in their
performance on questions requiring the same kind of
thinking but covering a different topic? That is, is
there transfer of training of a particular kind of
thinking to new content in the course?"
The initial learning score, which is related to
research question b, is defined as the percent correct
responses to questions which require the kind of thinking
for which the student was diagnosed in need of remediation
and which were answered during a student's second and sub-
sequent tries of exams 1, 2, and 3. This dependent variable
measures the effect of remediation which focuses on the kind
of thinking required in a specific class of question
covering a particular topic in chemistry, on student per-
formance on that class of questions for that particular
topic. Therefore, a comparison of values of the initial
learning score for experimental and control groups will
provide the necessary data to answer research question b.
The specific hypotheses developed for research question
b are as follows:
52
Hypothesis b1. For students diagnosed in need of
memorization remediation, the experimental group will
score higher than the control group on the initial
learning score.
Hypothesis b2. For students diagnosed in need of
reasoning with math remediation, the experimental
group will score higher than the control group on
the initial learning score.
To answer research question c, we define the transfer
score as the percent correct reSponses to questions which
require the kind of thinking in which the student is de-
ficient, and which were answered during the student's first
try of exams 2 and 3, and all tries of exam 4 and 5, and
the final. This dependent variable is a measure of a
student's performance on a particular class of questions
when new chemistry content is introduced. Therefore, an
examination of values of the transfer score for experimental
and control groups will provide the necessary data to answer
research question c.
The specific hypotheses developed from research ques-
tion 0 are as follows:
Hypothesis cl. For students diagnosed in need of
memorization remediation, the experimental group will
score higher than the control group on the transfer
score .
53
Hypothesis c2. For students diagnosed in need of
reasoning with math remediation, the experimental
group will score higher than the control group on the
transfer score.
Each of the hypotheses was tested using a simple t-
test with a = .05. The results of this analysis are shown
in Table V. For both the memorization and the reasoning
with math classes there was a significant difference be-
tween the experimental and control group for the initial
learning score but not for transfer score. Thus, the re-
medial classes appear to be effective at the time of remedia-
tion, but the learning does not appear to transfer to any
new content.
54
Table V. T-test Evaluation of Project CLIC
Memorization Class - Initial Learning Score
Mean Initial Standard
N Learning Score Deviation T-Value
P
Less Than
Experimental 15 .741 .076 5.39 .001
Control 10 .477 .167
Memorization Class - Transfer Score
Mean Standard P
N Transfer Score Deviation T-Value Less Than
Experimental 15 .350 .269 .04 .969
Control 10 .346 .203
Reasoning With Math Class - Initial Learning Score
Mean Initial Standard P
N Learning Score Deviation T-Value Less Than
Experimental 14 .667 .091 4.52 .001
Control 15 .522 .082
Reasoning With Math Class - Transfer Score
Mean Standard P
N Transfer Score Deviation T-Value Less Than
Experimental 14 .418 .124 .16 .872
Control 15 .426 .144
SUMMARY AND DISCUSSION
In this chapter results of the study will be summar-
ized and interpreted. The results of a test of each of the
hypothesies will also be presented.
Overview of the Project
This project has produced and tested a unique diag-
nostic remedial system which is based upon a classification
_of the tasks which students are asked to perform in a
general chemistry course. The classification scheme is
based on the kind of thinking required by the test questions
used in the course. The scheme has been evaluated by first
examining the extent of agreement obtained when content
experts classify questions and second by calculating inter-
item correlation coefficients for questions grouped according
to the categories proposed.
Two classes of questions (reasoning with math and .
memorization) were chosen as the basis of remediation.
Remedial materials and classes were made available to
students during the first half of Chemistry 131, Spring
Term, 1976. The effectiveness of this remediation was
measured by monitoring student performance on a specific
class of questions during the term. A distinction was
made between scores obtained from tests involving content
discussed in remediation and scores obtained when the
student was being introduced to new material. The latter
55
56
represents the effect of transfer of learning to think in
a particular way to new material in the course. This
transfer of training represents the ultimate goal of this
type of remediation.
The Validity of the Classification Scheme
Reliability
The reliability of the questions used in this study
compares favorably with the reliability reported for
standard achievement tests when adjusted for length. Since
the same bank of questions is used each term to create the
individual tests, questions that are misinterpreted by stu-
dents and questions which tend to mislead students have
been systematically removed from the file. This process
has created a set of questions which have withstood a great
deal of scrutiny by both students and faculty and are there-
fore considered to be good tests of student achievement.
It is important to remember that this test of reliability
refers to the measurement of overall achievement in the
course, and not to the measurement of a student's ability
to perform a particular kind of thinking. The latter would
be obtained by comparing scores on arbitrary halves of a
group of questions of the same kind of thinking. This type
of analysis was reported by Tannenbaum12 in his evaluation
of science processes. In the present study, the individual
57
fifteen item tests did not provide a large enough sample
of questions to permit this type of analysis to yield any
useful results.
Interclassifier Agreement
Table VI summarizes the results obtained when content
experts who are familiar with the specific nature of the
course attempt to classify questions according to the
proposed scheme. The agreement for the memorization and
reasoning categories, which is much higher than the other
categories, may well be due to the large number of questions
which require these kinds of thinking. The discrepencies
which did occur were usually caused by a classifier omitting
a kind of thinking because it was trivial compared to another
kind of thinking required by the question. This problem
also accounted for most of the discrepencies which occurred
for the translation, visualization and math categories.
There was considerable discussion among the chemistry
faculty concerning the distinction between reasoning and
memorization. Questions which required a very simple
application of a principle were considered by some to be
memorization and by others to be reasoning. For example,
a question which requires the determination of the density
of an object given its mass and volume, would be classified
according to the scheme as a reasoning question because it
requires the student to apply the relationship between
58
Table VI. A Summary of Interclassifier Agreement.
Kind of Thinking Percentage Agreement
Memorization 92.6%
Reasoning 95.1
Translation 75.8
Classification 75.0
Visualization 75.9
Math 77.8
59
density, mass and volume. It has been argued that this
question requires only the memorization of the relation-
ship and the reasoning is trivial. To resolve this problem,
one might measure the correlation between these kinds of
questions and questions which are definitely in the memory
category and compare the result with correlations between
these kinds of questions and questions which are definitely
in the reasoning category. That is, let an analysis of
student performance determine the category into which these
types of questions would be placed.
The category called "classification" proved to be
extremely difficult to use because it is actually a sub-
set of the reasoning category. The differences between
these two categories are too subtle to yield reliable
classifications.
It has been suggested that the ease and reliability
of question classification might be improved by asking the
classifier to simply choose the one most important kind of
thinking in a particular question. The "most important"
kind of thinking would be defined as the kind of thinking
which would be most likely to cause the student to miss
the question. This kind of analysis would eliminate de-
ciding whether a kind of thinking was too trivial to include
in the characterization of the question but adds a decision
concerning the relative importance of more than one kind
of thinking when several are required by a question.
The data presented in Table II (page 42) amplifies the need
60
for some type of modification of the classification scheme.
It would seem appropriate to deal specifically with the
translation, visualization and math categories and to
consider the elimination of the category of classification.
Restricting each question to only one category may also
improve the usefulness of the classification scheme.
We recognize, however, that it may be unreasonable
to classify a question into only one category when the
question clearly requires two different kinds of thinking.
A further refinement of the classification scheme may be
to assign weighting factors to indicate the relative
importance of the contributing categories.
Analysis of Correlation Coefficients
A comparison of inter-item correlation coefficients has
been made for questions grouped according to content and
kind of thinking. The first hypotheses tested by this
analysis are as follows:
Hypothesis 1: The average of correlation coefficients
for group A will be higher than the average for group B.
Hypothesis 2. The average of correlation coefficients
for group C will be higher than the average for group D.
A planned comparison between the correlations among
group A questions and the correlations among group B ques-
tions revealed a significant difference for only the reason-
ing with math category. (See Table III, page 43). Thus,
61
for questions of the same content, the fact that they are
also of the same kind of thinking will significantly raise
the correlation only for the reasoning with math category.
For the memorization category, the average correlation for
group A was substantially greater than that for group B
but the difference was not significant at the a = 0.05 level.
The data indicate that a subsequent analysis using a larger
sample of tests would probably produce a significant dif-
ference between groups A and B for the memorization category.
A test of the differences in correlations for groups C
and D revealed significant differences for only the mem-
orization and reasoning with math categories. That is,
when one compares correlations between pairs of items each
from a different topic, the fact that the items are from
the same kind of thinking seems to increase the correlation
for the reasoning with math, and the memorization categories
but not for the visualization, translation, or reasoning
categories.
A comparison which was not tested statistically but can
be made informally is that between the average correlation
coefficients for groups B and C. An examination of Table
II, page 42, reveals that for most categories the differences
between these two groups are relatively small. That is, the
correlation between questions of the same content but dif-
ferent kind of thinking are not significantly different from
the correlation between questions of different content but
the same kind of thinking. It was initially thought that the
62
content which a question is testing will be the most impor-
tant factor controlling student performance on that question
In this study the constraint of using fifteen item tests
as the basis of the analysis forced a rather broad defini-
tion of each content category. If a stricter delimiting of
content categories were used, the content correlation would
probably increase significantly.
The third hypothesis of this part of the study is:
Hypothesis 3. The average correlation between items
from the same topic will be higher than the average
correlation between items from different tOpics.
This hypothesis was supported for all of the categories
tested except for visualization. The distribution of
visualization questions was such that relatively few cor-
relation coefficients could be obtained for most tests.
Thus the averages tended to fluctuate more than they did
for the other categories. The data obtained for visualiza-
tion are probably a result of this fluctuation.
The hypothesis most directly related to the validation
of the categories is hypothesis 4 which states:
Hypothesis 4. For each of the six major classes of
questions, the average correlation between items of
the same class will be greater than the average cor-
relation between items from different classes.
The data shown in Table IV, page 45’support this hypothesis
63
for the categories memorization, reasoning and reasoning
with math. The hypothesis is not supported for visualiza-
tion and translation and was not tested for classification.
Unlike the visualization category, the translation cate-
gory showed a significant topic main effect but did not
show a kind of thinking main effect. Thus the category
translation is not validated by student performance. It
may be that translation, while identifiable, is too closely
related to the more prevalent categories memorization and
reasoning to be distinguishable by an examination of stu-
dent performance. To summarize this part of the study,
we can say that questions requiring memorization tend to
correlate higher with one another than they do with ques-
tions requiring some other kind of thinking. This is also
true for reasoning and reasoning with math questions.
Evaluation of Project CLIC
The categories of memorization and reasoning with math
were chosen to be the basis for the experimental remedia-
tion. The evaluation hypotheses are divided into two
parts. The first being those related to student progress
with material being learned during remediation, and the
second being related to achievement when new chemistry
content is introduced. These latter hypotheses have been
identified as the transfer hypotheses because they relate
to the student's ability to transfer what he has learned
64
about a particular kind of thinking to new content in the
course.
The initial learning score has been defined as the propo -
tion of correct responses to questions from topics which have
been discussed in the context of the kind of thinking remedia-
tion. The transfer score is the proportion of correct
responses to questions from topics not yet discussed during
remediation. The hypotheses concerning the initial learning
score are as follows.
Hypothesis b1: For students diagnosed in need of
memorization remediation, the experimental group will
score higher than the control group on the initial
learning score.
Hypothesis b2: For students diagnosed in need of
reasoning with math remediation, the experimental group
will score higher than the control group on the initial
learning score.
For the memorization group, the average value of the
initial learning score was 0.741 for the experimental group
and 0.477 for the control group. The probability that these
values are different only by chance is less than 1 in 1000.
For the reasoning with math group, the values were 0.667
for the experimental group and 0.522 for the control group.
This difference is also highly significant. These data
indicate that students who received remediation on a
65
particular kind of thinking did better on questions requir-
ing that kind of thinking than did students who received
remediation on some other kind of thinking. Unfortunately
these results do not prove that the kind of thinking
addressed by the remediation was the determining factor.
In attempting to teach the student to think in a particular
way, we used specific examples from the present content of
the course. It may be that the students simply learned
from these examples and from the additional presentation
of the related content. This would not have been as sig-
nificant a problem if the two kinds of thinking were evenly
distributed throughout the various topics and subtopics of
the course. This, however, was not the case. Thus, the
tests of initial learning for the control group probably
contained less of the content discussed in remediation than
the test for the experimental group.
The transfer hypotheses related to the question of
transfer are as follows.
Hypothesis cl: For students diagnosed in need of
memorization remediation, the experimental group will
score higher than the control group on the transfer
score .
Hypgthesis c2: For students diagnosed in need of
reasoning with math remediation, the experimental group
will score higher than the control group on the transfer
score.
66
Neither of these hypotheses has been supported by the
data. For the memorization class, the average value ofthe
variable was 0.350 for the experimental group and 0.346
for the control group. The difference is not significant
at the a = .05 leve1.Fbr the reasoning with math class,
the value for the experimental group was 0.418 and the
value for the control group was 0.426. The difference
between these two values is also not significant.
With this type of data, there are many alternative
hypotheses which can be put forth. I will mention two and
discuss each briefly.
The data indicate that the remediation had some positive
effect but they do not support the statement that this
positive effect was the result of improving students'
ability to think in a particular way. It may be that the
remediation failed to teach the students very much about
the kind of thinking involved but instead simply presented
once again, a certain segment of the material. Since
the remedial material suggested specific methods and
strategies for the students to use, it was possible to
determine if these methods were being used successfully.
An informal check of the notebooks of five students indi—
cated that three of the five were using most of the tech-
niques and two were not using them at all. During the
reasoning with math remediation, the instructor would often
ask students to work problems using the techniques that
had been discussed. Many of the students did not demonstrate
67
that they had mastered the technique of dimensional analysis
which was one of the topics discussed in the class. In
general, one could conclude that some of the students
simply did not master the key techniques and did not learn
the key concepts of the remedial material. This suggests
that the remedial classes could be more effective if based
on a mastery model designed so that all students mastered
the basic skills and ideas of the remedial class. There
would however, be practical problems involved in requiring
mastery of material which would be viewed by students
as supplementary to the material of the course.
Another alternative hypothesis is that students who
mastered the skills presented in the remedial class were
unable to apply those skills when dealing with new material
in the course. Learning to apply one's knowledge to new
situations has always been an important goal of education.
Many remedial programs in science education have been
based on the idea that the mastery of certain basic skills
and ideas which are applied in a discipline will improve
ones learning in that discipline. This is very appealing
because once the student has learned the skill, he will
supposedly be able to use that skill throughout his study.
What is sometimes forgotten is that knowing how to do some-
thing is not exactly the same as knowing when to do some-
thing. A student who has learned the technique of dimen-
sional analysis, for example, may be able to apply it to
a problem if dimensional analysis is fresh in his mind or
68
if he is instructed to use the technique, but may not
think to apply it to a new problem encountered a week later.
Successful attempts to teach a general learning skill and
to demonstrate the transfer of that skill have been re-
d.24 Hopefully more of this type of research willporte
be forthcoming.
The study presented here represents an initial step
in the application of current ideas in the field of edu-
cational psychology to the teaching of freshman chemistry.
We obviously have a great deal to learn about the teaching
and learning of chemistry at this level.
IMPLICATIONS FOR FUTURE RESEARCH AND DEVELOPMENT
Like many research projects, this study has created
more questions than it has answered. It has also initiated
the development of a unique learning laboratory for the
study of the learning of chemistry at the college level.
This chapter will outline the characteristics and poten-
tial of this learning laboratory and will also present
several proposals for a continuing research program in
chemical education.
A Chemical Education Laboratogy
The sequence of courses to which this study was ap-
plied has an average enrollment of approximately 1500
students per term. The instruction which is delivered to
the students through the taped cassettes can be thought
of as a very controllable and well defined experimental
treatment. One knows exactly what information has been
communicated through the tapes to the student. This informa—
tion does not vary uncontrollably from term to term as does
the information delivered via a lecture mode. The instru-
ments used to measure achievement are created from a bank
of questions which can also be easily controlled.
Although each question is used only once during a
term, it is in most cases used again in subsequent terms.
By studying student performance on individual questions
69
70
or groups of related questions one can gain valuable in-
formation about which concepts or ideas are being communi~
cated effectively and which parts of the course need improve-
ment. Also, anytime that modifications of instructional
materials are made, they can be easily tested by establish-
ing an experimental group who study the new material and
compare this group's achievement with that of a control
group which studies the old material. In this manner,
objective evidence concerning the effectiveness of instruc—
tion can be routinely obtained.
The computer management system needed to compile and
analyze the data has been established and is presently
working well, although many improvements have been prOposed.
As described in Chapter 3, the students mark their answers
on machine scorable answer sheets which are then processed
by an optical scanner linked to a CDC 6500 computer.
Student performance data is stored in disc files which in
this study are analyzed by the Statistical Package for the
Social Sciences subroutines. This computerized record
keeping and data analysis system permits the processing
of thousands of pieces of data with a relatively small
expenditure of resources. Students take an average of two
attempts at each exam. Including the final, there are a
total of 13 exams administered during the two term sequence.
Obviously, this amount of testing creates a large quantity
of data in a relatively short time.
Computerized record keeping also makes it easier
71
to store from term to term statistical data on each item
in the question bank. Presently, only an index of difficulty,
defined as the proportion of students who get the question
wrong, is being stored along with the number of students
who have answered the question. There is theoretically no
limit to the information concerning each question which
could be stored on the question file. Considering the
number of students who take introductory chemistry and
considering the increasing numbers of students who find
the kinds of thinking required in introductory chemistry
difficult, it would appear that the information to be
gained from this learning laboratory would be an important
contribution to chemistry instruction. Specific sugges-
tions for research and development studies are outlined
in the next section.
The Difficulty Index
As mentioned earlier, the routine processing of informa—
tion includes calculating a difficulty index for each ques-
tion. Preliminary studies indicate that these indices
vary dramatically. There is a significant number of ques-
tions which over 90 percent of the students get wrong and
there is also a significant number which less than 10
percent get wrong. To this author's knowledge, most
instructors using the mastery approach which requires
repeated examinations, assume that their alternate exam
72
forms are of approximately the same difficulty. The pre-
liminary data obtained in this study indicate that this is
probably not the case. It is therefore recommended that a
careful study of exam form difficulty be undertaken and,
if necessary, a method be established to keep the dif—
ficulty of exams to within an acceptable limit.
A related area of interest is the relationship between
the difficulty of a question and the kind of thinking (as
defined by this study) which the question requires. An
informal inquiry into this question indicates no difference
in the average difficulty of the various classes of ques-
tions. The memory questions for example do not appear to
be any easier or any more difficult than the reasoning with
math questions. This needs to be studied more carefully
on a long term basis.
Throughout the present research project there has been
concern expressed about the effect of not taking into ac-
count the inherent difficulty of a question when attempting
to validate the classification scheme. It seems logical
that a student who answers three difficult memory questions
correctly should receive a higher score in ”memory ability"
than a student who answers three easy memory questions.
The details of assigning some type of weighting factor for
the purposes of this analysis need to be considered care-
fully. An alternative to the weighting factor would be
to control the difficulty of the questions used so that
the scores obtained are the result of questions of
73
approximately the same difficulty.
Alternative Validity Studies
There have been many suggestions made pertaining
to the validity test of the proposed classification cate-
gories. This section will discuss two of these plans.
The analysis of correlation coefficients was performed
using fifteen item examinations as the basic instrument.
This produced a relatively small number of correlation co—
efficients in each of the groups A, B, C, and D which in
turn caused the averages to be calculated from as few as
three coefficients. This has in some cases produced an
unstable statistic and may account for the inability to
obtain significant differences. If a substantially longer
exam were given, the number of inter-item correlation co-
efficients would also increase as would the stability of
the average. The number of useful coefficients obtained
from a test of n items is %In (n-l).
It has also been pointed out that the present classifica-
tion scheme permits the assignment of a question to more
than one class of questions. For example, a question
characterized as Rs (R1, V) would be included in both the
visualization and the reasoning category even though one of
the two may have no impact on performance on the question.
The alternative is to classify each question into only one
category, that category being the kind of thinking which
74
is most likely to cause the student to miss the question.
This strategy of classification is more likely to group
similar questions together and hence should improve the
average correlation between questions of the same class,
and may even result in a significant difference for the
visualization and translation categories.
Selecting the Sample
A student's ability to think in a particular way is
obviously only one of the factors that influence perfor-
mance in Chemistry 130-131. Basic interest in chemistry
and motivation to study are also very important factors.
Most of the students enrolled in this sequence of courses
are not chemistry majors but are taking the courses be-
cause they are required to do so by their major area.
In selecting those students who scored below 60 percent
on a particular kind of thinking, we chose students who were
for the most part in the bottom 40 percent of the class
when ranked by gradepoint in CEM 130. It is quite probable
that this group on the average has a lower level of interest
and motivation to study chemistry than a group ranking from
40 percent to 75 percent in class average. It has been
suggested that the CLIC project may have been more success-
ful if applied to this latter group, since these students
are likely to be more motivated and interested in the study
of chemistry. If the CLIC experiment could be run with
75
both of these groups simultaneously some interesting com-
parisons of achievement, improvement and participation could
be made.
A Piagetian Classification
A theory of intellectual development which has had an
impact on the teaching of chemistry at the college level
is that advanced by Swiss psychologist Jean Piaget. An
important aspect of Piaget's theory is his "stages of
intellectual development" which are listed below.
1. Sensori-motor stage (0-2 years)
2. Preoperational stage (2-7 years)
3. Concrete separations (7-11 years)
4. Formal operations.
Each stage represents a distinct set of abilities which
are usually developed during the ages indicated. Piaget
has designed many different tests of intellectual develop-
ment which are supposed to identify in which of the four
stages a person is operating. As mentioned earlier, recent
applications of the tests to college students indicate that
many of these students do not demonstrate formal thinking
in specific situations. In simple terms, this means that
the student does not deal successfully with problems re-
quiring the formation and testing of hypotheses involving
relationships between variables, or problems requiring
76
the control of one variable, by systematically testing
all possibilities.
Chemists have begun to look critically at the informa-
tion concerning the intellectual development of college
students and ask what effect this situation might have on
the teaching of college chemistry.35:37It is generally
agreed that much of the thinking required in introductory
chemistry is at the formal level. We are only beginning
to sort out specifically which of the various tasks re-
quired of an introductory chemistry student would not be
done by someone not demonstrating formal thinking. Some
initial work on this question has been reported by Herron38
who identifies tasks which he believes require formal
thought but does not present any empirical evidence to
support the identification.
The kind of analysis employed in the present study
would be ideally suited to the development and testing of
a classification scheme based on Piaget's concrete and
formal operations levels of intellectual development.
Questions used in Chemistry 130—131 could be classified
according to the Piagetian level required and a correlation
analysis could be run. In addition, the score on a set
of items from a chemistry test could be compared to scores
on a traditional Piagetian test. If a reliable classifica-
tion of general chemistry questions can be made, then
a diagnosis of the Piagetial level demonstrated by students
taking chemistry could be routinely obtained.
77
Furthermore, procedures for increasing the students'
tendency to think at the level of formal operations could
be developed in the context of a freshman chemistry course.
This last development would be of extreme importance to
instruction since so little presently is known about how
or if a person who is not in the practice of thinking at
the formal level in a particular situation, can be taught
to do so.
12.
13.
14.
15.
16.
17.
18.
19.
20.
B. S. Bloom, Taxonomy pf Educational Objectives: Cogni-
tive Domain, David McKay, New York (1956T.
K. V. Fast, Dissertation Abstracts, g1, 2194A (1972).
P. W. Airasian, Science Education, g5, 91-95 (1970).
H. V. Scott, Science Education, £1, 291-296 (1973).
American Association for the Advancement of Science,
Science Q Process Approach, Washington, D.C. (1964).
E. L. Smith, American Educational Research Association
Annual Meetings, Chicago, Illinois (1974).
R. M. Gagne, Interchange, ;, 1-8 (1972).
R. M. Gagne, The Essentials 9: Learning for Instruction,
Dryden, Hinsdale, Illinois (1974).
R. M. Gagne, Educational Psychologist, g, 1-9 (1968).
R. M. Gagne, Psychological Review, gg, 355-365 (1962).
R. M. Gagne and L. T. Brown, Journal g: Expegimental
Psychology, 62, 313-321 (1961).
R. S. Tannenbaum, Journal pf Research i3 Science Teaching,
8, 123-136 (1971).
C. H. Stedman, Journal 9; Research lg Science Teaching,
12, 235-241 (1973).
Jean Piaget, Journal pf Research i3 Science Teaching, 3,
176-186 (1964).
H. Beilin, Piagetian Research and Mathematical Education,
National Council of Teachers of Mathematics, Washington,
D.C. (1970).
T. A. Bredderman, Journal pf Research in Science Teach-
ing, 19, 189-200 (1973).
J. W. McKinnon, American Journal pf Physics, 32, 1047-52
(1971).
A. E. Lawson, and J. W. Renner, Science Education,
59. 545-559 (1974) .
D. Griffiths, Unpublished Ed. D. Dissertation, Rutgers
University, New Brunswick, NJ (1973).
F. W. Danner and M. C. Day, American Educational Re-
search Association Annual Meeting, San Francisco, CA,
April (1976).
78
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
79
M. R. Lawler and M. Riser, The Journal 92 Experimental
Education, 55, 45-52 (1974).
B. Z. Shakhashiri, Journal 2: Chemical Education, 52,
588-592 (1975).
D. M. Riban, Journal of Research 52 Science Teaching,
3, 72-82 (1969f. ‘—
J. H. Larkin, American Association of Physics Teachers
Meeting, Chicago, Illinois (1975).
R. N. Hammer, 167th National Meeting, American Chemical
Society, Los Angeles, California, March 1974.
Benjamin S. Bloom, Evaluation Comment, 2 (1968).
Fred S. Keller, Journal 2: Applied Behavioral Analysis,
2 (1968), 79-89.
James H. Block (ed.), Master Learning: Theory and
Practice. New York. HoIt, Rinefiart and WIHSton, Inc.,
1971.
R. H. Davis, L. T. Alexander, and S. L. Yelon, Learning
System Design, McGraw-Hill Book Company, New Yor
(1974).
R. L. Ebel, Essentials of Educational Measurement,
Prentice-Hall,Inc.,EngIewood Cliff§, New Jersey (1972).
E. Kales, Personal communication, March 1976.
D. T. Campbell and D. W. Fiske, Psychological Bulletin,
55, 82 (1959).
G. V. Glass and J. C. Stanley, Statistical Methods in
Education and Psychology, Prentice-Hall, Inc., EnglEWood
CIiffs, New Jersey (19 0).
D. G. Boyle, Students' Guide 59 Piaget, Pergaman Press,
New York (1969Y.
J. Piaget, and B. Inhelder, The Early Growth 25 Logic
in the Child, W. W. Norton and Company, Inc., New York
T196597.
B. S. Craig, Journal 22 Chemical Education, 45, 807
(1972).
D. W. Beistel, Journal 9: Chemical Education, 52, 151
(1975).
J. D. Herron, Journal 92 Chemical Education, 52, 147 (1975).
APPENDIX A
FORTRAN PROGRAMS
The following are listings of the fortran programs
used to perform the data analysis for this study. Each
listing begins with a brief description of the function of
each program.
Program READ
Program READ requests information from a nine-track
tape prepared at the scoring center. A disc file (student
record file) is created in which each record contains the
student number, exam form number and an indication of the
student's responses to each question.
FTN.
MAP(OFF)
ATTACH,TAPEIZ,WAITSP76131,PW=OLIVIA.
LGO.
UNLOAD,TAPE12.
REWIND,TAPE14.
REWIND,TAPEIS.
SORTMRG.CHEM131
CATALOG,TAPE69,SAVETHIS,RP=999,ID=BOB,TK=OLIVIA.
REWIND,TAPE69.
LISTTY,I=TAPEG9,B,NS,1-20.
PROGRAM A(INPUT=64,0UTPUT=512,TAPE12=512,TAPEI4=512,
80
81
XTAPEG=512,TAPES=512,TAPEB9=64,TAPESO=512,TAPESI=512,TAP
E36=512,
(15).
232
XTAPEB7=512)
DIMENSION JK(15),IANS(15),LD(3),IQCP(4),IACP(4,12),HOLD
XIMZT(3),LTE(6),LF(3)
IEND=0
LS=9
ACOUNT=0.
DO 232 NMR=1,15
HOLD(NMR)=0.
IQZ=0
ICHM=0
ICP=0
DO 2 JL=1,20000
C READS THE 9 TRACK TAPE FOR NEW INFORMATION.
1
READ(12,1)IT,ID,(JK(NA),NA=1,15)
FORMAT(I6,I3,15R1)
IF(EOF(12).NE.0)GO TO 98
IF(IT.NE.1)GO TO 68
C STUDENT NO. 1 READ AS SPECIAL INFORMATION
78
66
160
164
98
2009
100
530
283
291
506
544
ILQ=0
DO 78 IXD=1,4
IQCP(IXD)=0
DO 66 NJ=1,3
LD(NJ)=JK(NJ)
REWIND 89
WRITE(89,160)IT,ID,(JK(NA),NA=1,15)
FORMAT(16,I3,15R1)
REWIND 89
READ(89,164)IT,ID,(JK(NA),NA=1,15)
FORMAT(I6,I3,3R1,1211)
LS=JK(9)
IF(IQZ.EQ.0)GO TO 853
GO TO 100
IEND=1
DO 2009 NMR=1,3
LF(NMR)=LD(NMR)
LN=LS
IF(LN.EQ.1)GO TO 427
IECIS=0
REWIND 50
IF(IECIS.EQ.15)GO TO 952
READ(50,291)(IMZT(NMR),NMR=1,3),HMZT,(LTE(NMR),NMR=1,6)
FORMAT(R1,Rl,Rl,12,I6,A10,A10,Al0,A10,A10)
IF(EOF(50).NE.0)GO TO 506
GO TO 730
PRINT 544,(LF(NMR),NMR=1,3)
FORMAT(* HELP130*,5X,II,11,I1)
IF(IEND.EQ.1)GO TO 38
GO TO 854
82
730 IF(IMZT(1) .ED.LF(1) .AND.IMZT(2) .ED.LF(2) .AND.IMZT(3) .ED.LF(3) )804,
X283
804 IECIS=IEXIIS+1
C ABZT IS THE VALUE OF THE RATIO OF THE NUMBER WRONG TO THE TOTAL (ACDU
NT)
ABZT=HOLD ( IECIS) /ACIIJNT
IZZ=ACOUNT
WRITE(36,299) (IMZT(NMR) ,NMR=1 ,3) ,HMZT, (LTE(NMR) ,NMR=1 ,6) ,IZZ,ABZT
299 FORMAT(R1,R1,R1,IZ,I6,A10,A10,A10,A10,A10,I4,F8.6)
GO TO 530
427 IEKIIS=0
REWIND 51
430 IF(IE)CIS.ED.15)GO TO 952
429 RFAD(51,391) (IMZT(NMR) ,NMR=1,3) ,PHZT, (LTE(NMR) ,NMR=1,6)
391 FORMAT(R1,R1,R1,12,16,A10,A10,Al0,A10,A10)
IF(ECF(51) .NE.0)GO TO 510
GO TO 731
510 PRINT 509,(LF(NMR) ,NMR=1,3)
509 FORMAT(* HEIPl31*,5X,Il,Il,Il)
IF(IEND.EQ.1)GO TO 38
GO TO 854
731 IF(IMZT(1) .ED.I..F(1) .AND.IMZT(2) .EX).LF(2) .AND.IMZT(3) .EX).LF(3) )805,
X429
805 IECIS=IECIS+1
C ABZT IS THE VALUE OF THE RATIO OF THE NUMBER WRONG '10 THE TOTAL (AmU
NT)
ABZT=HOLD ( IECIS) /A(IIJNT
IZZ=ACDUNT
WRITE(37,296) (IMZT(NMR) ,NMR=1 ,3) ,HMZT, (LTE(NMR) ,NMR=1,6) ,IZZ,ABZT
296 FORMAT(R1,R1,Rl.12,I6,A10,A10,A10,A10,A10,I4,F8.6)
GO TO 430
952 II) 230 NMR=1,15
230 HOLD(NMR) =0.
IF(IEND.EQ.1)GO 'IO 38
ACUJN'I‘=0.
GO TO 854
68 IF(IT.NE.2)GO '10 81
C STUDENT NO. 2 READ AS KEY
82 II) 84 NJ=1,15
84 IANS(NJ)=JK(NJ)
G0 '10 2
C STUDENT NO. 3 READ FOR CORRECTIONS TO THE KEY
81 IF(IT.NE.3)GO ‘IO 67
ILQ=1
REWIND 89
WRITE(89,1201) IT,ID, (JK(NA) ,NA=1 ,15)
1201 FORMAT(16,I3,15R1)
REWIND 89
READ(89,1202) IT,ID, (JK(NA) ,NA=1 ,15)
1202 FORMAT(16,I3,211,13R1)
ICP=ICP+1
ICflP(ICP)=JK(1)*10+JK(2)
83
DO 77 ICD=3,12
77 IACP(ICP,ICD)=JK(ICD)
CD ‘10 2
C CONVERI'S RESPONSES(R FORMATL A 1 IF CORRECT, A LEI'IER REPRESENTED TH
E
C RESPONSE CPIBEIN IF INCORRECT)
67 DO 17 NJ=1,15
IF(IANS(NJ)-JK(NJ))14,18,14
18 JK(NJ)=1R1
(I) TO 17
14 IF(IID.E0.0)GO ‘IO 71
DO 79 NA=1,4
IF(NJ.EO.IQCP(NA))GO ‘IO 39
79 CONTINUE
GO TO 71
39 DO 75 III=3,12
IF(IACP(NA,III) .ED.1R9)GO 'IO 71
75 IF(JK(NJ) .EO.IACP(NA,III) )GO TO 18
71 JK(NJ)=JK(NJ)-3ZB
17 CONTINUE
ACOJINT=AOIJNT+L
DO 210 NMR=1,15
IF (JK(NMR) .EO.1R1)GO ‘10 210
C HOLD KEEPS TRACK OF THE NUMER WRONG PER EXAM FORM NUMBER
HOLD (NMR) =HOID (NMR) +1.
210 CONTINUE
IF(LS-1)94,27,27
94 WRITE(5,2047)IT,ID
2047 WT(I6,I3.3)
C WRITES ON TAPE 14 FG? CHEM 130 STUDENTS
WRITE(14,90) IT, (LD(NA) ,NA=1 ,3) ,ID, (JK(NA) ,NA=1,15)
90 FORMAT(I6,3R1,I3,15R1)
m TO 2
27 WRITE(6,1749)IT,ID
1749 FORMAT(I6,I3.3)
C WRITES ON TAPE 15 FOR CHEM 131 STUDENTS
WRITE(15,91) IT, (LD(NA) ,NA=1 ,3) ,ID, (JK(NA) ,NA=1 ,15)
91 FORMT(I6,3R1,I3,15R1)
GO TO 2
853 IQZ=1
854 DO 609 NMR=1,3
609 LF(INMR)=LD(NMR)
LN=LS
IF(ICI~M.NE.I)GO TO 126
C WRITBS ON TAPES FOR ACCESS BY TELETEST FOR CHEM 130 SCORES.
WRITE(5,70)LD(1)
70 FORMAT(*S(DRE*/*SO)RE*/Rl ,* ,0*)
C WRITES ON TAPEG FOR ACCESS BY TELETES‘T FOR CHEM 131 SCORES.
WRITE(6,72)LD(1)
72 FORMAT(*SCI)RE*/*SCDRE*/R1 ,* ,0*)
ICHM=1
126 IF(LS-1)1524,1523,1523
1524 WRITE(5,8S) (JK(NA) ,NA=4 ,8) ,LD(1)
85 FORMAT(*CD*/2I1.*.*.2I1/Rl,*.0*/*AUIO*/Rl)
84
m TO 2
1523 WRITE(6,97) (JK(NA) ,NA=4,8) ,LD(1)
97 FORMAT(*CD*/211,*.*,ZIl/R1,*.0*/*AUTO*/R1)
2 CONTINUE
38 WRITE(5,74)
74 FORMAT(*E*)
WRITE(6,105)
105 FORMAT(*E*)
PRINT 1005,JL
1005 FORMAT(* THE LOOP IS NON AT*,IS)
98 EDDFILE 14
ENDFILE 15
EM)
SORT(1,1,90)
FILE (TAPE15,S,D, ,O,N)
FILE(TAPEG9,0,D, ,O,N)
KEY(A,C,7 ,9)
REEORD(I,U,90)
END
SORT(1,1,90)
FILE (TAPE37 ,S,D, ,O,N)
FILE (TAPE69,M,D, ,O,N)
FILE (TAPE47 ,O,D, ,O,N)
KEY(A,C,1,5)
REXZORD(I,U,90)
SORT(1,1,90)
FILE (TAPE15 ,S,D, ,O,N)
FILE (TAPE68 ,M,D, ,O,N)
FILE (TAPE60 ,O,D, ,O ,N)
KEY(A,C,7 ,9)
REEORD(I,U,90)
EM)
Program PERCENTAGES
Program PERCENTAGES processes the ECI file and the student record
file and calculates the proportion of questions from a specified class
of questions, which the student has answered correctly.
ATTACH ,TAPESl , SPMP ,PW-=OLIVIA .
ATTACH ,TAPE52 , SPMR , PW=OLIVIA .
ATTACH ,TAPE53 , SPMl ,PW=OLIVIA.
ATTACH ,TAPE54 , SPMZ ,PW=OLIVIA .
SORTMRG .
SORTMRG .
REWIND ,TAPEZ .
REWIND ,TAPE8 .
ATTACH ,TAPE10 , FINALl 31 ,PW=OLIVIA.
85
FIN.
mo.
CATALCE ,TAPEl 5 ,SP76FINAL131 , ID=BOB ,RP=999 ,TK=OLIVIA.
REWIND,TAPE15.
COPYSBF ,TAPEIS ,OUT'PUT .
SORT(2,1 ,90)
FILE(TAPE51 ,S,D, ,O,N)
FILE (TAPE52,S,D, ,O,N)
FILE (TAPE2,0,D, ,O,N)
KEY (A,C,1 ,5)
RECORD(I ,U,90)
END
SORT(2,1 ,90)
FILE (TAPE53,S,D, ,O,N)
FILE (TAPE54,S,D, ,O,N)
FILE (TAPES ,O,D, ,O,N)
KEY(A ,C ,1 ,5)
RECORD(I ,U,90)
END
PRmRAM B (INPUT=64 ,OUTPUT-f512 ,TAPE2=512 ,TAPE8=512 ,TAPE10=512 ,TAPE
Xl4=512 ,TAPEl 5=512 ,TAPE20=51 2 ,TAPE80=512)
DIMENSION JK(15) ,IBMA(15) ,ICMA(15) ,IHAD(15) ,IBT(15) ,AN(15)
400' ILTC=0
KN=0
KQ=0
KL=0
ILT‘=0
19 READ(10,3)ISN,K,NS, (JK(NA) ,NA=1,15)
3 EORMAT(I6,I3,I3,15A1)
IF(EOF(10) .NE.0)GO TO 121
4634 BP=0.
BR=0.
8&0.
IF(ILT.EQ.1)GO TO 139
98 IF(KL-K)62,36,139
36 II) 5 NA=1,15
IF(IBMA(NA) .ED.0)GO TO 139
NIFIBMA(NA)
BP=BP+1.
IF (JK(NA) .NE.1H1)GO TO 97
101 BR=BR+1.
97 BC=BR/BP*100.+.5
5 CONTINUE
139 IBP=BP
IBR=BR
IBC=BC
GO TO 39
17 IL'I‘=1
GO TO 139
39 WRITE(14,8) IQN,K,NS, (JK(NA) ,NA=1,15) ,IBR,IBP,IBC
8 FORMAT(I6,I3,12,15A1,I2,IZ,I3)
GO TO 19
62 NB=1
DO 72 NA=1,15
86
72 IBMA(NA)=0
IF(KL.EQ.0)GO TO 78
IBMA(1)=KQ
NB=2
78 DO 52 NZ=NB,16
KL=KN
IF(ILTC.ED.1)GO TO 17
83 READ(2,1)KN,IBMA(NZ)
1 FCRMAT(I3,IZ)
IF (EGWZ) .NE.0)GO TO 57
IF(KL.EQ.0)GO TO 52
IF(KN-KL)52,52,61
52 CONTINUE
GO TO 61
57 ILT'C=1
61 KQ=IBMA(NZ)
IBMA(NZ)=0
GO TO 98
121 REWIND l4
ILTC=0
ILT=0
KN=0
KQ=0
KL=0
16 READ(14,99) ISN,K,NS, (JK(NA) ,NA=1 ,15) ,IBR,IBP,IBC
99 FORMAT(I6,I3,IZ,15A1,I2,I2,I3)
IF(EOF(14) .NE.0)GO TO 2027
CP=0.
CR=0.
CC=0.
IF(ILT.EQ.1)GO TO 149
88 IF(KL—K)9,46,149
46 DO 26 NA=1,15
IF(ICMA(NA) .ED.0)GO TO 149
NL=ICMA(NA)
CP=CP+1.
IF(JK(NA) .NE.1Hl)GO TO 87
111 CR=CR+1.
87 CG-CR/CP*100.+.5
26 CONTINUE
149 ICP=CP
ICR=CR
ICC=CC
GO TO 49
27 ILT=1
GO TO 149
49 WRITE (15,18)ISN,K,1NS, (JK(NA) ,NA=1 ,15) ,IBR,IBP,IBC,ICR,ICP,ICC
18 FORMAT(I6,I3,12,15A1,12,IZ,I3,12,12,I3)
GO TO 16
9 NB=1
DO 71 NA=1,15
71 ICMA(NA)=0
IF(KL.EQ.0)GO TO 68
IONA(1)=KQ
87
NB=2
68 DO 51 NZ=NB,16
KIFKN
IF(ILTC.ED.1)GO TO 27
84 READ(8,10)KN,ICMA(NZ)
1I FORMAT(I3,12)
IF(EO?(8) .NE.0)GO TO 56
IF(KL.ED.0)GO TO 51
IF (KN-KL) 51 , 51 ,41
51 CONTINUE
(I) TO 41
56 ILTC=1
41 KQ=I01A(NZ)
IONA(NZ)=0
GO TO 88
2027 CONTINUE
END
Program PRINTOUT
Program PRINTUUT calculates average subtest scores for all tries
of each exam for each student and then a grand average subtest score
for the entire term.
ATTACH ,TAPE15 ,SAVEWAIT76131 ,PW=OLIVIA.
PNPURGE ,PPN=IWAIT131 .
ATI'ACH ,TAPE72 ,1 IWAIT131 ,PW=OLIVIA.
SOKl‘MRG.
CATALCB ,TAPEZ , SP76STUDENTRECORIB , ID=BCB ,RP=999 ,TK=OLIVIA .
REWIND,TAPE2.
MAP (OFF)
F'IN.
IGO.
REWIND,TAPE20 .
COPYBF,TAPE20 ,OUI'PUT.
SORT(2 ,1 ,90)
FILE (TAPE15,S,D, ,O ,N)
FILE (TAPE72,S,D, ,O,N)
FILE (TAPEZ ,O,D, ,O,N)
KEY (A ,C,l ,9)
RECORD (I ,U ,90)
END
PRCXSRAM PR (IWSIZ ,OJT'PU'I‘=512 ,TAPE2=512 ,TAPE20=512 ,TAPE42=512 ,
XTAPE4 3)
DATA J,N,I ,IT,IWS*0/,ZA,ZB,ZC,ZD,AB,AR,BR,DR,ER,BB,DB,EB,SCORE,
XCOUNT,WRMA ,WRMB,WRRA,WRRB/18*0./
88
WRITE(42,1000)
1000 WT(*SW MIMBER* '5X'*M(1) *,5X,*M(1-3) *'SX,*M(4_6) *,5X,*R(1
)
X*,5X,*R(1-3)*,5X,*R(4-6)*,5X,*TOI‘AL PERCENT”)
2 JK=J
IF(JK.ED.0)GO TO 1002
IF(Im.NE.0)GO TO 99
906 WRMA=WRMA+A
WRMB=WRMB+B
WRRA=WRRA+D
WRRB=WRIB+E
IF(J.NE.1)GO TO 400
IF(WEMB.EQ.0)O) TO 201
MPER=WR4A/WHNB*1 00 . +. 5
203 IF(WRIB.EQ.0)GO TO 205
IRPER=WRRA/WRRB*100.+.5
(I) TO 400
201 MPER=0
GO TO 203
205 IRPER=0
400 IF(J.GT.3)GO TO 800
IF(WRBB.EQ.0)GO TO 401
MMR=WR4A/WRMB*100.+.5
403 IF (WRRB.EQ.0)CD TO 405
IRRPER=WRRA/WRRB*1 I0 . +. 5
GO TO 1002
401 MMPER=I
GO TO 403
405 IRRPER=0
GO TO 1002
800 AR=AR+A
BR=BR+B
DR=DR+D
ER=ER+E
IF(BR.EX).0)GO TO 901
LAPERM=AR/BR*100.+.5
903 IF(ER.EQ.0)GO TO 905
LAPERR=DR/ER*100.+.5
GO TO 1002
901 LAPERM=0
GO TO 903
905 LAPERR=0
GO TO 1002
99 IF(CONI'.NE.1)GO TO 980
IPER=0
GO TO 1005
980 IPER=((SO)RE-K)/((COJNT-l.)*15.))*100.+.5
1005 WRITE(42,98) IA,MPER,MMPER,LAPERM,IRPER,IRRPER,LAPERR,IPER
98 FORMAT(4X,I6,8X,I4,6X,I4,7X,I4,6X,I4,6X,I4,7X,I4,10X,I4)
WRITE(43,67) IA,MPER,W1PER,LAPERM,IRPER,IRRPER,LAPERR,IPER
67 FORMAT(16,I4,I4,I4,I4,I4,I4,I4)
IF(IT.EQ.1)GO TO 100
Im=0
89
SCDRE=K
OJUNT=1.
WRMA=0.
WRMB=0.
W0.
WRRB=0.
AR=0.
BR=0.
Dk0.
BR=0.
GO TO 906
1002 IA=I
READ(2,1)I,N,K,KA,KB,KC,A,B,C,D,E,F
1 EOH‘IAT(I6,I3,12,A5,A5,A5,F2.0,F2.0,F3.0,F2.0,F2.0,F3.0)
IF (EOF(2) .NE.0)GO TO 200
SCDRE=SCDRE+K
COINT=COJNT+L
IAA=A
IBB=B
ICC=C
IDD=D
IEE=E
IFF=F
J=N/100
GO TO 51
200 IT=1
GO TO 10
51 IF(IA.EQ.0)GO TO 11
IF(IA.NE.I)GO TO 10
IF(JK.NE.J)GO TO 8
11 AB=AB+A
BB=BB+B
DB=IB+D
EB=EB+E
WRITE(20,26)I,N,K,KA,KB,KC,IAA,IBB,ICC,IO),IEE,IFF
26 FCRMAT(1X,I6,1X,I3,1X,12,1X,A5,A5,A5,I2,12,I3,12,12,I3)
GO TO 2
8 IF(BB.EQ.0.)GO TO 74
APA=AB/BB*100 .
(I) TO 22
74 APA=0.
22 IF(EB.EQ.0.)CD TO 77
APB=[B/EB*100.
GO TO 33
77 APB=0.
33 IPB=APB+.5
IPA=APA+.5
LA=AB
=BB
LD=DB
LE=EB
WRITE(2I,27)JK,LA,LB,IPA,LD,LE,IPB
27 FORMAT(1H+,45X,*THE TOTALS FOR EXAM*,I3,* ARE*,I4,I4,IS,I4,I4,IS)
90
ZA=ZA+AB
ZB=ZB+BB
ZC=ZC+lB
ZD=ZD+EB
WRITE(20,88)I,N,K,KA,KB,KC,IAA,IBB,ICC,III),IEE,IFF
88 FORMAT(1X,I6,1X,I3,1X,I2,1X,A5,A5,A5,I2,12,I3,12,I2,I3)
AB=0.
BB=0.
BB=0.
BB=0.
AB=AB+A
BkBBi-B
DB=DB+D
EB=EB+E
GO TO 2
10 IF(BB.EQ.0.)(I) TO 75
APA=AB/BB*1 00 .
GO TO 23
75 APA=0.
23 IF(EB.EQ.0.)GO TO 76
APB=DB/EB*100.
GO TO 34
76 APB=0.
34 IPA=APA+.5
IPB=APB+.5
LA=AB
LB=BB
LD=DB
LE=EB
WRITE(20,28)JK,LA,LB,IPA,LD,LE,IPB
28 FORMAT(1H+,45X,*THE TOTALS FOR EXAM*,I3,* ARE*,I4,I4,I5,I4,I4,IS)
ZA=ZA+AB
ZB=ZB+BB
ZC=ZC+DB
ZD=ZD+EB
IF(ZB.EQ.0)GO TO 105
ZPA=ZA/ZB*100 .
GO TO 106
105 ZPA=0.
106 IF(ZD.EQ.0.)GO TO 1I7
ZPB=ZC/ZD*100 .
GO TO 108
107 ZP$0.
108 MV=ZA
MX=ZB
MY=ZC
MZ=ZD
IZPA=ZPA
IZPB=ZPB
Im=l
WRITE(20,38) MV,MX,IZPA,MY,MZ ,IZPB
38 FORMAT(46X,*THE GRAND TOTALS ARE*,6X,I4,I4,IS,I4,I4,IS,/)
IF(IT.EQ.1)GO TO 2
1'.
.r‘rid"‘31
91
ZA=0.
ZB=0 .
ZC=0 .
ZD=0 .
AB=0.
BB=0.
DB=0 .
EB=0.
GO TO 11
1I0 CONTINUE
END
READY 00.06.32
Program FACTOR
Program FACIOR performs a factor analysis on specified exams and
produces a matrix of inter-item correlation coefficients.
HAL,SPSS,D=X.
REWIND,HIIIIJT.
MAP(OFF)
FTN.
LO).
REWIND,TAPE6.
COPYSBF ,‘MPE6 ,OJT‘PUT.
RUN NAME FACTOR ANALYSIS FOR CEM 130 EXAM 217
DATA LIST FIXED /1 STUNUM 1-6,EFN 7-9,S 10-11,Ql TO 015 12-26 (A)/
SELECT IF (EFN ED 217)
N OF CASES UNKNOAN
ng Q]. m 015 (usfl'lAfl,IBI'ICIO'IDI'IIEI'OIFII'IIGII'IHII'IIIII'IJUI=0)
(CONVERT)
FREQJENCIES INTEGERfll TO 015 (0,1)
OPTIONS 8
STATISTICS 1,5
READ INPUT DATA
FACTOR VARIABLES=QI TO 01 5/T'YPE=PA2/FACSCORE/NFAC'IOFB = 3/
OPTIONS 5
STATISTICS ALL
FINISH
PRCBRAM A (INPUT=64 ,OJTPUT‘=112,m,TAPES=BCDOJT,TAPE6=112)
DIMENSION B(15,15) ,C(15,15)
SD=0
XS=0
WRITE(6,3)
3 FORMAT(52X,*X VALUES* ,17X,*Z VALUES*,//)
DO 50 I=1,15
50 READ(5,6) (B(I,J) ,J=1,15)
6 EORMAT(8F10.7)
DO 5 NL=1,15
DO 5 NA=1,15
92
DO 5 NL=1,15
DO 5 NA=1,15
IF(NA.EQ.NL)GO TO 5
XS=XS+B (NL ,NA)
5 CONTINUE
XA=XS/225.
DO 8 NLF1,15
DO 8 NA=1,15
IF(NA.EQ.NL)GO TO 8
C(NL,NA) =.5*AL£I;( (1+B(NL,NA) )/(1-B (NL,NA) ))
8 CONTIMJE
DO 9 NLF1,15
DO 9 NA=1,15
IF(NA.EQ.NL)GO TO 9
WRITE(6,2)B(NL,NA) ,C(NL,NA)
2 FORMAT(50X,F10.7,15X,F10.7)
9 CONTINUE
WRITE(6,10)XA
10 FORMAT(//,* THE AVERAGE IS*,F10.7)
EIND
READY 00.17.42
Program ANOVA
Program ANOVA performs a one-way and a two~way analysis of variance
as well as posthoc analyses on the average correlation coefficients.
HAL,SPSS.
RUN NAME ANOVA AND ONEWAY VISUALIZATION VS DUN-VISUALIZATION
DATA LIST FIXED /1 EXAM 1-4,ZRBAR 6-9,@ 11 ,CL 13,W\YONE 15/
N OF CASES 24
ANOVA ZRBAR BY CO (1,2) CL (1,2)/
READ INPUT DATA
ONEWXY ZRBAR BY WAYONE (l,4)/
RABBES = TUKEY/
RAMSES = SCHEFFE (.05)/
STATISTICS ALL
FINISH
APPENDIX B
SAMPLE QUESTION CLASSIFICATIONS
To help clarify the method of characterizing questions,
a series of sample questions and their classifications are
given below.
1. "An empty aluminum Coke can weighs 50 grams. How
many moles of aluminum does one Coke can contain?
(Atomic weight A1=27)"
Ans. 1.85 moles
Classification: R1(M1) The relationship being
applied is 27 grams A1 = 1 mole. The student
must find the number of moles in 50 grams by
setting up and solving a linear equation.
2. "Elements which are most metallic are found in
what general area of the periodic chart?"
Ans. Lower left.
Classification: Mr This question is included
because it can easily be interpreted in two
different ways. We could say that a property
of the most metallic elements is that they
are located in the lower left on the periodic
chart. An alternative interpretation is that
there is a relationship between the metallic
character of an element and its position on
the periodic chart. By convention, the mem-
orized information is interpreted as a rela-
tionship and the question is classified as
Mr,
3. "Fifteen grams of nitric oxide (NO) contain how
many molecules?"
Ans. 3.0 x 1023
C1a331f1cation Rs (TmC,R2(M ))
1e
93
94
The R2 implies two relationships being applied to the
problem. The student must realize that for N0, 1 mole =
20 grams. This step is Tmc, since the chemical symbol N0
is translated to a mathematical relationship. At this point
the relationship is used to find the number of moles in
15 gram R1(M1) and finally the memorized relationship 1
mole = 6.02 x 1023 particles is used to find the number of
molecules. (R1(Mle)). The e designates the use of a
number written in scientific notation. Whenever the results
of an R process are used in a subsequent R process, the
two can be combined into one R2 process. The RS designation
is used because the question requires the sequencing of
the translation and reasoning steps.
4. "What is the percentage by weight of fluorine in
phosphorus (III) fluoride" (PF3)?
Ans. 65
Classification R3 (Tmc, R2(M1))
The question is given the classification R3 because in its
associated instructional setting the step by step procedure
for solving these types of problems is given to the student.
The solution then becomes a matter of following the direc-
tions given. Within the algorithm, the student is in-
structed to translate from the chemical symbol to the
mathematical relationship between the number of constituent
atoms in a molecule. The number of atoms is then converted
to the weight of the atoms which is then expressed as a
percentage (R2(M1)).
APPENDIX C
CALCULATING GROUPED AVERAGE CORRELATION COEFFICIENTS
The following example is included to help clarify the
procedure for calculating the average correlation co-
efficients for the groups of questions A, B, C, D.
In this example,a 15 item test is analyzed for the
reasoning category. The content of the test is divided into
two topics, T1 and T2. The first step is to identify questions
containing a reasoning process. In this case they are:
l. Reasoning Questions: 2, 4, 6, 8, 12, 13, 14, 15
Questions which do not require reasoning are therefore:
Nonreasoning Questions: 1, 3, 5, 7, 9, 10, 11
2. The questions are also classified by topic.
Topic 1 Questions: 1, 2, 4, 5, 10, ll, 14, 15
Topic 2 Questions: 3, 6, 7, 8, 9, 12, 13
3. Groups of correlation coefficients are formed as
follows:
Group A is all coefficients between questions from
the same content which require reasoning.
That is, Group A = {r(K1Tx, KlTx)}
where Kl = kind of thinking 1, which in
this case is reasoning;
and r = the correlation coefficient operator.
Group A coefficients are:
95
96
r r r
2.4 r2,15 ”4,15 8,12 6,12 ”12,13
r2.14 r4,14 ”8,6 ”8,13 ”6,13 ”14,15
The values of the above correlation coefficients are
then assigned to obtain an average correlation for
Group A.
Group B is all coefficients between pairs of questions
from the same topic with only one of the pair being a
reasoning question.
That is, Group B = {r(K1Tx, KyTx)} y # 1
Group B coefficients are:
”1,2 ”2,5 ”4,10 ”10,14 ”3,6 ”3,13 ”7,12
”1,4 ”2,10 ”4,10 ”10,15 ”3,8 ”6,7 ”7,13
”1,14 ”2,11 ”5,14 ”11,14 ”7,8 r6,9 ”8,9
”1,15 ”4,5 ”5,15 ”11,15 ”3,12 ”7,8 ”9,12 ”9,13
Group C is all coefficients between pairs of questions
both of which require reasoning but each of the pair
being from a different topic.
That is, Group C = {r(K1 Tx'KlTy)} x # y
Group C coefficients are:
r r
”2,8 ”2,6 2,12 ”4,12 ”8,14 6,14 ”12,14 r13,14
”2,13 ”4,8 ”4,6 ”4,13 ”8,15 ”6,15 r12,15 ”13,15
Group D is all coefficients between pairs of questions,
only one of which requires reasoning, with each of
97
the pair being from a different topic.
That is, Group D = {r(K1Tx,KyTz)} x i z, y # 1
Group D coefficients are:
”2,3 ”4,7 ”8,10 ”6,10 ”12,10 ”13,10
”2,7 ”4,9 ”8,11 ”6,11 ”12,11 ”13,11
”2,9 ”8,1 ”6,1 ”12,1 ”13,1
”4,3 ”8,5 ”6,5 ”12,5 ”13,5
In this manner, one test would yield four average cor-
relation coefficients, one from each of the groups A, B,
C and D. A set of tests analyzed for the reasoning cate-
gory would yield a set of average correlation coefficients
for each group. These sets of coefficients are then used
as the data for the analysis of variance.
The specific topics which were chosen for this analysis
are listed below.
1. CEM 130 Exam 2
Topic 1 - Crystal Structure
Topic 2 - Electromagnetic Radiation
Topic 3 - Structure Determination in Crystals.
2. CEM 130 Exam 3
Topic 1 - Particles and Waves
Topic 2 - Emission Spectroscopy
Topic 3 - Quantum Numbers
98
CEM 131 Exam 1
Topic 1 - Ideal Gases
TOpic 2 - Phase Transformations
CEM 131 Exam 2
Topic 1 - The Equilibrium Constant
Topic 2 - Calculations Based upon the Equilibrium
Law
CEM 131 Exam 3
Topic 1 - Solutions
Topic 2 - Concentration and Colligative Properties
Topic 3 - Ionic Equilibria
APPENDIX D
CLIC TAPE OUTLINES
CLIC Tape A-l
1. Memory Skills
A. The importance of memorization
B. Note taking
1. Noting definitions and examples
2. Previewing study guide questions
3.) The 2-5-1 format
C. Reviewing your notes
1. Cueing
2. Establishing memory traces
3. Association and understanding
4. Repression - developing a good attitude
5. Self confidence
6. Timing your review
II. Application to Chemistry Concepts
A. Vapor pressure
1. The "gas can” example
2. Factors effecting vapor pressure
B. Cooling curves
1. A related experiment
2. Heat capacity.
CLIC Tape A-2
I. Review of Memory Skills
A. Lecture cueing
99
II.
II.
100
B. Examples
C. Note taking and studying
Application to Concepts of Chemistry
A. Irreversible Processes
1. Definitions: reversible, irreversible
2. Examples
B. Equilibria
1. Reaction rates
2. The equilibrium law
3. LeChatelier's Principle
a. Changing concentration
b. Changing pressure
c. Changing temperature
CLIC Tape A-3
Review of Memory Skills
A. Using the study guide
B. Lecture cues - examples
C. Studying and repression
Chemistry Concepts
A. Solutions and Mixtures
B. Concentration terms
1. Normality
2. Molarity
3. Molality
4. Weight percent
5. Saturated
6. Supersaturated
II.
101
Factors which effect solubility
1. Charge density - charge to radius ratio
2. Temperature
3. Pressure
Colligative properties
Electrolytes
CLIC Tape B-l
Developing Reasoning With Math Skills
A. Symbolic equations
1. Properties represented
2. Units of the variables - units conversion
3. Manipulating symbolic equations
4. Using two symbolic equations in sequence
5. Checking your mathematics
B. A General approach to problem solving
1. Reading the problem, noting givens and unknowns.
2. Applying relationships to the problem
3. Setting up the solution
4. Checking the units and the math
Applications
A. Ideal Gas Law calculations
1. PV = nRT
2. Boyles and Charles Law problems
B. Dimensional analysis applied to specific heat
problems
II.
II.
102
CLIC Tape B-2
Problem Solving Principles
A. Using symbolic equations in problem solving
1. Symbol-variables
2. Units-unit conversion
3. Manipulating the equation
B. Developing a problem solving approach
C. Using dimensional analysis
Applications
A. The Equilibrium Law
1. The symbolic equation
2. Variables and units
3. Working with initial concentrations
CLIC Tape B-3
Review of Symbolic Equations
A. Variables and units
B. Deriving new equations
C. Unit conversions
D. Dimensional analysis
Applications
A. Colligative Properties
1. Freezing point depression
2. Boiling point elevation
B. Weight percent problems
C. Concentration problems and dimensional analysis