14
Evaluating Instructional Technology Implementation in a Higher Education Environment CHERYL BULLOCK AND JOHN ORY ABSTRACT For decades colleges and universities have experimented with various technological innovations to deliver instruction. Although not new in higher education, use of learning technology has greatly increased in recent years (Burnaska, 1998). This increased use brings an increased need for understanding the methodologies and approaches best suited to the evaluation of learning technologies in higher education. Here we first review the literature and describe the methods used in a myriad of evaluation studies in this area. Our primary purpose, however, is to describe our experiences evaluating a campus-wide learning technology effort (the SCALE Project) at the University of Illinois at Urbana-Champaign. This evaluation spanned three years and used multiple methods. More importantly, it provided some challenges from which others may learn and benefit. INTRODUCTION Colleges and universities have long experimented with various technological innovations to deliver instruction, including radio, television, videotape, and computers. Over the years many associated efforts have been made to study the impact of these technologies on student learning. Although many of these studies have been done under the heading of research, they are in fact evaluations (a term that implies making a value judgment) of the impact of technology-enhanced instruction. An alternative statement of the aim of these studies, which is more consistent with the work presented here, is to understand how educational interven- tions perform by observing and measuring the teaching and learning process, or some small slice of it. Many of the evaluations in this area are impact studies designed to make comparisons Cheryl Bullock Head, Division of Measurement & Evaluation, Office of Instructional Resources, 247 Armory Building, Champaign, Illinois 61820; Tel: (217) 333-3490; E-mail: [email protected]. American Journal of Evaluation, Vol. 21, No. 3, 2000, pp. 315–328. All rights of reproduction in any form reserved. ISSN: 1098-2140 Copyright © 2001 by American Evaluation Association. 315

Evaluating instructional technology implementation in a higher education environment

Embed Size (px)

Citation preview

Page 1: Evaluating instructional technology implementation in a higher education environment

Evaluating Instructional TechnologyImplementation in a Higher EducationEnvironment

CHERYL BULLOCK AND JOHN ORY

ABSTRACT

For decades colleges and universities have experimented with various technological innovationsto deliver instruction. Although not new in higher education, use of learning technology hasgreatly increased in recent years (Burnaska, 1998). This increased use brings an increased needfor understanding the methodologies and approaches best suited to the evaluation of learningtechnologies in higher education. Here we first review the literature and describe the methodsused in a myriad of evaluation studies in this area. Our primary purpose, however, is to describeour experiences evaluating a campus-wide learning technology effort (the SCALE Project) at theUniversity of Illinois at Urbana-Champaign. This evaluation spanned three years and usedmultiple methods. More importantly, it provided some challenges from which others may learnand benefit.

INTRODUCTION

Colleges and universities have long experimented with various technological innovations todeliver instruction, including radio, television, videotape, and computers. Over the yearsmany associated efforts have been made to study the impact of these technologies on studentlearning. Although many of these studies have been done under the heading of research, theyare in fact evaluations (a term that implies making a value judgment) of the impact oftechnology-enhanced instruction. An alternative statement of the aim of these studies, whichis more consistent with the work presented here, is to understand how educational interven-tions perform by observing and measuring the teaching and learning process, or some smallslice of it.

Many of the evaluations in this area are impact studies designed to make comparisons

Cheryl Bullock ● Head, Division of Measurement & Evaluation, Office of Instructional Resources, 247 Armory Building,Champaign, Illinois 61820; Tel: (217) 333-3490; E-mail: [email protected].

American Journal of Evaluation, Vol. 21, No. 3, 2000, pp. 315–328. All rights of reproduction in any form reserved.ISSN: 1098-2140 Copyright © 2001 by American Evaluation Association.

315

Page 2: Evaluating instructional technology implementation in a higher education environment

between technology-enhanced instruction and traditional, face-to-face instruction. So, whathave these evaluations determined? A recent review of contemporary research on theeffectiveness of distance learning in higher education (including synchronous and asynchro-nous instructional delivery) by the Institute for Higher Education Policy (Gold & Maitland,1999) concluded:

With few exceptions, the bulk of these writings suggest that the learning outcomes ofstudents using technology at a distance are similar to the learning outcomes of studentswho participate in conventional classroom instruction (p. 1).

This conclusion is also supported by Thomas Russell’s (1999) annotated review of 355studies that evaluate the impact of technology-enhanced instruction. In a report titledThe nosignificant difference phenomenon, Russell emphasizes the reoccurrence of the researcher’sconclusion of “no significant difference” on a variety of outcome measures, between studentstaught using technology and those who were not.

Other investigators raise concerns about the quality of the evaluations that haveexamined learning technologies. Their contention is that some studies did not find significantdifferences because they were inadequately designed (Ester, 1995; Khali & Shashaani, 1994;Kulik & Kulik, 1991; Martin & Rainey, 1993). Our primary focus here is not the findings ofpast evaluations. Instead, we focus on gaining an understanding of how past evaluations oftechnology-enhanced evaluations were conducted and we offer suggestions that may behelpful to future evaluators in this area.

A BRIEF REVIEW OF PAST EVALUATION METHODS

We began our review of past studies of the impact of technology-enhanced instruction hopingto learn what evaluation models or approaches have been followed in the past. We wantedto identify the important evaluation questions, the types of data routinely collected, and thetypes of problems encountered. Our review supported the following three conclusions: (1)multiple data collection methods have been used to collect information from multiplesources; (2) various evaluation models or approaches have been followed; and (3) there area common set of problems and concerns about past evaluations of technology-enhancedinstruction.

Conclusion 1: Multiple Data Collection Methods Have Been Used to CollectInformation from Multiple Sources

A close reading of Russell’s (1999) review of 355 studies, which assessed the impact oftechnology-enhanced instruction, revealed the use of a wide assortment of measures ordependent variables. The many types of data collected in studies conducted from 1928 to thepresent include classroom achievement exam scores; standardized test scores (on measuresof critical thinking, reading comprehension, science reasoning, spatial relations, and specificcontent areas); course grades; course assignments or products; student behaviors; studentretention; student attitudes and ratings of instruction; and, finally, costs.

Common methods used to collect these data include exams and quizzes, audit trails,tracking tools, teacher and student journals, classroom observations, videotaping, question-naires, interviews, and focus groups (McKenna, 1995). Probably the most common approach

316 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 3: Evaluating instructional technology implementation in a higher education environment

to collecting evaluative data involved pre- and postcourse measures of student achievement,often using either classroom exams, standardized exams, or both. Studies of this type havebeen used throughout the century to assess the impact of “new” technologies, includingcorrespondence courses (Crump, 1928), television (Schramm, 1962; Suchy & Baumann,1960), audiotapes (Popham, 1961), and computers (Clarke, 1992; Goldberg, 1997; Hiltz,1997). Many of these same researchers were also interested in learning how and what thestudents thought about the new technology. Efforts have often been made to measure studentattitudes about instructional innovation, subject matter, and teacher effectiveness, and thecourse in general. Attitudinal data were collected using paper and on-line surveys, individualand group interviews, and student journals.

Conclusion 2: Various Evaluation Models or Approaches Have Been Followed

No single evaluation approach or model rises above the others when hundreds oftechnology impact studies are reviewed (Oliver, 1997). Instead, one finds the influences ofmany different evaluation approaches. This situation was well described by Allan Avner, anevaluation specialist for PLATO (one of the, if not the, first computer instruction deliverysystems) when he was asked how they conducted evaluations: “To tell you the truth I doubtif any specific evaluation model was followed. . . [instead] a bit of this and a bit of that wouldbe used to design an evaluation approach” (personal communication, July 2, 1999).

Concepts from several well-known evaluations models were evident in our review of theliterature. These models have been written about over the years as objectives-based, goal-free, and responsive or illuminative. A relatively new approach, one that to date may berestricted largely to the evaluation of instructional technology, was also identified in ourreview. Researchers referred to this model as integrative. Following is a brief discussion ofthe different evaluation models used to evaluate the impact of technology instruction.

Quasi-experimentation comparativeevaluations compare one instructional innovationapproach to another or to a traditional method of instruction. Comparative studies areprobably the most often used approach to evaluating technological innovations, and also themost often criticized (Reeves, 1994). Being comparative in nature, they are often difficult toarrange. Further, they pose an ethical dilemma: Should some groups be denied, for the sakeof the evaluation, access to resources with potential to enhance their learning environment(Oliver, 1997)? (Of course, if the technology does not in fact usually enhance learning, thiscriticism may be less potent.) Another criticism has been that although it is not inherentlyrequired, comparative studies in the technology literature in practice have often emphasizedthe use of quantitative over qualitative data. The absences of qualitative data frequently ledto difficulty in understanding why a difference did, or did not, occur (Jones, Scanlon,Tosunoglu, Ross, Butcher, Murphy, & Greenberg, 1996).

Objectives-basedevaluations (Popham, 1972; Tyler, 1942) focus on student attainmentof prespecified goals and objectives. Studies of this type most commonly use pre- andpostinstruction measures, such as tests or surveys, developed around the instructional goalsand objectives of the learning technology program (McKenna, 1995).

Goal-free evaluations (Scriven, 1972) attempt to assess all program impacts, bothintended and unintended. Researchers investigating technological innovations often indicatetheir interest in learning how the innovation impacted the learning and attitudes of thestudents in anticipated and unanticipated ways. For example, Draper et al. (1996) wrote:

317Instructional Technology Implementation

Page 4: Evaluating instructional technology implementation in a higher education environment

More important than all of [the structured measures], however, are the less formalopen-ended measures (e.g., personal observation, focus groups, and interviews withopen-ended questions). This is true firstly because most of what we have learned about ourmethods and how they should be improved has come from unplanned observations: aboutwhat was actually going on as opposed to what we had planned to measure (p. 8).

Avner expressed this same concern for the unexpected in our conversation about earlyPLATO evaluations. He stated, “We assessed the objectives but we also tried to find theunexpected. PLATO might be a great thing, but we wanted to know what damage we maybe doing to the student” (personal communication, 1999).

Illuminative evaluations (Parlett & Hamilton, 1976; Stake, 1978) involve the use ofobservations, interviews with participants, questionnaires, and document analysis to “illu-minate” or reveal problems, issues, and concerns using qualitative data analytic techniques.This approach was most commonly used to describe and interpret the impact of thetechnology, with less concern for actually measuring impact (Oliver, 1997).

Participatory evaluations (Cousins & Earl, 1992; Greene, 1997) involve a partnershipbetween trained evaluation personnel and the different evaluation audiences. The reviewidentified several studies in which the conduct of the evaluation was a collaborative effortamong the evaluators, teachers, students, and developers. This has particular significance forevaluators working with those who design instructional technologies as well as for those whouse them to teach (Draper et al., 1996).

Integrativeevaluation (Draper et al., 1996) is aimed at improving teaching and learningby integrating CAL (computer assisted learning) material into the overall situation moreeffectively. Draper and his colleagues explain that:

Integrative evaluation is not primarily either formative or summative of the software, aswhat is both measured and modified is most often not the software but surroundingmaterials and activities. It is not merely reporting measurements as summative evaluationis, because it typically leads to immediate action in the form of changes. It could thereforebe called formative evaluation of the overall teaching situation, but we call it integrativeto suggest the nature of the changes it leads to (p. 12).

Getting the clients involved in the evaluation and sharing results in a timely fashion toimprove the program are not unique to integrative evaluation. Patton (1997) and others havestressed these fundamental activities in their writings onutilization-focused evaluation.Doing whatever it takes to make the evaluation more useful to the audiences can, in thecontext of instructional technology, lead to Draper’s notion of evaluation integration.

Conclusion 3: A Common Set of Problems and Concerns

Several reviews of technology impact studies have identified a common set of problemsand concerns about the evaluations. First, there are the general concerns that quantitativestudies fail to explain why anything happened and those qualitative studies have difficultyestablishing general results. Thus, Oliver (1997) suggests that most critics of technologyimpact studies advocate a “hybrid approach” to evaluation. Others (Clark, 1992; Cooley &Lohnes, 1976) are critical of the over-reliance on comparative, or racehorse, studies as notedearlier.

More specific problems include Reeves’ (1994) belief that the “no significant differ-ences” problem is partially due to evaluators’ failure to fully describe and measure the unique

318 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 5: Evaluating instructional technology implementation in a higher education environment

characteristics of the innovation under study. Another possible problem is that significantdifferences cannot be detected because the tests and quizzes used are too shallow in theirdepth of understanding (Kearsley, 1990). Some reviewers questioned the reliability andvalidity of assessment instruments. Gold and Maitland (1999) summed up the views of manywhen they wrote, “the methodology of many research designs is weak, with regard to suchfactors as the populations being compared or otherwise studied; the treatments being given,the statistical techniques being applied, and the validity, reliability, and generalizability of thedata on which the conclusions are based” (p. 22).

A CONCRETE EXAMPLE

Increased use of learning technologies in higher education brings increased needs for theirevaluation. Having addressed the general knowledge base regarding technologically en-hanced instruction, we now turn to our primary purpose here, which is to provide a veryspecific and concrete example from which others may learn. Although there is some benefitof general evaluation models and lists of possible problems, discussions of concrete examplesare also important in advancing practice. Accordingly, in the remainder of this article, theauthors share their experiences evaluating a campus-wide learning technology effort (theSCALE Project). This evaluation spanned three years, used multiple methods, and providedsome interesting challenges.

Our purpose in presenting this example is to provide information that may help othersconduct evaluations of the use of learning technologies in higher education classrooms.Accordingly, we do not present results from the SCALE evaluation in this article. Completeresults for the three-year evaluation can be found at http://w3.scale.uiuc.edu/scale/.

The SCALE Project at UIUC

In March 1995, the Alfred P. Sloan Foundation awarded a three-year grant to theUniversity of Illinois at Urbana-Champaign (UIUC) to establish SCALE, the Sloan Centerfor Asynchronous Learning Networks (ALNs). Campus funds were also allocated to supportthe Center. At its inception, SCALE had the following two goals:

1. To facilitate the restructuring, developing, and delivery of new ALN-based courseson the UIUC campus during the three-year grant period; and

2. To promote, disseminate, and diffuse the ALN concept widely on the UIUCcampus.

To help accomplish these goals, SCALE sponsored bi-monthly informational seminarsand issued grants for classroom development and implementation of ALN to instructors fromall UIUC colleges. It also supported training efforts for instructors, teaching assistants, andstudents in SCALE-sponsored courses. To monitor the effectiveness of these efforts, SCALEcommissioned a three-year external evaluation of its activities.

The SCALE Evaluation

In June 1995, SCALE contracted with the UIUC campus Office of InstructionalResources (OIR) to conduct a three-year evaluation effort. The SCALE director believed it

319Instructional Technology Implementation

Page 6: Evaluating instructional technology implementation in a higher education environment

was important to have an external evaluation conducted by individuals with a strongunderstanding of the campus. The resulting evaluation team within OIR consisted of thedirector of the campus Office of Instructional Resources, a principal evaluator, and a graduatestudent research assistant. The team spent their first summer researching the general literatureand other Sloan Foundation grants. They also interviewed SCALE and campus administra-tors and the first 17 faculty members funded to develop courses using ALN activities.

Triple Cliency

One of the first considerations in this evaluation, and arguably in most evaluations(Worthen, Sanders, & Fitzpatrick, 1998), was to understand the context of the evaluation,paying particular attention to whom would be using the results. For the SCALE project andits subsequent evaluation, the evaluation team identified three primary client1 groups: (1) thecampus administration, (2) involved instructors, and (3) the Alfred P. Sloan Foundation (thefunding agency). This discussion of clients is important for reasons beyond providing acontext for the reader to understand the SCALE evaluation. Learning technologies areviewed differently by different parties. Moreover, they frequently do have very diverseclientele. For example, instructors who implement technologies at the course level may findthat their department head has a vested, and differing, interest in evaluation results.

While balancing the three clients and their respective interests was a challenge, therewas much overlap in their interests. The campus administration and involved instructorsshared an interest in evaluating the impact of ALN on professors and students, while thecampus administration shared with the Sloan Foundation a clear interest in understanding theeconomic implications of ALN. Even evaluations of small-scale instructional technologyefforts may find themselves challenged with meeting multiple clients’ needs. A search foroverlapping interests may help to address diverse clients’ needs.

Evaluation Approach

Because of the information needs of our three clients, the design itself was eclectic.There were distinct elements, however, from different evaluation approaches. For example,there were elements of an objectives-oriented approach because the Sloan Foundation hadclear objectives and a stated interest in understanding whether or not they had been met.Elements of the participatory approach were also clearly present in the discussions leadingto the general evaluation design. The SCALE evaluation collaborated with others affected bythe implementation of ALN. Instructors and administration were consulted about the ques-tions to ask and about the methods for collecting information (Worthen, Sanders, &Fitzpatrick, 1997). Additionally, there were elements of a quasi-experimental approach withregard to our investigation of student achievement.

Methodologies

To answer the evaluation questions, we adopted the mixed-method approach oftenadvocated in the literature. Following a document review, both qualitative data and quanti-tative data were collected. Quantitative data took the form of survey results, records of use,achievement gain scores, and an extensive cost-benefit analysis (which is reported in asubsequent article). Qualitative data primarily involved interviews. As is common, some data

320 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 7: Evaluating instructional technology implementation in a higher education environment

collection efforts had both qualitative and quantitative elements. For example, the evaluationof computer conferencing involved tallying interactions between various course members aswell as a qualitative content analysis of the interactions.

It was not plausible to begin the evaluation by collecting information on the impacts orefficiencies of ALN before instructors and students had much opportunity to become familiarwith and make use of ALN activities. Therefore, early evaluation efforts focused on assessingstudent and instructor use of, perceptions about, and satisfaction with ALN within theSCALE-sponsored courses. (Some impact and efficiency data were also collected duringyears one and two.) In the third year, emphasis was placed on the long-term impacts of ALN,with particular attention to the possible efficiencies of SCALE-sponsored courses. Table 1presents the types of data collected during the three-year evaluation. Following is a briefexplanation of the various types of data collected. The evaluation team also monitored bothindividual course computer conferences and the SCALE instructors’ computer conference;although this is not represented in Table 1, it is discussed further below.

Data Collection

Student surveys.To assess student attitudes and perceptions about the use of ALN,7,140 students in 119 courses across the curricula were surveyed. A purposeful stratificationof courses was involved in the evaluation efforts. At UIUC, the College of Liberal Arts andSciences enrolls the most students and, therefore, the largest group of students surveyedcomes from this college. Likewise, fewer students were surveyed from the College ofApplied Life Studies because it is one of the smaller colleges on the UIUC campus.

During the first semester of the SCALE evaluation only one student survey was used.However, as the complexity of the learning technologies employed in the SCALE-sponsoredcourses grew, it became necessary to develop a second survey. The first SCALE-sponsoredcourses almost exclusively used computer conferencing as their ALN delivery methods.Semester two saw an increase in the use of various Web-based technologies. During yeartwo, the number of SCALE-sponsored courses using Web-based applications far exceededthose primarily using computer conferencing as their ALN activity. Therefore, a “Confer-encing” survey was administered to students enrolled in SCALE-sponsored courses whereinconferencing software was the primary application, and a “Web” survey was administered tostudents in courses primarily using the Web.

TABLE 1.Methods Used During the Evaluation

EvaluationYear

Quantity and Type of Methods Used

Surveys InterviewsGain

ScoresCourseProfilesStudent Instructors Support Student* TA* Instructors

Year 1 2,151 75 — 24 4 34 4 4Year 2 3,974 68 96 30 5 28 2 2Year 3 1,015 24 — 12 5 10 3 9Totals 7,140 167 96 66 14 72 9 15

Notes: *Student and TA interviews were conducted in groups. The number of group interviews is reported in thesecolumns rather than the individual number of students interviewed.

321Instructional Technology Implementation

Page 8: Evaluating instructional technology implementation in a higher education environment

With practice the evaluators learned how to administer these surveys using only tenminutes of class time, even in courses with large enrollments. An evaluator would come tothe classroom early and hand surveys to students as they entered. Once the bell rang, surveyswere no longer handed out. In this manner the last student to receive the survey had a full tenminutes for completion and we only used 10 to 15 minutes of classroom time. Ouradministration procedures allowed us to collect surveys from classes as large as 450 studentswith, at most, 15 minutes of interrupted classroom time. By treating instruction time as ahighly valued commodity, the evaluation team maintained a good reputation and was givenaccess to almost every class we requested during each semester of the evaluation.

Post-course instructor surveys.One hundred and sixty-seven instructor surveys wereincluded in the SCALE evaluation. This represented a return rate of 70%. The first year thesurvey was sent to all participating instructors as an e-mail attachment. They were asked tocomplete the survey and then send back a hard copy or return it as an attachment. There weresome problems with word processing incompatibility conducting the survey in this manner,however, and it was decided that future instructor surveys would be administered on-line.Starting with the second semester of the evaluation, instructors were sent e-mails withembedded URL in the text. Those with fairly recent e-mail packages could simply doubleclick on the highlighted URL and the web page containing the survey would appear on theirscreen. Instructors were asked questions about the time commitment of, level of satisfactionwith, and support required for teaching courses with ALN. When an instructor responded,their answers were transferred to a spreadsheet and were ready for analysis.

Computer support personnel surveys.Ninety-six support personnel from labs acrosscampus responded to similar on-line surveys regarding their training and the type ofassistance they provided for students regarding the learning technologies supported by theSCALE project at UIUC. This survey was preapproved by the manager of the computerconsultant sites on campus and was administered on-line. Respondents were asked about thetraining they had received, the types of question students and instructors asked them, andtheir satisfaction with various learning technologies used in SCALE-sponsored courses.

Student and TA group interviews. Student and teaching assistant (TA) interviewswere conducted during each semester of the SCALE evaluation. Student group interviewswere conducted during regular classroom time without the professor being present. Courseswere chosen for interviews on a stratified basis, by college. In three years, only one instructorwould not allow us into the classroom to interview her students. Teaching assistants wereinterviewed primarily in the larger SCALE-supported courses. The TA interviews wereconducted without students or professors present and focused on issues of computer acces-sibility, ease of use, student satisfaction, and perceived instructional benefits of using ALN.One frustration of the evaluation team is that the student interviews were purposefully short(because they were replacing valuable instruction time). A further frustration is that focusgroup or group interview research (Morgan, 1998) indicates that the most effective focusgroups contain between six to ten participants who are purposefully selected. Although thesegroups were purposefully selected, they typically contained 20 to 25 participants.

Instructor interviews. Each semester, each SCALE-sponsored instructor was asked toparticipate in an individual one-hour interview with a member of the evaluation team. During

322 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 9: Evaluating instructional technology implementation in a higher education environment

the first two years of the evaluation, the instructor interviews focused on professors’perceptions of ALN effects on certain quality indicators of education. These indicatorsinclude quality and quantity of student-to-student interaction; quantity and quality of student-to-instructor interaction; and the sense of community fostered within classrooms.

In addition, instructors were asked to comment on efficiency questions during year threeof the evaluation. Indeed, during year three the focus of all data collection (instructorinterviews included) was on assessing instructional efficiencies. Interviews were conductedwith the instructors and, in most cases, their department’s business managers. During theseinterviews, information was collected on instructors’ salary and time allocation for teaching,cost per student for each course (with and without ALN), and the infrastructure cost ofrunning an ALN course. Specific questions from all of the surveys can be found at the aboveURL, along with the complete evaluation results.

Gains in student achievement. The evaluation team undertook several quasi-experimental studies during the first two years. These looked at achievement score differ-ences between (1) ALN and non-ALN sections of the same course; (2) semesters taught withALN and those taught without ALN in the same course (i.e., historical comparisons); and (3)similar courses where one of the professors used ALN and the other taught without ALN.Difficulties in conducting these studies are discussed later in this article.

Content analysis of the SCALE instructors conference.As part of their effort to builda sense of ALN community on the UIUC campus, SCALE established a computer conferenceusing FirstClass conferencing software, where instructors could communicate with eachother about ALN concerns and questions. A content analysis of the postings to the conferencewas conducted as part of the evaluation. To accomplish this, the evaluation team tallied thepostings made and categorized the content of the postings. The following five predominantthemes emerged from the postings made to this conference: (1) announcements and sharingof information; (2) specific technical assistance questions and answers; (3) sharing of bestpractices; (4) philosophical issues of interest; and (5) general suggestions to the SCALEproject staff for improvement.

Course conferences.The evaluation team monitored the student computer conferencesin several courses throughout the evaluation. The purpose of the monitoring was to tallystudent and instructor use as well as determine the type of interactions that occurred. In allcases the professors informed the students that an evaluator was observing all postingsthroughout the semester and that names would never be associated with any postings used infinal reports. The evaluators did not post to any of the conferences. Some instructors choseto make some of the class conferences off-limits to the evaluation team.

Evaluation of Efficiencies

The focus of the third-year evaluation was to examine how ALN may have saved ratherthan used campus funds. In the third year of SCALE, nine SCALE-funded courses wereselected as “Efficiency Projects.” These projects were selected for their potential to showdifferent ways that ALN is being used at UIUC to produce cost savings and to utilizepersonnel and resources more efficiently. A cost study was completed for each of the nineprojects. A cost analysis was conducted to estimate project development and operating costs

323Instructional Technology Implementation

Page 10: Evaluating instructional technology implementation in a higher education environment

and, when appropriate, estimate savings, development cost recovery, and additional effi-ciency benefits. Wherever possible, attempts were also made to evaluate student andinstructor attitudes and to collect evidence of student achievement in the projects. Because ofthe purposive sampling of courses for the efficiency study, it can best be thought of as anexamination of the likely potential of ALN, rather than as an estimate of the averageefficiency for courses in the SCALE project.

Reporting the Results

Traditional evaluation reports were generated each semester. More interestingly, how-ever, a Best Practices Panel was conducted at the conclusion of the SCALE evaluation. Sixfaculty users of SCALE-sponsored learning technologies, identified as exemplary by theSCALE evaluation team, participated in a two-hour panel discussion. Faculties from acrosscampus were invited, and over 100 attended. A series of questions were developed in acollaborative effort with the associate director of the SCALE project and the evaluation team.Panelists were sent these questions ahead of time and responded in turn during the discussion.In this manner, the results of the evaluation were alternatively reported to a major client, thefaculty. This panel was videotaped and can be found at: http://franklin.scale.uiuc.edu/scale/presentations/aln_best_practices/.

LESSONS LEARNED

Throughout the course of any evaluation, there are many occasions when the evaluator maysay, “Next time I’ll know to do differently,” or, “That worked rather well.” The SCALEevaluation was no different. Following are some of the lessons we learned facing the manychallenges of the SCALE evaluation.

Meeting Multiple Needs for Information

As previously stated, the evaluation was conducted for three central clients—campusadministration, involved instructors, and the Sloan Foundation. Early on, the evaluation teamrecognized the challenges involved in meeting the needs of the different audiences, and weadopted several strategies to try to meet these challenges. First, we learned to begin sharingearly drafts of data collection procedures and instruments with the clients, including keycampus administrators and many of the involved instructors, and we allowed them to givefeedback and participate in the final products. This was done to avoid frustration on theclients’ parts with the nature and reporting format of any evaluation findings. Second, welearned to use e-mail effectively to continuously share preliminary results with the clients,informing them of trends, potential problems, or suggestions for change. We chose to sharepreliminary results so as to help the clients remedy small problems before they became moreserious. In this manner our clients obtained ownership of the evaluation as well as timelyassistance. Using e-mail communication whenever possible, rather than meetings and letters,allowed ongoing yet time-efficient formative information to be shared with the manyindividuals associated with the project.

324 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 11: Evaluating instructional technology implementation in a higher education environment

Making Do with What You Have

The evaluation team would have liked more experiments or quasi-experiments to assessthe impact of ALN on student achievement. There were a variety of reasons why we wereonly able to have the limited number that we did. Several problems were inherent incomparing ALN and non-ALN sections of the same course. In some cases, ALN classes werenot large enough to have multiple sections to allow comparisons. In other cases, theinstructors of different sections were not interested in coordinating their efforts or usingcommon exams. Even if they did so, having a different instructor in the ALN course and thenon-ALN course would create ambiguity about the impact of ALN. Further, in most cases theinstructor felt that the ALN approach was superior to the standard approach and, hence, wasreluctant to offer a traditional, non-ALN section. These instructors felt it was unfair to theirstudents to deny half of them a learning technology they thought would be instructionallybeneficial.

In the absence of parallel sections we began to make comparisons between previouscourse results and current ALN sections. This type of comparison has special problems.Many professors did not have baseline data or results for previous courses. More commonly,instructors had results from the past but were not interested in using the same old exams justto make evaluative comparisons. Additionally, the instructors were often concerned thatchanges in the type and nature of their students from semester to semester could confuse theissue of comparability.

Using Multiple Data Collection Methods

Our office has always supported the use of multiple methods for data collection toprovide more complete and comprehensive evaluations. The SCALE evaluation suggests theuse of multiple methods also addresses the particular or unique preferences for informationheld by different audiences. For example, our campus administrators, facing the budgetarydemands of computer technology, were most interested in the results of our SCALEefficiency studies. We found our faculty focusing on the results of the student interviews andsurveys wherein they could learn about their students’ likes and dislikes. While showinginterest in all that we did, our funding agency was most eager to learn of any achievementgains using ALN.

Respecting Rights of Privacy

During the first year of the evaluation, the ALN activity most frequently used bySCALE-sponsored instructors involved computer conferencing systems such as FirstClass orPacerForum. We believed it was important to assess who was conferencing, how often theywere conferencing, and about what were they conferencing. Gaining access to these confer-ences, however, brought valuable lessons.

As one professor said: “My students build a trust within their conferencing. Theydiscuss sensitive issues and I would find it very uncomfortable to allow evaluators into thisworld.” Clearly, this professor had the best interest of her students in mind and should becommended for caring. We learned to balance our need to collect the information with therights and needs of the instructors and students. One strategy was to always clearly define andexplain to all of the course members exactly what we could see and access within a

325Instructional Technology Implementation

Page 12: Evaluating instructional technology implementation in a higher education environment

conference. Additionally, we asked instructors to ask the students periodically during thesemester how they viewed our presence. The main lesson that we learned, however, is thatthe evaluator must assume the position of respecting the right of privacy, and explaining thisto the clients and respondents.

Being Sensitive to Evaluation Participants

Although we originally thought gaining access to classrooms to survey students wouldbe a formidable challenge, it did not prove to be so. We credit this result to our system ofgetting in and getting out of the classrooms quickly, as discussed earlier. We did everythingpossible to make both faculty and students understand how much we respected their time andeffort. In hindsight, we probably should have conducted some in-depth student focus groupsoutside of the classroom to see if our sensitivity to using too much classroom timeshort-changed our results.

Staying Flexible

The fast-changing world of computer technology required us to keep our evaluationflexible and readily adaptable. We started the evaluation with what we thought was a“standard” student survey that would permit comparisons over time. However, when theALN intervention changed from conferencing to conferencing plus web materials, we had tocreate a second “standard” survey. We again showed some flexibility when we learned thatthe ALN faculty was steady users of e-mail. We took advantage of their predilection forelectronic mail and changed our paper-based instructor survey to an electronic version toincrease the response-rate.

Being Eclectic

We used our knowledge of and experience with evaluation practices and researchmethods to build an eclectic approach to our evaluation that best addressed our needs. Asdescribed earlier, we found tremendous utility in combining aspects of several evaluationapproaches in our final evaluation plan. We used different sampling techniques to addressdifferent evaluation purposes. A stratified sampling of classes and respondents was per-formed to conduct interviews and surveys, but a purposive sampling approach was followedin our selection of efficiency projects. We used conventional reporting strategies such aswritten summaries and reports as well as the less traditional method of a best-practice panel.

SUMMARY

Here we have described the strategy followed to evaluate a three-year campus-wide imple-mentation of ALN in courses in all eight colleges at a large comprehensive university.

The evaluation team faced three major challenges while conducting the evaluation. First,the evaluation attempted to serve multiple clients by addressing the needs of the two fundingagents (Sloan Foundation and campus administration) as well as those of the individualinstructors using ALN in their classrooms. Second, the evaluators encountered some instruc-tors’ reluctance to allowing evaluator monitoring of computer conferences. Third, it was

326 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000

Page 13: Evaluating instructional technology implementation in a higher education environment

often difficult to assess student gains in achievement that could directly be attributed to ALNuse.

The evaluation provided several helpful insights for future efforts to evaluate large-scaleimplementations of instructional technology. It confirmed our beliefs about the value ofmultiple data collection methods for addressing different client questions and needs. Itdemonstrated the utility of an eclectic evaluation approach that combined aspects of variousapproaches. The evaluation experience supported the need for flexibility as we modifiedprocedures to keep up with changes in the technology intervention. We also learned thebenefits of respecting the privacy of evaluation participants and being sensitive to thedemands on their time. The experience also encourages evaluators to make do with what theyhave in the absence of having what they really want (e.g., in the absence of control sectionswe made comparisons between the same course taught with and without ALN).

Although it was not the intent of this paper to present evaluation results, it is fair to sayour efforts revealed some very successful, and not so successful, classroom implementationsof ALN technologies. Many of the SCALE-supported courses showed evidence of studentand instructor use of and satisfaction with ALN. There was also some evidence that ALNhelped to increase student learning and could be used to reduce campus instructional costs.Budget savings occurred by using ALN to support the teaching of larger course sections, orto reduce the number of instructional staff, or both. Complete results for the three-yearevaluation of the SCALE project is available at: http://w3.scale.uiuc.edu/scale/.

NOTE

1. One could argue that students constitute a fourth clientele because they will ultimately beimpacted the most from new and improved use of ALN. However, we limited our definition of clientto individuals to whom we were required to report findings.

REFERENCES

Burnaska, K. (1998). Using Mallard to teach an undergraduate Economics Course. Doctoral disserta-tion, University of Illinois at Urbana-Champaign.

Clark, D. (1992). Getting results with distance education. The Journal of Distance Education, 12(1),38–51.

Clarke, R. E. (1992). Media use in education. In M. C. Alkin (Ed.),Encyclopedia of educationalresearch in education(pp. 805–814). New York: Macmillan.

Cooley, W. W., & Lohnes, P. R. (1976).Evaluation research in education.New York: Irvington.Cousins, J. B., & Earl, L. (1992). The case for participatory evaluation.Educational Evaluation and

Policy Analysis, 4, 397–418.Crump, R. E. (1928). Correspondence and class extension work in Oklahoma. Doctoral dissertation,

Teachers College, Columbia University.Draper, S., Brown, M., Henderson, F., & McAteer, E. (1996). Integrative evaluation: an emerging role

for classroom studies of CAL.Computers and Education, 26, 1–3.Ester, D. P. (1995). CAI, lecture, and learning style: The differential effects of instructional method.

Journal of Research on Computing in Education, 27(4), 129–139.Gold, L., & Maitland, C. (1999).What’s the difference?Paper prepared by the Institute for Higher

Education Policy.

327Instructional Technology Implementation

Page 14: Evaluating instructional technology implementation in a higher education environment

Goldberg, M. W. (1997).CALOS: First results from an experiment in computer-aided learning.Proceedings, ALM’s 28th SIGCSE Technical Symposium on Computer Science Education.

Greene, J. (1997). Participatory evaluation. In R. Stake (Ed.),Evaluation and the post modern dilemma.Advances in Program Evaluation, Vol. 3(pp. 171–203). Greenwich, CT: JAI Press.

Hiltz, R. S. (1997). Impacts of college-level courses via asynchronous learning networks: Somepreliminary results.Journal of Asynchronous Learning Networks, 1, 2.

Jones, A., Scanlon, E., Tosunoglu, C., Ross, S., Butcher, P., Murphy, P., & Greenberg, J. (1996).Evaluating CAL at the Open University: 15 years on.Computers in Education, 26(19), 5–15.

Kearsley, G. (1990). Designing education software for international use.Journal of Research onComputing in Education, 23(2), 242–250.

Khalili, A., & Shashaani, L. (1994). The effectiveness of computer applications: A meta-analysis.Journal of Research on Computing in Higher Education, 27(1), 48–61.

Kulik, C. C., & Kulik, J. A. (1991). Effectiveness of computer-based instruction: An updated analysis.Computers and Human Behavior, 7(1), 74–75.

Martin, E. D., & Rainey, L. (1993). Student achievement and attitude in a satellite-delivered high schoolscience course. The American Journal of Distance Learning, 17(1), 22–28.

McAteer, E., Neil, D., Barr, N., Brown, M., Drapes, S., & Henderson, F. (1996). Simulation Softwarein a Life Sciences practical course.Computers and Education, 26(1–3), 101–112.

McKenna, S. (1995). Evaluating IMM: Issues for researchers, Open Learning Institute. OccasionalPaper #17, Charles Sturt University. New South Wales, Australia.

Morgan, D (1998).Planning focus groups. Focus Group Kit 2. Thousand Oaks, CA: Sage.Oliver, M. (1997).A framework for evaluating the use of learning technology. BP ELT Report no. 1.

University of North London. London, England.Parlett, M., & Hamilton, O. (1976). Evaluation as illumination. In G.V. Glass (Ed.),Evaluation studies

review annual(pp. 141–157). Beverly Hills, CA: Sage.Patton, M. Q. (1997).Utilization-focused evaluation.Thousand Oaks, CA: Sage.Popham, W. J. (1961). Tape recorded lectures in the college classroom.AV Communications Review

(9).Popham, W. J. (1972).An evaluation guidebook.Los Angeles, CA: Instructional Objectives Exchange.Reeves, T. C. (1994). Evaluating what really matters in computer-based education. In M. Wild & D.

Kirkpatrick (Eds.), Computer education: New perspectives(pp. 219–246). Perth, Australia:MASTEC.

Russell, T. (1999).The no significant difference phenomenon.Raleigh, NC: North Carolina StateUniversity.

Schramm, W. (1962).What we know about learning from instructional television: The next ten years.Stanford, CA: Stanford Institute for Communication Research.

Scriven, M. (1972). Pros and cons about goal-free evaluation.Evaluation Comment, 3, 1–8.Stake, R. E. (1978).Program evaluation, particularly responsive. Kalamazoo, MI: Western Michigan

University.Suchy, R. R., & Baumann, P. C. (1960).The Milwaukee experiment in instructional television:

Evaluation report.Milwaukee, WI: Milwaukee Public Schools.Tyler, R. W. (1942). General statement on evaluation.Journal of Educational Research, 35, 492–501.Worthen, B., Sanders, J., & Fitzpatrick, J. (1997).Program evaluation: Alternative approaches and

practical guidelines. New York: Longman

328 AMERICAN JOURNAL OF EVALUATION, 21(3), 2000