A Metaevaluation Study on the Assessment of Teacher Performance in

Embed Size (px)

Citation preview

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    1/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    75

    A Metaevaluation Study on the Assessment of Teacher Performance inan Assessment Center in the PhilippinesCarlo Magno

    De La Salle University, Manila, Philippines

    AbstractThe present study conducted a metaevaluation of a teacher performance system used in

    the Performance Assessment Services Unit (PASU) of De La Salle-College of Saint

    Benilde in Manila Philippines. To determine whether the evaluation system on teacher

    performance adheres to quality evaluation, the standards of feasibility, utility, propriety, and

    accuracy are used as standards. The system of teacher performance evaluation in PASU

    includes the use of students rating called the Student Instructional Report (SIR) and a

    rating scale used by peers called the Peer Evaluation Form (PEF). A series of guided

    discussions was conducted among the different stakeholders of the evaluation system in the

    college such as the deans and program chairs, teaching faculty, and students to determine

    their appraisal of the evaluation system in terms of the four standards. A metaevaluation

    checklist was also used by experts in measurement and evaluation in the Center for

    Learning and Performance Assessment (CLPA). The results of the guided discussionshowed that most of the stakeholders were satisfied with the conduct of teacher

    performance assessment. Although in using the standards by the Joint Committee on

    evaluation, the results are very low. The ratings of utility, propriety, and feasibility were fair

    and the standard on accuracy is poor. The areas for improvement are discussed in the

    paper.

    IntroductionIt is a primary concern among educational institutions to assess the teaching

    performance of teachers. Assessing teaching performance enables one to gauge the

    quality of instruction represented by an institution and facilitate better learningamong students. The Philippine Accrediting Association of Schools, Colleges and

    Universities (PAASCU) judges a school not by the number of hectares of property

    or buildings it owns but rather by the caliber of classroom teaching and learning it

    can maintain (ODonnell, 1996). Judging the quality of teacher performance

    actually depends on the quality of assessing the components of teaching. When

    PAASCU representatives visit schools, they place a high priority on firsthand

    observation of actual faculty performance in the classroom. This implies the value

    of the teaching happening in an educational institution as a measure of the quality

    of that institution. Different institutions have a variety of ways of assessing teacher

    performance. These commonly include classroom observation by and feedback

    from supervisors, assessment from peers, and students assessment, all of whichshould be firmly anchored on the schools mission and vision statements.

    The De La Salle-College of Saint Benilde (DLS-CSB) uses a variety of

    assessment techniques to come up with a valid evaluation of a teacher s

    performance. As an institution that has adopted the learner-centered psychological

    principles, any assessment technique it uses, as mentioned in the schools mission,

    recognizes diversity by addressing various needs, interests, and cultures. As a

    community of students, faculty, staff, and administrators, we strengthen our

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    2/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    76

    relationships through transformational experiences guided by appreciation of

    individual worth, creativity, professional competence, social responsibility, a sense

    of nationhood, and our faith. We actively anticipate and respond to individual,

    industry, and societal needs by offering innovative and relevant programs that foster

    holistic human development. The processes in teacher performance evaluation of

    instructors, professors and professionals of the college is highly critical since it isused to decide on matters such as hiring, rehiring, and promotion. There should be

    careful calibration and continuous study of the instruments used to assess teachers.

    The process of evaluation in the college was established since the start of

    the institution in 1988. Since that time, different assessment techniques have been

    used to evaluate instructors, professors, and professionals. The assessment of

    teachers is handled by the Center for Learning Performance and Assessment

    (CLPA), which is primarily responsible for instrument development,

    administration, scoring and the communication of assessment results to its

    stakeholders. Currently, the instructors and professors are assessed by students

    using the Student Instructional Report (SIR), the Peer Evaluation Form (PEF), and

    academic advising. The current forms of these instruments have been in use in the

    last three years.

    At the present period, there is a need to evaluate the process of evaluating

    teacher performance in DLS-CSB. Through a metaevaluation study, it may be

    determined whether the processes meets the Joint Committee Standards for

    Evaluation. The Joint Committee Standards set a common language to facilitate

    communication and collaboration in evaluation. It is very helpful in a

    metaevaluation process since it provides a set of general rules for dealing with a

    variety of specific evaluation problems. The processes and practices of the CLPA in

    assessing teaching performance needs to be studied whether it meets the standards

    of utility, feasibility, propriety, and accuracy. The metaevaluation technique involves

    the process of delineating, obtaining, and applying descriptive information and

    judgmental information about the standards of utility, feasibility, propriety, andaccuracy of an evaluation in order to guide the evaluation and to publicly report its

    strength and weaknesses (Stufflebeam, 2000).

    This study on metaevaluation addresses the issue of whether the process

    used by the CLPA on evaluating teaching performance in DLS-CSB meets the

    standards and requirements of a sound evaluation. Specifically, the study will

    provide information on the adequacy of the SIR, peer assessment and student

    advising on following areas: (1) items and instructions of responding; (2) process of

    administering the instruments; (3) procedures practiced in assessment; (4) utility

    value from stakeholders; (4) accuracy and validity of responses.

    Models and M ethods of Teacher EvaluationGenerally, teacher evaluations may be summative or formative. The

    instruments used for summative evaluation are typically checklist-type forms that

    provide little room for narrative, and take note of observable traits and methods

    that serve as criteria for continued employment, promotions, and the like (Isaacs,

    2003). On the other hand, formative evaluations are geared toward professional

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    3/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    77

    development. In this form of evaluation, teachers and their administrators meet to

    try to trace the teachers further development as a professional (Bradshaw, 1996).

    Another model is differentiated supervision which is a flexible model of

    evaluation that works from the premises that teaching is a profession, and as such,

    teachers should have a certain level of control over their development as

    professionals (Glatthorn, 1997). This model allows for the clinical model ofevaluation, cooperative options that allow teachers to work with peers, and self-

    directed options guided by the individual teacher (Isaacs, 2003). The model allows

    professional staff and supervisors/administrators options in the process applied for

    supervision and evaluation. The supervision program is designed to be

    developmentally appropriate to meet the needs of each member of the professional

    team. The three processes in the Differentiated Supervision Model are: (1)

    Focused Supervision, (2) Clinical Supervision, and (3) Self-Directed Supervision.

    The method of collaborative evaluation was developed (Berliner, 1982;

    Brandt, 1996; Wolf, 1996) with the core of the mentor/administrator-teacher

    collaboration. Whether new or experienced, a teacher is aided by a mentor.It

    requires a more intensive administrative involvement that may include multiple

    observations, journal writing, or artifact collections, plus a strong mentoring

    program (Isaacs, 2003). At the end of a prescribed period, the mentor and mentee

    sit down to compare notes on the data gathered over the observation period.

    Together, they identify strengths, weaknesses, areas for improvement, and other

    such points. In this model, there are no ratings, no evaluative commentaries and no

    summative write-ups (Isaacs, 2003).

    Another is the multiple evaluation checklist which uses several instruments

    other than administrator observations. Here, the peer evaluation, the self-

    evaluation, and the student evaluation meet in varying combinations to form a

    teachers evaluation (Isaacs, 2003).

    Self-evaluation also plays an important role in the evaluation process. It

    causes the teacher to think about his or her methods more deeply, and causes himor her to consider the long-term. It is also said to promote a sense of responsibility

    and the development of higher standards (Lengeling, 1996).

    Then there is the most commonly-used evaluation, the student evaluation

    (Bonfadini, 1998; Lengeling, 1996; Strobbe, 1993; Williams & Ceci, 1997). They

    are the easiest to administer and they provide a lot of insights about rapport-

    building skills, teacher communication, and effectiveness. However, as Williams

    and Ceci (1997), according to Isaacs (2003) have found that a change in a content-

    free variable in teaching (they conducted a study in which the only variable

    modified was the teaching styleteachers were told to be more enthusiastic and

    attended a seminar on presentation methods) was enough to cause a great

    magnitude of increase in teacher ratings, student evaluations have to be viewed withcaution. Another reason is one of the findings of the study by Bonfadini (1998), he

    found that, upon asking students to rate their teachers according to four

    determinant areas, (a) personal traits, (b) professional competence, (c) student-

    teacher relationships, and (d) classroom management, the least used determinant

    was professional competence. Conclusion: students may tend to look more at the

    packaging (content-free variables) rather than that which empirically makes a good

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    4/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    78

    teacherso viewing student-based information, says Isaacs (2003), should be done

    with care.

    In the field of teacher evaluation, the growing use of the portfolio is slowly

    softening the otherwise sharp edges of the standardized instrument (Engelson,

    1994; Glatthorn, 1997; Shulman, 1988; Seldin, 1991).

    National standards are also used as method for teacher evaluation. It isbased on the instigation of a screening board other than the standard licensure

    committee, something that has no counterpart in the Philippines. The creation of

    the National Board for Professional Teaching Standards (1998) was prompted by

    the reportA Nation Prepared: Teachers for the 21st

    Centurygenerated by the 1986

    Carnegie Task Force on Teaching as a Profession, which in turn was prompted by

    the 1983A Nation at Riskreport (Isaacs, 2003). It is the mission of the NBPTS to:

    establish high and rigorous standards for what experienced teachers

    should know and be able to do, to develop and operate a national,

    voluntary system of assessment and certification for teachers, and to

    advance educational reforms for the purpose of improving student learning

    in America's schools (Isaacs, 2003).

    The National Board Certification was meant as a complement to, but not a

    replacement for, state licensure exams. While this latter represents the minimum

    standards required to teach, the former stands as a test for more advanced

    standards in teaching as a profession. Unlike the licensure examinations, it may or

    may not be taken; it is voluntary. As such, some schools offer monetary rewards for

    the completion of the test, as well as opportunities for better positions (i.e. certified

    teaching leadership and mentor roles) (Isaacs, 2003).

    Metaevaluaton: Evaluation of an evaluation

    In 1969, Michael Scriven used the term metaevaluation to describe the

    evaluation of any evaluation, evaluative tool, device or measure. Seeing how somany decisions are based on evaluation tools (which is typically their main purpose

    for existence in the first placeto help people make informed decisions), it is no

    wonder that the need to do metaevaluative work on these evaluation tools is as great

    as it is (Stufflebeam, 2000).

    In the teaching profession, student evaluation of teachers stands as one of

    the main tools of evaluating. However, as earlier stated, while it is but fair that

    students be included in the evaluative process, depending on the evaluation process

    and content, it may not be very fair to teaching professionals to have their very

    careers at the mercy of a potentially flawed tool.

    The Process of M etaevaluationHow does one go about performing a metaevaluation? Stufflebeam (2000,)

    identified certain steps:

    1. Determine and Arrange to Interact with the Evaluation's Stakeholders.

    Stakeholders can refer to anyone whose interests might be affected by the

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    5/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    79

    evaluation under the microscope. These may include teachers, students, and

    administrators.

    2. Staff the Metaevaluation with One or More Qualified Metaevaluators.

    Preferably, these should be people with technical knowledge in psychometrics and

    people who are familiar with the Joint Committee Personnel Evaluation Standards.

    It is sound to have more than one metaevaluator on the job, so that more aspectsmay be covered objectively.

    3. Define the Metaevaluation Questions. While this might differ on a case-

    to-case basis, the four main criteria ought to be present: propriety, utility, feasibility,

    and accuracy.

    4. As Appropriate, agree on Standards, Principles, and/or Criteria to Judge

    the Evaluation System or Particular Evaluation

    5. Issue a Memo of Understanding or Negotiate a Formal Metaevaluation

    Contract. This will serve as a guiding tool. It contains the standards and principles

    contained in the previous step and will help both the metaevaluators and their

    clients understand the direction the metaevaluation will take.

    6. Collect and Review Pertinent Available Information

    7. Collect New Information as Needed, Including, for Example, On-Site

    Interviews, Observations and Surveys

    8. Analyze the Findings. Put together all the qualitative and quantitative data

    in such a way that it will be easy to do the following step.

    9. Judge the Evaluation's Adherence to the Selected Evaluation Standards,

    Principles, and/or other criteria. This is the truly metaevaluative step. Here, one

    takes the analyzed data and judges the evaluation based on the standards that were

    agreed upon and put down in the formal contract. In another source, this step is

    lumped with the previous one to form a single step (Stufflebeam, 2000).

    10. Prepare and Submit the Needed Reports. This entails the finalization of

    the data into a coherent report.

    11. As Appropriate, Help the Client and Other Stakeholders Interpret andApply the Findings. This is important for helping evaluation system under scrutiny

    improve by ensuring that the clients know how to use the metaevaluative data

    properly.

    The Standards of M etaevaluationThere are four standards of metaevaluation: propriety, utility, feasibility, and

    accuracy.

    Propriety standards were set to ensure that the evaluation in question is

    done in an ethical and legal manner (P1 Service Orientation, P2 Formal Written

    Agreements, P3 Rights of Human Subjects, P4 Human Interactions, P5 Competeand Fair Assessment, P6 Disclosure of Findings, P7 Conflict of Interest, P8 Fiscal

    Responsibility). They also check to see that all welfare of all stakeholders in

    considered (Widmer, 2004).

    Utility standards stand as a check for how much the evaluation in question

    caters to the information needs of its users (Widmer, 2004). They include: (U1)

    Stakeholder Identification, (U2) Evaluator Credibility, (U3) Information Scope and

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    6/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    80

    Selection, (U4) Values Identification, (U5) Report Clarity, (U6) Report Timeliness

    and Dissemination, and (U7) Evaluation Impact.

    Feasibility standards make sure that the evaluation is conducted in a

    realistic, well-considered, diplomatic, and cost-conscious manner (Widmer, 2004).

    They include: (F1) Practical Procedures, (F2) Political Viability, and (F3) Cost

    Effectiveness.Finally, accuracy standards make sure that the evaluation in question

    produces and disseminates information that is both valid and useable (Widmer,

    2004). They include: (A1) Program Documentation, (A2) Context Analysis, (A3)

    Described Purposes and Procedures, (A4) Defensible Information Sources, (A5)

    Valid Information, (A6) Reliable Information, (A7) Systematic Information, (A8)

    Analysis of Quantitative Information, (A9) Analysis of Qualitative Information,

    (A10) Justified Conclusion, (A11) Impartial Reporting, and (A12) Metaevaluation.

    It should be noted that the aforementioned standards were developed

    primarily for the metaevaluation of the evaluation of education, training programs

    and educational personnel.

    Forms of T eacher Evaluation in the ContextThe present study conducted a metaevalution of a system of teacher

    performance. The teacher performance is composed of two major parts: The

    Student Instruction Report (rated by students) and the Peer Evaluation Form (rated

    by faculty peers).

    The Student Instructional Report.The Student Instructional Report (SIR)currently used by the College of Saint Benilde originated from the SET form used

    by De La Salle University. It has been revised over the yearsinstructions have

    been changes, certain things were omitted from the manual. The items used to day

    are pretty much what they were in 2000, and the instructions more or less the sameas those written in 2003. The SIR is administered in the eighth week of every

    term, the week directly after the midterms week. The evaluands of the form are

    teachers; the evaluators, are their students, and other stakeholders are the chairs

    and deans, who use the data generated by the SIR for administrative decisions. The

    results are presented to the teachers after the course cards are given. By definition

    then, it is a form of summative evaluation. There is currently no data that speaks of

    its value as a method of formative evaluation.

    Peer Evaluation Form . The Peer Evaluation Form (PEF) is used by facultymembers in observing the performance of their colleagues. The PEF is designed to

    determine the extent to which the CSB faculty has been exhibiting teachingbehaviors along the areas of: teachers procedures, teachers performance, and

    students actions as observed by their peers.

    The PEF is used by a peer observer if the teacher is new in the college and

    due for promotion. The peer discuss with the faculty evaluated the observation and

    rating given. The faculty signs the form after the conference proper.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    7/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    81

    MethodGuided Discussion

    The Guided Discussion is the primary method of data-gathering for all

    groups concerned. As stated above, the represented groups include the teachers,the chairs and/or deans, the CLPA-PASU staff directly involved in the evaluation

    process, the evaluation measurement expert team and the students.

    As suggested by Galves (1988), there are five to seven (5-7) participants for

    everyguided discussion (GD)session. The participants for the GD were chosen by

    the deans of the respective schools involved. The groups included are teachers,

    chairs and/pr deans, the CLPA-PASU staff, a team of evaluation measurement

    experts from CLPA and students.

    Separate GD sessions for each of the schools of the college were conducted

    they have different needs. The scope of this study is to assess and evaluate" the

    current practices undertaken in the SIR and PEF system of administration, scoring,

    and interpretation. In the GD sessions that were conducted, the participants are co-

    evaluators considering that they all employ the same PEF and the same SIR items

    and standards of practice.

    Each of the four criteria set by the Joint Committee Standards for

    Evaluation was used as a guide in the discussion. The Teachers group is set to

    discuss and evaluate the propriety aspect; the Chairs/Deans group, the utility aspect;

    the CLPA-PASU Staff group, the feasibility aspect; the team of experts, the

    accuracy aspect.

    Before any of the GD sessions, the list of guide questions for each group

    was sent to the chosen participants for a pre-screening of the topics to be discussed

    at least ten days before the scheduled GD session for that group. The participants

    are given the liberty to request that other topics be added to the discussion or that

    certain topics be scratched out.The modified guide containing the set of questions to be covered is

    presented to the participants. Three researchers play specific roles as prescribed by

    Galves (1988): The guide shall ask the questions and guide the discussion, the

    recorder records of the points raised per question and any questions the

    participantsmay care to ask (using a medium visible to the whole group), and the

    observer of the process is tasked to keep the discussion on track, regulate the time

    per topic, and prevent anyone from monopolizing the discussion. The guide

    initiated the discussion by presenting the new set of questions, at which point the

    participantswere given another opportunity to add or subtract topics for discussion.

    Once the final set of questions has been decided upon and recorded by the

    recorder, responses were gathered and discussed by the group.One key feature of the GD method is that a consensus on the issues under

    a topic must be reached. When all the points were raised, the group was given the

    chance to look over their responses to validate or invalidate them. Whatever the

    group decides to keep will be kept; what it chooses to strike out gets stricken out.

    The side-along evaluation done by the observer may be done at regular

    points throughout the discussions as decided by the group (i.e. after each topic)

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    8/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    82

    and/or when he or she deems it fit to interrupt (i.e. at points when the discussion

    goes astray, or theparticipantspend too much time on one point)

    A similar procedure was followed for the Student group. The purpose the

    students discussion is to get information of their perspectives of the evaluation

    process and their perception of their role as evaluators.

    At the end of each discussion, the participants were asked to give theiropinion about the usefulness and feasibility of having this sort of discussion every

    year to process their questions, comments, doubts, and suggestions. This provides

    data for streamlining the metaevaluative process for future use.

    Extant Data, Reliability, and Validity TestingThe average ratings of the teachers within the last three school years (AY

    2003-2004 and 2004-2005) were used to generate findings on how well the results

    could discriminate the levels of good teaching and needs improvement teaching.

    The Cronbachs alphawas used to determine the internal consistency of the old

    teacher performance instrument.

    The average of the scores for the three terms was computed for each school

    year, generating three average scores. These scores were compared to each other to

    check the reliability across time.

    Metaevaluation Checklist A checklist was used to determine whether the evaluation meets the

    standard of utility, feasibility, propriety, and accuracy. There were seven experts in

    line with measurement and evaluation who were invited to evaluate the system used

    by the CLPA in assessing teachers performance on both Student Instructional

    Report (SIR) and Peer Evaluation Report (PEF). The metaevaluators first used a

    30-item checklist adopted from the Joint Committee Standards for Evaluation. Themetaevaluators were guided by information from the GD session notes (as

    transcribed by the recorder) and other extant data.

    InstrumentationFor the GD sessions, aguidelists was used. Theguideis composed of a set

    of questions under each standard that is meant to evaluate the evaluation system.

    The questions in the GD are the pre-written. In the data-gathering method, these

    are still subject to change, both in the fielding of the questions prior to the GT

    sessions and on the day of the GT session itself.

    The Metaevaluation Checklist by Stufflebeam (2000) was used to rate theSIR and PEF as an evaluation system. It is composed of ten items for each of the

    subvariables under each of the four standards (see appendix B). The task is to

    check the items in each list that are applicable in the current teacher performance

    evaluation system done by the center. Nine to ten (9-10) items generates a rating of

    excellent for that particular subvariables; 0.7-0.8), a very good; 0.5-0.6, good; 0.3-

    0.4, fair; and 0-0.2, poor.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    9/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    83

    Data AnalysisThe data obtained from the GD was analyzed using the qualitative

    approach. The important themes from the notes produced in the GD wereextracted based on the appraisal components for each area of metaevaluation

    standard. For utility appraisal themes referring to stakeholder identification

    (persons affected by the evaluation should be identified), evaluator credibility

    (trustworthiness and competence of the evaluator), information scope and selection

    (broad selection of information/data for evaluation), values identification

    (description of procedures and rationale of the evaluation), report clarity

    (description of the evaluation being evaluated), report timeliness (findings and

    reports distributed to users), and Evaluation impact (the evaluation should

    encourage follow-through by stakeholders) were extracted. For propriety the

    appraisal themes extracted are on Service orientation (designed to assist and

    address effectively the needs of the organization), formal agreement (Obligation of

    formal parties are agreed to in writing), rights of human subjects (evaluation is

    conducted to respect and protect the rights of human subjects), and human

    interaction (respect human dignity and worth). For feasibility the themes extracted

    are on practical procedures, political viability, fiscal viability, and legal viability. The

    qualitative data were used as basis in accomplishing the metaevaluation checklist for

    utility, feasibility, and propriety.

    For the standards on accuracy on accuracy, the existing documents of

    processes, procedures, programs, policies, documentations, and reports were made

    available to the metaevaluators in order to accomplish the metaevaluation checklist

    in this area.

    In the checklist, every item of the metaevaluation standard that was checked

    were divided into 10 and averaged according to the number of metaevaluators whoaccomplished the checklist. Each component is then interpreted whether the

    system reached the typical stands of evaluation. The scores are interpreted as 0.9 to

    1.0, Excellent; 0.7 to 0.8, Very Good; 0.5 to 0.6, Good; 0.3 to 0.4, Fair; 0.1 to 0.2,

    Poor.

    ResultsUtility

    Under utility there are four standards evaluated: Stakeholder identification,

    information scope and selection, values identification, functional reporting, follow-

    up and impact, and information scope and selection. Themes and clusters formedin evaluating the utility of the teacher performance evaluation system.

    For the standard on stakeholder identification, the strands were clustered

    into four themes: Mode of feedback, approaches to feedback, sources of feedback,

    and time of giving feedback. For the deans and chairs, the mode of feedback took

    the form ofone on one basis, informal approach, post-conferences, meetings, and

    note is given when urgent. The approaches in giving feedback were both

    developmental (suggestions to further improve the teaching skills) and evaluative

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    10/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    84

    (Standing of the faculty). The sources of feedback come from the students through

    the SIR, student advising, e-mail from students and parents, and peers (senior

    faculty, chairs, deans). Feedback is given if the rating is high (3.75 and above);

    sometimes no feedback is given; when the results of the SIR are low; if the faculty is

    new to the college; and those who have been teaching for a long time and getting

    low ratings.For values identification, the strands were clustered into three themes:

    Needs, actions taken, and value of the instrument. According to the participants,

    the needs included Results (that) are (not) too cumbersome for deans to read; A

    print out of the results should be given; the time taken to access the results turns off

    some teachers from accessing them; Students having difficulty answering the SIR;

    Students dont see how teaching effectiveness is measured and; Create a particular

    form for laboratory classes. The action taken theme included removing items that

    are valid and another computation is done and; Other evaluation criteria is done.

    The themes of the instrument value showed that for the instrument to be valuable,

    there should be indicators for each score; there should be factors of teaching

    effectiveness with clear labels; identify what the instrument measures; there needs to

    be a lump score on learner-centeredness and; there are other success indicators that

    are not reflected in the SIR.

    For functional reporting, two clusters emerged: Decisions and functions.

    The decisions made by the teacher evaluation include promotion, loading with

    course, retaining part-time faculty, deloading a faculty, permanency, and training

    enhancement. The functions of the teacher evaluation are used for improvement

    the faculty; the administrators come up with a list of faculty that will be given

    teaching load based on SIR reports and; The PEF constricts what needs to be

    evaluated more.

    The follow-up and impact included both qualitative and quantitative. The

    qualitative aspect of the instruments included suggestions to give headings/labels

    for the different parts; Come up with dimensions and subdimensions; Devise a wayto reach the faculty (yahoo, emails etc.); The teachers and students should see what

    aspects to improve on; and there should be narrative explanations for the figures.

    The quantitative aspect of the report included faculty doesnt understand the

    spreading index; Conduct a seminar explaining the statistics; Come up with a

    general global score; Each area should be represented with a number; and a verbal

    list of strengths and weaknesses of the faculty.

    Two clusters were identified for information scope and selection:

    Perception and action. In the perception the faculty looks at evaluation as

    something negative because the school uses the results. For the suggested actions

    come up with CLPA kit explaining the PEF and SIR; check on the credibility on

    the answers of students; and SIR needs to be simplified for the non-hearingstudents.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    11/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    85

    Table 1Rating for UtilityUtility Mean Rating Interpretation

    Stakeholder Identification 0.59 Good

    Evaluation Credibility 0.65 GoodInformation Scope and selection 0.78 Very Good

    Values Identification 0.66 Good

    Report Clarity 0.52 Good

    Report Timeliness and dissemination 0.29 Poor

    Evaluation Impact 0.35 Fair

    Note. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (0-.2)

    The ratings for utility using the metaevaluation checklist showed that in

    most of the item areas, the performance of the teacher evaluation processes are

    good. In particular, the area on information scope and selection is very good.

    However, report timeliness is poor and evaluation impact is fair and should thus be

    improved.

    ProprietyThe standards on propriety include service orientation, formal evaluation

    guidelines, conflict of interest, confidentiality, and helpfulness. Clusters and themes

    were formed from the GD.

    For service orientation, the clusters formed were on the results, examiner,

    and responding. According to the participants, they were not satisfied because the

    results come very late. There is a need to prepare hard copies of the results

    because most faculty members could not access the results and the PEF

    qualitative results are not seen online. The participants appraisal of the examiners

    include being friendly, sometimes late, new staff have difficulty administering the

    form because they could not handle deaf students, and they are not able to answer

    the questions of students. In responding to the SIR it was mentioned that there

    should be orientation to students.

    For the formal evaluation guidelines, the three areas specified were the

    students, frequency of meetings, and observation visits. For the students, it was

    mentioned that they get tired of answering many SIR (forms) within the day. In

    terms of the frequency of meetings, there are no guidelines for modular classes

    and team teaching; and no SIR for on-the-job training classes and the teacher

    cannot be promoted. In the observation visits, it is needed to make clear who will

    call the teacher when the SIR is finish; the observer cant make other visits; the PEFguidelines do not give instructions what the observer will do; it is not practical for

    the observer to go through the whole process of preobservation, observation and

    post observation.

    No clusters were formed for the conflict of interest. The themes extracted

    were CLPA do not give in to requests; not too many queries about the SIR;

    because the LC is adopted by the college, more value is given to the SIR; and SIR

    is not fully explained to the teacher.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    12/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    86

    For confidentiality, majority of the participants agree that the information

    kept by the center is very confidential.

    For the area on helpfulness the themes identified were the comments are

    read rather than the numbers; its the comments that the teachers look at; the

    numerical results need a more clear explanation; and comments need to be broken

    down into specific factors.

    Table 2Rating for ProprietyPropriety Mean Rating Interpretation

    Service Orientation 0.53 Good

    Formal Agreement 0.70 Very Good

    Rights of Human subjects 0.62 Good

    Human Interaction 0.57 Good

    Complete and Fair Assessment 0.38 Fair

    Disclosure of Findings 0.50 Good

    Conflict of Interest 0.48 Fair

    Physical Responsibility 0.40 Fair

    Note. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

    Most of the ratings for propriety using the metaevaluation checklist were

    pegged at good. A very good rating was obtained for formal agreement. A fair rating

    is obtained in the areas of complete and fair assessment, conflict of interest, and

    physical responsibility.

    FeasibilityThe standards on feasibility include practical procedures, political viability,

    fiscal viability, and legal viability. Clusters and themes for the standards of feasibility

    were formed.

    For practical procedures, the clusters formed were on the understandability

    of the instructions, difficulty with the comments and suggestions part, difficulty with

    the instrument as a whole. These clusters show that while there are standardized

    procedures for every run of the SIR, there is a difficulty following them because

    generally, the students do not understand the instructions. The comments and

    suggestions (part four) part of the instrument appears to be a particularly

    problematic parthere too, the instructions do not seem to be clear to the students:

    It is obvious they do not understand the instructions because they do not complete

    the sentence. Other than this, some students are not sure whether they arerequired to answer or it is optional. Others dont feel safe answering this part

    because they are afraid their professors will get back at them for whatever they

    write. Ultimately, observed the participants, the instrument(itself) is complicated,

    not only the instructions.

    For political viability, eight issue clusters were formed. These were time

    issues in administration, time issues of teachers, rescheduling issues, frequency of

    evaluation, anticipating name changes, identifying and anticipating teacher-related

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    13/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    87

    issues, anticipating student needs, and concerns about utility. The time issues in

    administration mentioned administration problems regarding the first-thirty-minutes

    policy observed by the Center. The time allotment is generally too short for the

    whole administration procedure from giving instructions to the actual answering of

    the instrument. Teachers also have issues regarding the same policy. Some refuse

    to be rated in the first thirty minutes, preferring to be rated in the last thirty.Another issue regarding the policy is the refusal of some teachers to be evaluated in

    the first thirty minutes. There are faculty members who dictate that the last 30

    minutes will be used for evaluation. There are others who complain about the

    duration of the SIR administration, even if the guidelines (distributed in the

    eighth week of the term, the week before the evaluation) indicated first 30

    minutes.

    Though discouraged by the Center, rescheduling still does happen during

    the evaluation period. Usually it is because some of the faculty members (or their

    students) do not show up. Similarly, there are times when some students do come,

    but their numbers do not meet the fifty percent quota required for each sections

    evaluation. Another common reason for rescheduling are schedule conflicts with

    other activities: (the) Young Hoteliers Exposition and some tours and retreats

    have the same schedule as the SIR.

    The next issue cluster formed is regarding the frequency of evaluation;

    teachers question whether there is a need to evaluate every term. Although there is

    only one strand, it is important enough to be segregated as it gives voice to one of

    the interest groups major concerns.

    The next cluster forms the biggest group, the cluster that talks about

    identifying and anticipating the needs of the one of the major interest

    groups/stakeholders of the whole evaluation system: the teachers themselves. Their

    needs range from the minor (We need to request for the updated list of the faculty

    names early in the term, a list including the faculty members who changed their

    surnames with computer center.) to the major, and a lot in between. Among thislast include the need to make sure that teachers are aware of their evaluation

    schedules and the Centers policies, to come up with ways to deal with the teachers

    during the actual administration, and to equip them with the know-how to access

    their online results.

    Just as teachers, the evaluatees, have needs, so do their evaluators, their

    students. By not taking care of the students needs and/or preferences, the Center

    risks generate inaccurate results. Thus, the Center should compile the needs of

    students and present it (the SIR) to (the) students in an attractive form. (CLPA

    should) drum up the interest of students in the evaluation.

    Last under the feasibility area are issues on utilization. There appears to be

    a need to make the utilization clearer to the stakeholders, especially the teachers.For the area on cost effectiveness, the clusters formed were human

    resources, material resources, and technology. The human resources of the Center

    are well-utilized. The staff feels that despite special cases when they find it

    difficult to go home because of the late working hours, they feel well compensated,

    in part because of the meals served. As to material resources, the SIR process is

    well-supported by the College and so, everything is generally provided. There are

    special cases where the evaluation setting makes administration difficult. For

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    14/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    88

    instance, sometimes its hard to administer in the far buildings, especially in the

    food labs located in far places. Finally, under the theme of technology, the Center

    proved well-equipped enough to handle the pen-and-paper instruments processing.

    However, it may be some time before the process become paperless; if the memos

    would be delivered online, instead of personally, as is currently done, some of the

    faculty would not get the memo on time because the faculty members do nothave their own PCs. Then, an attempt was made to administer the instrument

    online. A problem that was noted in this regard was kaunting respondents with

    online evaluation (very few respondents are gathered with the online evaluation).

    Other than that, if all classes come together for on-line the computers hang.

    For legal viability, only one theme was developed, standardizing the

    evaluation setting. There is a common script to keep the instructions standardized

    and, although Duringcollege break some classes are affected with the noise (of

    college break activities), the classroom is generally conducive in answering.

    Table 3Rating for FeasibilityFeasibility Rating Interpretation

    Political Viability 0.23 Poor

    Practical Procedure 0.68 Good

    Cost effectiveness 0.50 Good

    Note. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

    For the three areas of feasibility, a good raring was obtained for practical

    procedure and cost effectiveness and poor for political viability.

    AccuracyThe standards of accuracy were rated based on the reliability report of the

    instrument since SY 2003-2004 to 2005-2006. The trend of the mean performance

    of the means of the faculty from 2003-2006 was also obtained.

    Table 4Internal Consistency of the items for the SIR from 2003 to 2006

    School Year

    2003-2004 2004-2005 2005-2006

    1st Term 0.873 0.875 0.881

    2nd Term 0.888 0.892 0.894

    3rd Term 0.892 0.885

    Summer 0.832 0.866

    The reliability of the SIR form is consistently high since 2003 to 2006. The

    Cronbachs alpha obtained are all in the same high level across the three terms and

    across three school years. This indicates that the internal consistency of the SIR

    measure is stable and accurate across time.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    15/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    89

    Figure 1 shows a line graph of the means in the SIR each term across three

    school years.

    Figure 1Data Trend from the Last Three Y ears

    The trend in the means show that the SIR results increase at a high level

    during summer terms (4th). The high level of increase can be observed from the

    spikes in the 4th term in the line graph for the three part of the SIR instrument.

    The means during the first, second, and third term are stable and it rapidly increase

    for the summer term.

    Table 5Rating for Accuracy Accuracy Rating In

    Program documentation 0.03 Poor

    Content Analysis 0.00 Poor

    Described Purposes and Procedures 0.25 Poor

    Defensible Information Sources 0.50 Good

    Valid Information 0.23 Poor

    Reliable information 0.35 Fair

    Systematic information 0.85 Very Good

    Analysis of Quantitative information 0.25 Poor

    Analysis of Qualitative information 0.00 Poor

    Justified conclusions 0.00 PoorImpartial reporting 0.38 Fair

    Metaevaluation 0.08 Poor

    Note. Excellent (.9-1), Very Good (.7-.8), Good (.5-.6), Fair (.3-.4), Poor (.1-.2)

    The ratings for accuracy using the metaevaluation checklist were generally

    poor in most areas. Only systematic information was rated as very good, only

    3.70

    3.80

    3.90

    4.00

    4.10

    4.20

    4.30

    4.40

    1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th

    Term

    Mean

    Part I

    Part 2

    Part 3

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    16/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    90

    defensible information sources was rated as good, and both reliable and impartial

    reporting were fair.

    Table 6Summary Ratings for the StandardsStandard Rating Percentage Interpretation

    Feasibility 4.5 25% Fair

    Accuracy 8.75 0% Poor

    Propriety 15.17 25% Fair

    Utility 13.5 25% Fair

    In the four standards as a whole, feasibility (25%), propriety (25%), and

    utility (25%) are met fairly and accuracy (0%) is poor for the entire teacher

    performance evaluation system of the center. The poor accuracy is due to zero

    ratings on content analysis, qualitative information, and justified information. The

    three standards rated as fair did not even meet half of the standards in the

    metaevaluation checklist.

    Figure 1Outcome of the Standards

    Standard Outcome

    0% 25% 50% 75% 100%

    Feasibility

    Accuracy

    Propriety

    Utility

    DiscussionThe overall findings in the metaevaluation of the teacher evaluation system

    at the Center for Learning and Performance Assessment show that it falls below the

    standards of the Joint Committee on Evaluation. The ratings of utility, propriety,

    and feasibility were fair and the standard on accuracy is poor.

    In the standard of utility the report timeliness and dissemination is poor.

    This is due to the lack of timely exchanges with the full range of right-to-know

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    17/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    91

    audiences. In order to improve the timely exchanges, the Center needs to conduct

    consistent communications with different offices that they are serving.

    For propriety, the rating is only fair because low ratings were obtained for

    complete and fair assessment, conflict of interest, and fiscal responsibility. To

    improve complete and fair assessment, there is a need to assess and report the

    strengths and weaknesses of the procedure, use the strengths to overcome weaknesses, estimate the effects of the evaluations limitations on the overall

    judgment of the system. In line with conflict of interest, there is a need to make the

    release of evaluation procedures, data and reports for public review. For physical

    responsibility, there is a need to improve adequate personnel records concerning

    job allocations and time spent on the job, and employ comparisons for evaluation

    materials.

    In standards of accuracy, majority of the ratings were poor, including

    program documentation, content analysis, described purposes and procedures,

    valid information, analysis of qualitative and quantitative information, justified

    conclusion and metaevalaution. For program documentation the only criteria met

    was the technical report that documents the programs operations; all other nine

    criteria were not met. For content analysis, all criteria were not met. In described

    purposes and procedures, only the record of the clients purpose of evaluation and

    implementation of actual evaluation procedures were met. All other eight criteria

    were not met. For valid information, there is a need to focus evaluation on key

    ideas, employ multiple measures to address each idea, provide detailed description

    of the constructs assessed, report the type of information each employed

    procedures acquires, report and justify inferences, report the comprehensiveness of

    the information provided by the procedures as set in relation to the information

    needed, and establish meaningful categories of information by identifying regular

    and recurrent themes using qualitative analysis. In the analysis of qualitative and

    quantitative information, there is a need to conduct exploratory analysis to assure

    data correctness, choose procedures appropriate to the system of evaluatingteachers, specify assumptions being met by the evaluation, report limitations of each

    analytic procedures, examine outliers and verify correctness, analyze statistical

    interactions, and using displays to clarify the presentation and interpretation of

    statistical results. In the areas of justified conclusions and metaevaluation, all criteria

    were not met.

    In the standards of feasibility, political viability needs to be improved. For

    political viability, the evaluation needs to consider ways to counteract attempts to

    bias or misapply the findings, foster cooperation, involve stakeholders throughout

    the evaluation, issue interim reports, report divergent views, and affirm a public

    contract.

    Given the present condition of the SIR and PEF in evaluating facultyperformance based on the qualitative data, there are still gaps that need to be

    addressed in line with the evaluation system. The stakeholders are more or less not

    yet aware of the detailed standards on conducting evaluations among their faculty

    and what is verbalized in the qualitative data is only based on their personal

    experience and the practices required of the evaluation system. By contrast, the

    standards on evaluation would specify more details that need to be met in the

    evaluation. Some areas in the evaluation are interpreted by the stakeholders as

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    18/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Time Taylor Academic JournalsISSN 2094-0734

    92

    acceptable based on the themes of the qualitative data but more criteria need to be

    met in a larger range of evaluating teachers. It is recommended for the Center for

    Learning and Performance Assessment to consider the specific areas found wanting

    under utility, propriety, feasibility, and especially accuracy to attain quality standards

    in their conduct of teacher evaluation.

    ReferencesBonfadini, J. (1998). Should students evaluate their teachers? Rural Living, 52(10),

    40-41.

    Berliner, D. (1982). Recognizing instructional variables. In D.E. Orlsoky (Ed.),

    Introduction to Education(pp. 198-222). Columbus, OH: Merrill.

    Bradshaw, L. (1996).Alternative teacher performance appraisal in North Caroling:

    Developing guidelines. (ERIC Document Reproduction Service No. ED

    400 255)

    Egelson, P. (1994, April). Collaboration at Richland School District Two: Teachers

    and administrators design and implement a teacher evaluation system that

    supports professional growth. Paper presented at the Annual Meeting of the

    American Educational Research Association, New Orleans, LA. (ERIC

    Document Reproduction Service No. ED 376 159)

    Glatthorn, A. (1997). Differential instruction. Alexandria, VA: Association for the

    Supervision and Curriculum Development.

    Galves, R.E. (1986).Ang ginabayang talakayan: Katutubong pamamaraan ng sama-

    samang pananaliksik[Guided discussion: Ethnic approach in research].

    Unpublished manuscript, Psychology Department, University of the

    Philippines

    Hummel, B. (2006). Metaevaluation: An online resource. Retrieved September 6,

    2006, from http://www.bhummel.com/Metaevaluation/index.htmlIsaacs, J.S. (2003).A study of teacher evaluation methods found in select Virginia

    secondary public schools using the 4x4 model of block scheduling.

    Unpublished doctoral dissertation, Virginia Polytechnic Institute and State

    University.

    Lengeling, M. (1996). The complexities of evaluating teachers. (ERIC Document

    Reproduction Service No. ED 399 822)

    ODonell, J. (1990).

    Scriven, M. (1969). An introduction to meta-evaluation. Educational Products

    Report, 2, 3638.

    Seldin, P. (1991). The teacher portfolio. Bolton, MA: Anker.

    Shulman, L. (1988). A union of insufficiencies: Strategies for teacher assessment ina period of educational reform. Educational Leadership, 46(3), 36-41.

    Strobbe, C. (1993). Professional partnerships. Educational Leadership, 51(42), 40-

    41.

    Stufflebeam, D.L. (2000). The methodology of metaevaluation as reflected in by

    the Western Michigan University Evaluation Center.Journal of Personnel

    Evaluation in Education, 14(1), 95.

  • 8/9/2019 A Metaevaluation Study on the Assessment of Teacher Performance in

    19/19

    The International Journal of Educational and Psychological Assessment

    December 2009, Vol. 3

    2009 Ti T l A d i J l ISSN 2094 0734

    93

    The National Board for Professional Teaching Standards (1998). The national

    certification process in-depth. [On-line]. Available:

    http://www.nbpts.org/nbpts/standards/intro.html

    Williams, W. & Ceci, S. (1997). How am I Doing? Change, 29(5), 12-23.

    Widmer, T. (2004): The Development and Status of Evaluation Standards in

    Western Europe. New Directions for Evaluation, 10(4), 31-42.

    Author NotesSpecial Thanks to my research Assistant Ms.Nicole Tangco for helping me gather

    data and consolidate the report. Also the staff, heads, and coordinators of the

    Center for Learning and Performance Assessment (CLPA) for participating in the

    study. For the Directors office of the CLPA and the Performance Assessment

    Services Unit for the funding.

    About the AuthorDr. Carlo Magno is presently a faculty of the Counseling and Educational

    Psychology Department at De La Salle University, Manila, Philippines. He teaches

    courses in measurement and evaluation, advance statistics, and advance

    psychometric theory. He has conducted studies and engaged in projects about

    teacher performance assessment.