23
1 Measuring instructional practices in mathematics using a daily log Carrie W. Lee, Temple A. Walkowiak, and Elizabeth L. Greive North Carolina State University Paper for presentation at National Council of Teachers of Mathematics 2014 Research Conference April 2014 New Orleans, LA This work is funded by the National Science Foundation under Award #1118894. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF. Questions or comments about the work in this paper should be directed to the Principal Investigator and second author of this paper at [email protected] .

1 Measuring instructional practices in mathematics using a daily log … · 2014. 4. 4. · frequency of tasks that are made available to the students. The log lists 44 items that

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Measuring instructional practices in mathematics using a daily log

    Carrie W. Lee, Temple A. Walkowiak, and Elizabeth L. Greive North Carolina State University

    Paper for presentation at National Council of Teachers of Mathematics 2014 Research Conference

    April 2014 New Orleans, LA

    This work is funded by the National Science Foundation under Award #1118894. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF. Questions or comments about the work in this paper should be directed to the Principal Investigator and second author of this paper at [email protected].

    mailto:[email protected]

  • 2

    Abstract

    In the current era of accountability with a push to use value-added test scores for evaluation of

    programs and teachers, there is a need for valid and reliable measures of mathematics teaching

    practices that can be used in a large number of classrooms. In this paper, we present the

    Mathematics Instructional Log, an instrument completed by elementary school teachers, that

    aims to measure three domains of standards-based mathematics teaching (tasks, discourse, and

    representations) that are aligned with the Standards for Mathematical Practice in the Common

    Core and with NCTM’s process standards. The purposes of the paper are: (1) to describe the

    theoretical framework underpinning the Mathematics Instructional Log; (2) to discuss the

    development process; (3) to share the results of an exploratory factor analysis of pilot data; and

    (4) to outline implications for future work. Initial evidence of validity and score reliability are

    discussed.

  • 3

    Introduction

    The mathematics achievement of America’s students has remained a topic of concern and

    discussion for policy makers and educators for several decades. When television, newspapers,

    and magazines publish headlines like “Sluggish Results Seen in Math Scores” (Dillon, 2009) and

    “How to Solve Our Problem with Math” (Ramirez, 2008), even the general public engages in the

    conversation. This attention to mathematics stems from two sources. First, today’s global

    economy has resulted in newly created jobs that require mathematical problem solving and

    technological skills (Glenn, 2000). Second, comparisons on international assessments of

    industrialized countries indicate that American students have consistently performed in the

    bottom third in mathematics (Cooke, Ginsburg, Leinwand, Noell, & Pollock, 2005).

    Most recently, the development, adoption, and implementation of the Common Core

    State Standards in Mathematics (CCSS-M) (National Governor’s Association Center of Best

    Practices, Council of Chief State School Officers, 2010) has drawn even more attention from

    researchers, practitioners, and the public as to what is happening inside mathematics classrooms

    in the United States. Alongside the new standards, we live in an age with increasing attention to

    value-added models for teacher accountability. In addition to this scrutiny of K-12 schools and

    teachers, colleges of education have been under fire for their work in preparing the next

    generation of teachers with more attention to evaluating and ranking the effectiveness of teacher

    preparation programs (Greenberg, McKee, & Wash, 2013). With all of these current challenges,

    there is an immediate need for tools that produce data on classroom practices beyond student

    achievement outcomes. A careful examination of students' experiences during school

    mathematics is certainly warranted, considering this explicit focus in the Standards for

    Mathematical Practice in the CCSS-M.

  • 4

    The recent adoption and subsequent implementation of the CCSS-M by many states

    coupled with the increased need to carefully evaluate teacher preparation programs in response

    to scrutiny, result in an urgency to describe the type of mathematics instruction happening in

    schools as it relates to the practices outlined in the new standards. In response to the need to

    catalogue and quantify teachers’ mathematics instruction as a part of a program evaluation, the

    Mathematics Instructional Log was developed. At its core, it describes the frequency and type of

    mathematics teaching practices present in elementary mathematics lessons. Although in its early

    stages of development, the purposes of this paper are to: (1) outline the instrument’s theoretical

    framework; (2) describe the development process to date; (3) share the results of the pilot study;

    and (4) discuss implications and next steps for future work.

    Significance and Context of the Work

    The development of the Mathematics Instructional Log occurred in the context of an

    ongoing five-year research study, Project ATOMS (Accomplished Elementary Teachers of

    Mathematics and Science), in which we are evaluating the effectiveness of a STEM-focused

    elementary teacher preparation program. In this study, we are examining the development of our

    teacher candidates while undergraduate students into their first two years of teaching,

    particularly in relation to STEM content. While the study has many facets, we are most

    interested in their knowledge, beliefs, and teaching practices and the relationships among these

    constructs.

    Teachers’ instructional practices remain a central focus in this age of accountability,

    particularly among teacher educators and school administrators. The recent shift to a focus on

    accountability has created a need for valid and reliable evaluative measures of teaching practice.

    While Race to the Top pushes for evaluation of program effectiveness using student test score

  • 5

    gains (U.S. Department of Education, 2009), the applicability of student test score gains for the

    purposes of program evaluation is limited by tested subjects and grade levels, growth model

    selection, and characteristics of teachers and schools (Henry, Kershaw, Zulli, & Smith, 2012).

    The field of education is challenged by a lack of promising evaluative measures that are focused

    on unique program features in order to measure program effectiveness. Due to these limitations,

    evaluators of education programs are often forced to develop instruments and conduct validation

    activities as part of routine evaluation efforts; this is the very focus of the work described in this

    paper about the Mathematics Instructional Log.

    In the past, there have been two commonly used tools for gathering data on instructional

    practices, observations (e.g., Hiebert et al., 2005) and surveys (e.g., Early Childhood

    Longitudinal Study). Classroom observations have often been called the “gold standard”, but the

    cost of implementing observations is expensive. Additionally, Rowan and Correnti (2009) found

    large variability in practices across days, indicating a large number of observations are needed to

    reliably discriminate among teachers. Furthermore, extensive training is often needed to ensure

    inter-rater reliability. Surveys, while cost-effective, have been criticized for the accuracy of data

    collected (e.g., Mayer, 1999). Most surveys are administered as annual surveys and therefore

    require teachers to retrospectively answer questions about their instruction over long periods of

    time. This form of data collection introduces issues of memory error and estimation strategies

    (Rowan, Jacob, & Correnti, 2009).

    Another tool used less often is an instructional log (Rowan, Harrison, & Hayes, 2004:

    Rowan, Camburn, & Correnti, 2004; Camburn & Barnes, 2004), a daily questionnaire in which

    teachers “log” about instruction. While instructional logs have limitations of their own, this

    approach has been explored as a solution to the shortcomings of observational and survey

  • 6

    measures. Instructional logs allow researchers to collect data across a large number of teachers,

    and in turn, examine patterns across thousands of days. As a result, logs address the limitations

    of observational measures by allowing teaching variation to be documented through the increase

    in data collection points. Additionally, instructional logs are less expensive than the cost of

    conducting observations. In regard to concerns with survey measures, when teachers complete

    the log on the day of instruction, issues of memory lapse are addressed. It is also common for

    annual surveys to ask broader questions due to the retrospective nature of the task. With daily

    instructional logs, more specific questions can be asked, allowing for more information to be

    collected about daily instruction.

    The use of a mathematics instructional log has proven to be a valid measure of

    mathematics teaching practices (Rowan, Harrison, & Hayes, 2004) in work done by researchers

    on the Study of Instructional Improvement (SII). In their work, they collected data on three

    dimensions of teaching: whether or not a teacher used direct teaching, the pacing of content

    coverage, and the nature of students’ mathematical work. They made a decision to only collect

    this information around three focus topics: number concepts; operations; and patterns, functions,

    or algebra. While their work offered compelling results and greatly informed the work outlined

    in this paper, the features of our program along with the goals of our research study prompted us

    to develop the Mathematics Instructional Log to address the theoretical framework driving our

    work.

    Theoretical Framework

    The theoretical framework that guided the development of our instructional log focuses

    on three domains of mathematics instruction: tasks, discourse, and representations (Berry,

    Rimm-Kaufman, Ottmar, Walkowiak, & Merritt, 2010; Walkowiak, Berry, Rimm-Kaufman,

  • 7

    Meyer, & Ottmar, 2014). These domains are the overarching focus within the process standards

    outlined by NCTM’s Principles and Standards for School Mathematics (2000) and within the

    Standards for Mathematical Practice in the CCSS-M. For example, two of the mathematical

    practices, “model with mathematics” and “use appropriate tools strategically,” align well with

    representations, also one of the process standards. The instructional log captures information

    about the opportunities that are provided within a lesson in regard to tasks, discourse, and

    representations. The log is not able to address the same level of depth that observational

    measures are able to capture, but the log is able to capture frequency and presence of certain

    activities within these domains.

    Tasks. First, the tasks in which students engage during mathematics instruction matter.

    One characteristic to consider is a task's level of cognitive demand, based upon the type of

    thinking required of students. The cognitive demand framework (Stein, Smith, Henningsen, &

    Silver, 2009) organizes tasks into four categories based on the amount of rigor and use of

    conceptual understanding. The lower level categories, memorization and procedures without

    connections, focus more on completing tasks based on a set of memorized steps without tying the

    process to mathematical meaning. The higher level categories, procedures with connections and

    doing mathematics, involve tasks that require explanation or use of conceptual understanding

    within a mathematical process. While students may engage in tasks at all levels, analysis of the

    frequency of tasks within each category allows conclusions to be made about students’

    opportunities to engage with higher level tasks. Students who authentically engage in higher

    level tasks develop reasoning skills that are emphasized in the Standards for Mathematical

    Practice in the CCSS-M. While the mathematics log cannot provide information about the

    authenticity of the tasks in which students engage, the log does provide information about the

  • 8

    frequency of tasks that are made available to the students. The log lists 44 items that describe

    different tasks or activities that the daily lesson may include. By indicating which activities

    occurred during the lesson, information can be gleaned about the nature of the cognitive demand

    provided to the students.

    Discourse. Second, researchers in mathematics education (Lampert & Blunk, 1998;

    Hufferd-Ackles, Fuson, & Sherin, 2004) emphasize that discourse is an important component of

    school mathematics because it is a central part of what and how students learn. Student-to-

    student and student-to-teacher discourse during which students explain and justify their

    reasoning, is a critical component of standards-based practices that have been shown to result in

    better learning outcomes (Stein, 2007; Truxaw & DeFranco, 2007; Truxaw, Gorgievski &

    DeFranco, 2008). Communication is also a foundational tenet of the mathematical practices

    outlined by both NCTM and the CCSS-M. A teacher's role is a primary support for student

    talk. The mathematics log includes several items that address discourse by indicating if

    communication of mathematical ideas occurred among the students. The log provides a picture

    of the frequency of student talk during a lesson. Although, the log cannot provide a detailed

    account of what strategies are used or the depth of implementation, it can show that discourse

    opportunities were included in the lesson.

    Representations. Third, Lesh, Post, and Behr (1987) outline five representations students

    and teachers can use for mathematical concepts: pictures, written symbols, oral language, real-

    world situations, and manipulative models. Research has shown the importance of using

    multiple representations in mathematics instruction to help students create understanding (e.g.,

    Lehrer & Schauble, 2002), but the use of multiple representations should include making explicit

    connections among these representations (Duval, 2006). Within the mathematics log there are

  • 9

    items when the teacher responds with representations he/she and the students used during

    instruction. The distinction between the teacher’s and students’ use is important because it

    provides information about the type of representations students are using themselves in their

    mathematics learning and the presence of multiple forms. Also, log items ask whether the teacher

    and/or the students made explicit connections among the representations.

    Development of the Log

    The Mathematics Instructional Log was developed to measure mathematics instructional

    practices that occur in elementary classrooms, specifically instructional practices as outlined in

    the above theoretical framework. The development of the Mathematics Instructional Log was

    grounded in methods of educational testing and measurement (Kubiszyn & Borich, 2010).

    Starting in August 2012 and continuing to date, the log development process has included five

    overlapping and iterative stages as outlined in Table 1.

    Table 1: Development Process of the Mathematics Instructional Log

    Stage Focus

    1 Development of purpose and theoretical framework

    2 Development of items and scales

    3 Development of training for teachers and implementation of the first pilot

    4 Revisions of initial log and implementation of the second pilot with revised log

    5 Revisions of second (revised) log; Ongoing and continued validation efforts

    The development process is ongoing and focuses on examining the validity and score

    reliability of the instructional log, a critical piece of measurement development. This work uses

  • 10

    the integrated conception of validity proposed by Messick (1993) to develop and begin to

    validate the instructional log. In the integrated conception, construct validity is not easily

    separated from other types of validity and is a unitary concept.

    In Stage 1, the development of the theoretical framework was heavily grounded in the

    practices or processes specified in the CCSS-M (2010) and NCTM’s process standards (2000),

    focusing specifically on standards-based teaching practices which are emphasized in the

    elementary education program that the instrument was designed to evaluate. Also, questions

    about mathematical content were designed to incorporate the content standards of the CCSS-M

    to align the log to current reform and to allow for collection of information on content topics

    taught across time points. As a part of this stage, existing mathematics instructional logs were

    analyzed, including the log created by the Study of Instructional Improvement (Rowan, Harrison,

    & Hayes, 2004) and the log developed within the Mosaic II Study (Le, Stecher, Lockwood,

    Hamilton, & Robyn, 2006).

    Stage 2 focused on item and scale development. Existing items from the SII and Mosaic

    II logs were analyzed for convergence with specific research goals for the development of the

    Mathematics Instructional Log, and new items were drafted to align with the variety of

    instructional practices used in mathematics. Both content and practice items were then reviewed

    by content experts, and modifications were made accordingly. In addition to expert reviewers,

    five elementary teachers participated in cognitive interviews to isolate items that were

    ambiguous or biased. As teachers reflected on their most recent mathematics lesson, they were

    asked both to identify any items that were confusing and to share how their typical math

    instruction was captured by the log. Specifically, they were asked if there were math practices

    that they used that were not listed within the log.

  • 11

    An initial draft of the log was piloted with 57 teachers in two local school districts in

    January 2013 during Stage 3 of the development process. All teachers participated in a 90-

    minute face-to-face training. The purpose of the training was to ensure common understanding

    among participants of the terms and scales used in the mathematics log and a science log (not

    detailed in this paper). The 57 teachers collectively logged 585 days of mathematics instruction.

    They completed the log electronically via Qualtrics, a survey software, as soon as possible after

    instruction occurred across a timeframe of fifteen school days.

    During this first pilot, we also collected qualitative feedback from participants to provide

    further clarification to items. Based on feedback from participants during the first pilot, several

    items were split to provide further clarity. For example, an item initially listed as “Work on

    problem(s) that have multiple answers or solution methods” was separated into “Work on

    problems that have multiple answers” and “Work on problems that have multiple solution

    methods.” Also, several items were further elaborated to include language concerning discourse

    to provide a better picture of what the task included. For example, the item “Prove that a solution

    is valid or that a method works for all similar cases” was transformed into two items, “Prove

    orally that a solution is valid or that a method works for all similar cases” and “Prove through

    written work that a solution is valid or that a method works for all similar cases” to discern if

    students engaged in student talk during the task.

    The response scale for the majority of items was also revised such that each response

    choice was defined clearly. The scale was changed to measure the time that the students

    engaged in behaviors rather than the teacher’s perceived emphasis on each behavior, which had

    been the focus of the first draft of the log. The scale on the second draft was modified to a four-

    point scale for the student behaviors listed after the following question stem, “During today's

  • 12

    mathematics instruction, how much time did students.” Teachers were required to each of the

    items with one of the following choices:

    Not today: This behavior was not done during today’s instruction;

    Little: This behavior made up a relatively small part of the instruction;

    Moderate: This behavior made up a large portion, but NOT the majority of instruction;

    Considerable: This behavior made up the majority of today’s mathematics instruction.

    Stage 4 included a second pilot test of the instrument with 54 elementary teachers.

    Because the teachers were not located in a central location and we wanted to pilot virtual

    trainings, the 90-minute training was conducted virtually using Blackboard Collaborate, which

    allowed for teachers to ask questions, respond to polls, and hear the presenter while viewing the

    PowerPoint slides. After the training, teachers logged about their mathematics instruction for 15

    days. The data from this second pilot were used to conduct an exploratory factor analysis;

    results are outlined in the next section of this paper. Currently, as part of Stage 5, 74 second-year

    teachers are logging about their mathematics instruction for a total of 45 logged days across the

    school year. Validation efforts are ongoing and are discussed later in this paper.

    Exploratory Factor Analysis: Results from the Second Pilot Study

    Data from the second pilot with 57 teachers was analyzed using an exploratory factor

    analysis (EFA) with each day of instruction as a data entry (n=750). The number of mathematics

    instructional days logged ranged from 2 to 16 days per teacher, with a mean of 13 days per

    teacher. The revised Mathematics Instructional Log included 11 items with a total of 53 sub-

    items, all of which were completed by teachers during each logging session. Sub-items were

    scored on a Likert scale. Items were arranged in four main sections displayed in Table 2:

    content, time spent on mathematics, use of representations, and student activities/behaviors

  • 13

    during mathematics instruction.

    Table 2: Overview of the Mathematics Instructional Log by Section

    Section Log Section Description

    Number of Items Examples of Sub-Items

    Response Scale

    1 Mathematical Content

    1 item with 5 sub-items

    -Number and Operations:Fractions -Measurement and Data

    1=Not today 2=Secondary focus 3=Primary focus

    2 Time in minutes spent teaching mathematics

    1 item N/A Dropdown in 5 minute intervals

    3 Use of representations

    2 items with 6 sub-items each

    -Number or symbols -Concrete materials -Pictures or diagrams

    Dichotomously scored with a checkbox for each item

    4 Student activities and behaviors

    7 items (all the same stem) with 42 sub-items

    -Pose questions to the teacher about the mathematics -Work on today’s mathematics homework

    1=Not today 2=Little 3=Moderate 4=Considerable

    The exploratory factor analysis included the 2 items in Section 3 and the 42 sub-items in

    Section 3, for a total of 44 items in the analysis. They were chosen because they are designed to

    measure the three domains of the theoretical framework: tasks, discourse, and representations.

    The two items in the Section 3 on representations included one item which asked “What did the

    students use to work on mathematics today?” with the following options to check if present:

    numbers or symbols; concrete materials; real-life situations or word problems; pictures or

    diagrams; tables or charts; and “the students made explicit links between two or more of these

    representations. The second item in Section 3 was a parallel item about the representations the

    teacher used during the mathematics lesson. These items were rescored on a four-point scale

    based upon what was selected with more weight given to the last choice. A four-point scale was

  • 14

    used so that these two items would be on a four-point scale like the sub-items in Section 4. The

    sub-items from Section 4 include the same item stem mentioned earlier, “During today’s

    mathematics instruction, how much time did the students.” The items in Sections 1 and 2,

    mathematics content and time spent teaching mathematics, were primarily asked for descriptive

    purposes and will be used in future analyses.

    Exploratory factor analysis using Principal Axis Factoring in SPSS with oblique rotation

    (i.e., Promax) was conducted; oblique factor solutions of 3-7 factors were considered.

    Cronbach's alphas were calculated for each factor subscale. We decided on the six-factor Promax

    structure, which explained 39.45% of the variance, because the factor loadings suggested clear

    cut-off points, the items loaded in a way that most clearly matched our theoretical framework,

    and the scree plot showed leveling at seven factors. Table 3 outlines the six factors, alphas for

    the subscales, the range of factor loadings, the percent of variance explained, and sample items.

    Three items were dropped because they did not clearly load on a factor (“Connect today's math

    topic to another math topic,” “Review or practice math facts that they have memorized”,

    “Perform tasks focused on math procedures”).

    Table 3: Results of Exploratory Factor Analysis, Pilot #2

    Factor # of Items

    Alpha Range of Factor Loading

    Percent of Variance Explained

    Sample Items

    Discourse 9 .864 .764-.461 17.96 Pose questions to other students about the mathematics.

    Talk about similarities and differences among various solution strategies.

    Lower Level Tasks

    10 .762 .909-.331 7.29 Listen to me explain the steps to a procedure.

    Read from a textbook to learn information.

  • 15

    Problem Solving 7 .782 .715-.307 5.51 Demonstrate different ways to solve a problem.

    Write explanations of mathematical ideas, solutions, or methods.

    Connections 5 .764 .891-.341 3.70 Connect today's math topic to a “real world” idea.

    Read from a picture book.

    Prior Knowledge

    5 .442 .777-.350 2.68 Review mathematics content previously covered.

    Participate in an activity designed to activate prior knowledge.

    Representations 5 .607 .818-.164 2.31 Use hands-on tools to explore mathematical ideas or to solve problems.

    Use pictures or diagrams to represent mathematical concepts.

    Although it is not in the limits of this paper to detail the analysis process of the initial

    pilot data, an EFA of the same form was conducted that resulted in similar factor loadings to the

    larger second pilot. It is important to note these similar loadings resulted despite the fact that

    revisions were made between the first and second pilot implementations of the log. The initial

    pilot EFA resulted in a five-factor structure with factors closely aligned with the second pilot

    EFA; the five factors were identified as cognitive demand, connections and applications,

    problem solving, representations, and discourse. In the second pilot, the latter three factors

    emerged and were given the same identifying label. The factor of cognitive demand emerged

    again, but it was renamed accordingly as lower level tasks due to the nature of the items that

    loaded within the factor. The factor, “connections and applications” emerged similarly, but in

    this second pilot, the label “connections” was a better fit for describing the set of items. A sixth

    factor, “prior knowledge”, which was not present in the first EFA, was a factor resulting from the

  • 16

    second pilot EFA.

    In addition to support from the initial pilot, it is important to note how the six factors

    from the second pilot connect to the theoretical framework. While discourse and representations

    factors match corresponding constructs of the theoretical framework, the “tasks” domain

    emerges as four factors: lower level tasks, problem solving, connections, and prior knowledge.

    Items loadings to the lower level tasks included “Listen to me present the definition for a term,”

    “Listen to me explain the steps to a procedure,” and “Orally answer recall questions.” The nature

    of these activities are more centered on the direct actions of the teacher and likely do not require

    students to make conceptual connections to their work. When analyzed using the task analysis

    guide developed by Stein, Smith, Henningsen, and Silver (2009), the nature of the items within

    this factor are categorized as memorization tasks or procedures without connections tasks. The

    practices and activities fall into these categories because at the surface level, they do not require

    students to engage in complex thinking or attend to the conceptual ideas underlying the

    procedures. Although one cannot be certain how the teacher implements the task, if other items

    from the log are not selected to indicate other practices, tasks in this factor are best characterized

    as lower level tasks.

    Furthermore, items within the named factor problem solving entailed higher-level skills

    such as proving and demonstrating various solution strategies. Examples of items that loaded

    within this factor included “Prove through written work that a solution is valid or that a method

    works for all similar cases” and “Demonstrate different ways to solve a problem.” These items

    describe mathematical activities that require the students to move past using procedures to

    engage in tasks that likely demonstrate conceptual understanding of the mathematics. Research

    shows that students that engage in tasks that require higher level cognitive process perform better

  • 17

    on problem solving and reasoning measures (Henningsen & Stein, 1997). Furthermore, the level

    of cognitive demand is sustained by requiring students to justify or explain their thinking which

    are actions that are captured by log items in this factor grouping.

    Additionally, connections and prior knowledge are distinct components of tasks

    captured by the log. The items that loaded within the factor of connections explicitly stated that

    connections were made to another subject, other math concepts, or real-world situations. Also,

    activities with picture books loaded within this factor, which is understandable due to the real-

    world setting of most picture books designed to integrate mathematical concepts. The emergence

    of the connections factor sheds light on the unique characteristics of activities that are designed

    to help students link mathematical processes to the world around them. Henry Kepner, NCTM

    President speaks to the importance of connections within mathematics in his statement, “When

    students connect mathematical ideas, their understanding becomes deeper and more lasting, and

    learners come to view mathematics as a coherent whole—connected with other subjects and their

    own interests and experiences” (NCTM Summing Up, 2009). Prior knowledge was a weaker

    factor with an alpha level of only .422. The items that loaded within this factor included

    activities that review previous learning and activities that activate prior knowledge. These items

    do not directly relate to one another, and therefore, the construct they illustrate within the factor

    does not seem as clear as the previous ones described.

    Also, it appears it may be more difficult to capture representations with only four items

    and some weaker loadings. Some of the items that might be considered a measure of

    representations loaded under a different factor. For example, the item “Talk about similarities

    and differences among representations” loaded within the discourse factor, yet the item still

    provides information concerning the use of representations within the lesson. This speaks to the

  • 18

    complexity of the constructs and encourages the research team to consider dual loadings.

    Although the total variance explained was 39.45%, the factors show promise for

    measuring features of the intended domains of tasks, discourse, and representations. The results

    are limited because at this stage in the log development, the analysis did not account for the

    nested structure of the data, with logging days situated within individual teachers. Additionally,

    the trainings for this second pilot were conducted virtually, which proved to be a challenge to

    make sure that participating teachers were attentive during the training.

    Attention to evidence of validity and score reliability has been addressed throughout the

    development process (and continued collection of evidence is ongoing). First, the cognitive

    interviews and feedback from pilot participants examined the face validity of the measure by

    asking participants how they interpreted the items, suggesting which items were vulnerable to

    inconsistent interpretation. Second, experts reviewed the log's content and agreed the items

    appear to measure the constructs in the theoretical framework. Third, construct validity has been

    examined through the factor analysis; the three domains of the theoretical framework are

    represented in the factor structure. Finally, there is evidence of score reliability with the

    Cronbach's alphas for the factor subscales (we recognize the weaker internal consistency of the

    “prior knowledge” subscale indicating further data collection and analyses are needed). Our

    ongoing and future collection of validity evidence is outlined in the next and final section of the

    paper.

    Implications and Future Directions

    Although in its early stages of development, we believe that describing the development

    of our Mathematics Instructional Log can help other researchers interested in ways to measure

    teaching practices. This work addresses the need for valid quantitative measures that can be used

  • 19

    reliably in order to investigate the teaching practices used in large number of classrooms. Based

    on feedback from the participating teachers in the pilots, completing the Mathematics

    Instructional Log across a number of instructional days seems to also contribute to a more

    reflective and analytic approach towards mathematics teaching.

    Future steps in the development of the Mathematics Instructional Log include conducting

    in-depth cognitive interviews, nested exploratory (EFA) and confirmatory (CFA) factor analysis,

    and item response analysis. In-depth cognitive interviews will provide information about how

    teachers are thinking about the items as they respond. Taking into account the nested structure

    of the data, the team is currently conducting an EFA and will follow up with a nested CFA to

    assess the quality of the item measurement scales or factors. Additionally, the team plans to

    apply item response theory to understand the extent to which the factor scales fit the data.

    We acknowledge several limitations in using an instructional log. Logs may be prone to

    measurement error due to teachers’ self-reports of their own instruction. In addition to

    conducting a sample cognitive interviews, we are video recording the teachers’ lessons in order

    to understand the extent to which the log is accurately representing a teacher’s mathematics

    instruction. Also, all 74 second-year teachers currently completing the log are required to submit

    three video recorded mathematics lessons. These videos will be scored by trained raters and

    compared to the teacher’s log report. The results will be analyzed as a source of criterion

    validity evidence. Another way to address the limitation of measurement error caused by teacher

    self-report is to provide training on how to objectively use the log. Training provides common

    definitions and explanation of logging processes. Continued support and training throughout the

    logging time frame helps promote accuracy and participation. These steps were taken in the pilot

    and current implementations of the Mathematics Instructional Log.

  • 20

    Finally, this instructional log is meant to measure instruction at a gross level (Rowan,

    Jacob, & Correnti, 2009). The log is not designed to capture fine-grained nuances of

    instructional practices and does not account for the quality of those practices. Further

    understanding as to whether this is an issue will be reached through an analysis of the collected

    videos from participants. The instructional log does not differ across grades and was designed

    with the intent to use it in elementary (K-5) classrooms. We are uncertain about the applicability

    at all grade levels at this point. In our ongoing work, we will continue to investigate these and

    other issues related to using an instructional log to measure teaching practices in mathematics.

    The Mathematics Instructional Log was developed to address the need for valid

    quantitative measures that can be used reliably to examine instructional practices in a large

    number of classrooms. With the current need for understanding teaching practices in the age of

    the CCSS-M and the need for program evaluation instruments, the work outlined in this paper iis

    timely. The Mathematics Instructional Log shows promise as one measure to address these

    needs. Measures like the log can add to the existing body of research on mathematics teaching

    and offer meaningful implications for teachers, teacher educators, and researchers regarding

    mathematics instruction. The work presented in this paper and future analyses will provide

    understanding as to the extent to which the Mathematics Instructional Log is a valid and reliable

    measure of mathematics teaching practices.

  • 21

    References Berry, III, R. Q., Rimm-Kaufman, S. E., Ottmar, E. M., Walkowiak, T. A., & Merritt, E. (2010).

    The Mathematics Scan (M-Scan): A Measure of Mathematics Instructional Quality. Unpublished measure, University of Virginia.

    Dillon, S. (2009, 2009, October 14). Sluggish results seen in math scores. The New York Times,

    pp. Retrieved November 2, 2009 from http://www.nytimes.com. Duval, R. (2006). A cognitive analysis of problems of comprehension in a learning of

    mathematics. Educational studies in mathematics, 61(1-2), 103-131 Cooke, G., Ginsburg, A., Leinwand, S., Noell, J., & Pollock, E. (2005). Reassessing US

    international mathematics performance: New findings from the 2003 TIMSS and PISA. Washington, DC: American Institutes for Research.

    Glenn, J. (2000). Before it's too late: A report to the nation from the national commission on

    mathematics and science teaching for the 21st century. Jessup, MD: Education Publications Center.

    Greenberg, J., McKee, A., & Wash, K. (2013). Teacher prep review: A review of the nation’s

    teacher preparation programs. Washington, DC: National Center on Teacher Quality. Henningsen, M., & Stein, M. K. (1997). Mathematical tasks and student cognition: Classroom-

    based factors that support and inhibit high-level mathematical thinking and reasoning,” Journal for Research in Mathematics Education, 28(5), 524–549.

    Henry, G. T., Kershaw, D. C., Zulli, R. A., & Smith A. A. (2012). Incorporating teacher

    effectiveness into teacher preparation program evaluation. Journal of Teacher Education, 63(5), 335-355.

    Hufferd-Ackles, K., Fuson, K.C., & Sherin, M.G. (2004). Describing levels and components of a

    math-talk learning community. Journal for Research in Mathematics Education, 35(2), 81-116.

    Kubiszyn, T., & Borich, G. (2010). Educational Testing & Measurement: Classroom Application

    and Practice (9th ed.). Hokoken, NJ: John Wiley & Sons, Inc. Lampert, M. & Blunk, M. L. (Eds.). (1998). Talking mathematics in school: Studies of teaching

    and learning. New York: Cambridge University Press. Le, V. N., Stecher, B. M., Lockwood, J. R., Hamilton, L. S., Robyn, A., Williams, V. L., Ryan,

    G. W., Kerr, K. A., Martinez, J. F., & Klein, S. P. (2006). Improving mathematics and science education: A longitudinal investigation of the relationship between reform-oriented instruction and student achievement. Santa Monica, CA: Rand Corporation.

    http://www.nytimes.com/�

  • 22

    Lehrer, R., & Schauble, L. (2002). Symbolic communication in mathematics and science: Co-constituting inscription and thought. In E. Amsel, & J. P. Byrnes (Eds.), Language, literacy, and cognitive development: The consequences of symbolic communication (pp. 167-192). Mahwah, NJ: Erlbaum.

    Lesh, R. A., Post, T. R., & Behr, M. J. (1987). Representations and translations among

    representations in mathematics learning and problem solving. In C. Janvier (Ed.), Problems of representation in the teaching and learning of mathematics (pp. 33 - 40). Hillsdale, NJ: Erlbaum.

    Mayer, D. P. (1999). Measuring instructional practice: Can policymakers trust surveydata?

    Educational Evaluation and Policy Analysis, 21(1), 29–46. Messick, S. (1993). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edition).

    Phoenix, AZ: Oryx Press. National Council of Teachers of Mathematics (2000). Principles and standards for school

    mathematics. Reston, VA: Author. National Governors Association Center for Best Practices, Council of Chief State School

    Officers (2010). Common Core State Standards in Mathematics. Washington, DC: Author.

    Ramirez, E. (2008, 2008, December 4). How to solve our problems in math. U.S. News and

    World Report, pp. Retrieved November 2, 2009 from http://www.usnews.com. Riordan, J. E., & Noyce, P. E. (2001). The impact of two standards-based mathematics curricula

    on student achievement in Massachusetts. Journal for Research in Mathematics Education, 32(4), 368-398.

    Rowan, B., Camburn, E., & Correnti, R. (2004). Using teacher logs to measure the enacted

    curriculum: A study of literacy teaching in 3rd grade classrooms. The Elementary School Journal, 105(1), 75–102.

    Rowan, B., & Correnti, R. (2009). Studying reading instruction with teacher logs: Lessons from

    a study of instructional improvement. Educational Researcher, 38(2), 120-131. Rowan, B., Harrison, D. M., & Hayes, A. (2004). Using instructional logs to study elementary

    school mathematics: A close look at curriculum and teaching in the early grades. The Elementary School Journal, 105(1), 103-127.

    Rowan, B., Jacob, R., & Correnti, R. (2009). Using instructional logs to identify quality in

    educational settings. New Directions for Youth Development, 121, 13-31. Stein, C. C. (2007). Let’s talk. Promoting mathematical discourse in the classroom. Mathematics

    Teacher, 101 (4), 285-289.

    http://www.usnews.com/�

  • 23

    Stein, M. K., Smith, M. S., Henningsen, M. A., & Silver, E. A. (2009). Implementing standards-

    based mathematics instruction (2nd ed.). New York: Teachers College Press. Truxaw, M. P. & DeFranco, T. C. (2007). Lessons from Mr. Larson. An inductive model of

    teaching for orchestrating discourse. Mathematics Teacher, 101(4), 268-272. Truxaw, M. P., Gorgievski, N., & DeFranco, T. C. (2008). Measuring K-8 teachers’ perceptions

    of discourse use in their mathematics classes. School Science and Mathematics, 108(2), 58-68.

    U.S. Department of Education. (2009). Race to the Top executive summary. Retrieved from

    http://www2.ed.gov/programs/racetothetop/executive-summary.pd Walkowiak, T. A., Berry, R. Q., Meyer, J. P., Rimm-Kaufman, S. E., & McCracken, E. R.

    (2014). Introducing an observational measure of standards-based mathematics teaching practices: Evidence of validity and score reliability. Educational Studies in Mathematics, 85(1), 109-128.