Upload
maxime
View
105
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Iowa’s Application of Rubrics to Evaluate Screening and Progress Tools. John L. Hosp, PhD University of Iowa. Overview of this Webinar. Share rubrics for evaluating screening and progress tools Describe process Iowa Department of Education used to apply rubrics. Purpose of the Review. - PowerPoint PPT Presentation
Citation preview
Iowa’s Application of Rubrics to Evaluate Screening and Progress ToolsJohn L. Hosp, PhDUniversity of Iowa
Overview of this Webinar•Share rubrics for evaluating screening
and progress tools•Describe process Iowa Department of
Education used to apply rubrics
Purpose of the Review•Survey of universal screening and
progress tools currently being used by LEAs in Iowa
•Review these tools for technical adequacy•Incorporate one tool into new state data
system•Provide access to tools for all LEAs in
state
Structure of the Review Process
CoreGroup IDE staff responsible
for administration and coordination of the effort
Vetting
Group
Other IDE staff as well as stakeholders from
LEAs, AEAs, and IHEs from across the state
WorkGroup IDE and AEA staff who
conducted the actual reviews
Overview of the Review Process
•The work group was divided into 3 groups:
•Within each group, members worked in pairs
Group A Group B Group C
Key elements of tools: name, what it measures, grades it is used with, how it is administered, cost, time to administer
Technical features: reliability, validity, classification accuracy, relevance of criterion measure
Application features: alignment with CORE, training time, computer system feasibility, turn around time for data, sample, disaggregated data
Overview of the Review Process• Each pair:
▫had a copy of the materials needed to conduct the review
▫reviewed and scored their parts together and then swapped with the other pair in their group
• Pairs within each group met only if there were discrepancies in scoring▫A lead person from one of the other groups
participated to mediate reconciliation• This allowed each tool to be reviewed by every
work group member
Overview of the Review Process•All reviews will be completed and brought
to a full work group meeting•Results will be compiled and shared•Final determinations across groups for
each tool will be shared with the vetting group two weeks later
•The vetting group will have one month to review the information and provide feedback to the work group
Structure and Rationale of Rubrics•Separate rubrics for universal screening
and progress monitoring▫Many tools reviewed for both▫Different considerations
•Common header and descriptive information
•Different criteria for each group (a, b, c)
Universal Screening Rubric
Iowa Department of EducationUniversal Screening Rubric for Reading (Revised 10/24/11)
What is a Universal Screening Tool in Reading: It is a tool that is administered at school with ALL students to identify which students are at-risk for reading failure on an outcome measure. It is NOT a placement screener and would not be used with just one group of students (e.g., a language screening test)
Why use a Universal Screening Tool:It tells you which students are at-risk for not performing at the proficient level on an end of year outcome measure. These students need something more and/or different to increase their chances of becoming a proficient reader.
What feature is most critical:Classification Accuracy because it provides a demonstration of how well a tool predicts who may and may not need something more. It is critical that Universal Screening Tools identify the correct students with the greatest degree of accuracy so that resources are allocated appropriately and students who need additional assistance get it.
Header on cover page
Group AGroup A
Information Relied on to make determinations: (circle all that apply, minimum of two) Manual from publisher NCRtI Tool Chart Buros/Mental Measurement Yearbook On-Line publisher Info. Outside Resource other than Publisher or Researcher of Tool Name of Screening Tool:
Skill/Area Assessed with Screener:
Grades: (circle all that apply) K 1 2 3 4 5 6 Above6
How Screener Administered: (circle one) Group or Individual
Criteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if: Cost (minus administrative fees like printing)
Tools need to be economically viable meaning the cost would be considered “reasonable” for the state or a district to use. Funds that are currently available can be used and can be sustained. One time funding to purchase something would not be considered sustainable.
Free
$.01 to $1.00 per student
$1.01 to $2.00 per student
$2.01 to $2.99 per student
≥ $3.00 & over per student
Student time spent engaged with tool
The amount of student time required to obtain the data. This does not include set-up and scoring time.
≤ 5 minutes per student
6 to 10 minutes per
student
11 to 15 minutes per
student
>15 minutes per student
Group BGroup B
Criteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if: Criterion Measure used for Classification Accuracy (Sheet for Judging Criterion Measure)
The measure that is being used as a comparison must be determined to be appropriate as the criterion. In order to make this determination several features of the criterion measure must be examined.
15-12 points on criterion
measure form
11-8 points on criterion
measure form
7-4 points on criterion
measure form
3-0 points on criterion
measure form
Same test but uses different
subtest or composite OR
same test given at a different
time Classification Accuracy (Sheet for Judging Classification Accuracy for Screening Tool)
Tools need to demonstrate they can accurately determine which students are in need of assistance based on current performance and predicted performance on a meaningful outcome measure. This is evaluated with: Area Under the Curve (AUC), Specificity and Sensitivity
9-7 points on classification
accuracy form
6-4 points on classification
accuracy form
3-1 points on classification
accuracy form
0 points on classification
accuracy form
No data provided
Criterion Measure used for Universal Screening Tool (Sheet for Judging Criterion Measure)
The measure that is being used as a comparison must be determined to be appropriate as the criterion. In order to make this determination several features of the criterion measure must be examined.
15-12 points on criterion
measure form
11-8 points on criterion
measure form
7-4 points on criterion
measure form
3-0 points on criterion
measure form
Same test but uses different
subtest or composite OR
same test given at a different
time
Judging Criterion MeasureUsed for: Circle all that apply Screening: Classification Accuracy Screening: Criterion Validity Progress Monitoring: Criterion ValidityName of Criterion Measure: Gates
How Criterion Administered: (circle one) Group or Individual
Information Relied on to make determinations: (circle all that apply) Manual from publisher NCRtI Tool Chart Buros/Mental Measurement Yearbook On-Line publisher info. Outside Resource other than publisher or Researcher of Measure
Additional Sheet for Judging the External Criterion Measure (Revised 10/24/11)
1. An appropriate Criterion Measure is:a) External to the screening or progress monitoring tool b) A Broad skill rather than a specific skillc) Technically adequate for reliability d) Technically adequate for validitye) Validated on a broad sample that would also represent Iowa’s population
Judging Criterion Measure (cont)Feature Justification Score 3 Score 2 Score 1 Score 0 Kicked Out
Criterion Measure is: a) External to the
Screening or Progress Monitoring Tool
The criterion measure should be separate and not related to the screening or progress monitoring tool. Meaning the outside measure should be by a different author/publisher and use a different sample. (e.g., NWF can’t predict ORF by the same publisher)
External with no/little overlap. Different author/publisher, standardization group.
External with some/ a lot of overlap. Same author/publisher, and standardization group.
Internal (same test using different subtest or composite OR same test given at a different time)
b) A broad skill rather than a specific skill
We are interested in generalizing to a larger domain and therefore, the criterion measure should assess a broad area rather than splinter skills.
Broad reading skills are measured (e.g., Total reading score on ITBS)
Broad reading skills are measured but in one area (e.g., comprehension made up of two subtests)
Specific skills measured in two areas (e.g., comprehension and decoding)
Specific skill measured in one area (e.g., PA, decoding, vocabulary, spelling)
Judging Criterion Measure (cont)
c) Technically adequate for Reliability
Student performance needs to be consistently measured. Typically demonstrated with reliability under different items (alternate form, split half, coefficient alpha)
Some form of reliability above .80
Some form of reliability between .70 and .80
Some form of reliability between .60 and .70
All forms of reliability below .50
d) Technically adequate for Validity
The tool measures what it purports to measure. We focused on criterion-related validity to make this determination. The extent to which this criterion measure relates to another external measure that is determined good.
Criterion ≥ .70
Criterion .50-.69
Criterion .30 -.49
Criterion .10 - .29
e) A broad sample is used
The sample used in determining the technical adequacy for a tool should represent a broad audience. While a representative sample by grade is desirable it is often not reported therefore, taken as a whole does the population used represent all students or is it specific to a region or state?
National sample Several States (3 or more) across more than one region
States (3, 2 or 1 in one region)
Sample of convenience, does not represent a state.
Judging Classification AccuracyAdditional Sheet for Judging Classification Accuracy for Screening Tool (Revised 10/24/11)
Assessment: (Include name and grade) Complete the Additional Sheet for Judging the Criterion Measure. If it is not kicked out complete review for:
1) Area Under the Curve (AUC) 2) Specificity/Sensitivity 3) Lag time between when the assessments are given
Feature Justification Score 3 Score 2 Score 1 Score 0 Kicked Out 1) Area Under the Curve (AUC)
Technical Adequacy is Demonstrated for Area Under the Curve
Area Under the Curve is one way to gauge how accurately a tool identifies students in need of assistance. It is derived from Receiver Operating Characteristic curves (ROC) and is presented as a number to 2 decimal places. One AUC is reported for each comparison—each grade level, each subgroup, each outcome tool, etc.
AUC ≤ .90
AUC ≥ .80
AUC ≥ .70
AUC < .70
Judging Classification Accuracy (cont)2) Specificity or Sensitivity
Technical Adequacy is Demonstrated for Specificity or Sensitivity (see below)
Specificity/Sensitivity is another way to gauge how accurately a tool identifies students in need of assistance. Specificity and Sensitivity can give the same information depending on how the developer reported the comparisons. Sensitivity is often reported as accuracy of positive prediction (yes on both tools). Therefore if the developer predicted positive/proficient performance, Sensitivity will express how well the screening tool identifies students who are proficient. If predicting at-risk or non-proficient, this is what Sensitivity shows. It is important to verify what the developer is predicting so that consistent comparisons across tools can be made (see below)
Sensitivity or Specificity
≥ .90
Sensitivity or Specificity
≥ .85
Sensitivity or Specificity
≥ .80
Sensitivity or Specificity
< .80
3) Lag time between when the assessments are given Lag time- length of time between when the criterion and screening assessment is given
Time between when the assessments are given should be shorter to eliminate effects associated with differential instruction
Under two weeks
Between two weeks and 1
month
Between 1 month and 6
months
Over 6 months
Sensitivity and Specificity Considerations and Explanations
Key+ = proficiency/mastery- = nonproficiency/at-risk0 = unknown = Sensitivity = Specificity
Explanations:True means “in agreement between screening and outcome”. So true can be negative to negative in terms of student performance (i.e., negative meaning at-risk or nonproficient). This could be considered either positive or negative prediction depending on which the developer intends the tool to predict. As an example, a tool that has a primary purpose of identifying students at-risk for future failure would probably use ‘true positives’ to mean ‘those students who were accurately predicted to fail the outcome test’.Sensitivity = true positives/true positives + false negativesSpecificity = true negatives/true negatives + false positives
Consideration 1:Determine whether developer is predicting a positive outcome (i.e., proficiency, success, mastery, at or above a criterion or cut score) from a positive performance on the screening tool (i.e., at or above benchmark or a criterion or cut score) or a negative outcome (i.e., failure, nonproficiency, below a criterion or cut score) from negative performance on the screening tool (i.e., below a benchmark, criterion, or cut score). Prediction is almost always positive to positive or negative to negative; however in rare cases it might be positive to negative or negative to positive.
Figure 1a This is an example of positive to positive prediction. In this case, Sensitivity is positive performance on the screening tool predicting positive outcome.
Figure 1b
This is the opposite prediction—negative to negative as the main focus. In this case, Sensitivity is negative (or at-risk) performance on the screening tool predicting negative outcome. Using the same information in these two tables, Sensitivity in the top table
will equal Specificity in the second table. Because our purpose is to predict proficiency, in this instance we would use Specificity as the metric for judging.
Outcome + -
Screening +
-
Outcome - +
Screening -
+
Consideration 2: Some developers may include a third category—unknown prediction. If this is the case, it is still important to determine whether they are predicting a positive or negative outcome because Sensitivity and Specificity are still calculated the same way.
Figure 2a This is an example of positive to positive prediction. In this case, Sensitivity is positive performance on the screening tool predicting positive outcome. It represents a similar comparison to that in Figure 1a.
Figure 2b This is the opposite prediction—negative to negative as the main focus. In this case, Sensitivity is negative (or at-risk) performance on the screening tool predicting negative outcome. It represents a similar comparison to that in Figure 1b. Using the same information in these two tables, Sensitivity in the top table will equal Specificity in the second table. Because our purpose is
to predict proficiency, in this instance we would use Specificity as the metric for judging.
Outcome + 0 -
Screening +
0
-
Outcome - 0 +
Screening -
0
+
Consideration 3: In (hopefully) rare cases, the developer will set up the tables in opposite directions (reversing screening and outcome or using a different direction for the positive/negative for one or both). This illustrates why it is important to consider which column or row is positive and negative for both the screening and outcome tools.
Notice that the Screening and Outcome tools are transposed. This makes Sensitivity and Specificity align within rows rather than columns.
Screening - 0 +
Outcome +
0
-
Group B (cont)Criterion Validity for Universal Screening Tool. From technical manual
Tools need to demonstrate that they actually measure what they purport to measure (i.e., validity). We focused on criterion-related validity because it is a determination of the relation between the screening tool and a meaningful outcome measure.
Criterion ≥ .70
Criterion .50-.69
Criterion .30 -.49
Criterion .10 - .29
Criterion < .10 or no information
provided
Reliability for Universal Screening Tool.
Tools need to demonstrate that the test scores are stable across items and/or forms. We focused on:
alternate form split half coefficient alpha
Alternate Form > .80
Split-half > .80
Coefficient alpha >.80
Alternate Form > .70
Split-half > .70
Coefficient alpha >.70
Alternate Form > .60
Split-half > .60
Coefficient alpha >.60
Alternate Form > .50
Split-half > .50
Coefficient alpha >.50
There is no evidence of
reliability
Reliability across raters for Universal Screening Tool.
How reliable scores are across raters is critical to the utility of the tool. If the tool is complicated to administer and score it can be difficult to train people to use it leading to different scores from person to person.
Rater ≥.90 Rater .89-.85 Rater .84-.80 Rater ≤.75
Group CGroup C
Criteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if: Alignment with Iowa CORE/ Demonstrated Content Validity
It is critical that tools assess skills identified in the Iowa Core. Literature & Informational:
Key Ideas & Details Craft & Structure Integration of knowledge &
ideas Range of reading & level of
text complexity Foundational: (K – 1)
Print Concepts Phonological Awareness Phonics and Word
Recognition Fluency
Foundational: (2 – 5) Phonics and Word
Recognition Fluency
Has a direct alignment
with the Iowa CORE (provide
Broad Area and Specific
Skill)
Has alignment with Iowa
CORE (Provide Broad Area)
Has no alignment with the Iowa CORE
Group C (cont)Training Required The amount of time needed for
training is one consideration related to the utility of the tool. Tools that can be learned in a matter of hours and not days would be considered appropriate.
Less than 5 hours of training (1 day)
5.5 to 10 hours of training (2 days)
10.5 to 15 hours of training (3 days)
Over 15.5 hours of training (4+ days)
Computer Application (tool and data system)
Many tools are given on a computer which can be helpful if: schools have computers, the computers are compatible with the software, and the data reporting can be separated from the tool itself. It is also a viable option if hard copies of the tools can be used if computers are not available.
Computer or hard copy of
tool available. Data reporting
is separate
Computer application
only. Data reporting
is separate
Computer or hard copy of
tools available.
Data reporting is part of the
system
Computer application only. Data reporting is part of
the system
Data Administration and Data Scoring
The number of people needed to administer and score the data speaks to the efficiency of how data is collected and the reliability of scoring.
Student takes assessment
on computer and it is
automatically scored by
computer at end of test
Adult administers
assessment to student and
enters student’s
responses (in real time) into computer and
it is automatically
scored by computer at end of test
Adult administers
assessment to student and
then calculates a score at end
of test by conducting
multiple steps
Adult administers assessment to student and then calculates a score at end of test by conducting
multiple steps AND referencing additional
materials to get a score (having to look up
information in additional tables)
Group C (cont)Data Retrieval (time for data to be useable)
The data needs to be available in a timely manner in order to use the information to make decisions about students
Data can be used instantly
Data can be used Same
day
Data can be used Next day
Data are not available until
2 – 5 days later
Takes 5+ days to use data
(have to send data out to be
scored) A broad sample is used
The sample used in determining the technical adequacy for a tool should represent a broad audience. While a representative sample by grade is desirable it is often not reported therefore, taken as a whole does the population used represent all students or is it specific to a region or state?
National sample
Several States (3 or more) across more
than one region
States (3, 2 or 1 in one region)
Sample of convenience,
does not represent a
state.
Disaggregated Data Viewing disaggregated data by subgroups (i.e, race, English language learners, economic status, special ed. status) helps determine how the tool works with each group. This information is often not reported but it should be considered if it is available.
Race, economic
status, and special ed. status are reported
separately
At least two disaggregated
groups are listed
One disaggregated group is listed
No information on
disaggregated groups
Progress Monitoring RubricHeader on cover page
Iowa Department of Education Progress Monitoring Rubric (Revised 10/24/11)
Why use Progress Monitoring Tools: They quickly and efficiently provide an indication of a student’s response to instruction. Progress monitoring tools are sensitive to student growth (i.e., skills) over time, allowing for more frequent changes in instruction. They allow teachers to better meet the needs of their students and determine how best to allocate resources. What feature is most critical: Sufficient number of equivalent forms so that student skills can be measured over time. In order to determine if students are responding positively to instruction, they need to be assessed frequently to evaluate their performance and the rate at which they are learning.
Information Relied on to make determinations: (circle all that apply, minimum of two) Manual from publisher NCRtI Tool Chart Buros/Mental Measurement Yearbook On-Line publisher Info. Outside Resource other than Publisher or Researcher of Tool Name of Progress Monitoring Tool:
Skill/Area Assessed with Progress Monitoring Tool:
Grades: (circle all that apply) K 1 2 3 4 5 6 Above6
How Progress Monitoring Administered: (circle one) Group or Individual
Name of Criterion Measure:
How Criterion Administered: (circle one) Group or Individual
Descriptive info on each work group’s section
Group ACriteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if:
Number of equivalent forms
Progress monitoring requires frequently assessing a student’s performance and making determinations based on their growth (i.e., rate of progress). In order to assess students’ learning frequently, progress monitoring is typically conducted once a week. Therefore, most progress monitoring tools have 20 to 30 alternate forms.
20 or more alternate
forms
15 – 19 alternate
forms
10 – 14 alternate
forms
9 alternate forms
< 9 alternate forms
Cost (minus administrative fees like printing)
Tools need to be economically viable meaning the cost would be considered “reasonable” for the state or a district to use. Funds that are currently available can be used and can be sustained. One time funding to purchase something would not be considered sustainable.
Free
$.01 to $1.00 per student
$1.01 to $2.00 per student
$2.01 to $2.99 per student
≥$3.00 & over per student
Student time spent engaged with tool
The amount of student time required to obtain the data. This does not include set-up and scoring time. Tools need to be efficient to use. This is especially true of measures that teachers would be using on a more frequent basis.
≤ 5 minutes per student
6 to 10 minutes per
student
11 to 15 minutes per
student
>15 minutes per student
Group BCriteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if:
Forms are of Equivalent Difficulty (Need to provide detail of what these are when publish review)
Alternate forms need to be of equivalent difficulty to be useful as a progress monitoring tool. Having many forms of equivalent difficulty allows a teacher to determine how the student is responding to instruction because the change in score can be attributed to student skill versus a change in the measure. Approaches include: Readability formulae (e.g., Fleish-
Kincaid, Spache, Lexile, FORCAST) Euclidian Distance Equipercentiles Stratified Item Sampling
Addressed equating in
multiple ways
Addressed equating in 1
way that is reasonable
Addressed equating in a way that is
NOT reasonable
Does Not Provide any indication of
equating forms
Judgment of Criterion Measure (see separate sheet for judging criterion measure)
The measure that is being used as a comparison must be determined to be appropriate as the criterion. In order to make this determination several features of the criterion measure must be examined.
15-12 points on criterion
measure form
11-8 points on criterion
measure form
7-4 points on criterion
measure form
3-0 points on criterion
measure form
Technical Adequacy is Demonstrated for Validity of Performance score (sometimes called Level)
Performance score is a student’s performance at a given point in time rather than a measure of his/her performance over time (i.e., rate of progress). We focused on criterion-related validity to make this determination because it is a determination of the relation between the progress monitoring tool and a meaningful outcome.
Criterion ≥ .70
Criterion .50-.69
Criterion .30 -.49
Criterion .10 - .29
Group B (cont)Technical Adequacy is Demonstrated for Reliability of Performance score
Tools need to demonstrate that the test scores are stable across item samples/forms, raters, and time. Across item samples/forms:
coefficient alpha split half KR-20 alternate forms
Across raters: interrater (i.e., interscorer,
interobserver) Across time:
Test-retest
Item samples/ forms ≥.80
Item samples/ forms .79-.70
Item samples/ forms .69-.60
Item samples/ forms ≤.59
Must Report 2/3 OR a score
of 0 in 2 or more areas.
(No tool would be kicked out due to lack of
any one.)
Rater ≥.90 Rater .89-.85 Rater .84-.80 Rater ≤.75
Time ≥.80 Time .79-.70 Time .69-.60 Time ≤.59
Technical Adequacy is Demonstrated for Reliability of slope
The Reliability of the slope tells us how well the slope represents a student’s rate of improvement. Two criteria are used:
Number of observation, that is student data points needed to calculate slope.
Coefficients, that is reliability for slope. This should be reported via HLM (also called LMM or MLM) results. If calculated via OLS, the coefficients are likely to be lower. *
10 or more observations/
data points
9-7 observations/
data points
6-4 observations/
data points
3 or fewer observations/
data points
Coefficient >.80
Coefficient >.70
Coefficient >.60
Coefficient <.59
Group B (cont)
* HLM=Hierarchical Linear Modeling LMM=Linear Mixture Modeling MLM=Multilevel Modeling OLS=Ordinary Least Squares HLM, LMM, and MLM are three different ways to describe a similar approach to analysis. Reliability of the slope should be reported as a proportion of variance accounted for by the repeated measurement over time. These methods take into account that the data points are actually related to one another because they come from the same individual. OLS does not take this into account and as such, would ascribe the extra variation to error in measurement rather than the relation among data points.
Group CCriteria Justification Score 3 Score 2 Score 1 Score 0 Kicked out if:
Alignment with Iowa CORE/ Demonstrated Content Validity
It is critical that tools assess skills identified in the Iowa Core. Literature & Informational:
Key Ideas & Details Craft & Structure Integration of knowledge &
ideas Range of reading & level of text
complexity Foundational: (K – 1)
Print Concepts Phonological Awareness Phonics and Word Recognition Fluency
Foundational: (2 – 5) Phonics and Word Recognition Fluency
Has a direct alignment
with the Iowa CORE (provide
Broad Area and Specific
Skill)
Has alignment with Iowa
CORE (Provide Broad Area)
Has no alignment with the Iowa CORE
Training Required The amount of time needed for training is one consideration related to the utility of the tool. Tools that can be learned in a matter of hours and not days would be considered appropriate.
Less than 5 hours of training (1 day)
5.5 to 10 hours of training (2 days)
10.5 to 15 hours of training (3 days)
Over 15.5 hours of training
(4+ days)
Computer Application (tool and data system)
Many tools are given on a computer which can be helpful if: schools have computers, the computers are compatible with the software, and the data reporting can be separated from the tools itself. It is also a viable option if hard copies of the tools can be used if computers are not available.
Computer or hard copy of
tool available. Data reporting
is separate
Computer application
only. Data reporting
is separate
Computer or hard copy of
tool available. Data reporting is part of the
system
Computer application
only. Data reporting is part of the
system
Group C (cont)Data Administration and Data Scoring
The number of people needed to administer and score the data speaks to the efficiency of how data is collected and the reliability of scoring.
Student takes assessment
on computer, it is
automatically scored by
computer at end of test
Adult administers
assessment to student and
enters student’s
responses (in real time) into computer, it is automatically
scored by computer at end of test
Adult administers
assessment to student and
then calculates a score at end
of test by conducting
multiple steps (adding
together scores across
many assessments, subtracting
errors to get a total score)
Adult administers
assessment to student and
then calculates a score at end
of test by conducting
multiple steps AND
referencing additional
materials to get a score (having to
look up information in
additional tables)
Data Retrieval (time for data to be useable)
The data needs to be available in a timely manner in order to use the information to make decisions about students
Data can be used instantly
Data can be used Same
day
Data can be used Next day
Data are not available until
2 – 5 days later
Takes 5+ days to use data
(have to send data out to be
scored)
Group C (cont)
A broad sample is used
The sample used in determining the technical adequacy for a tool should represent a broad audience. While a representative sample by grade is desirable it is often not reported therefore, taken as a whole does the population used represent all students or is it specific to a region or state?
National sample
Several States (3 or more) across more
than one region
States (3, 2 or 1 in one region)
Sample of convenience,
does not represent a
state.
Disaggregated Data Viewing disaggregated data by subgroups (i.e, race, English language learners, economic status, special ed. status) helps determine how the tool works with each group. This information is often not reported but it should be considered if it is available
Race, economic
status, and special ed. status are reported
separately
At least two disaggregated
groups are listed
One disaggregated group is listed
No information
on disaggregated
groups
Findings•Many of the tools reported are not
sufficient (or appropriate) for universal screening or progress monitoring
•Some tools are appropriate for both•No tool (so far) is “perfect”•There are alternatives from which to
choose
Live Chat•Thursday April 26, 2012•2:00-3:00 EDT•Go to rti4success.org for more details