83
New England Assessment Consortium | Connecticut, New Hampshire, Vermont 6.2 Scope of Work | Page 1 © 2014 CTB/McGrawHill LLC (Unpublished) 6.2 Scope of Work 8.1 Project Management and Planning 8.1.1 Proposer’s Project Team 8.1.1.1 Project Director – The Proposer will appoint a single full-time project director who oversees the management of the project including work assigned to subcontractor(s) and serves as primary point of contact with the states’ management team. 8.1.1.2 Project Manager(s) – The Proposer will appoint one or more project manager(s) who serve as primary point of contact with individual partner states on issues unique to the state (e.g., shipping, identification of schools) 8.1.1.3 Support Staff – The Proposer will indicate the number of full or part-time support staff specifically assigned to the project By forming the New England Assessment Consortium (NEAC) the member states will receive promised benefits when securing common, required test services while realizing greater economies of scale than are available to any single state agency. This approach promises the best of both worlds by bringing forward the buying power of a cooperative while giving member states the ability to secure distinct services to address specific local mandates. While CTB recognizes the value in this approach, we also acknowledge that it will require a thoughtful management approach so that both goals can be fully realized. We carefully considered our dedicated team members and management structure to ensure alignment with the needs of both the cooperative and the individual state agency members. We see our assigned team members and team structure as being critical elements in the success of this endeavor. Dedicated Team Members CTB has had the great fortune of working closely with each of the NEAC member states during our work with various Smarter Balanced contracts. In particular, we have worked extensively with members of the Connecticut State Department of Education. We are both proud and very protective of this working relationship, and we have carefully taken this work and this relationship into account as we have assembled a highly effective, deeply experienced project management team for this submission. Members of our proposed team were chosen for their expertise and experience on previous Smarter Balanced program as well as for their ability to work effectively and collaboratively. These highly knowledgeable team members also have a proven track record of technology system delivery, item development, and scoring delivery and an ability to understand and incorporate each customer's vision and goals into that work. CTB is committed to superior program management to ensure that the requisite coordination and communication takes place across project participants at all times. All aspects of this project will be managed by an experienced and highly-qualified project management staff. Mr. David Breen, Project Director, will provide complete oversight of all program activities and contractual obligations. Mr. Breen is very familiar with the Smarter Balanced Assessment System; he is leading a significant portion of the work completed under the Achievement Standard under Smarter Balanced Contract 21. He will be the team leader within our cross-functional program team. He will focus on program oversight and coordination of schedules, activities, and deliverables across our program team and with NEAC’s

Technology Proposal Sample

Embed Size (px)

Citation preview

Page 1: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 1 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

6.2 Scope of Work

8.1 Project Management and Planning

8.1.1 Proposer’s Project Team 8.1.1.1 Project Director – The Proposer will appoint a single full-time project director who oversees the management of the project including work assigned to subcontractor(s) and serves as primary point of contact with the states’ management team.

8.1.1.2 Project Manager(s) – The Proposer will appoint one or more project manager(s) who serve as primary point of contact with individual partner states on issues unique to the state (e.g., shipping, identification of schools)

8.1.1.3 Support Staff – The Proposer will indicate the number of full or part-time support staff specifically assigned to the project

By forming the New England Assessment Consortium (NEAC) the member states will receive promised benefits when securing common, required test services while realizing greater economies of scale than are available to any single state agency. This approach promises the best of both worlds by bringing forward the buying power of a cooperative while giving member states the ability to secure distinct services to address specific local mandates. While CTB recognizes the value in this approach, we also acknowledge that it will require a thoughtful management approach so that both goals can be fully realized.

We carefully considered our dedicated team members and management structure to ensure alignment with the needs of both the cooperative and the individual state agency members. We see our assigned team members and team structure as being critical elements in the success of this endeavor.

Dedicated Team Members CTB has had the great fortune of working closely with each of the NEAC member states during our work with various Smarter Balanced contracts. In particular, we have worked extensively with members of the Connecticut State Department of Education. We are both proud and very protective of this working relationship, and we have carefully taken this work and this relationship into account as we have assembled a highly effective, deeply experienced project management team for this submission. Members of our proposed team were chosen for their expertise and experience on previous Smarter Balanced program as well as for their ability to work effectively and collaboratively. These highly knowledgeable team members also have a proven track record of technology system delivery, item development, and scoring delivery and an ability to understand and incorporate each customer's vision and goals into that work.

CTB is committed to superior program management to ensure that the requisite coordination and communication takes place across project participants at all times. All aspects of this project will be managed by an experienced and highly-qualified project management staff. Mr. David Breen, Project Director, will provide complete oversight of all program activities and contractual obligations. Mr. Breen is very familiar with the Smarter Balanced Assessment System; he is leading a significant portion of the work completed under the Achievement Standard under Smarter Balanced Contract 21. He will be the team leader within our cross-functional program team. He will focus on program oversight and coordination of schedules, activities, and deliverables across our program team and with NEAC’s

Page 2: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 2 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

leadership and stakeholders. Mr. Breen will be NEAC's primary point of contact for all program issues and communications. The brief biography below introduces Mr. Breen and his qualifications.

David Breen

Senior Program Manager CTB/McGraw-Hill employee since February 2014 and 2001 to 2006

Job Description: Mr. Breen is responsible for the CTB scope of the Smarter Balanced Standard Setting program (Contract 21), which includes the recruitment of panelists for both the In-Person Panel and Online Panel, the hiring and facilitation of an outside auditor, and co-leadership of the Achievement Level Setting (or Standard Setting). In addition, he is responsible for managing a cross-functional team to develop and execute the program schedule and budget as well as identifying and mitigating risk.

Qualifications and Education: Mr. Breen received his bachelor's degree in Economics and Psychology from the University of California, Davis, California.

He is experienced and familiar with large-scale assessment programs. In a leadership role, Mr. Breen has managed cross-functional teams and effectively executed programs.

Current projects assigned: Smarter Balanced (Contract 21)

Past contracts assigned: New York City, the Department of Defense, and the states of New York, Missouri, Maryland, Illinois, and North Dakota

As presented in Figure 1, the organizational chart for NEAC, project managers from key functional areas will work under the direction of Mr. Breen. The program and project managers will work with CTB's Smarter Balanced Program Management Portfolio Director, Antonia Deoudes. This organizational structure and its benefits to NEAC are addressed in greater detail below and also in the Project Staffing section of this proposal. Our project management team includes 30 staff members who will work directly with NEAC’s leadership to ensure that any risks are identified, mitigation strategies are defined, and resolution is sought in a timely manner.

Page 3: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 3 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 1: Organizational Chart for NEAC Smarter Balanced Assessments

Page 4: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 4 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Project Team Structure We have organized our project management team in a portfolio management structure that is intended to ensure that key management staff members operate in a tightly joined, rigorous management model that enforces accountability; optimizes cross-functional alignment; escalates issues to appropriate decision makers; and aligns calendars, communications and strategies. This structure will support the management of the NEAC program within a broader context and network of Smarter Balanced membership programs. Managing this program within a portfolio will allow us to maximize, streamline, and leverage common or like work across the portfolio to maximize value to an individual program. In practice, David Breen and our assigned project management staff will report into our Smarter Balanced Program Portfolio and work as part of an integrated community with the other Smarter Balanced project teams to provide a common governance structure and greater access to information and resources and to leverage the work completed on the common elements of these programs to the benefit of each individual program. We have used this management structure very successfully in large, complex Smarter Balanced Consortia programs, including Contracts 14 and 16/17. Key benefits of this structure include:

More effective implementation of the NEAC programs through improved access to information Removal of redundant and duplicated efforts, allowing an increase our value proposition to the

member states Improved engagement and communication between our NEAC program team and senior

management Greater leveraging of skills and expertise of team members to the benefit of the NEAC programs More effective prioritization of work efforts as well as greater concurrency in work streams to

deliver the programs within aggressive timeframes A consistent set of tools and metrics across all Smarter Balanced member programs and effective

monitoring of and management to key performance indicators Our proposed management structure provides the depth and breadth of experience needed to ensure completion of tasks, services, and activities and realization of outcomes for this program. Our teams possess the skills and expertise needed to ensure successful planning, implementation, monitoring, and delivery of high-quality products and services to meet the contractual requirements outlined in the Request for Proposal (RFP).

8.1.2 Management Meetings and Activities Management Meetings – The Proposer will support regular management meetings with the states’ project management team. The Proposer should budget for one full day meeting per month, to be held at locations that rotate across the member states. Lodging and meals (as appropriate) for the states’ management team will be arranged and paid by the contractor.

8.1.1.1 WebEx Conference Calls – The Proposer will support monthly WebEx conference calls with the states’ project management team.

8.1.1.2 Management Reports – In addition to detailed minutes from management meetings, the Contractor will provide the following reports: tasks and recommending updates and revisions, as needed, to the project schedule.

8.1.2.2.1 Annual project plan and schedule (including detailed procedures and specifications)

8.1.2.2.2 Monthly written status reports describing the current status of scheduled

Program Meeting and Meeting Facilitation CTB plans productive management meetings between state agencies and our program team to provide an avenue for ongoing communication and collaboration. These meetings are an essential part of the successful implementation of the program that will allow us to build a strong, collaborative relationship

Page 5: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 5 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

that is focused on successful contract outcomes and on meeting program needs. Working with NEAC’s leadership, the CTB project management team will be responsible for planning and facilitating a minimum of one full meeting per month that will be held within the NEAC states. CTB will maintain responsibility for successful meeting outcomes as well as for documenting and distributing the agendas, meeting notes, action items, and decision logs. CTB will establish and adhere to communication protocols, ensuring that agendas are established to ensure the key issues are addressed. Agendas will be provided in advance of these meetings and meeting notes will be distributed within 48 business hours following each meeting. Meeting notes will include action items and decision logs.

Weekly Project Management Meetings In additional to the monthly in-person meetings, our project leadership team will conduct regular program implementation meetings via conference call/WebEx. These meetings will occur at a minimum of monthly at an agreed-upon day and time. We recommend that these meetings take place on a weekly basis during the period of critical program start-up. The purpose of these meetings will be to ensure all requirements and timelines are maintained and to address any need for problem solving and time-sensitive discussions. Periodically, we may determine that focused conference calls are required as specific activities are being implemented. Our project management team will work with NEAC to determine the date and times for these additional calls. CTB will provide Web conferencing capabilities for all program team/program implementation calls. Our project director will be responsible for all planning and facilitation of the meetings as well as for documenting and distributing the agendas, meeting notes, action items, decision logs, and outcomes of each meeting.

Comprehensive Project Meetings Annual management meetings between NEAC members and CTB are essential for effective planning, implementation, and management of this project. These meetings will occur in NEAC member states, the same as the monthly in-person meetings. The meeting location will rotate and be designated by NEAC leadership. We would like to hold the first meeting within a very short period of time following contract negotiation; we recommend that this meeting occur within two weeks of contract execution. CTB will be responsible for all meeting logistics and meeting costs, including travel and accommodations for key NEAC staff members and their designees.

The initial objective of the program start-up meeting is to build a foundation for the detailed Project Management Plan as well as to finalize the master program schedule. An essential outcome of this meeting is a consistent understanding of the requirements and specifications for deliverables and to solidify agreements related to quality standards and expectations for overarching contract management and communications. At this meeting, we will review the initial Project Management Plan and master schedule so that we can gain NEAC leadership input and feedback, which we will use as we develop the final versions of these program management tools.

After the initial launch meeting, we will incorporate the NEAC leadership’s feedback and input to provide the following program management documentation, which will be submitted for NEAC's review and approval:

Updated master schedule Updated project management plan, including the management plan that addresses communication,

meetings, and management reporting Meeting notes/decision log/open action list As work progresses and the designs become more finalized, adherence to the project management work plan and master schedule will be closely monitored and reviewed at each subsequent management meeting. Please see Appendix A for CTB’s proposed schedule for NEAC.

Page 6: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 6 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Program Status Reporting and Program Transparency CTB will provide all standard program status reporting beyond what is explicitly required in the RFP. We will provide weekly reports—both weekly status reports and issues logs, monthly reports—communication reports and dispensation reports, and quarterly and annual reports. We will support NEAC in reporting program status, key metrics, and outcomes, as required. We understand that these scheduled reports are the minimum required for program reporting, and we will continue to develop the reports comprehensively and will deliver them on time.

In addition to the reports described above, we offer an option to significantly expand the level of program reporting beyond the RFP’s requirements. CTB has created program dashboard capabilities that incorporate both program in-flight metrics and key performance indicators. The program dashboard provides clear visuals that communicate program status and ensure communications surrounding status are crisp and clearly articulated. We have successfully used various program dashboards internally throughout the Smarter Balanced Contract 16/17 program, and we see significant benefits to providing greater transparency for the state agencies with the option for a comprehensive and customer-facing program dashboard.

8.2 Technical and Policy Issues The Contractor will plan and host two meetings per year of the NEAC Technical Advisory Committee (TAC). Contractor responsibilities will include each of the following:

8.2.1 Work with management team to identify and recruit TAC members;

8.2.2 Execute any necessary contractual arrangements with TAC members, including payment of a reasonable stipend that is consistent with industry standards;

8.2.3 Identify an appropriate meeting site, and make all logistical and contractual arrangements, rotating the meetings around the three partner states;

8.2.4 Prepare all meeting materials, including an annotated agenda, and arrange for key staff members to attend and report to the TAC when appropriate;

8.2.5 Arrange travel and lodging for TAC members and two representatives per state;

8.2.6 Provide meeting facilitation;

8.2.7 Prepare and disseminate detailed meeting notes;

8.2.8 The Contractor may be required to prepare materials and/or make presentations related to particular TAC agenda items. Meeting schedules and agendas will be determined by the states in consultation with the Contractor;

8.2.9 The Contractor will attend a meeting with individual state education leadership (e.g., commissioner, board of education) upon request, but not to exceed one time per state per year. The Contractor will be represented by the project director, senior management, and/or additional staff with responsibility, expertise or experience relevant to the topics for discussion;

8.2.10 The contractor may be required to attend 2 meetings each year held by the Smarter Balanced Consortium.

CTB will support NEAC in assembling the NEAC Technical Advisory Committee as requested in the RFP and as outlined in the Meeting Tables that are in Appendix B of this proposal. We will be responsible for the planning, support, and logistical management of these advisory committee meetings. We will work with NEAC to identify and solicit advisory committee members and to secure contracts with them. CTB will provide program management, content development, and/or Research department staff members to support and facilitate meetings, depending on the meeting agendas and required outcomes.

We understand that each state/consortia program requires a very individualized plan for support of its technical advisory committee and other advisory committee meetings. These plans will be included in

Page 7: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 7 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

the NEAC program plan and program schedule. CTB will be responsible for initiating the required consulting agreements with participants, and we will ensure consulting fees are established based on expectations and current practices within the industry. We will be responsible for all expenses associated with the planned advisory committee meetings, including all travel expenses, lodging, meals, and meeting costs.

As with all meetings, CTB will be responsible for working with NEAC to create comprehensive meeting agendas and meeting resource materials that ensure the required meeting outcomes are met. We will be responsible for documenting and distributing meeting notes and decision logs.

Our proposal includes all costs associated with these meetings and meeting participation, as identified in the RFP and outlined in Appendix B: Meeting Tables in this proposal. We will plan to attend, upon request, key state education leadership meetings, and we will provide relevant staff members from across the project and organization in support of these meetings. Additionally, CTB will plan to attend relevant meetings sponsored by the Smarter Balanced Assessment Consortium. Should additional meetings or participants require support, we will negotiate providing these with the requesting state, using the general costs provided in the RFP and current federal GSA rates.

8.3 On-line Assessment and Technical Support The Contractor will provide the hosting site, test administration application, server and application management services, for the Smarter Balanced on-line operational test construction (e.g., the adaptive algorithm), assessment delivery, and records retention for both the summative and interim assessments. The Proposer may propose use of an alternative to the Smarter test delivery platform, but must demonstrate that it meets the technical specifications of the Smarter platform, consistent with the interoperability standards adopted by Smarter Balanced, and provides comparable tests using the same functionalities, accessibility tools, and the same or greater test security protections. SBAC has provided the following documents to assist states and prospective vendors prepare for on-line assessment and technical support:

SBAC Hosting Requirements: See Appendix 4

Industry Questions and Answers Regarding Smarter Balanced Assessments: http://www.smarterapp.org/spec/2014/04/11/specs-QuestionsAndAnswers.html

Smarter Balanced Applications Deployment and Technology Certification Overview: http://www.smarterapp.org/spec/2014/04/11/specs-AppDeploymentTechCertification.html

Proposers will describe how the services, procedures and technologies described in the Hosting Requirements document will be provided. If an alternate delivery platform is proposed, the proposal will provide the following information:

CTB's Test Development Platform (TDP) will provide all test administration, delivery, and scoring applications for the Smarter Balanced administrations for NEAC. This delivery system incorporates current industry-standard requirements for customers implementing high-stakes summative assessments. Our platform will:

Meet all published Smarter Balanced technical specifications for test delivery, including secure browser, Web applications to manage registration of students for the tests, delivery of tests to students, scoring of test items, integration of item scores into overall test score, and delivery of scores

Provide data import/export that are consistent with all industry-and Smarter Balanced-adopted interoperability standards, including the Smarter Balanced QTI/APIP requirements

Provide comparable assessments through the use of our ShadowCAT adaptive algorithm Provide automated scoring Provide industry-standard security protections for assessment content and data Develop and prepare the TDP for Smarter Balanced certification

Page 8: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 8 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

CTB Test Delivery Platform CTB brings significant expertise in administering online statewide summative assessments to this project. Our next generation TDP will be fully compatible with all Smarter Balanced requirements, when those are published, for test administration and scoring. Our adaptive algorithm is standards-based and provides optimal forms selection for each student, surpassing the Smarter Balanced requirements.

TDP Test Delivery System CTB's TDP offers NEAC four key benefits that form the foundation for a sustainable, state-of-the-art assessment:

1. The system has a very small footprint within schools and has minimal requirements for hardware, software, bandwidth, and technological expertise.

2. The system currently supports many interactive (technology-enhanced) item types. The architecture supports continued innovation by allowing the introduction of new item types within the same framework, without disrupting the existing base codes.

3. The system can be accessed by nearly all students because it supports a broad and growing range of configurable accessibility features and accommodations. CTB continues to invest in extending these features to make the system accessible to all students.

4. The system enables the administration of complex administrations over time.

CTB Adaptive Algorithm: ShadowCAT The computer adaptive assessments will be delivered using CTB's optimal test assembly ShadowCAT engine, which has been in use by our customers since July 2013. CTB has been constructing fixed forms using the same underlying optimal test assembly engine since 2011. Expanded in 2013 for adaptive offerings, this engine provides a unique approach to adaptive testing, in which item-level adaptive, multi-stage on the fly, multi-stage linear on the fly, and linear tests can be transparently configured by the customer and delivered with ease. By using the Shadow Test approach, our engine guarantees all test blueprint, shared stimulus, and psychometric constraints are met while optimizing the selection of items to maximize information at specified ability levels throughout the student’s testing experience. No other adaptive algorithm can guarantee that the test will adhere to all constraints for every test taker.

Our ShadowCAT optimal test assembly engine was designed under the leadership of Chief Research Scientist, Dr. Wim van der Linden. Dr. van der Linden is an internationally renowned expert in adaptive

Page 9: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 9 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

testing, with seminal related publications in numerous peer reviewed and international journals and books. He is co-editor of three published volumes: Computerized Adaptive Testing: Theory and Applications (Boston: Kluwer, 2000; with C. A. W. Glas1) and its sequel, Elements of Adaptive Testing (New York: Springer, 2010; with C. A. W. Glas2), and Handbook of Modern Item Response Theory (New York: Springer, 1997; with R. K. Hambleton3). He is also the author of Linear Models for Optimal Test Design published by Springer (2005).

8.3.1 Requirements for the use of any software (and supporting devices) should be clearly documented and explained. The general requirements for the use of our TDP are documented and explained in Table 1.

Table 1: Additional Requirements for the TDP

Device Requirements Computer Requirements

Screen Size Screen Size 10’’ class or larger with 1024 x 768 display resolution

Headphones Available to students for use during the English language arts test and for students who require text-to-speech features on the mathematics test.

Security The device must have the administrative tools and capabilities to temporarily disable features, functionalities, and applications that could present a security risk during test administration.

Keyboards External keyboards are required in all cases unless specified differently by a student’s Individualized Education Program (IEP) or 504 plan. Any form of external keyboard that disables the on-screen virtual keyboard is acceptable. This includes mechanical, manual, plug and play, and wireless-based (e.g., Bluetooth, RF, IR) keyboards. The intent of this specification is to ensure the required display area is available to allow students to read multiple sources of complex item text and respond to source evidence for analytical purposes. While wireless keyboards are permissible, districts should be aware that high-density deployments of wireless keyboards and mice might interfere with each other or with the wireless network. Therefore, users should test the room configuration before the examination date and consider wired alternatives.

Pointing Devices A pointing device must be included. This can consist of a mouse, touch screen, touchpad, or other pointing device with which the student is familiar.

Form Factors No restriction as long as the device meets the other stated requirements. These forms include desktops, laptops, netbooks, virtual desktops and thin clients, tablets (iPad, Windows, Chromebooks, and Android), and hybrid laptop/tablets.

Networks Must connect to the Internet with a minimum of 20 Kbps available for each student who is testing simultaneously. Local Web proxy caching servers are not recommended.

 

1 van der Linden, W.J. & Glas, C.A.W. (2000). Computerized Adaptive Testing: Theory and Practice. Boston: Kluwer Academic

Publishers.

2 van der Linden, W.J. & Glas, C.A.W. (2010). Elements of Adaptive Testing. New York: Springer.

3 van der Linden, W.J., & Hambleton, R.K. (1997). Handbook of Modern Item Response Theory. New York: Springer.

Page 10: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 10 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.3.2 The minimum and preferred technology infrastructure needed to support online testing should be documented and explained. CTB’s TDP is compatible with many operating systems and devices. Please see Table 2 for minimum requirements for the computers schools already own as well as recommendations for future hardware purchases.

Table 2: TDP Requirements and Recommendations by Operating System

Operating System Minimum Requirements for Current Computers

Recommendations Minimum for Future Purchases

Windows Windows XP (service pack 3) Pentium 233 MHz processor 128 MB RAM 52 MB hard drive free space

Windows 7+ 1 GHz processor 1 GB RAM 80 GB hard drive or at least 1 GB of hard drive space available

Mac OS X Mac OS X 10.4.4 Macintosh computer with Intel x86 or PowerPC G3 (300 MHz) processor, 256 MB RAM, 200 MB hard drive free space

Mac OS X 10.7+ 1 GHz processor 1 GB RAM 80 GB hard drive or at least 1 GB of hard drive space available

Linux Linux (Ubuntu 9-10, Fedora 6) Pentium II or AMD K6-III 233 MHz processor 64 MB RAM 52 MB hard drive free space

Linux (Ubuntu 11.10, Fedora 16) 1 GHz processor 1 GB RAM 80 GB hard drive or at least 1 GB of hard drive space available

iOS iPads 2 running iOS6 iPads 3+ running iOS6

Android Android-based tablets running Android 4.0+

Android-based tablets running Android 4.0+

Windows Windows-based tablets running Windows 8+ (excluding Windows RT)

Windows-based tablets running Windows 8+ (excluding Windows RT)

Chrome Chromebooks running Chrome OS (rolling release)

Chromebooks running Chrome OS (rolling release)

8.3.3 The technical support documents should include information about suggested computer lab configurations. The CTB TDP is designed to maximize both scalability and availability for a multitude of test administrations. The platform is cloud-based, allowing for increased capacity and ease of duplication of an environment, as needed. A single environment is comprised of an administrative system, a test delivery client, and necessary supporting infrastructure. Each environment is a self-contained assessment system that allows, for example, rostering of students, scheduling of content, test delivery, and reporting.

The CTB TDP has been used successfully in state-wide, large-scale contracts for both formative and summative assessments, including those in Oklahoma, West Virginia, Hawaii, and the Department of Defense Educational Activity (DODEA). Further, the Oklahoma End-of-Instruction (EOI) assessment was successfully tested for up to 100,000 concurrent users. The platform is currently undergoing a

Page 11: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 11 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

project, which will be completed in June 2014, to formally test the platform to a capacity of 300,000 for a single environment.

The platform leverages cloud-based services such as Elastic Load-Balancers (ELB) and ElastiCache to maximize scalability and to reduce the response time for increased capacity. This system lets new environments to be created very quickly, as necessary. Figure 2 illustrates how the ELB will work to handle NEAC’s assessment load.

Figure 2: Cloud Load-Balancing for the TDP

Our cloud-based architecture leverages multiple-availability zones and supports regional disaster recovery. Ideally, the primary environment would be placed based on latency to the customer site, with the disaster recovery environment created in a region selected to maximize robustness in the case of a failover situation.

Online Implementation Process for the Computer Lab Configuration CTB's three-step checkpoint process is managed by our Online Implementation Manager and is designed to ensure that all online testing sites are ready for the test administration before the testing window opens. CTB provides guidance, materials, and technical support directly to districts and their technology coordinators. Working with a district's technical staff, our team will monitor site readiness progress and make sure that test locations are achieving the technology and configuration objectives needed to guarantee a problem-free live test administration. Our Online Implementation team will be supported by field engineers who will conduct site visits and support technology coordinators with solutions to issues related to hardware and network infrastructure.

The Online Implementation checkpoints for each online testing administration are listed below and shown in Figure 3.

Checkpoint 1: Readiness and Technology Survey (RTS) Training Checkpoint 2: Site Readiness Checkpoint 3: Administration Setup

Page 12: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 12 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 3: Three Checkpoints to Ensure All Sites Are Ready for Online Testing

Checkpoint 1 The first checkpoint focuses on an initial check of and feedback about a site’s readiness to test online using our Readiness and Technology Survey (RTS) system. During Checkpoint 1, District Test Coordinators verify that a District Technology Coordinator has been identified in RTS and that both Coordinators are able to participate in the site readiness training sessions.

Checkpoint 2 The second checkpoint focuses on Site Readiness. Tasks such as installation of the Test Delivery Client software and content for stress testing as well as completing a local stress test are part of Checkpoint 2. Once the test delivery software is installed, schools prepare for their online stress test, which is a short practice test to validate a site's readiness for a final certification that local networks and workstations are ready for testing. The goal of this statewide stress test is to exercise all machines that will be used for online testing across the State, using practice tests to ensure all platforms are ready for the full-scale operational test.

Checkpoint 3 The third checkpoint focuses on beginning the final online testing preparations. Activities scheduled during this phase are Test Administration site training, District Test Coordinator training, the practice tests, and completing the final workstation tests as part of the practice test.

At the conclusion of the final workstation test in Checkpoint 3, sites are asked to access the Readiness and Technology Survey one final time to indicate that the site has completed Checkpoint 3. Once the Online Implementation checkpoints are completed, all of the online testing sites are ready to participate in the online assessment with confidence.

Site Readiness Assessment Tool CTB will provide assistance to NEAC member agencies to determine each school’s system readiness for online testing, using our RTS application tool. RTS is a Web-based application that allows each district or school to input its system’s specifics (hardware configurations, operating systems, and number of workstations) and network information directly through a Web interface. These data are used to assess

Page 13: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 13 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

the compatibility of the systems in each school with the platform's Test Delivery Client as well as to evaluate the adequacy of the network capacity at each school. The results of this analysis provide key data to provide the member states with information about each school's readiness for computer-based testing.

To minimize the effort required to enter the information, much of the data entry is accomplished by drop-down selections. Additionally, RTS will accept an import file containing an inventory of systems to further minimize the effort required to capture the data.

RTS will evaluate the information provided about the systems against the benchmark configuration and provide a score card report indicating the system’s current level of readiness and a gap analysis between those parameters entered and the benchmark parameters. The score card will indicate an overall level of readiness that is color-coded in red, green, and yellow as well as a detailed level of readiness by systems, such as hardware, LAN, WAN, load capacity, and so forth. CTB will work with schools to interpret the score cards and plan the next steps for meeting full site readiness for assessment administrations.

Shown below are examples of the RTS screens in which district or school staffs will input and edit district- and school-level information.

Figure 4: Online RTS Tool District Home Panel

Page 14: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 14 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 5: Online RTS Tool School Workstation Panel

Page 15: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 15 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 6: Online RTS Tool School Network Panel

CTB also provides an enhanced network utility tool that can be used to verify connectivity between school networks and CTB servers. This tool, which is shown in Figure 7, also offers the ability to actively simulate a specified population of students concurrently taking a test under actual network load conditions. The simulator can be used by network personnel at various times of the day when testing will actually take place to ensure that test scheduling and peak loads can be adjusted to avoid interference with optimal testing conditions.

Page 16: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 16 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 7: Network Connectivity Tool and Load Simulator

CTB believes that our unique experience in performing these readiness evaluations for our current clients and that the capabilities of our existing RTS tool place us in a position to provide quality service to meet each state agency’s needs and to provide each school with valuable assessment and direction in preparation for their final implementation of online assessments.

8.3.4 Information on computer-based assistive technologies should be provided to the client so that the client can determine which they may allow; data on use of these technologies should be collected. CTB has a long history of providing appropriate support for students with disabilities and for English language learners (ELL). CTB’s commitment to provide fair access for all students through the Test Delivery Platform (TDP) goes beyond 100 percent keyboard accessibility. We have been committed to accessibility for all students ever since we conceptualized and designed our test delivery system and its items. We continue to make accessibility the top priority as we build more innovations around student testing and learning.

The CTB TDP is developed based on standards such as HTML5 and CSS3 to allow the delivery of interactive assessments that are compatible with the broadest range of devices and platforms. The platform supports accommodations for summative assessments along with general support for devices that are designed to leverage the accessibility features built into modern browsers. As an example, all content is authored with alternative text for all non-text elements in support of subpart 1194.22 of Section 508; this allows standard text-to-speech readers and refreshable Braille devices to function without additional modification.

The CTB TDP offers a robust array of accessibility features, as illustrated in Table 3. Many of these features are already in operational use in states where CTB is providing online testing. We offer a common system that provides all accommodations as well as unaccommodated tests. Students who need accommodations are not forced to use a “special” computer outfitted with a separate test delivery system.

Page 17: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 17 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Table 3 identifies those accessibility features that are operational in certain states or through the system for the Smarter Balanced tests.

Table 3: Accommodations and Accessibility Supports

Accessibility Feature Description CTB System Capability

and Comments

Increased font size The number of levels (generally 5 levels) and rate of increase (generally 1.25x the previous level) are configurable.

Under development for Spring 2015.

Foreground and background color

Any foreground and background color can be supported. Under development for Spring 2015.

Calculation devices Completely configurable calculators ranging from simple four-function calculators to scientific and graphic calculators.

Currently available.

Protractor, ruler Protractors and rulers are available at the item level so that scaling is always synchronized with the item content.

Currently available.

Additional time The platform supports timed testing along with the ability to increase the time if needed.

Currently available.

Breaks The CTB TDC supports the ability to create sub-tests. This feature allows a single test to be divided into units allowing for a "break" in testing after completing a sub-test.

Currently available.

Text-to-speech—directions, passages, items

Computer reads text and graphics aloud on directions, passages, and items.

Currently available.

Text-to-speech—graphic description

Computer reads graphics and tables aloud. Currently available.

Color overlay Any color can be laid on the screen. This persists throughout the test.

Currently available.

Recorded audio Efficient delivery of recorded audio. While we can store MP3, complicated patent issues have led us to convert MP3 to OGG, an open-source compression. We are able to deliver voice-audio using only about 10Kbps of bandwidth.

Currently available.

Reverse contrast Background turns to black, while text turns to white. Currently available.

Line reader This feature will allow a student to track the line he or she is reading.

Under development for Spring 2015.

Highlighter Student can select any text to highlight. This persists throughout the test.

Currently available.

Answer eliminator/ highlighter

Student can eliminate any multiple-choice option, whether it is in text or a graphic. This persists throughout the test.

Under development for Spring 2015.

Refreshable Braille/tactile with external embosser printer

Items can be rendered to desktop embossers that can integrate Braille and tactile graphics. The items will simultaneously render on a reader-accessible screen, and the student will be able to navigate to response spaces to provide answers.

Currently available.

Magnification Students can selectively magnify areas of the screen as needed. Currently available.

Masking Allows the masking of extraneous information on the screen Under development for Spring 2015.

Speech-to-text Speech will be converted to text and then saved in the database. Under research. We are actively investigating this feature.

Page 18: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 18 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Accessibility Feature Description CTB System Capability

and Comments

Sign language— directions, passages, items

These consist of recorded videos on sign language. Avatars are not recommended by hearing-impaired experts since they do not translate well to American Sign Language.

Currently available.

Translations The CTB TDC is localizable for any language and character set. Assessment content can also be developed in any language as necessary.

Currently available.

Glossaries and dictionaries

These enable content developers to associate additional content with words or phrases. The content can be of multiple types, and the content shown to a student can be controlled by his or her personal profile.

Under development for Spring 2015.

Alternate language glossaries and dictionaries

These enable content developers to associate alternate-language content with words or phrases. The content can be of multiple types, and the content shown to a student can be controlled by his or her personal profile.

Under development for Spring 2015.

Test pauses and restarts

An attention accessibility feature, the test can be paused at any time and restarted and taken over many days. So that security is not compromised, visibility of past items is not allowed when the test has been paused more than a specified period of time.

Currently available.

Mark-and-return can be used for item review

An attention accessibility feature, students can flag an item so they can review it.

Currently available.

Item notes An attention accessibility feature, item notes allow students to jot down ideas about items or passages.

Not currently available.

Review test Test can be reviewed before ending it. Currently available.

Area boundaries An agility accessibility feature, area boundaries for mouse-clicking multiple-choice options mean students can click anywhere on the selected response text or button.

Currently available.

Language Any language that is necessary can be supported. Currently available.

Help section A reference feature, the Help section explains how the system and its tools work. The Help section is available in the administrative interface, providing easy access to teachers.

Currently available.

Practice tests and tutorials

A reference feature, practice tests and tutorials familiarize students with the online testing system.

Currently available.

Performance report A reference feature, a performance report is available at the end of the test for the student.

Currently available.

Adaptive test Braille/ tactile graphics

An adaptive test can be administered in Braille through embossers or refreshable Braille displays.

Currently available.

Universal usability is provided by:

Applying universal design principles wherever possible in both item development and design of user interfaces.

Supporting Accessibility by Design (accessible design), CTB provides special features for items and functionality for item delivery to students who otherwise cannot be accommodated through universal design.

Delivery Choices CTB proposes to offer a broad array of features. Furthermore, as we continue to innovate to extend the accessibility of the test, improve measurement, and make testing more engaging, features will

Page 19: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 19 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

proliferate. This presents a bit of conundrum: How will students know what features are available? Students only test once or a few times a year. Therefore, they will generally not become “expert” users of the software. Making the appropriate tools meaningfully available is best accomplished by offering each student the set of tools he or she needs, without the distraction created by tools that are not needed. Each tool or embedded support provided can be configured to be:

Available to all students Made available to students by one or more adult roles in the system (e.g., state, district, or school

test coordinator; test administrator) Offered to all students but configured for each student by adult users The following table lists accessibility features and how they can be configured.

Table 4: Access Configurations for a Sample of Embedded Supports and Accommodations Accessibility

Feature Example of

Configuration Choice

State’s Rationale behind Configuration Choice

Color choice Assigned to students at testing time by the proctor

Most clients recognize that the proctor is the best person to determine the appropriate color since the student is with him or her in the room. If this tool is made available to all students, it would provide a distraction.

Increased font size Default font size (assigned to students in advance)

A student’s need for this feature is part of his or her school record and, therefore, can be determined in advance.

Text-to-speech—directions, passages, items

Assigned to students in advance or Assigned to students at testing time by the proctor

In some states, a student’s need for this feature is part of his or her school record and, therefore, can be determined in advance. In others, it is available to all students but requires a proctor to turn it on to ensure that the student has a headset and will not disturb anyone else.

Magnification Available to all students

This feature does not provide a distraction, and the student may need it on different screens.

Answer eliminator Available to all students

This feature does not provide a distraction, and the student may need it for answering items.

Refreshable Braille Assigned to students in advance

A student’s need for this feature is part of his or school record and, therefore, can be determined in advance.

Some of the accessibility features require special attention and preparation to make them effective and meaningful for the item and the student taking the test. These features were designed to serve their primary purpose in as meaningful a way as possible by pulling in the appropriate resources. In addition, each feature is effective only if it interacts seamlessly with the items with which it is presented as well as other features. Table 5 lists these features and their respective preparation requirements.

Table 5: Accessibility Features and Preparation

Accessibility Feature

Preparation

Color choice Colors that will be available on the system require that they not conflict with the item graphics and text. For example, if blue is one of the background colors, the same blue cannot be used on item graphics. Thus, items need to be reviewed and may need tweaks, if color conflicts arise.

Text-to-Braille and text-to-speech

Though the computer can automatically speak text rendered on the screen, special tags need to be created on graphics, directions, passages, and items. These tags need to comply with the read-aloud principles determined by the Braille Authority of North America (BANA).

Page 20: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 20 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Accessibility Feature

Preparation

Large-Print and magnification

Graphics need to be legible in Large-Print and all available magnification levels. Ideally, graphics would be vector-based so they magnify in a clear and consistent way; however, if vector-based graphics are not possible, some graphics may need to be enhanced depending on their legibility at specific zoom levels.

Sign language Recorded sign language translations need to be created in advance and undergo review levels (American Sign Language or Exact English).

Audio files Audio files need to be created or recorded in advance.

Language Content needs to be properly translated in supported languages other than English.

* System features will be available for Smarter Balanced items that contain appropriate metadata, tagging, and/or files.

Our TDP supports American Sign Language (ASL) by offering recorded videos of translators signing rather than by using avatars. Working with representatives of the deaf community, we learned that ASL is not simply a representation of English; rather, it is a unique language. Avatars are not capable of translating from English to ASL and provide, instead, a word-for-word translation. To speakers of ASL, this seems as unnatural as a word-for-word translation from English to any other language. Furthermore, ASL is more syntactically dense than the major gestures. Cues such as facial expression and lip movement form part of the language. Avatars neglect this aspect of communication.

8.3.5 A practice and training test should be provided to allow students to become familiar with keyboarding and navigation techniques and tools that will be used during the live assessment. CTB has extensive experience with the Smarter Balanced item bank, having developed all of the Smarter Balanced Pilot Test and Field Test content under previous and current contracts with Smarter Balanced. Our content development team will provide a practice test by accessing an existing test from Smarter Balanced. As an option, we will select items and/or performance tasks for a unique practice test to provide students with all item types and content that are representative of the Smarter Balanced blueprint to provide practice with the structure and format of the operational assessment. Our content team will review all items and suggest changes to pre-existing tests, if necessary. All content activities will be completed so the practice test can be reviewed and deployed in each grade/content area in sufficient time for classroom use prior to testing.

CTB will also provide access to a pre-developed training test using Smarter Balanced training items that will allow students to become familiar with the testing system, tools, and accommodations. Our Production lead will ensure that training test content is loaded into CTB’s TDP and has been routed through standard quality assurance checks prior to its release.

8.3.6 Procedures for uploading student demographic data in the online assessment system, including any necessary accessibility tools and supports, should be provided, as well as instructions and procedures for modification of enrollment data, where permitted by the client. Effective pre-identification, student rostering, and materials ordering systems are essential components of every large-scale assessment program. These systems must provide a secure, efficient, and user-friendly interface for a variety of users at the appropriate security levels. Once configured and deployed, this system will support the efficient administration of each of the assessments. The Online Precode System works with the Student Information Systems (SIS) to provide student demographic data; teacher, school, district, and classroom data; and any other data elements required to fulfill the specific needs of the assessment program. Our Online Precode System provides a secure, efficient, and user-friendly

Page 21: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 21 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

interface that ensures that the collection of student demographic data proceeds smoothly and with the highest possible quality control.

The Online Precode System was built with the intention of saving state offices and/or district staffs time transferring data back and forth to CTB to create clean, usable data for precode documents. This integration will complete a pull request of student population data at scheduled times or intervals. These data will be available immediately in the precode system for registering students for online test administrations or for precoding of printed documents for students who need hardcopy test materials.

Administrative users will be able to:

View and edit student demographic information entered as part of the pre-identification process Hand-enter student records prior to or at the time of testing Maintain both student-specific and test-specific data fields Complete an electronic Group Information Sheet to determine how student results will be returned

to the school district, (i.e.. by class, school and/or district) The Online Precode System includes the ability to generate precode rosters that list students registered for testing by administration; it has the option to sort by school district, building, grade level, and classroom and teacher. As mentioned, as the state loads roster data, the roster data will propagate through the system to the assessment delivery platform. Depending on the method of administration configuration (implicit or explicit), the testing platform will receive the pre-code roster data and register students for upcoming test administrations.

Within the TDP, users with the correct privileges can view and manage students at the appropriate levels. Students can be entered by users, as necessary, prior to or at the time of testing, as shown in Figure 8.

Page 22: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 22 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 8: Teacher Manage Students

Assignments are created by a user at a specified level (district, school, or class) that determines both the enrollment population for the assignment and the level of reporting.

The CTB TDP can assign tests to students at all levels. System users can select accommodations appropriate to the assignment and the population for the assignments. The population can be a group of students at the current level, such as all students in a district, school, or class, or it can be a custom group of students selected according to a set of user-defined rules through the creation of a Tag Group. The Tag Group feature lets students be selected based on arbitrary criteria, including student meta-data such as grade or customer-defined attributes.

Assignment of spiraled forms is currently available only for summative assessments, but the ability to select a spiraled assignment of a set of test forms will be added to the TDP in the future. Please refer to our implementation plan, presented in Table 6.

Table 6: TDP Implementation Plan for the Smarter Balanced Certification

Category Feature Status Delivery

Accommodations Highlighter Currently Available

Accommodations Flagging Items for Review Currently Available

Accommodations Indicators of Answered/Unanswered Currently Available

Accommodations Visually eliminate options Under Development Dec 2014

Accommodations Online Text/Graphics Notes Under Development Dec 2014

Administration Scheduling of tests for students at all levels Currently Available

Page 23: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 23 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Category Feature Status Delivery

Administration Assignment of specific forms and accommodations Currently Available

Administration Ability to view aggregate assignment information at all levels

Currently Available

Administration Ability to create unique login credentials per summative test session

Currently Available

Administration Ability to create unique login credentials per formative session

Currently Available

Administration Formative Spiral Form Assignment Not Available

Import/Export Customer QTI/APIP Content Import/Export Under Development Dec 2014

Item Types Connection Currently Available

Item Types Drop Down Currently Available

Item Types Select and Order/Classification/Tiling Currently Available

Item Types Multiple Line Then Select Currently Available

Item Types Single Circle Currently Available

Item Types Single Parabola Currently Available

Item Types Placing Points Currently Available

Item Types Single Angle Under Development Dec 2014

Item Types Reorder Text Under Development Dec 2014

Item Types Select and Change Text Under Development Dec 2014

Item Types Select Objects Under Development Dec 2014

Item Types Vertex-based Quadrilateral Under Development Dec 2014

Item Types Vertex-based Triangle Under Development Dec 2014

Item Types Single Ray Under Development Dec 2014

Item Types Selecting Points and Ranges on Number Lines Under Development Dec 2014

Item Types Partition Object Then Select Under Development Dec 2014

Item Types Partition Line then Place Points Under Development Dec 2014

Item Types Select Text Under Development Dec 2014

Item Types Select Defined Partitions Under Development Dec 2014

Item Types Object Transform Under Development Dec 2014

Lockdown Browser

Windows Currently Available

Lockdown Browser

Mac Under Development Fall 2014

Lockdown Browser

Linux Under Development Dec 2014

Lockdown Browser

iPad Under Development Dec 2014

Page 24: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 24 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Category Feature Status Delivery

Lockdown Browser

Android Under Development Dec 2014

Lockdown Browser

Chromebook Under Development Dec 2014

Test Delivery Client

Independent scrolling for Passages and Response Area Currently Available

Test Delivery Client

Navigation Currently Available

Test Delivery Client

Device Appropriate Content Display: Windows Currently Available

Test Delivery Client

Device Appropriate Content Display: Mac Currently Available

Test Delivery Client

Device Appropriate Content Display: Linux Currently Available

Test Delivery Client

Device Appropriate Content Display: Chromebook Currently Available

Test Delivery Client

Keyboard Support Currently Available

Test Delivery Client

Ability to view text and graphics simultaneously Currently Available

Test Delivery Client

Online Availability of Manipulatives Currently Available

Test Delivery Client

Indication of Student Name Currently Available

Test Delivery Client

Identification of Student In Case of Restart Currently Available

Test Delivery Client

Device Appropriate Content Display: IPad Under Development Fall 2014

Test Delivery Client

Device Appropriate Content Display: Android Under Development Fall 2014

8.3.7 Procedures for maintaining the security of the online testing environment should be documented. CTB's TDP provides advanced security protocols and techniques to protect both test content and student data. The test ticket contains a disposable username, password, and test access code for each student. The combination of those three credentials places a student into a test session.

For proctors, there is also a summary test ticket that shows the test access codes. Note that if there are multiple sections to the test, there is an access code for each sub-test/section that is a configurable option in the assignment creation process.

Additionally, the test assignment process allows users to create ‘windows’ when assessments are available. These windows can be adjusted to the minute; if a student logs in outside the window, then he or she will receive an error message stating the assessment is not available at this time. This prevents students and administrators from accessing the content of the assessment outside the school day. For added security, the content of the assessment is not available to administrators within the platform at

Page 25: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 25 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

any time. Users may not print out assessments. All secure documents must be ordered and shipped from CTB.

No content meta-data, including but not limited to the correct answers for items, are stored on the client workstation. All scoring is performed on the TDP servers.

Security Plan and Server Robustness The online testing system and its sub-applications adhere to strict, industry-standard security procedures. Best practices are also followed for physical access, intrusion protection, and virus protection. In addition, each element of the system is redundant, minimizing the risk of failure.

System Security All CTB systems meet strict standards for security of their data and the systems themselves.

Access to production servers is strictly limited to a few network and senior software engineers. Each engineer, and his or her level of access, must be approved by McGraw-Hill Education’s (MHE) Corporate Infrastructure Security team and MHE's infrastructure team. Access to the servers requires at least a ten-character password that incorporates three of the following four character types: mixed case, alphabetical, numerical, and symbolic. After access rights have been granted, any changes to the production servers must be approved.

Security of test items and student information is maintained at all times, with security procedures acting at three levels:

1. Physical security preventing access to the machines on which data reside or are processed 2. Network security, including protection of CTB networks from infiltration and secure

transmission of data to CTB networks and others 3. Software security, ensuring that only authorized users access information on CTB systems and

that their access is limited to only the information that they are authorized to view We describe key security procedures that will protect NEAC member jurisdictions’ items, ensure confidentiality and privacy, and enforce state laws, the Family Educational Rights and Privacy Act (FERPA), and other federal laws below.

Physical Security For the test administration software, data will reside on servers in the cloud. CTB’s hosting provider maintains 24-hour surveillance of both the interior and exterior of its facilities. All access is keycard controlled, and sensitive areas require biometric scanning. McGraw-Hill Education hosts all other servers for CTB in its location at East Windsor, New Jersey, with a secondary data center in Secaucus, New Jersey.

Access credentials are assigned only for authorized data center personnel, and only they have access to the data centers. Visitors’ identities are verified, and visitors are escorted at all times while in the facility.

All data center employees undergo multiple background security checks before they are hired.

Secure data will be processed at CTB facilities and will be accessed from CTB machines. As noted, access to facilities is keycard controlled. Visitors must sign in and be escorted while in all data centers. All servers are in a secure, climate-controlled location with access codes required for entry. Access to servers is limited to network engineers, all of whom, like all employees, have undergone rigorous background checks.

Staff members at both McGraw-Hill Education and our cloud service provider receive formal training in security procedures to ensure that they know and implement the procedures properly.

Page 26: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 26 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

CTB protects data from accidental loss through redundant storage, backup procedures, and secure off-site storage.

Network Security Hardware firewalls protect all networks from intrusion. They are installed and configured to prevent access for services other than HTTPS for secure sites. The firewalls provide a first level of defense against intrusion, backed up by a capable second line: hardware and software intrusion detection and remediation.

Intrusion detection systems constantly monitor network traffic and raise alerts for suspicious or unusual network traffic.

All companies hosting CTB systems maintain security and access logs that are regularly audited for login failures. Such failures may indicate intrusion attempts. Suspicious log entries are investigated and resolved.

All secure data transmitted across the public Internet are encrypted using SSH (AES), FIPS 140-2, or an IPSec VPN. Secure Web sites encrypt data using 128-bit SSL public key encryption.

The hosting environment is protected by an Intrusion Prevention System (IPS) appliance at the perimeter. The IPS appliance combines intrusion protection and vulnerability management technology into a single integrated solution that offers both proactive and reactive protection from the latest threats.

Software Security All secure Web sites and software systems enforce role-based security models that protect individual privacy and confidentiality in a manner consistent with state laws, FERPA, and other federal laws. All systems implement sophisticated, configurable privacy rules that can limit access to data to only appropriately authorized personnel.

Different states interpret FERPA differently, and CTB supports customized interpretations. The system is designed to support these interpretations flexibly. Some states limit a school’s access to data collected while the student attends the school, limiting access to historical data for students who transfer into the school. Other states provide the full history of data to the school or teacher who has jurisdiction over the student at any point in time. Similarly, while some states provide each teacher with access to information about all students in the teacher’s school, other states limit access to those students to whom the teacher provides instruction. CTB systems can be configured to support all these scenarios and more.

Secure transmission and password-protected access are basic features of the current system and ensure authorized data access. All aspects of the system, including item development and review, test delivery, and reporting, are secured by password-protected logins.

CTB systems use permission-based security models that ensure that users access only the data to which they are entitled and that limit their ability to change those data according to their rights.

Security Audits CTB conducts periodic security audits, including an electronic security audit of every new system deployed. McGraw-Hill’s Corporate Information Security (CIS) team conducts the security audit. Security personnel pose challenges to the software systems to emulate security threats.

Server Robustness

Fault Tolerance through Redundancy The components of the TDP are architected for high-availability. The system will withstand failure of any component with little or no interruption of service. One way that our organization’s platform achieves this robustness is through redundancy. Key redundant systems include:

Page 27: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 27 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

The test administration subsystem's hosting provider has redundant power generators that can continue to operate for up to 60 hours without refueling. With multiple refueling contracts in place, the generators can operate indefinitely. The provider maintains an n + 1 configuration of 16 diesel generators that, at maximum capacity, can supply up to 2.0 megawatts of power each.

Hosting providers have multiple redundancies in the flow of information to and from data centers by partnering with nine different network providers. Each fiber carrier must enter the data center at separate physical points, protecting the data center from a complete service failure caused by an unlikely network cable cut.

Every installation is served by multiple Web servers, any one of which can take over for an individual test upon failure of another.

Active/passive clusters of database servers are configured so that the passive node takes over in the event of failure of the active node.

Each database server in a cluster has dual connections to the disk arrays containing the system data. Each disk array is internally redundant, with multiple disks containing each data element. Failure of

any individual disk is recovered immediately by accessing the redundant data on another disk.

Archiving and Backup Data are protected by nightly backups. We complete a full weekly backup and incremental backups nightly. The systems are run with full transaction logging; enabling us to restore the system to the state it was in immediately prior to a catastrophic event.

The server backup agents send alerts to notify the system administration staff in the event of a backup error, at which time they inspect the error to determine whether the backup was successful or if they need to rerun the backup.

All backup media are stored in a secure, fireproof, off-site location. Our hosting provider ensures that all media sent off-site are shipped in locked, water-resistant, and impact-resistant containers. The off-site vendor does not have direct access to the individual media containing customer data at any point or any time during transport. The off-site storage location is audited on a regular basis to ensure proper physical security, media management, and location tracking are meeting the hosting provider’s industry-standard guidelines.

Test Administration Data Recovery Restoration audits are completed to verify the recoverability of data from the backup sets. These audits involve the restoration of data from a random system in the hosting environment, along with integrity checks of the data to ensure that it is intact.

In the event of a catastrophic failure, data are restored from the full backup. Incremental backups are then restored to return the system to its state on the previous night. Finally, full transaction logging allows us to repeat any database transactions prior to the catastrophic event, returning the database to its state immediately prior to the event. Specific procedures for recovering data to a database and to a file system are described below.

Recovery Procedure (Database) The database recovery procedure is used to recover from an unlikely database hard-disk failure that resulted in the loss of the logical data drive. The procedure involves six steps:

1. Determine the last full and differential backups 2. Perform a tail-log backup 3. Restore the full backup 4. Restore the differential backup 5. Restore the tail-log 6. Verify the integrity of the database

Page 28: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 28 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.3.8 Descriptions of training protocols to be provided at the local level on the test administration procedures should be provided. CTB's Online Implementation Manager will directly support the NEAC sites and technology staffs in their preparation for online testing. We will deliver guidance, materials, and technical support directly to the school sites and their technical coordinators at each checkpoint. Our technical support team will monitor site readiness progress and make sure that test locations are achieving the technology and configuration objectives needed to ensure a satisfactory live test administration. Local field engineers will be available to conduct site visits and support technology coordinators with solutions to issues related to hardware and network infrastructure.

CTB will provide training for districts and their technology coordinators in both face-to-face and webinar formats; the training is aligned with the checkpoint process. A specific training program will be developed for NEAC members, including the following training components.

Readiness and Technology Survey Training: CTB will provide training for District Test Coordinators and District Technology Coordinators on the use of the Readiness and Technology Survey (RTS) system. RTS is a Web-based application that is hosted by CTB and that evaluates data regarding each school's system/network infrastructure to determine the overall readiness for online testing. RTS will evaluate a school’s capabilities using the data provided compared to a benchmark configuration of minimum system requirements for online testing.

Site Readiness Training: CTB will provide detailed site readiness training, covering topics such as preparation for local and/or statewide stress testing to ensure that all schools planning to use computer-based testing are ready for the full-scale operational test.

Administration/Setup Training: CTB will provide training on the final online testing preparations. District staffs will be trained on assigning tests, setup/modification of student accommodations, and other administrative tasks. They will also be trained on completing final workstation checks and administration of practice tests.

The CTB Professional Development department will collaborate with NEAC members to develop a total of fourteen test administration training video simulations for grades 3–8 and grade 11 in Mathematics and English language arts (ELA). Once the final video modules are developed, CTB will provide the needed resources to NEAC for use in annual training events with school and district administrators.

8.3.9 Contractors will be responsible for providing up to 4 one-half day regional trainings on system use and test administration procedures, to be supplemented by an on-line webinar and other online training materials (e.g., slide deck from webinar, FAQ document). CTB will work with NEAC to determine the best method for conducting the variety of trainings that will support a successful program implementation. We will plan for the inclusion of four (4) one-half day regional trainings that address technology readiness, system use and interaction, and test administration procedures and support. We propose an annual training plan that includes a series of interactive presentations on each of the proposed online systems to be delivered through a combination of face-to-face training, webinars, and webcasts, and also through self-guided and self-paced online interactive sessions. Training resources will be based on existing Smarter Balanced training modules and publications and, for new systems, will be developed in collaboration with NEAC. Our proposed training plan will support schools and districts as they transition to the NEAC online testing system.

Page 29: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 29 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.3.10 Technical support should be available via telephone and/or electronically with tools such as help desk and/or email. CTB delivers world-class service using a three-tiered support system to provide consistency in the management of support issues and incidents. The Help Desk acknowledges receipt and begins the resolution process for reported support issues and incidents within 24 hours. Our focus is on building customer trust with agile support to create an excellent customer experience related to case resolution.

Help Desk resources will be experienced support staff members who are trained and certified specifically to support the NEAC assessments. Training for representatives is classroom-based and comprehensively covers key contacts, contract-specific requirements, descriptions and walkthroughs of associated systems and processes, and FAQ’s and knowledge base items. The Tier 1 staff is extensively trained to provide end-to-end support to assist educators with basic administration questions through providing advanced troubleshooting for online assessment issues. Once training is completed, representatives must complete and pass a certification before being added to the NEAC support team.

Tier 1: Customer Service/Technical Support Help Desk that is staffed by knowledgeable and experienced support professionals who will address the issue and correct as soon as possible.

Tier 2: Second level of support. Issues that cannot be resolved by the Tier 1 staff are escalated to this tier. This tier is staffed by those who have a higher level of experience with the platform and contract requirements and will perform additional troubleshooting to address the issue or incident. This level is staffed by Senior Support personnel.

Tier 3: The final level of support is used when the most complicated or urgent issues are escalated. Issues that cannot be resolved by the Tier 2 support staff are escalated to this tier. This level consists of various team members (systems administrators, database administrators, implementation staff, content/publishing staff, software developers, and program management) who work together to resolve the escalated issues.

Inquiries and issues unresolved by the initial support tier (Tier 1) will be escalated to a second-level team (Tier 2), which is composed of senior technical support personnel. If the issue cannot be resolved by the second level team, they will forward the issue to a third level team (Tier 3), composed of system administrators, database administrators, software developers, and contract and product specialists. On resolution, escalated cases will be de-escalated to the initial support tier for communication to the customer and case closure.

The CTB Help Desk uses an online customer management system, Salesforce.com (SFDC), to log customer interactions. The system tracks account, contact, and case information for historical and trending purposes; that information can also be used to pinpoint training opportunities and potential system enhancements. The data contained in SFDC are secure and accessible only by authorized CTB employees. The software uses historical customer information for each account and a case reference number for each technical issue and inquiry. SFDC has the flexibility to generate reports detailing case history and statistics in a variety of formats and can be provided as requested.

8.3.11 Metrics for monitoring and documenting systems performance should be identified. The CTB TDP has been formally load-tested with up to 100,000 concurrent users for our current projects. A load-testing cycle of 300,000 concurrent users is currently underway, with results expected in June 2014. The flexibility afforded by a cloud-based architecture allows for the addition of extra capacity on demand, as necessary, along with the ability to stand-up separate environments as dictated by requirements. The system is configured with multi-region disaster recovery (DR) redundancy to safeguard against catastrophic events. The data are replicated across DR zones continuously, with data loss potential limited to the latency of the cross-region transfer. The system is designed to be available

Page 30: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 30 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

to the customer throughout the school year and gives users the ability to administer assessments at arbitrary levels tailored to the needs of the specific group.

Several tools support the real-time monitoring of the system and network performance in every test session. These include Nagios, Oracle Enterprise Manager (OEM) Monitoring, SiteScope, SumoLogic, and Tripwire. For example, OEM provides comprehensive, flexible, easy-to-use monitoring functionality that supports the timely detection and notification of impending IT problems to the intended users. It offers comprehensive monitoring from Oracle Database instances to Oracle Real Application Clusters to Oracle Application Server Farms and Clusters. OEM Grid Control comes with a comprehensive set of performance and health metrics that allow monitoring of key components in our environment such as applications, application servers, and databases as well as the back-end components on which they rely, such as hosts, operating systems and storage.

8.3.12 Documentation should be provided regarding the capacity of the system to support the current and potential future range of Smarter Balanced item types (See Appendix 7 for link to Smarter Balanced Systems architecture and Item Specifications). As described above, CTB has deep knowledge of the Smarter Balanced item pool, item metadata, and response types. CTB worked with Smarter Balanced to define assessment items in terms of student response types and presented the schema for defining items in terms of presentation, response, and scoring type at the Smarter Balanced Collaboration Conference in September 2013. We have leveraged this intimate knowledge of interactive items formats and structure to provide a robust platform for delivery of these and future interactive item types.

The CTB TDP allows for device-independent delivery of content for assessments using modern HTML5-based browsers, including support for standard items and Smarter Balanced interactive (technology-enhanced) item response types on desktop and modern tablet operating systems (e.g., iPad, Android, Chromebook).

The Test Delivery Client (TDC) is the component of the TDP that students use to access assessment content. Figure 9 shows an example of the TDC interface with an interactive item response type.

Page 31: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 31 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 9: Test Delivery Client Interface – Technology Enhanced Items

In the TDC, the student can use the "Go On" or "Go Back" buttons to navigate to the next or previous items. (These controls are modified appropriately for adaptive testing.) Alternatively, the student can go to a specific item by selecting the desired item from the test navigation bar at the bottom of the screen. Navigation in most items can be performed by keyboard or a mouse; for some technology-enhanced item types, interaction currently requires a mouse.

Items are constructed to maximize visibility of all components of the item. When items do not fit on the screen, we make all reasonable attempts to enhance the student experience and ensure that mistakes are not made due to parts of the item, such as a subset of the possible responses, not being visible.

The navigation bar, as shown in Figure 9 above, also shows whether an item has been answered through the globe indicator that is above the item number. Items that have not been answered do not have an indicator, showing that the student has not yet responded to the item.

Note that each item also has a "Mark for Later Review" option for non-adaptive testing. This allows a student to flag an item for review, which causes a visual indicator to appear. If the student ends the test without clearing the review flag, the student is prompted to go back and review the flagged items.

Manipulatives Manipulatives are managed in several ways. When items are constructed, the items are flagged with meta-data that determines which manipulatives are appropriate for a specific item. When assignments are created, the user can determine what manipulatives are available for that assignment, giving the user

Page 32: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 32 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

control over which manipulatives are made available during the assignment. Figure 10 is an example of a graphing calculator that is available during testing.

Figure 10: Sample Manipulative—Graphing Calculator

Page 33: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 33 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

The highlighter, which is shown in Figure 11, will allow a student highlight sections of passages and items that are significant.

Figure 11: Sample Manipulative—Highlighter

Page 34: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 34 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 12 shows the magnifying glass that allows a student to select and enlarge specific areas of the screen.

Figure 12: Sample Manipulative—Magnifier

The answer eliminator, masking, and scratch pad manipulatives are currently under development. Please see the implementation plan presented in Section 8.3.4. above for details.

All planned changes to the CTB TDP will be reviewed and approved by an appropriate group of stakeholders prior to their release.

8.3.13 Describe methods for establishing the comparability of test results in comparison to those that would be delivered via the Smarter Balanced Test Administration engine. For a computer adaptive assessment, comparability across forms will depend on the characteristics and constraints provided in the test design and how those are incorporated into the operational CAT engine. Since the field test was not computer-adaptive, we assume that Smarter Balanced will provide all test specifications as part of the certification process for the summative assessment delivery. The summative assessment and any computer adaptive interim assessments will be delivered using CTB's optimal test assembly ShadowCAT engine that implements a fundamentally different adaptive algorithm. By using the Shadow Test approach, our engine guarantees all test blueprint, shared stimulus, and psychometric constraints are met while optimizing the selection of items to maximize information at specified ability levels throughout the student’s testing experience. No other adaptive algorithm can guarantee that the test will adhere to all constraints for every test taker. The following will explain this engine in more detail, before addressing comparability issues.

Different adaptive testing algorithms using the Smarter Balanced item parameters and adhering strictly to the test blueprint requirements are expected to produce interchangeable scores. Based on the recently released information on Smarter Balanced's adaptive algorithm, CTB will compare the

Page 35: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 35 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

ShadowCAT engine against the Smarter Balanced engine in terms of conformance to the test blueprint requirements and the precision of measurement using extended simulation studies. Also, as part of the Smarter Balanced certification process, CTB will establish the comparability (and superiority) of our adaptive testing engine. Here we describe the Shadow CAT engine in more detail and explain why we decided to implement the Shadow Test approach.

The ShadowCAT engine has been in use by our customers since July 2013. CTB has been constructing fixed forms using the same underlying optimal test assembly engine since 2011. Expanded in 2013 for adaptive offerings, this engine provides a unique approach to adaptive testing, in which item-level adaptive, multi-stage on the fly, multi-stage, linear on the fly, and linear tests can be transparently configured by the customer and delivered with ease. By using the Shadow Test approach, our engine guarantees all test blueprint, shared stimulus, and psychometric constraints are met while optimizing the selection of items to maximize information at specified ability levels throughout the student’s testing experience. No other adaptive algorithm can guarantee that the test will adhere to all constraints for every test taker.

Our ShadowCAT optimal test assembly engine was designed under the leadership of Chief Research Scientist, Dr. Wim van der Linden. Dr. van der Linden is an internationally renowned expert in adaptive testing, with seminal related publications in numerous peer reviewed and international journals and books. He is co-editor of three published volumes: Computerized Adaptive Testing: Theory and Applications (Boston: Kluwer, 2000; with C. A. W. Glas), and its sequel, Elements of Adaptive Testing (New York: Springer, 2010; with C. A. W. Glas), and Handbook of Modern Item Response Theory (New York: Springer, 1997; with R. K. Hambleton). He is also the author of Linear Models for Optimal Test Design published by Springer (2005).

Confidence in Superior Performance of the ShadowCAT Engine CTB is confident that our ShadowCAT engine will meet and exceed Smarter Balanced certification requirements. By the nature of the algorithm, the Shadow Test approach is considered to be the gold standard, surpassing precision, minimal bias, and constraint compliance results of other methods, including the weighted deviation method we assume will be represented in the open source adaptive algorithm (e.g. He, Diao, & Hauser, 20144, Patton, Diao, & Boughton, 20135, van der Linden, 20056). Our ShadowCAT engine has been driving item selection for our TABE Adaptive product since July 2013, when we converted from a previous adaptive testing algorithm that had been in use since 2011. As expected, our simulation studies on item banks and test blueprints similar to the Smarter Balanced item banks find higher precision and less bias when comparing the ShadowCAT with the weighted deviation model, in addition to the guaranteed 100 percent match to blueprint for every test taker. CTB is prepared to pursue certification upon release of the adaptive engine certification requirements, the item pool, and the item pool statistics. Table 7 summarizes the engine’s characteristics.

 

4 He, W., Diao, Q. & Hauser, C., "A Comparison of Four Item-Selection Methods for Severely Constrained CATs." Educational and

Psychological Measurement, first published on January 7, 2014 as doi:10.1177/0013164413516976.

5 Patton, J.M., Diao, Q. & Boughton, K. (2013). From paper-and-pencil to CAT: an application of mixed-integer programming. Paper presented at the National Council on Measurement in Education in San Francisco, CA.

6 van der Linden, W.J. (2005). A comparison of item-selection methods for adaptive tests with content constraints. Journal of Educational Measurement, 42(3), 283-302.

Page 36: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 36 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Table 7: Confidence in Superior Performance of the ShadowCAT Engine

Feature ShadowCAT Heuristic (e.g. Weighted Deviation)

Blueprint Compliance

Guaranteed blueprint compliance. Changes in item pool or blueprint may be easily made without impacting blueprint coverage.

There is always a chance the blueprint will not be met. Changes in the item pool or blueprint require adjustment of weights (additional simulation).

Psychometric Constraint Compliance

Guaranteed constraint compliance.

Always a chance constraint will not be met.

Maximum Information for High Precision

Guaranteed maximum information for each test taker.

Heuristic, not optimal. May be weighted minimally or not at all if blueprint is complex.

Adapts at the item level within a shared passage or stimulus

Item-level adaptation within guaranteed constraints – minimum and number of passages and minimum and maximum number of items per passage.

Maybe. Many engines using this approach require pre-assembled testlets that are then administered without adaptation once the passage is selected by the engine.

Simple and Transparent Configuration

It’s as easy as writing down the test specifications.

Iterative simulation required to determine best weights for each constraint.

Supports Multiple Adaptive Delivery Approaches

Can support linear, LOFT, MST, MST on the fly, and item-level adaptive under the same framework. Constraints are represented identically for each mode.

Requires a separate solution for each mode.

The ShadowCAT engine brings the Shadow Test concept from theory into practice for our customers. In contrast to heuristic approaches such as the weighted deviations method, the Shadow Test approach anticipates items needed to meet all constraints required of the test and builds a complete and optimal test form at each stage of adaptation. Optimality is achieved through simultaneously solving a system of equations in which each equation represents a test constraint and identifying the set of items that maximizes information for the test taker out of the multiple possible sets of items that are solutions to the system of constraints. This optimal set of items is called the shadow test. The optimal item is then selected from the shadow test. As the Shadow Test approach selects a complete and optimal test form at every adaptive decision point and then selects the optimal item from that fully compliant test form, it is not allowed to select an item that will result in infeasibility of constraints at a later adaptive step. In this way, the shadow test ensures that all constraints are met in every test administered. In addition, if the shadow test is ‘frozen’ or held fixed by design at predetermined locations or intervals within the test, linear on the fly and multistage on the fly adaptive tests result.

The Shadow Test algorithm prevents the engine from making a greedy or myopic selection—a selection that may meet a requirement early in the test but will rob the pool of any items available to meet a latter requirement—or concluding the test before all constraints are met. These are common issues in heuristic algorithms such as the weighted deviations method. Heuristics may involve rotating item selection through sub-content areas (content balancing) or involve complicated weighting designs in

Page 37: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 37 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

which each constraint is assigned a weight that is adjusted until simulation results yield a high percentage of tests meeting blueprint constraints. However, the heuristic approaches can never guarantee that a blueprint will be met. In addition, the weighting process can be a tedious trial and error effort, taking focus away from other important test configuration decisions in computer adaptive testing.

Using the Shadow Test approach will undoubtedly result in different tests with the ShadowCAT engine guaranteeing blueprint adherence. But as noted above, any other algorithm adhering to the same blueprint in the same way still delivers interchangeable scores. A comprehensive simulation study based on the recently released details on the Smarter Balanced item selection algorithm will be used to establish the comparability and the conformity to the Smarter Balanced test blueprint requirements and other guidelines.

8.3.14 Provide documentation regarding the application’s capacity to import and export as applicable: items, student item response data, student registration, demographics, and data regarding eligible and utilized accommodations. To complete these procedures to import and export items, student item response data, student registration, demographics, and accommodations information, a user must be assigned a role that has the necessary permissions.

For example, a user can import an assessment he or she previously created or export one for use in another IMS QTI-compatible application. To import an assessment, the source file must be an IMS QTI-compatible XML file. For more on the IMS QTI specification, see the IMS Global Learning Consortium.

Note: Because some systems differ in their interpretations of IMS QTI standards, importing IMS QTI-compatible files from other systems may not work reliably.

Various tools exist for importing and exporting student item response data, student registration data, demographics, and accommodations information. We will need to agree with NEAC about interface specifications before we can customize these tools to meet the needs of NEAC.

8.4 Test Items and Performance Tasks Smarter Balanced will develop, review, and field test a number of test items and performance tasks sufficient to populate item pools for both the summative and interim assessments. They will also provide on-going monitoring of item usage, removing items that become over exposed. Smarter Balanced will continue developing items in subsequent operational years in numbers sufficient to maintain the viability of the item pools. The contractor will support this process by providing the following:

8.4.1 Operational Field-Testing Implement operational field-testing in accordance with a plan approved by the Smarter Balanced Governing states that includes, at a minimum, parameters for items to be used in CAT and performance tasks.

CTB will work with NEAC to develop a field testing plan that will take into account (a) the source of the test content, (b) necessary equating designs including anchor items, and (c) needs for newly developed content to refresh the given item/task pool. Should new items/tasks be field tested during the contract period based on Smarter Balanced test specifications, our research team will work with Smarter Balanced to ensure that the items/tasks are on the Smarter Balanced scale.

The CTB content development team will select, if necessary, appropriate items from the Smarter Balanced item pool for field testing. If the item selection is automatic, we will ensure that available field-test items have appropriate metadata for operational testing. CTB is currently developing the initial field-test pool of 11,000 items for Smarter Balanced, and we anticipate that the field-test items for the first field-test administration in 2015 will come from this pool with all required item attributes.

Page 38: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 38 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

We understand that the Smarter Balanced Assessment Consortium plans to conduct the scaling activities for new items in its bank, and we will be prepared to support this process in several possible ways:

Provide the data to Smarter Balanced Arrange separately to conduct the scaling and equating at CTB Arrange to support Smarter Balanced in a review or replication of the scaling and equating We understand that Smarter Balanced plans to facilitate the analysis of field-test item data, including the generation of IRT parameters or other required statistics, review of item data, and decisions about the inclusion of items in the operational pool. CTB will manage the inclusion of field-test items seamlessly into student tests during the operational administration and will be ready to provide scoring data on field-tested items according to Smarter Balanced procedures, once they have been developed. CTB's ShadowCAT engine can be configured to embed field-test items by either alternating pre-constructed blocks of field-test items at designated item positions within a CAT or constructing them on the fly, incorporating exposure rates in real time. We have also developed a system that can optimally select field-test items adaptively to minimize item exposure and maximize calibration efficiency (van der Linden and Ren, 20147).

CTB is also willing to work with NEAC directly (or via Smarter Balanced) to ensure that the item parameters are sensitive to the high-performance profile of the student population in Connecticut, New Hampshire, and Vermont. If desired, CTB can support re-calibrating and re-equating items.

8.4.2 Quality Control and Item Tagging Conduct quality control on the import of items, item metadata and item tags into the test administration platform

CTB will ensure that all required item content and metadata will be imported correctly into the test administration platform. We will develop, for NEAC approval, an Item Import Quality Control Plan that will include the following elements:

1. Pre-import quality analysis. This analysis will include a close comparison of the defined Smarter Balanced data format with that of the CTB test administration platform, with particular emphasis on areas of import that could impact item content and data. Our content teams have documented "lessons learned" during the development and export/import of item content and metadata for Smarter Balanced Contract 16 Field Test item development. We will leverage this experience to ensure that potential areas of error (e.g., item/stimulus associations, machine-scoring rules, mathML equations) are adequately understood prior to the import of items.

2. QC of item metadata. Item metadata will be checked as part of the automated item import process. In addition, CTB content teams will manually check metadata reports to ensure that all metadata fields have imported into the test administration system correctly. We will develop checklists for this QC as part of the overall plan to complement the automated checks that are part of the import process.

3. QC of item format. Items will be checked for correct formatting in the test administration system including elements such as: • Item stimulus associations • Art or graphics formatting

 

7 van der Linden, W. J., & Ren, H. (2014). Optimal Bayesian adaptive design for item calibration. Psychometrika, 79. In press.

Page 39: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 39 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

• Equations/expression rendering • Scoring rules/correct scoring • Interactive item response space functionality

4. QC of embedded support tags. The Item Import Quality Control Plan will also contain procedures for the QC of all embedded support tags, including text-to-speech, Braille, Spanish and ASL translations; foreign-language glossaries; and English glossaries. Procedures for these checks will be developed based on the final Smarter Balanced QTI export format and certification requirements.

Initially, we will check a representative sample of items for each identified element. If any issues are discovered, all items containing the identified element will be manually reviewed. All items will be previewed prior to final approval for administration. The final Item Import Quality Control Plan will be modified, as needed, based on Smarter Balanced QTI/APIP formats and Smarter Balanced certification requirements.

8.5 Manufacture Delivery, Scanning and Scoring of Paper-based Tests Although the Smarter Balanced Assessments are designed for digital delivery, the states will provide a paper-based test form for at least 2 operational years of the project to schools that lack the technology readiness for delivery of computer based assessments. Smarter Balanced will provide a set of camera ready test forms for all grades and content areas. These forms will meet specification established by the Smarter Balanced Governing States. Smarter Balanced will not field test new items on paper. However the operational paper tests will necessarily be longer than the computer adaptive tests in order to increase the reliability of the paper forms. Additional hand-scoring, above and beyond what is required for the online tests, will be necessary because some of the online delivered, machine scored items will need to be hand scored. Proposers will describe how they will provide the following processes and services relative to the paper-based test option:

8.5.1 A process to help ensure production of necessary quantities of manufactured paper-based test materials based upon enrollment data and overage requirements provided by the states. Preliminary estimates of the numbers and percentages of students needing the paper-based option are summarized in Table 4 of the RFP.

8.5.2 A process to help ensure that all test paper-based test materials meet specifications provided by the states prior to final production, including checks during printing.

8.5.3 A process to help ensure accurate collating of paper-based test materials. 8.5.4 A process to identify and protect the security of paper-based test materials.

8.5.5 A process, where required, to pre-code answer documents with student SSID numbers, demographic information, LEA and school/testing site information. To ensure student confidentiality, a unique Smarter Balanced student identifier will be used for data transfer rather than the regular state student identification number.

8.5.6 A process to ensure students who take the paper assessment do not take a computer-based assessment in the same content area unless an exception is approved by the state. This process must also include procedures to identify and resolve any cases where students have two or more paper exams that may occur in cases when students change schools during the testing window.

8.5.7 A process and procedures to ensure the accurate and timely packaging of orders, including additional materials orders, including each of the following:

8.5.7.1 A process to ensure that all paper-based test materials, are shrink-wrapped, banded, or packaged according to standard industry practice.

8.5.7.2 A process to ensure the accurate labeling of all completed packages.

8.5.7.3 An expedited packaging and shipping system.

8.5.7.4 A process to ensure documentation is created and maintained for all completed orders. 8.5.7.5 A process to ensure accurate receipt, check-in, and processing of materials at the processing center.

Page 40: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 40 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.5.7.6 A process to reconcile and report any missing packages or material.

8.5.8 Methods and quality assurance guidelines for scanning paper-based test forms that includes the following:

8.5.8.1 A process that ensures accurate scanning

8.5.8.2 A process that ensures that the integrity of booklets and student response documents are maintained during the scanning process

8.5.8.3 A process that ensures that all relevant documents complete the scanning process 8.5.8.4 An editing process that ensures accurate collection of data from scanned documents 8.5.8.5 A contingency plan or system to ensure that any issues encountered in scanning will not delay scoring

8.5.8.6 A process to integrate the data collected from paper with data from the online administration for scoring and reporting

8.5.8.7 A process to collect, analyze and report any industry standard statistics regarding validity and reliability across the paper and online administrations

8.5.8.8 A process to detect and address any security breaches associated with then paper forms

8.5.9 Methods and quality assurance guidelines for scoring paper-based tests; Please see Section 8.8 for general requirements and procedures for scoring

8.5.8.7 A process to collect, analyze and report any industry standard statistics regarding validity and reliability across the paper and online administrations

Computer adaptive, computer-based test delivery is central to the Smarter Balanced vision, and it becomes the end-state for all the work with which we can assist NEAC in terms of defining and securing technology readiness for all participating districts and schools. However, it is understandable and expected that this transition to census online testing will take several testing cycles to be fully in place. To this end, Smarter Balanced has committed to providing a paper-pencil alternate testing format for the first three administration years. As the primary development vendor for the Smarter Balanced Pilot and Field Test item pools, CTB is best positioned to assist Smarter Balanced in developing the paper-pencil forms. We have successfully negotiated with the Smarter Balanced Assessment Consortium to design, develop, and produce the paper-pencil form for the 2015 operational years, and we are currently developing paper-pencil test form and item selection specifications that will guide this work and inform the field about exactly what will be provided.

The experience we will gain in developing a small comparability study form and in developing the 2015 operational and breach paper-pencil test forms will be used to inform the NEAC paper-pencil testing implementation plan. We are at the center of this effort, and we can use this vantage point to inform the NEAC paper-pencil implementation plan as well as provide additional information about methods that can build in greater cost efficiency by providing greater economies of scale than can other vendors.

CTB has used the estimated paper-pencil test quantities provided in the RFP to inform our planning and cost proposal. However, we fully understand that the test registration process will be used to finalize testing counts for both online and paper-pencil formats. Our proposed test registration process will be completed well before testing begins to allow us to provide test materials and test coordinator kits that will support schools in the distribution of materials; test administration; and collection, packaging, and return of materials to CTB for scoring and reporting. The registration process will also allow those districts participating in paper-pencil testing to register to receive Braille, Large-Print and Spanish-translated versions of the tests.

Per the RFP, CTB has assumed that we will provide paper/pencil assessment services at the quantities indicated in years one and two only. Our printing assumptions have also been captured in Appendix D: Manufacturing Specifications. All testing from that point forward will be conducted online. Should the states need pricing for the continuation of paper/pencil services, including accommodated Large-Print and Braille forms, CTB can readily provide prices for these.

Page 41: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 41 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Quality Assurance for Paper-Pencil Test Forms CTB is working with Smarter Balanced to define the final paper-pencil test specifications and formal test review process. Once fully defined, CTB will conduct the development, production, and quality assurance reviews to ensure that the test materials are of the highest quality and meet specifications. The paper-pencil test materials will be provided in a test book with a corresponding student response book. We will conduct our standard quality assurance checks during the materials production process; those checks are summarized in the high-level list below:

CTB's manufacturing staff completes site visits to ensure consistent quality in scannable and non-scannable hardcopy materials.

We use composition and production practices based on our experience creating the industry’s first usability studies, and we adhere to the elements of universal design for our print materials.

With Smarter Balanced approval of the digital proofs, CTB will send digital files are sent to the printer who: • Checks digital files for compatibility and begins the printing process. A job ticket is created to

track the progress and location of the job at all times. Tickets are also signed by operators and inspectors at key stages.

• Prepares production files and generates proofs. Proofs are sent to CTB where, once they are approved, they are sent to the customer for final approval prior to printing.

• CTB will specify printing on high quality paper that meets strict opacity and brightness requirements. Plates are inspected for quality and installed on the printing press. At the start and periodically through the print run, press sheets are pulled for quality control and to be compared to the final, approved proof.

• During binding, press sheets are collated, folded, trimmed, and bound. Individual press sheets are stored in a secure location on corresponding separate pallets before binding. Specified quantities of books are counted, shrink-wrapped, labeled, and stored in a secure warehouse prior to shipping.

CTB will implement our system that accounts for the tracking of each book from vendor to districts and back to CTB for scoring. This system includes the use of a unique barcode for each document that is to be inventoried and barcode assignment and verification documents at both the school and district level. CTB will work with NEAC to ensure that the documents and systems used to verify delivery and retrieval of paper-pencil materials meet the testing program’s security requirements.

Precoding Services We addressed our online precode system earlier in proposal Section 8.3.6. Using this system and process, NEAC districts and schools will register for both online and paper-pencil testing. Using the information submitted to the system, CTB will create student precode labels for application to the paper-pencil formatted tests. Our precode programs were designed to be flexible so that they can accept a variety of customer input file formats. NEAC district users will find that the online interface is easy to use. In addition, with our system’s validation checks, NEAC will be assured that the uploaded data are accurate.

CTB will use the student demographic data (including SSID, LEA and school/site information, and unique Smarter Balanced identifier) to generate student labels for the paper-pencil administrations. All precoded labels will be printed and delivered to NEAC districts as part of the Test Coordinator Kits we will supply to facilitate the distribution, collection, and return for scanning, scoring, and inventory processes of paper-pencil test materials.

Production of Student Pre-ID Labels CTB has extensive experience printing and producing precode labels. Our cold-fusion label printing process avoids the poor quality and high incidence of ink smudging that are typical with less

Page 42: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 42 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

sophisticated techniques. We will follow extensive quality control procedures at each stage of the data management and pre-identification process. Those procedures will include, but not be limited to, precoding of mock data to simulate all possible combinations of student demographic data, label scannability testing, and multiple visual inspections. The labels will be packaged and labeled by school and shipped to districts and/or schools/sites as part of the larger paper-pencil Test Coordinator Kits in accordance with final program schedules.

Using a single test registration/precode system for both online and paper-based testing will support ensuring that students are able to test in only one format for each content area unless otherwise specified by the State.

Test Material Distribution To facilitate the distribution, collection, and return of paper-pencil test materials, CTB will provide Test Coordinator Kits that will be delivered each year with all other test materials. These kits are intended to provide district- and school-level staff members with the instructions, header documents, shipping labels, envelopes, and ancillary forms required to package and return documents for scoring.

Test Coordinator Kits will include the following materials:

Test Coordinator Manuals (details the instructions for security, distribution, collection, packaging, and return of test materials for inventory and scoring)

Test Administration Manuals (paper-pencil version) Pre-slugged headers to facilitate scanning Paper bands and stack cover cards Pre-printed, colored shipping return labels Test security forms Pre-coded labels and rosters CTB will provide test materials and Test Coordinator Kits that are clearly labeled to assist in easy identification and distribution. We will work with NEAC to develop test materials printing and material fulfillment specifications that will include all information needed for the accurate printing and fulfillment of test materials that will be distributed to district and school/site coordinators, districts, and schools.

The materials fulfillment will be completed by experienced pick and pack teams, using barcodes and package verification. Specifications for packaging materials will be delivered to NEAC for review and approval. These specifications will detail how the materials will be packed and distributed as well as how they will be received from districts for scoring. Examples of packing and inventory lists and a description of how CTB will inventory materials will be included in the specifications. Each material shipment will contain a packing list that lists all secure materials and that can be used to ensure the shipment is complete. Prior to the commencement of the pick/pack, CTB will provide sample printed materials to NEAC for review and approval.

If desired, CTB will conduct a complete material review process with NEAC prior to the final production and fulfillment process begins. We will use expedited transportation with select, secure carriers to ship and collect testing materials. We will use our transportation management system to ensure all text materials (to and from NEAC districts) will be tracked and monitored. Using only secure, bonded freight carriers for shipping of testing materials to the NEAC districts and schools/sites will ensure that materials can be tracked electronically from origin to destination. All our carriers provide electronic scheduling and tracking to ensure 100 percent accuracy of shipments for this program.

Our Transportation department monitors all shipments through customer receipt and signature at the districts, schools, and sites to ensure that all materials are delivered in a timely and accurate manner. We will pay all costs associated with the distribution of all testing materials. Should it be determined

Page 43: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 43 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

that additional materials are needed at a test site, we will work with our fulfillment vendors to ensure that all short/add orders are expedited, providing uninterrupted fulfillment of materials and schedule compliance for the assessment window.

CTB will provide periodic and end-of-project shipping status information to NEAC for all test material distribution and collection cycles.

Scanning of Student Response Booklets For the scanning and scoring of all NEAC paper-pencil student response books, CTB will rely on our established data processing and scanning/scoring procedures. We have the experience and depth of resources to complete the processing of the paper-pencil test documents in accordance with program timelines and in strict alignment with program requirements. Document handling and scoring processing are core competencies at CTB. There is a companywide consciousness that each document that we process represents the work of a student and with this knowledge comes the responsibility to ensure the accuracy and the reliability of that data.

CTB uses state-of-the-art scanning and scoring technology to capture all multiple-choice, constructed-response, and student demographic data as well as data from all ancillary documents. Bu using high-quality equipment combined with well-established processes and systems, we can ensure the accurate and efficient capture of student data.

CTB uses state-of-the-art scanning facilities with scanners that are capable of capturing student demographic and response data from the pages of answer documents as electronic images and data, ensuring accuracy and reliability in the final data that we report. We will use scanning systems that are completely scalable and modular in design and that can be operated 24 hours per day, seven days per week. Our scanning systems are designed to accurately capture student response data and biographical information, and they are continually monitored; if standards are not met, the scanning systems will stop, display an error message, and prevent further scanning until the condition is corrected. Those conditions include document page and integrity checks, user designed online edits, and numerous internal quality assurance/quality control checks. Before every scanning shift starts, the operators thoroughly clean the machines and perform a diagnostic routine. This is yet another step to protect data integrity and one that has been done faithfully for the many years that we have been involved in production scanning. As a final safeguard, spot checks of scanned files, bubble by bubble, are routinely made throughout scanning runs. The result of these precautions, from the layout of the form to the daily vigilance of our operators, provides the highest levels of accuracy in the data that we report.

CTB uses Scantron 5000i scanners for their speed, accuracy, and volume capacity capabilities. During scanning, we collect both bubble data (Optical Mark Recognition, or OMR) as well as capture document images to facilitate the handscoring of open-ended responses. All score and demographic data are captured from documents and fed to CTB’s Winscore System for the document editing process that is reliant on key-entry clean-up of the bubble and barcode data. Our document scanning and editing process incorporates the following validation:

Data clean-up and key entry (correction of bubbling errors) is done from images associated in our system with each error and automatically displayed at the key-entry workstation, so there is no chance of a key entry operator key-entering from the wrong book.

We use proprietary and patented OMR software that is able to correct skew, stretch, or shift of a sheet due to paper motion while passing through the scanner or due to inaccurate printing. This software uses multiple (usually four) anchor marks printed on the sheet to establish with complete certainty the location of bubble positions, even if the sheet has been distorted by humidity.

The Scantron 5000i scanners come equipped with software that completes industry-standard checks for various problems that would indicate possible scanner problems. In addition, CTB’s proprietary software adds a series of image reliability checks.

Page 44: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 44 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Every 5,000 sheets, we scan a diagnostic sheet to check the correct operation of the scanner. If the bubbles on the diagnostic sheet register at the wrong levels, the scanner will refuse to scan until a sheet passes the test.

A Scantron field engineer is on site at our scanning center for every shift to ensure immediate resolution of any issue that may arise.

We use industry-standard mark resolution logic to determine the intended bubble or mark by a student (from among dark and light marks).

Group and stack information is captured through header sheets. The scanner software distinguishes between hand-bubbled and machine-printed bubbles and holds

the machine-printed ones to a higher standard of darkness. Possible erasures are captured at the scanner along with the darker, valid and intended bubbles. This

information is capture and passed to our Winscore system to support the Erasure Analysis process that is required for some of our contracts.

Document Editing Raw scoring and editing of scanned data, such as answer documents and headers, is performed in CTB’s client/server system, WinScore, where a sophisticated system of edits can be invoked to review the integrity of each batch scanned and produce a list of suspected errors. While Editors can view data from any document online, the “error suspect list” focuses Editors on the most likely problems based on predefined guidelines that will be approved by NEAC. This system reduces editing time and provides a high degree of quality assurance.

CTB has continued to enhance the capability of our editing software to simplify the detection and correction of errors. Online editing screens focus Editors on potential problems and then provide related information. The actual scanned documents are always available to the Editor in the case that a visual verification or hand-check of the document is needed. The software supports the review and correction of any field in the scanned record; student errors affecting the reliability of the data (including double-grids, blank responses, incorrect student identifiers, and damaged documents) are flagged, pulled, and reviewed, and the records are corrected. The operator is guided through each error in a particular job in a sequential manner. Entry and verification of the necessary corrections are also enhanced so that we are sure that each error is actually corrected. As batches are extracted for scoring, final edits are performed so that all requirements for scoring have been met. This automated final edit will flag a batch for further editing; if any error is detected, the batch containing the errors cannot be extracted for reporting. CTB concentrates its intensive editing capabilities into a process that uses a powerful client/server system plus comprehensive software support. The result is a system design that ensures the accuracy of the optical scanning operation.

CTB’s extensive operational scoring experience has allowed us to institute standardized document handling and scoring procedures to ensure that student responses (scanned and imaged) are linked to the correct student, school, and county. Once in WinScore, scanned information can be decoded and scored. The scoring process assigns scores to all the responses to the questions. A completed job is then exported to our mainframe scoring and reporting system for further processing. The export process extracts the raw score data for each case and puts it in a binary file in a proprietary format acceptable by the software that performs further processing on this data. Our Derived Score Processor (DSP) program will read the student’s items, determine which items are correct, and calculate the raw score for each test section. This program uses scoring parameter tables and internal algorithms to assign scores to test sections.

Accuracy and Reliability of CTB's Scoring Systems CTB conducts quality control checks at all phases of test processing, scoring, analysis, and data reporting. Established procedures require all staff members and subject matter experts to complete

Page 45: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 45 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

their tasks according to strict quality assurance regulations to ensure accuracy and reliability in the scoring of all assessment.

CTB will conduct comprehensive test processing validation procedures to ensure that all contract-specific requirements are clearly understood and have been translated into comprehensive repeatable work instructions. Through our test deck processing, CTB ensures that all scoring software and systems, as well as the processing tasks completed by trained scoring staff members, meet program requirements.

Test Security A variety of things that can take place during test administration that constitute a security breach and that must be discovered. Events ranging from student copying to students receiving inappropriate assistance before or during testing, can be identified through systematic as well as human checks. The two primary means to ensure the security has been maintained and the integrity of students' work preserved occur during 1) scanning and 2) handscoring. During scanning, our scanning and WinScore system can identify and capture student response patterns for closer analysis during an erasure analysis. During erasure analysis, two sets of erasures are analyzed: all erasures, and wrong-to-right erasures where an incorrect answer choice was erased and replaced with the correct answer choice. In our experience, it is important to note that the results of erasure analysis should be used only to facilitate identification of systematic problems within individual schools. That is, these types of analyses must be supported by additional, collateral information. If NEAC is interested in exploring the use of erasure analysis as a component of its test security procedures, we would be pleased to work with NEAC to implement this analysis as part of the program scope.

CTB will develop, obtain approval for, and document the overarching process and associated policies for the handling of all forms of testing irregularities. During the handscoring and rubric validation process, our readers are trained to watch for indications of “troubled students” and/or cheating. Such information can require urgent attention prior to the completion of handscoring. We have a well-established escalation process in place to immediately identify and begin the notification process for any student response that may be evidence of a security breach or that is a response of a sensitive nature.

Validity and Reliability across the Paper and Online Administrations CTB understands that Smarter Balanced intents to conduct substantial research studies in support of the Smarter Balanced validity framework and use of effective accommodations and supports for students. Given our extensive experience, CTB would be prepared to conduct and report the results of research studies to evaluate evidence for validity claims for scores produced under accommodated circumstances as well as answer validity and reliability claims regarding the use of (non-adaptive) paper-based and (adaptive) online administrations. We are willing to support Smarter Balanced in its efforts in flexible ways:

Provide the data to Smarter Balanced Arrange separately to conduct the validity and reliability studies at CTB Arrange to support Smarter Balanced in a review or replication of these studies In particular, CTB will work with school districts that are not prepared to support computer-based testing.

Items will be evaluated for evidence of comparability between paper and online administrations. A standard methodology to investigate the comparability of paper and online administrations is to make use of intact forms for each mode of administration and to spiral the mode of administration so that the groups of students for each mode are randomly equivalent. This methodology in the current context is not appropriate because it would require the construction of a linear computer-based form for each paper-based form. In addition, it may require the administration of both forms outside the regular

Page 46: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 46 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

assessment administration. As an alternative, CTB proposes a process to investigate the comparability across modes that does not require an additional sample of students.

For each paper-based form, the items will be evaluated through industry-standard psychometric procedures when sufficient sample sizes become available. The items will be calibrated using the same IRT models that will be used to calibrate the computerized items. Next, the paper-based item parameters will be linked to the item parameters of the corresponding computer-based items. The item types supporting both modes of administration without a modification (e.g., multiple-choice items) are hypothesized to be most likely to be comparable across modes and hence can serve as the initial anchor items. The anchor set will then be purified iteratively in subsequent stages by reviewing the item characteristic curves. Items with large discrepancies in characteristic curves will be reviewed and unanchored (removed from the anchor set), and the linking method will be carried out once more for the final linking. This method is predicated on the assumption that some items can serve as anchors albeit rendered in different modes. Despite the postulate, this method has been used in practice to help identify items/item types that are prone to administration mode effects and to provide a basis for potential score adjustments. All efforts will be made to provide comparable and psychometrically defensible scores across the paper and online administrations.

CTB has extensive and highly successful experience providing states with appropriate evidence related to the Standards for Educational and Psychological Testing (APA, AERA, & NCME, 19998) as well as providing direct evidence in support of peer review. CTB designs technical reports to support the peer review process, and our customers commonly pass peer review. We understand that federal law is in transition and that changes to the peer review process may occur over the course of this contract. CTB will work with Smarter Balanced and the NEAC states to provide technical documentation and statistical data for peer reviews and to support accountability reporting.

8.6 Security, Chain of Custody and Data Forensics

8.6.1 Test Security The following tasks are primarily the responsibility of the contractor, but will also require direct involvement of the Project Management Team. Proposers will describe how they will address the following tasks and responsibilities:

8.6.1.1 Develop and implement a comprehensive plan to ensure the security of test items, materials, and student data.

8.6.1.2 Develop and implement training procedures and materials regarding test security, and confidentiality of student data and personally identifiable information

8.6.1.3 Develop and implement uniform policies and procedures for identifying and dealing with possible security beaches and testing irregularities

8.6.1.4 Develop implement procedures to account for and protect secure materials at all stages of distribution, receipt, storage, and return. Note: This requirement has general implications, but applies specifically to paper-based test forms.

The integrity of the Smarter Balanced Assessment system is directly dependent on all member states and participating contractors to ensure the security of test items, test materials, and student data. This

 

8 American Educational Research Association, American Psychological Association, & National Council for Measurement in

Education. (1999). Standards for educational and psychological testing. Washington, D.C.: Author.

Page 47: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 47 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

will require all stakeholders to have a comprehensive security plan and protocols at the core of their program implementation plan and to clearly articulate this plan as it relates to all aspects of the program plan from system development to data reporting.

System Security Our Test Delivery Platform provides advanced security protocols and techniques to protect both test content and student data. The test ticket contains a disposable username, password, and test access code for each student. The combination of those three credentials places a student into a test session.

For proctors, there is also a summary test ticket that shows the test access codes. Note that if there are multiple sections to the test, there is an access code per sub-test/section that is a configurable option in the assignment creation process.

Additionally, the test assignment process lets users create ‘windows’ when assessments are available. These windows can be adjusted to the minute; if a student logs in outside the window, then he or she will receive an error message stating the assessment is not available at this time. This prevents students and administrators from accessing the content of the assessment outside the school day. For added security, the content of the assessment is not available to administrators within the platform at any time. Users may not print out assessments. All secure documents must be ordered and shipped from CTB.

No content meta-data, including but not limited to the correct answers for items, are stored on the client workstation. All scoring is performed on the TDP servers.

Security Plan and Server Robustness The online testing system and its sub-applications adhere to strict, industry-standard security procedures. Best practices are also followed for physical access, intrusion protection, and virus protection. In addition, each element of the system is redundant, minimizing the risk of failure.

System Security All CTB systems meet strict standards for security of the data and the systems themselves.

Access to production servers is strictly limited to a few network and senior software engineers. Each engineer, and his or her level of access, must be approved by McGraw-Hill Education’s (MHE) Corporate Infrastructure Security team and MHE's infrastructure team. Access to the servers requires at least a ten-character password that incorporates three of the following four character types: mixed case, alphabetical, numerical, and symbolic. After access rights have been granted, any changes to the production servers must be approved.

Security of test items and student information is maintained at all times, with security procedures acting at three levels:

1. Physical security preventing access to the machines on which data reside or are processed 2. Network security, including protection of CTB networks from infiltration and secure

transmission of data to CTB networks and others 3. Software security, ensuring that only authorized users access information on CTB systems and

that their access is limited to only the information that they are authorized to view We describe key security procedures that will protect NEAC member states’ items, ensure confidentiality and privacy, and enforce state laws, the Family Educational Rights and Privacy Act (FERPA), and other federal laws below.

Physical Security For the test administration software, data will reside on servers in the cloud; CTB’s hosting provider maintains 24-hour surveillance of both the interior and exterior of its facilities. All access is keycard controlled, and sensitive areas require biometric scanning. McGraw-Hill Education hosts all other

Page 48: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 48 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

servers for CTB in its location at East Windsor, New Jersey, with a secondary data center in Secaucus, New Jersey.

Access credentials are assigned only for authorized data center personnel, and only they have access to the data centers. Visitors’ identities are verified, and visitors are escorted at all times while in the facility.

All data center employees undergo multiple background security checks before they are hired.

Secure data will be processed at CTB facilities and will be accessed from CTB machines. As noted, access to facilities is keycard controlled. Visitors must sign in and be escorted while in all data centers. All servers are in a secure, climate-controlled location with access codes required for entry. Access to servers is limited to network engineers, all of whom, like all employees, have undergone rigorous background checks.

Staff members at both McGraw-Hill Education and our cloud service provider receive formal training in security procedures to ensure that they know and implement the procedures properly.

CTB protects data from accidental loss through redundant storage, backup procedures, and secure off-site storage.

Network Security Hardware firewalls protect all networks from intrusion. They are installed and configured to prevent access for services other than HTTPS for secure sites. The firewalls provide a first level of defense against intrusion, backed up by a capable second line: hardware and software intrusion detection and remediation.

Intrusion detection systems constantly monitor network traffic and raise alerts for suspicious or unusual network traffic.

All companies hosting CTB systems maintain security and access logs that are regularly audited for login failures; such failures may indicate intrusion attempts. Suspicious log entries are investigated and resolved.

All secure data transmitted across the public Internet are encrypted using SSH (AES), FIPS 140-2, or an IPSec VPN. Secure Web sites encrypt data using 128-bit SSL public key encryption.

The hosting environment is protected by an Intrusion Prevention System (IPS) appliance at the perimeter. The IPS appliance combines intrusion protection and vulnerability management technology into a single integrated solution that offers both proactive and reactive protection from the latest threats.

Software Security All secure Web sites and software systems enforce role-based security models that protect individual privacy and confidentiality in a manner consistent with state laws, FERPA, and other federal laws. All systems implement sophisticated, configurable privacy rules that can limit access to data to only appropriately authorized personnel.

Different states interpret FERPA differently, and CTB supports customized interpretations. The system is designed to support these interpretations flexibly. Some states limit a school’s access to data collected while the student attends the school, limiting access to historical data for students who transfer into the school. Other states provide the full history of data to the school or teacher who has jurisdiction over the student at any point in time. Similarly, while some states provide each teacher with access to information about all students in the teacher’s school, other states limit access to those students to whom the teacher provides instruction. CTB systems can be configured to support all these scenarios and more.

Page 49: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 49 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Secure transmission and password-protected access are basic features of the current system and ensure authorized data access. All aspects of the system, including item development and review, test delivery, and reporting, are secured by password-protected logins.

CTB systems use permission-based security models that ensure that users access only the data to which they are entitled and that limit their ability to change those data according to their rights.

Security Audits CTB conducts periodic security audits, including an electronic security audit of every new system deployed. McGraw-Hill’s Corporate Information Security (CIS) team conducts the security audit. Security personnel pose challenges to the software systems to emulate security threats.

Server Robustness

Fault Tolerance through Redundancy

The components of the TDP are architected for high-availability. The system will withstand failure of any component with little or no interruption of service. One way that our platform achieves this robustness is through redundancy. Key redundant systems include:

The test administration subsystem's hosting provider has redundant power generators that can continue to operate for up to 60 hours without refueling. With the multiple refueling contracts in place, the generators can operate indefinitely. The provider maintains an n + 1 configuration of 16 diesel generators that, at maximum capacity, can supply up to 2.0 megawatts of power each.

Hosting providers have multiple redundancies in the flow of information to and from data centers by partnering with nine different network providers. Each fiber carrier must enter the data center at separate physical points, protecting the data center from a complete service failure caused by an unlikely network cable cut.

Every installation is served by multiple Web servers, any one of which can take over for an individual test upon failure of another.

Active/passive clusters of database servers are configured so that the passive node takes over in the event of failure of the active node.

Each database server in a cluster has dual connections to the disk arrays containing the system data. Each disk array is internally redundant, with multiple disks containing each data element. Failure of

any individual disk is recovered from immediately by accessing the redundant data on another disk.

Archiving and Backup Data are protected by nightly backups. We complete a full weekly backup and incremental backups nightly. The systems are run with full transaction logging; enabling us to restore the system to its state immediately prior to a catastrophic event.

The server backup agents send alerts to notify the system administration staff in the event of a backup error, at which time they will inspect the error to determine whether the backup was successful or they will need to rerun the backup.

All backup media are stored in a secure, fireproof, off-site location. Our hosting provider ensures that all media sent off-site are shipped in locked, water-resistant, and impact-resistant containers. The off-site vendor does not have direct access to individual medium containing customer data at any point or any time during transport. The off-site storage location is audited on a regular basis to ensure proper physical security, media management, and location tracking are meeting the hosting provider’s industry-standard guidelines.

Test Administration Data Recovery Restoration audits are completed to verify the recoverability of data from the backup sets. These audits involve the restoration of data from a random system in the hosting environment along with integrity checks of the data to ensure that they are intact.

Page 50: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 50 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

In the event of a catastrophic failure, data are restored from the full backup. Incremental backups are then restored to return the system to its state on the previous night. Finally, full transaction logging allows us to repeat any database transactions prior to the catastrophic event, returning the database to its state immediately prior to the event. Specific procedures for recovering data to a database and to a file system are described below.

Recovery Procedure (Database)

The database recovery procedure would be used to recover from an unlikely database hard-disk failure that resulted in the loss of the logical data drive. The procedure involves six steps:

1. Determine the last full and differential backups 2. Perform a tail-log backup 3. Restore the full backup 4. Restore the differential backup 5. Restore the tail-log 6. Verify the integrity of the database

Test Security Training and Resources To ensure fidelity to security procedures, it is imperative that the processes to keep item, test and student data are clearly articulated for each administration. CTB will work with NEAC to include specific security training into the four regional trainings as a critical area of responsibility for test administrations and educators. We will use the security procedures outlined in the Operational Test Administration Manual and resources provided by the Smarter Balanced Assessment Consortium as the foundation for the training, but we will add NEAC context and local elements to the training and resource materials. It is our vision that the security training included in these regional workshops will strive to provide instructional and testing staff members with the complete process for proctoring the computer tests (and paper-based tests) and other related administrative duties to ensure that test security and the implementation of standardized protocols are followed.

8.6.2 Chain of Custody 8.6.2.1 Develop and implement processes and procedures the Contractor will use to ensure the security, integrity, and accuracy of materials shipped, transported, and received while maintaining chain of custody.

8.6.2.2 Develop and implement policies, guidelines and sign-off procedures for State, District, and School officials to establish and document a chain of custody for hand-offs to ensure that documents are received, accounted for, and distributed and returned.

CTB will implement the processes necessary to ensure the integrity of the chain of custody as this applies to all test material reviews and paper-pencil test materials. As discussed above, CTB's management team will monitor all shipments and deliveries through customer receipt and signature at the Department, districts, schools/sites to ensure that all materials are delivered in a timely and accurate manner. All our carriers provide electronic scheduling and tracking to ensure 100 percent accuracy of shipments for this program.

In addition, the Project Director will be responsible to implement a customer approval, including customer sign-off form, to guide the chain of custody process through development and final review of all documents and deliverables. This critical project management tool will ensure that all critical decision-makers within NEAC have the opportunity to review and approve critical deliverables and documents, and creates the means to track, archive, and trigger any subsequent customer approval prior to making any requested changes.

Page 51: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 51 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.6.3 Data Forensics 8.6.3.1 For on-line assessments, describe plans and procedures to provide continuous updates that capture a variety of data including but not limited to: 8.6.3.1.1 time of testing, 8.6.3.1.2 all student answer choices including the final choice used for scoring; 8.6.3.1.3 response latency; 8.6.3.1.4 tracking the movement of the examinee through the test; 8.6.3.1.5 student response times; 8.6.3.1.6 accessibility options used by the student; and 8.6.3.1.7 analysis of student gains over time.

CTB's test delivery system (TDS) captures a significant amount of data during a test administration for the purpose of supporting an adaptive administration of content to the student, doing post-hoc analysis, such as item-specific analyses or checking for test irregularities (see Section 8.8.1.14), and reporting of student results. All students are identified through a unique student identifier (SSID), and all data including student demographic and performance data is associated with the student record through this identifier.

Each test administration is scheduled according to a student population’s specific needs based on data obtained from student information systems and acquired from previous administrations. An individual student can be required to test within that testing session, or the system can simply identify an assessment to be overdue in operational reports. Regardless of the constraints applied to an assessment, all assessments taken by students are tracked using start-time and stop-time for each session, with elapsed time tracked for the overall test session.

Currently, navigation, key usage, or mouse movement that do not result in a response are not tracked. Since the test is administered in adaptive mode, the next item typically requires an answer be submitted for the previous item and a related update of the student’s ability estimate be calculated. Under adaptive mode, it is in generally not advisable to allow back navigation and changes to answers to previous items. A clever student could use such an option to answer the initial items wrong to subsequently get easier items and later correct the earlier answers, ensuring a great final score. Within limited sections, such as an item set related to one joined stimulus, this risk might be acceptable, but for the moment, CTB does not support back navigation. Thus, items will always be answered in the sequence that they are presented and, once submitted, answers cannot be changed. We track the final answer to which the student commits, which is then used for scoring. All responses submitted by a student are time-stamped and allow for an analysis of the elapsed time for each response (see Section 8.8.1.14 for a detailed outline of our response-time analysis that is used to identify text irregularities).

Requirements for accessibility options that are associated with a student are used to schedule assessments with appropriate accommodations. The TDS will use the student’s accessibility requirements to ensure that an adaptive assessment will not deliver an item that does not meet the student’s specific needs.

Student performance on assessments is stored and used to generate longitudinal reports that form the basis for identifying gains in performance on both tests and skills at an individual, group/class, or school or higher level aggregate level. Performance gains are also one of the factors we analyze to identify potential test irregularities.

Page 52: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 52 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.7 Test Administration The following tasks are primarily the responsibility of the contractor, but will also require direct involvement of the Project Management Team. Proposers will describe how they will address the following tasks and responsibilities:

8.7.1 Utilizing information about the testing window provided by Smarter Balanced, the Contractor will identify and publish an annual calendar of the assessment window well in advance of testing. Each state is responsible for setting any limits or modifications to the testing window as required by legislation or other factors.

8.7.2 The Contractor will develop and publish guidelines on how and when and what materials, including student-level directions for administration, should be made available prior to the administration window.

8.7.3 The Contractor will develop and publish a protocol for preparing the testing environment, to be included as a part of the procedure manuals and training.

8.7.4 The Contractor will develop and maintain a secure database of District Test Coordinator contact information.

CTB will work with NEAC to develop, publish and communicate the full NEAC testing calendar, inclusive of all key dates for the preparation and administration of the NEAC assessment. The calendar will be established with NEAC and communicated will begin in August 2014 to allow LEAs to plan for the spring testing season. Any local limits or modifications to the full testing calendar will be identified and clearly communicated as part of the test administration calendar.

CTB will maintain responsibility for publishing all materials and resources necessary to communicate test administration protocols that must be followed for LEAs to complete the test administration, including test administration manuals and other ancillary guidelines.

CTB will provide all Test Administrator Manuals (TAM) required for the operational administration. As a result of the Smarter Balanced Cross-Contractor collaboration group for the Smarter Balanced Field Test, our program teams have deep familiarity with the details of the Field Test administration and will leverage this knowledge into updating and developing operational manuals. As noted, we will begin with the manuals developed under the Smarter Balanced administration contract 19b and revise these documents as appropriate to provide accurate information for the operational administration. Manuals will be developed by grade or grade-band if required by the details of the administration, and they will include administration instructions, time requirements, and scripts (student-level directions), as appropriate.

CTB will develop and maintain the NEAC District Test Coordinator database that will include all Test Coordinator contact information.

8.8 Scoring

8.8.1 General Scoring Requirements 8.8.1.1 The Proposer will describe a process for ensuring the accuracy, reliability, and confidentiality of scoring for open-ended responses, including a process that provides consistent and accurate hand-scoring.

8.8.1.2 The Proposer’s response will include a description of the qualification and experience of the scorers proposed for the NEAC tests and a rationale for the proposal.

8.8.1.3 The Proposer’s response will provide details on the processes and procedures used to train scorers and qualify scores for participation in scoring.

8.8.1.4 The Proposer’s response will provide details on the quality control processes used to monitor scoring rates and accuracy. The response will also provide details on processes used to identify scorers for retraining or removal and processes used to invalidate scores from particular scorers. This should include rate of double-scoring, selection of responses for double scoring, etc.

8.8.1.5 The Proposer will outline policies for the type and frequency of information provided from the scoring

Page 53: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 53 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

process (within and across scoring sites) to the states.

8.8.1.6 The states’ project management shall have the right to request, “on- demand” within four hours any regular scoring report and to monitor activities at scoring sites.

8.8.1.7 The Proposer will provide for representatives or agents from the States to be present at the scoring site(s) during, scoring qualification, training, and initial scoring. The contractor’s response must discuss the issues of making it possible for oversight with a very limited staff.

8.8.1.8 The Proposer will produce a document summarizing the scoring process for the current year that includes information described in tasks 1 through 7 above. 8.8.1.9 The Proposer will describe a process for identifying, evaluating and informing the states about “crisis papers” (e.g., student responses that contain disturbing content). 8.8.1.10 The Proposer will provide a plan that delineates the process for rescoring, late batch scoring, and score verification requests

8.8.1.11 The Proposer will describe a plan for resolving requests for rescoring hand-scored open-ended responses

8.8.1.12 The Proposer will describe the procedures and safeguards established for the scoring process that ensure confidentiality is maintained and student identify is securely controlled.

8.8.1.13 The Proposer will describe the processes that will be established to perform hand-scoring verifications of machine-scored items that are included on the test.

8.8.1.14 The Proposer will describe data forensic procedures that will be used to identify cheating and/or other irregularities in test administration and student response.

8.8.1.15 The Proposer will describe a process and procedures, including fees that may be assessed, for rescoring requests from individuals other that the states’ representatives, as well as dispute resolution related to scoring.

8.8.1.14 The Proposer will describe data forensic procedures that will be used to identify cheating and/or other irregularities in test administration and student response.

Handscoring Experience Our proposed handscoring team offers NEAC extensive and unparalleled handscoring experience. Since 1985, CTB has excelled in assessing student work to provide valuable information to educators that can enhance their instruction and, thus, student learning. Since the beginning of our handscoring experiences, CTB has continued to improve our scoring processes through evolution and refinements in our scoring and score validation processes. Through our work handscoring/scoring the Pilot and Field Test administrations as the prime contractor for Smarter Balanced, CTB has moved forward as an industry leader in handscoring to support a computer-adaptive (CAT) item pool and the new Smarter Balanced Performance Tasks in ELA and mathematics.

There are several critical differences in rubric development, training and qualification of readers, handscoring materials development, and handscoring quality monitoring when handscoring a large CAT item pool from what has been traditionally done to support more limited fixed-form item banks. These differences are crucial to ensure that handscoring contractors can support the scoring of the new assessments as NEAC:

1. Administers the Smarter Balanced CAT item pool where the exposure rate of each constructed-response item is much lower than it would be in a fixed-form test

2. Districts administer the test across the full and long testing window and need to see results as soon after testing as possible

CTB’s experience as a leader in conducting the Smarter Balanced rangefinding, reader training, and handscoring materials development will provide significant benefit to NEAC as member states move to operational testing in 2015 and beyond.

Page 54: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 54 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Procedures to Ensure Scoring Quality and Reliability

Scorer Qualifications, Recruitment, and Training CTB will ensure that that all scoring supervisors, team leaders, and readers possess a bachelor’s degree or higher and that all staff members recruited and hired as readers are also degreed. CTB has instituted a comprehensive, standardized training model to direct the training of our Handscoring staff. Our commitment to the training and qualification process stems from the knowledge that the consistency and reliability of the scores assigned is directly aligned with the quality of the training the readers receive. In our model, team leaders are trained prior to readers. Team leaders are extremely critical in the process of scoring accurately, as they must shepherd a team of readers throughout live scoring, ensuring consistency within the team and alignment to the scoring guidelines and protocols.

Training for readers and team leaders is led by Handscoring supervisors and follows the process below:

An overview of the training process, types of materials that will be used, terminology of our industry, and the specific assessment being trained.

An introduction to and review of the items/performance task and any supplementary materials, such as reading passages, etc.

An introduction of the scoring guide for each item in the assigned Rater Item Block (RIB) set with a focus on the rubric and corresponding anchor papers. These will be a key resource for readers throughout scoring. It is our general practice to provide the scoring guides electronically to readers.

An introduction to the training set for an item. A review of the training set using standardized annotations to explain the scoring rationale, referring

readers back to the item’s rubric and anchor papers. After the training sets have been completed for all items, the qualification process can begin. Training is conducted online in CTB’s scoring system. Our Handscoring supervisors and team leaders will determine whether a reader qualifies upon the reader’s completion of the set. CTB Handscoring will apply these qualification standards:

For four-point items—non-adjacent agreement rate of less than five percent and perfect agreement rate of 80 percent

For three-point items—non-adjacent agreement rate of less than five percent and perfect agreement rate of 80 percent

For two-point items—perfect agreement rate of 90 percent For one-point items—perfect agreement rate of 100 percent Readers must also pass a Condition Code Qualifying set in order to be eligible to score. Only qualified readers are assigned to score student responses and receive project-specific training on such issues as the handling of alert or sensitive papers. CTB’s daily supervision of all Handscoring staff members will continue once scoring begins and will follow quality assurance procedures as described below.

Our experienced staff recognizes the importance of clarifying and understanding the intent of each score point assigned to each paper.

Checks and Controls The scoring of all performance items must be highly reliable to ensure that each student’s responses receive a fair, consistent, and accurate score. To this end, CTB will employ a variety of ongoing checks and controls, including systematic administration of intra-rater reliability reads (including validity papers/check-sets and read-behinds) as well as inter-rater reliability. A fifteen percent blind double-read will also be used with ongoing read-behinds to further validate scoring accuracy. To ensure the accuracy and reliability of the handscoring, CTB will institute a series of quality processing steps:

Page 55: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 55 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

CTB will conduct intra-rater reliability reads through targeted read-behinds as part of our standard procedure. The targeted read-behind system allows Handscoring staff members to provide timely feedback to the reader because the team leaders can discuss incorrectly scored responses with the reader as soon as a problem is detected.

Inter-Rater Reliability. In our scoring system, readers scores concurrently and do not know when they are participating in inter-rater reliability monitoring. This allows us to establish inter-rater reliability statistics for all readers and for the project as a whole. Inter-rater reliability statistics can be scrutinized to determine severity or leniency trends, agreement rates, discrepancy rates, the distribution of scores, and the number of condition codes. Furthermore, inter-rater reliability statistics are an excellent source to determine team drift and team leader influence.

Validity Papers. The purpose of validity sets is to provide consistent accurate scoring reflective of the scoring guides throughout the entire scoring session. By administering these pre-scored papers throughout scoring, we can ascertain whether the scoring teams/individuals are drifting from the original scoring criteria. Validity papers will be administered at pre-established intervals based on the rate of scoring, and they will appear to readers and team leaders in the same format as actual student responses. The scores assigned to the validity papers are compared to the conventional or approved score, and through this comparison, information is obtained about the accuracy and reliability of the scorer.

Due to the importance of the NEAC program, the scoring of the constructed response items and essays must be highly reliable to ensure that each student’s responses receive a fair, consistent, and accurate score. To this end, CTB will employ a variety of scoring activities proposed to ensure a fair and accurate scoring of all student responses. These activities include systematic administration of intra-rater reliability reads (including check-sets and read-behinds), as well as inter-rater reliability monitoring. While these activities occur across contracts, and across scoring contractors, CTB deepens our commitment to accuracy and reliability in scoring in the provision of an independent Data Audit Team. For the scoring of NEAC, the Data Audit Team will provide an extra layer or “double check” in the monitoring of the validity in assignment of scores.

At CTB, our quality monitoring does not stop with the quality of our staff and reporting tools. We also strive to excel in our oversight and verification of the quality assurance processes as well as any corrective actions that may result through an internal audit program. To this end, CTB has developed an additional Handscoring branch—the Data Monitor team.

Data Monitoring Team Our staff of Data Monitors supports quality assurance for all CTB programs across all sites. The Data Monitor teams support the quality assurance process for all CTB programs by accessing the same reports reviewed daily by the scoring teams and by creating summary-level reports. They are responsible for identifying readers who are performing below the quality standard and for prescribing corrective actions for readers falling below the required quality standards.

Data monitors also act in an audit capacity to assure that no issues “slip through the cracks.” Reader and team leader performance is monitored regularly by the data monitors. They monitor and audit both the data that are generated as well as the procedures carried out by our staff to ensure the program is implemented according to its particular specifications. Any reader who does not maintain adequate levels of scoring accuracy will have his or her scores voided after collaboration between and determination from the appropriate Handscoring project manager (or his/her designee). Voided scores will be re-scored by qualified readers, and voided readers will be retrained and allowed to return to scoring at the discretion of content leadership.

Handscoring’s item-level qualification and data monitoring processes, detailed above, have been used for the scoring of the Pilot and Field Tests and have become the proposed criteria for use by all vendors for the 2015 operational Smarter Balanced assessment.

Page 56: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 56 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Alert Papers CTB will develop, obtain approval for, and document the overarching process and associated policies for the handling of all forms of testing irregularities. During the handscoring and rubric validation process, our readers are trained to watch for indications of “troubled students” and/or cheating. Such information can require urgent attention prior to the completion of handscoring. A well-established escalation process is in place to immediately identify and begin the notification process for any student response that may be of a sensitive nature. The different types of "Alert" responses are listed below. Troubled Student Alerts include, but are not limited to, the following:

Suicide Criminal activity Alcohol or drug use Extreme depression Violence Rape or sexual or physical abuse Self-harm or intent to harm others Neglect CTB's handscoring system will support the "Alert" process from the initial identification, through the escalation and verification process completed by a Handscoring supervisor, and into the NEAC notification process. CTB proposes the following process:

Once identified during the handscoring and/or rubric validation process, the responses in question will be reviewed by a supervisor for verification and then immediately routed to the Handscoring Project Manager for notification.

The Handscoring Project Manager will post an electronic response to a secure site as a unique file with the appropriate identifying information.

Additionally, the Project Manager will be responsible to ensure that the appropriate state representatives are notified.

The state of origin Testing Director for the flagged student will be notified the same day via a phone call. He or she will be directed to the posted file, and if requested, and a copy of the response and the necessary district/school and student identifying information will sent via overnight mail. Finally, responses indicating “testing irregularities” will be sorted by state, logged, and sent to the NEAC state at the conclusion of scoring.

Rescoring CTB proposes a standardized rescore and score appeals. This process is initiated when a district superintendent notifies the NEAC state agency in writing that there is a compelling reason to believe a student’s score should be higher (e.g., the score is not consistent with overall classroom performance), or if the administrator determines a student should not have received a Level Not Determined (LND). Once the appeal is received, we will investigate the incident and provide the results to the school district and to the member agency within fifteen (15) business days of receipt of the appeal request.

Maintaining Confidentiality All handscoring and rubric validation is completed without the reader/reviewer having access to student biographical or identifiable information. Further, CTB ensures all readers complete confidentiality agreements and are bound to maintain and protect the confidentiality of McGraw-Hill information, property, and intellectual property and that of our customers. CTB will work with NEAC to review and modify the confidentiality agreements to ensure the agreement meets the needs of this program.

Page 57: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 57 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Rubric Validation CTB will conduct rubric validation for all technology-enhanced items. This process, which is analogous to rangefinding for human scored items, is completed once there are sufficient responses to technology-enhanced items. It is not clear at this time if NEAC requires that rubric validation occur for the 2015 operational items set. All machine-scored items were previously exercised through the rubric validation process, and scoring rules (and human readable rubrics) were verified for correct scoring by the Smarter Balanced equation and graphic response engines. Should NEAC be interested in completing rubric validation using NEAC student responses during the operational administration, we would complete this process several weeks into the testing window. We would work with NEAC to identify the timeframe when this will occur to ensure a sufficient and representative sample of student testing has been completed. This process is generally conducted at the CTB handscoring site. Any changes to rubrics identified through a review of operational student responses would be brought to a NEAC scoring committee before these changes are made. Additionally, this information should be communicated to Smarter Balanced Consortium. CTB recommends that we provide any final revisions, along with sample responses, in an end-of-project report to both NEAC and Smarter Balanced.

Data Audit Team At each of our Handscoring Centers, we house a Data Audit Team under the leadership of a Data Supervisor. The Data Supervisor reports directly to the Handscoring Chief Reader and provides an extra layer or “double check” in the monitoring of the validity in assignment of scores.

Throughout the scoring window, a full complement of handscoring monitoring reports are posted daily on a CTB intranet site, centralizing reports across sites and allowing oversight of the quality assurance process by the Chief Reader. In addition to the scoring supervisors, the Chief Reader ensures that all quality processes detailed in the handscoring specifications are being maintained, that there is consistency in processing between the scoring sites, and that appropriate actions are being taken to produce the highest quality scoring.

This quality assurance team acts in an audit capacity supporting the program and is of paramount importance to the reliability and consistency of scoring.

Daily Controls in addition to the studies that we will conduct to control rater drift, and the specialized monitoring completed by our Data Monitoring Team, CTB’s day-to-day handscoring process incorporates several critical elements to ensure consistency in the scores the readers assign. The daily use of check sets, comparisons of item performance between field test and operational administrations, and use of original training materials all work to ensure consistency in the scoring of constructed-response items.

Handscoring Reports The CTB electronic handscoring system provides multiple reports, at different levels of detail, to ensure consistency in scoring and to make sure that any problems that exist on the scoring floor can be identified and remedied immediately. Monitoring reports are available to supervisory staff and data monitoring staff in real-time to ensure the immediate identification and correction of any issue that could affect the reliability of scores assigned by readers.

Participation by NEAC We welcome NEAC’s involvement throughout the daily monitoring processes. To support your involvement, we can provide a sub-set of daily monitoring reports. Additional summary and cumulative level reports can be provided to NEAC according to a predetermined schedule. During handscoring NEAC may also request any level of handscoring monitoring report for their review. CTB will generate this “on-demand” report for review, and deliver within four hours of the request being made. By working with NEAC, we hope to establish the type and frequency of quality monitoring reporting so that the scheduled delivery of reports meets the needs of NEAC staff and reports are available prior to being requested. Finally, CTB will utilize our distributed handscoring system. While we certainly do not

Page 58: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 58 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

expect for NEAC to have to generate their own quality monitoring reports, this option can be made available.

CTB will support NEAC representatives to be on-site fort the handscoring process during reader training, qualification and initial scoring. CTB will complete the handscoring of NEAC at our Indianapolis Scoring Center, so all monitoring by NEAC staff can occur at a single site.

Forensic Procedures to Identify Cheating and Other Irregularities A breach to test security can have serious implications for the psychometric integrity of the reported test scores and on the interpretations and consequences of those scores (Standards for Educational and Psychological Testing, 1999; Standard 8.7). Test security refers to the protocols in place in a testing system that will protect the psychometric integrity of the test scores. It is important to state that any anomalous findings reported through data forensic techniques do not automatically mean that a testing irregularity occurred. Reviewers should note that results should only be used only to facilitate identification of systematic problems within classrooms, schools, or districts. That is, these types of analyses must be supported by additional, collateral information (e.g., a reported testing security violation or interview) before conclusions regarding any improprieties are reached.

The following discussion addresses CTB’s procedures for Data Forensics Analysis for test security for all grades and content areas for NEAC. There are multiple indicators of a security breach or an irregularity in test administration practices, typically used within a body of evidence before determining a breach or irregularity occurred. While the following methods are proposed, CTB will work closely with NEAC to select the most valid and appropriate analyses for NEAC's intended uses. In addition, all methods can be aggregated across different group levels. CTB can also support working with third-party vendors, should NEAC determine different types of forensic analyses are required for specific incidences.

Suspicious Changes in Test Scores in Adjoining Years (Gains Score Analysis) Score changes will be examined between years using a regression model. The scores between past and current years are compared. Test scores from the most recent opportunity (e.g., grade 4) are regressed on the last score from the previous year (e.g., grade 3).

: most recent score in current year

: most recent score in past year

: difference in test end days between Yt and Yt–1

: residual

A large score gain or loss between grades is detected by examining the residuals for outliers. The residuals are computed as observed value minus predicted value. To detect unusual residuals, we compute the studentized t residuals. An unusual increase or decrease in student scores between administrations is flagged with studentized |t| residuals greater than 3.

The computation of the studentized t residuals is as follows:

Consider a simple regression model ,

The residuals can be expressed as 1 ,

where , called the hat matrix.

For linear models, the variance of the residual ei for student i is 1 , and an

estimate of the standard deviation of the residual is 1 .

Page 59: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 59 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

The residuals can be modified to better detect unusual observations. The ratio of the residual to its

standard error, called the standardized residual, is 1⁄ .

If the residual is standardized with an independent estimate of , the result has a student’s t distribution under the normality assumption. If we estimate by , the estimate of obtained after deleting the -th observation, the result is a studentized residual. Studentized t residuals can be

computed as 1 ⁄ , where i = student i, is the estimate of s after deleting the ith observation.

The number of students with a large score gain or loss is aggregated at the classroom, school, and district levels. Unusual changes in an aggregate performance between years are flagged based on the average studentized t residuals in an aggregate unit. For each aggregate unit, a critical t value is computed and is flagged with |t| greater than 3,

where =standard deviation of residuals in an aggregate unit; =number of students in an aggregate unit, (e.g. testing session, or test administrator), and 1 . The QA report includes a list of the flagged aggregate units with the number of flagged students in the aggregate unit.

Caveats

Explanations for score changes may be due to things such as interventions or instructional or curriculum change. Therefore, the flagging criterion should be taken only as a stimulus for further investigation.

Wrong to Right Answer Change (Erasure) Analysis CTB acknowledges that applying the "erasure" analysis to online testing settings is complicated due to inclusion of various technology-enhanced item types and warrants further research. We will conduct an erasure analysis for paper-pencil administrations and will conduct an analogous analysis for online test administrations. The calculation of the number of wrong-to-right answer changes is formulated with defined school groups as the unit of analysis. The groups can be defined as intact instructional classrooms, homerooms, or any testing group identified using the name or ID of a teacher responsible for leading the group's test administration.

In the description of the procedure below, 1, . . , denotes the classes in the state, whereas and denote the size and mean number of wrong-to-right (WTR) erasures for class , respectively. In

addition, μ and denote the mean and the standard deviation of the distribution of the number of wrong-to-right answer changes for the population of individual students in the state.

The basic idea underlying the procedure is a statistical test of the null hypothesis that the mean number of wrong-to-right erasures for the school class constitutes a random sample from the administration distribution of wrong-to-right erasures. The hypothesis is tested against the (right-sided) alternative that the mean number is too high to be explained by random sampling. Classes for which the null hypothesis has to be rejected are flagged for further scrutiny. A well-known central limit theorem in statistics tells us that the sampling distribution of is asymptotically normal with mean:

and standard deviation:

Page 60: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 60 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

It is evident in the formula for the administration standard deviation that the classroom flagging criterion for each classroom is adjusted for the number of test takers in a classroom. This adjustment ensures that the flagging criterion is equally stringent for classrooms with considerably different numbers of test takers.

In addition, minimizing the probability of false positive (Type I) errors in this statistical test is crucial in this analysis. Flagging classrooms for further scrutiny is typically perceived as suspicion that students or educators have cheated by erasing incorrect answers and replacing them with correct answers.

The statistical procedure is as follows:

For each class , 1, . . , , calculate 4 . Flag the classes for which is larger than the result.

Caveats

Statistically, the flagging criterion proposed is very conservative. The standard normal table shows that under random sampling, the (asymptotic) probability of a sample mean more than four standard deviations above the population mean is less than .0001. However, rejection of the null hypothesis tells us only that the observed mean number of wrong-to-right erasures is unlikely to be the result of random sampling. Specifically, it does not necessarily prove any form of testing impropriety.

The following caveats are always applicable:

The normal distribution holds only for large classes; for smaller classes the result is approximate. Rejection of the null hypothesis does not necessarily imply cheating. Alternative explanations are

possible. The flagging criterion should thus be taken only as a stimulus to look for additional evidence and find

out what really happened in the classroom.

Response Time Analysis CTB will also use a procedure described in van der Linden and Guo (2008)9 for online testers to identify aberrances in test administration related to response time. This procedure is intended for use in computer adaptive testing, but it is also relevant to on-line linear fixed form administrations.

This procedure assumes that item and person parameters for all items and test takers are available and that the administered test design is clearly identified in the data and/or additional information. CTB will apply the response time model using the time-per-item data received from the online administration system.

Response time analysis is proposed, as pointed out in van der Linden and Guo (2008):

Response times are continuous rather than binary, allowing more information about the size of aberrances than responses alone.

Response time statistical checks maintain their power throughout the test. The response time model proposed separates the time intensity of an item from the speed of the

test taker. It would be very difficult, if not impossible, for a test taker with pre-knowledge or memorization intent to time responses to match the time intensity (item) parameter and the speed (person) parameter for all items administered.

 

9 van der Linden, W.J., and Guo, F. (2008). Bayesian Procedures for Identifying Aberrant Response-Time Patterns in Adaptive Testing.

Psychometrika, 73(3), 365-384.

Page 61: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 61 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

The response time model to be used for this analysis is as follows:

; , , √

exp ln ,

where is the speed at which test taker takes the test, is the time intensity of item , and is a discrimination parameter for item . van der Linden (2006)10 describes estimation procedures for these parameters.

CTB will estimate the response time parameters and then identify aberrant response time patterns. Each logged response time will be standardized using a predicted mean and standard deviation, given the response times for all other items taken by the same test taker. A response time to an item will be flagged as aberrant when its standardized residual is more than 1.96 or less than -1.96. Rates of flagging outside of the significance level of the test, along with patterns of aberrances found, will be examined and reported to NEAC.

In future years, NEAC may want to consider the use of ANOVA and CUSUM (van Krimpen-Stoop and Meijer, 200111; Egberink, et al, 201012) analyses on item response times to detect possibly compromised items from administration to administration.

Caveats

Explanations for aberrant response time patterns exist other than cheating. One such example is poor time management. Therefore, the flagging criterion should be taken only as a stimulus for further investigation.

Detection of Gaming in AI scoring CTB has used artificial intelligence (AI) scoring of essay items in large-scale assessment settings. As part of the Smarter Balanced Pilot and Field Tests (Smarter Balanced RFP No. 16/17), CTB has conducted a series of research studies on reliable and valid use of AI scoring. With the inclusion of more automated scoring in high-stakes assessments, more students may be tempted to take advantage of automated scoring by including construct-irrelevant materials in their responses, which is aimed at inflating their score without being noticed (‘gaming’). If undetected, gaming can pose a serious threat to score validity and raise public mistrust in automated scoring.

CTB's approach upholds score validity as the top priority while heeding the latest research findings and public acceptance of automated scoring. As such, CTB is proposing 100 percent human scoring with 100 percent AI read behind and human adjudication of non-exact scores for Year 1 (more details are in Section 8.8.2). Using the data collected in Year 1, CTB will closely examine the impact of gaming on scores assigned by automated scoring engines and employ statistical methods to detect gamed responses. We have identified different gaming strategies: repeating responses several times, adding paraphrases from item stimulus material, inserting academic words, and inserting content words. Preliminary study results revealed that gaming generally increased the score of low-quality responses

 

10 van der Linden, W.J. (2006). A lognormal response model for response times on test items. Journal of Educational and Behavioral

Statistics, 31, 282-2004.

11 Van Krimpen-Stoop, E.M.L.A., & Meiher, R.R. (2001). CUSUM-based person fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26, 199-218.

12 Egberink, I., Meijer, R., Veldkamp, B., Schakel, L., and Smid, N. (2010). Detection of aberrant item score patters in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48, 921-925.

Page 62: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 62 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

between 0.25 and 0.5 points. We will leverage the Smarter Balanced research findings on the susceptibility of automated scoring systems to gaming. Outlier detection methods will be used to identify gamed responses automatically, based on atypical feature patterns (e.g., unusual length, repeated phrases or sentences, etc.). We will discuss issues in test validity when AI scoring is in use, strategies for phasing in a higher percentage of AI scoring, and innovative gaming detection techniques with the NEAC states.

Caveats

Effective detection of gaming may require continuous monitoring of and adapting to gaming behaviors and improving the detection techniques. Detecting gaming should be paired with automatic routing of responses for human scoring.

8.8.2 Specific Requirements for Automated Scoring 8.8.2.1 Proposer will describe how it will be demonstrated that the Proposer’s AI engine delivers comparable results to field test scoring

8.8.2.2 Proposer will describe procedures that will be used to establish the quality of the AI engine that includes regularly scheduled performance checks for the scoring of constructed responses using a wide variety of new and previously scored student papers (both AI and hand-scored) using the AI engine.

8.8.2.3 Proposals should provide for Input from psychometricians, hand-scoring experts, and technical staff that will help ensure that the software is providing reliable scoring that is as accurate, or more accurate than, human scoring.

8.8.2.4 Proposer will describe procedures that will be used for recalibration, retraining and delivery must be demonstrated and included as a required resource

8.8.2.5 Proposer will provide evidence that the AI Scoring Engine meets the following additional criteria:

8.8.2.5.1 includes a range of score points, types and styles of writing, and other types of constructed responses;

8.8.2.5.2 includes an automated process to provide a randomly selected predetermined portion of the papers to be hand-scored;

8.8.2.5.3 meets the same standards for accuracy and reliability that exist for human scoring of the same item type;

8.8.2.5.4 provides evidence that the engine meets accuracy and reliability standards and they must be documented and included as part of the process.

8.8.2.5.5 includes validation processes that utilize student responses across the entire population, including a range of score points, types and styles of writing;

8.8.2.5.6 and provides evidence that the AI engine performs as well, or better than, human readers.

8.8.2.6 Proposer‘s System must give accurate, timely assessment results to the NEAC states with capability to disseminate scoring results to schools and districts as quickly as possible. Proposers should describe the strategies and procedures they will use to expedite reporting, with estimates of turn-around time.

8.8.2.7 Proposer's System must employ the use of a SSID System to identify each student and to ensure the accurate matching of the student to test results. NEAC states shall supply the SSID and will provide frequent file updates as needed. To ensure student confidentiality, a unique Smarter Balanced student identifier will be used for data transfer rather than the regular state student identification number. Overall, the system must satisfy Federal and state laws that provide for the protection of personally identifiable student information.

8.8.2.8 Proposer's System must perform the scoring of Smarter Mathematics and ELA assessments in accordance with specifications developed by the Smarter Balanced Assessment Consortium. The scoring engine for Smarter Mathematics and ELA assessments must be able to encompass the full range of the Smarter Balanced metric.

8.8.2.9 If full scoring by Artificial Intelligence is proposed for any assessment, the scoring, the

Page 63: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 63 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Proposer should develop and implement a phase-in plan must be described in which humans have a decreasing role over time.

CTB believes firmly in providing cost-efficient, reliable, and valid scoring. To this end, we have been highly active in efforts to study and improve the automated scoring of text-based responses, as we are responsible for all automated scoring efforts on the Smarter Balanced Pilot and Field Test (Smarter Balanced RFP No. 16/17). Based on the results of the automated scoring studies on the pilot data and in anticipation of our work on the field test, CTB has crafted an automated scoring solution that will provide NEAC with assurances of reliable and valid scoring while phasing in the cost efficiencies machine scoring can provide over time.

Within the scope of the Smarter Balanced RFP 17, we conducted an aggressive research agenda that was aimed at:

Identifying the performance of automated scoring engines on Smarter Balanced items as compared to human scoring.

Understanding the sample sizes for training and validation papers to achieve the best engine reliability possible with the minimum costs.

Understanding the reliability of various engine and human scoring scenarios (e.g., one human and one machine score vs two machine scores vs two human scores).

Understanding the potential for gaming of the automated scoring engines. Better identifying single papers that should be routed to humans due to characteristics that indicate

the engine may not predict a score with high confidence. We also conducted research to determine whether we predict which items, based on their content metadata, were better suited for current automated scoring techniques.

As appropriate, CTB will make use of these relationships in order to provide NEAC with solid hand- and AI-scoring methods and to ensure both CTB and NEAC are at the forefront of AI scoring capabilities.

Coordinating AI- and Hand-scoring CTB’s scoring workflow system, Performance Evaluation and Management System (PEMS), allows for the coordination of human and machine scoring so that NEAC can employ a wide range of machine-scoring implementation options. For example, machines may provide second reads for human scorers, automatically flag aberrant responses or alert papers, provide first and second reads so that a human may need only to resolve discrepancies, or other combinations of human and engine scoring that will improve efficiency of the system. In particular, PEMS can also randomly select pre-determined portions of responses to be hand-scored.

CTB’s strategy to expedite the receipt of scoring results by schools and districts is based on a rolling, “district by district” completion model. Student results for districts testing online will be available for access fifteen business days after the completion of each individual district’s test window. For districts testing in mixed mode, using both online and paper-pencil tests, student results will be available twenty business days after the receipt of the last paper-pencil test documents at CTB’s document processing center. This process will expedite reporting by enabling districts that have completed testing earlier to access results without having to wait until all districts have tested and the overall test window has closed.

A Scoring Plan to Meet NEAC Goals CTB’s solution for automated scoring is based on the results of our research on the Smarter Balanced item pool. We assume that we will use automated scoring for up to three constructed-response items on the ELA assessments. For these items, we propose a phased-in approach:

Page 64: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 64 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

In Year 1, 100 percent of the constructed-responses from the items selected for automated scoring will be scored by both a human and the CTB engine trained for the item during the Smarter Balanced Field Test. Non-exact scores will be resolved by a Handscoring supervisor. Our research indicates that with the engines deployed during field testing, this model provides reliability that is superior to other models (including human-human and engine-engine scenarios) and that we can anticipate a 25-30 percent adjudication rate. This solution provides the opportunity for adjustments that may be required to the automated scoring engines following the test administration while not limiting the turn-around time of scores. In addition, this approach mitigates the risk associated with the size of the training sets provided during the Field Test (on average, 1,000 responses for training and 500 responses for validation) to address NEAC’s concern that the training sets be large enough for accurate scoring while leveraging the work done to train automated scoring engines on the Smarter Balanced Field Test items.

For Year 2, we will refresh engines with lower performance during Year 1 with additional training data so that engines are adequate for 100 percent automated scoring and 25 percent human scoring. We plan to route responses that appear to require a human score for human scoring in addition to holding 15 percent of the responses for engine reliability reporting.

For Year 3 and the out years, we will continue to refresh engines with lower performance and score 100 percent of the responses with automated scoring and 10 percent of those responses with a human as well.

Figure 13 summarizes schematically CTB's phase-in AI scoring plan in which humans have a decreasing role over time. The far left box (HHh) is the current (legacy) standard of using 100 percent human scoring with no AI scoring (M) involvement. Note that the little 'h' denotes human adjudication. In the next box to the right is a hybrid approach (HMh), which, according to our study for Smarter Balanced Contract 17, provides agreement rates higher than two human raters. Our study also revealed that MMh (i.e., two independent AI engines in conjunction with human adjudication) in its current state is not as effective and hence requires further research. What is promising is Mh, a single, best-performing AI engine with targeted human scoring. That is, the AI engine flags responses that are prone to scoring errors.

Figure 13: CTB's Phase-In AI Scoring Plan

During our pilot with Smarter Balanced Contract 17, we have also conducted extensive research on AI scoring of short-response items. The results were similar as they were for the extended-response items. We suggest phasing in AI scoring of short-responses on a delayed scheme starting in Year 2 by scoring 100 percent of the responses by hand as well as using CTB's AI engine. We will then evaluate the results

Page 65: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 65 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

and coordinate with NEAC states and Smarter Balanced to advance this scoring in a way that is acceptable to NEAC states and aligned with Smarter Balanced's research. We propose this approach:

In Year 1, 100 percent of the short responses will be scored by humans. These data will then be used to improve and train CTB’s engine. We anticipate that we can successfully add about 30 items to the pool of machine-scored items each year.

In Year 2 and beyond, 100 percent of the short responses for which engines have been developed will be scored by both a human and the CTB engine trained for the item, using data collected in prior years. Non-exact scores will be resolved by a Handscoring supervisor. We will provide extensive studies regarding the comparability or potential superiority of the engine-scored short responses vs. human-scored short responses.

Additionally, for the out years, CTB will consult with Smarter Balanced and NEAC to determine the level of acceptance for AI-scored short-responses and make adjustments to this plan accordingly.

For mathematics, CTB will score graphics and equation items using Smarter Balanced's engine. Other mathematical item types (other than those that can be machine scored in a straightforward way) will be handscored.

We are happy to share the results of our research on the Smarter Balanced Pilot items with NEAC. That research indicates the gain in engine reliability tapers off after about 900 responses in the training set for essay items and after about 500 responses in the training set for short text items for most of the engines and items we analyzed. Further research is required in this area, and CTB is committed to conducting this research to provide our customers with the most cost-effective solutions possible for training reliable automated scoring engines.

Validating the Quality of Automated Scoring CTB employs rigorous procedures to ensure the quality of scoring engines during engine training, engine validation, and live scoring. We believe that NEAC will be happy with the performance of the automated scoring engines; yet we will also provide ample technical documentation to support the sufficiency of the engines for each of the test administrations.

During the Smarter Balanced Field Test, our engines will be trained on nearly all essay items and some ELA short text items in the Smarter Balanced item pool. Training sets will be drawn as stratified random samples from the population of responses received during the Field Test. On average, the training set size is expected to include about 1,000 responses. Validation sets will be also drawn as stratified random samples from the population of responses received during the Field Test. On average, the validation set size is expected to be about 500 responses.

During training, our PEMS workflow system provides the engines with full responses and highly validated human scores on those responses. The engines are then trained to predict the most likely score for the response.

Following training, the PEMS workflow system provides the engines with full responses without scores. The trained engines then score these new responses and return the scores to the PEMS workflow system. CTB’s Research staff evaluates the performance of the engines against highly validated human scores. We use several statistics for this evaluation, including quadratic weighted kappa, correlation, exact agreement, adjacent agreement, and standardized mean difference (both magnitude and direction). We will work with NEAC to establish the business rules regarding thresholds of acceptability for each of these statistics, making recommendations in line with industry standards for these evaluation criteria. To the extent possible, we evaluate the engines for both the entire population and for subgroups. All results of this evaluation will be released to NEAC. Based on engine results on the Smarter Balanced Pilot items, we believe that many of the engines trained during the Field Test will perform well against the evaluation criteria. Our solution is designed to provide assurances that, in the event an engine does not perform well, students will receive valid human scores and the engine will be updated and refreshed

Page 66: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 66 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

until sufficient performance is achieved. In particular, should the engine not generalize to the high-performing student population of the NEAC states, CTB is prepared to draw additional training sets to re-center the engine.

Following operational scoring, CTB will conduct a similar evaluation with the human scores deployed as a read-behind and will provide results to NEAC.

CTB’s Automated Scoring System CTB’s automated scoring system is designed to score student essays submitted online. The system combines natural language processing techniques with state-of-the-art machine-learning methods to model the scores human raters would give to each essay. After initial training using essays scored by human raters, our system’s performance is validated against expert raters on a separate set of responses during the model-building phase. When used for accountability purposes, the scores assigned by the system are continuously monitored. Our automated scoring system is capable of modeling both holistic and trait-level (analytic) ratings so that it scores student responses with a degree of reliability that is comparable with expert human raters.

CTB has been using automated essay scoring in classroom settings since 2005 and in large-scale accountability testing contexts since 2009. We have expertise in building prompt-specific as well as generic scoring models. Prompt-specific scoring models demonstrate high fidelity to human scoring on a prompt-by-prompt basis, but they can be used reliably only with the particular writing prompt for which they have been trained. Generic scoring models, on the other hand, are not quite as reliable as prompt-specific models, but they generalize better to a variety of prompts. This allows models to be used more flexibly; for example, for formative purposes in the classroom. When applied in this setting, our technology provides students with feedback on their writing performance. Based on the scoring rubrics, this feedback is both holistic and at the trait-level and offers suggestions to improve grammar, spelling, and writing conventions at the sentence level.

The automated scoring system analyzes approximately 95 features of student-produced text. These features can be classified as structural, syntactic, semantic, and mechanics-based.

Structural features identify the essay’s organization and development. Syntactic features, which are derived from a linguistic parse tree, can identify, among other things,

the variety of simple and complex syntactic structures. Semantic features identify, for example, the sophistication, appropriateness, and variety of the

language. Mechanics features identify many types of common writing errors, ranging from punctuation and

spelling errors to grammatical errors in agreement, verb formation, and pronoun usage. Grammar tools also detect stylistic tendencies such as informal language and overuse of words.

Most commonly, the features are used to model trait-level scores that can be reported separately and/or combined to produce an overall writing score. The traits and score points used can be aligned with different scoring rubrics. For example, for a state-wide accountability assessment, our system scores student essays along five dimensions of good writing using a six-point scale. For the Smarter Balanced Pilot study, our system was trained to score essays using the three Smarter Balanced traits of effective writing: Statement of Purpose/Focus and Organization (1–4 scale); Language and Elaboration of Evidence (1–4 scale); and Conventions (0–2 scale).

Our automated scoring system can also be configured to identify atypical responses, such as essays that are off-topic or off-purpose. These responses can automatically be routed to human raters for review. For the Smarter Balanced Field Test, the system has been extended with additional features to capture vocabulary and word use and the capability to detect responses likely to require human scoring, based on statistical outlier detection.

Page 67: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 67 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

When used in a state accountability testing context, CTB’s rater reliability studies typically become publically available in the program’s technical report. CTB’s essay scoring system and supporting practices attend to validity and reliability considerations, and extensive comparisons between the scores produced by expert raters and those produced by CTB’s automated scoring system are analyzed. An array of statistical characteristics is examined to evaluate the scoring quality for each prompt, including score reliability for subgroups. This use of multiple measures provides a more complete and global view of validity, reliability, and the quality of automated scoring. Based on automated scoring research, this framework was also used by CTB to evaluate automated scoring during the Smarter Balanced Pilot and Field Tests.

It should be noted that all students are identified through a unique student identifier (SSID) and all data, including student demographic and performance data, are associated with the student record through this ID. We adhere to federal and state laws such as FERPA that provide for protection of personally identifiable student information.

8.8.3 Specific Requirements for Hand-Scoring 8.8.3.1 Proposals will describe how the Proposer will perform hand-scoring for Smarter Mathematics and English Language Arts elements in accordance with specifications developed by the Smarter Balanced Assessment Consortium for constructed response and performance task items.

8.8.3.2 Proposals will describe the Proposer’s experience and capabilities relative to handscoring services. Hand-scored items must be scored and results provided within 30 days from the close of the test window. The states are interested in scoring models that can take advantage of the fact that many schools will complete testing in the early weeks of a three month testing window, so that scoring might begin before all schools have completed testing.

8.8.3.3 Proposals for hand-scoring should be developed on the assumption that all scoring procedures, rubrics, exemplars, anchor papers, and annotations will be provided by Smarter Balanced. The states are open to cost estimates that are presented in per student units, unit ranges or as a flat fee, Proposals should explain how estimates were calculated, listing key variables that may impact estimates, and the extent to which estimates may change.

As the primary contractor for Smarter Balanced Contract 17, CTB completed the majority of the Smarter Balanced Pilot and Field Test Scoring, and we were responsible for the creation of final rubrics, scoring guides, and handscoring materials. We have the proven experience to reliably and validly score Smarter Balanced computer-adaptive (CAT) items and performance tasks using means proven to be cost-effective and scalable. Our plan, refined during the Pilot and Field Test scoring, relies on an emerging body of knowledge about how to complete the reader training, qualification, materials development, and scoring of items within a CAT pool. This is a significant shift in the industry, as we move away from a state's more limited constructed-response item pool and toward the need for handscoring to be able to train and score all items within the vast Smarter Balanced CAT item pool, and do so with the reliability and validity that is required of a summative assessment. We have created the Field Test scoring plan that has informed this approach for Smarter Balanced. CTB scored an estimated 70 percent of the CAT item and performance task pool for the Smarter Balanced Pilot and Field Tests, and we developed the majority of the training materials, provided leadership in rangefinding activities, and led the automated scoring plan and research studies. With the experience we gained under the Pilot and Field Test scoring contract, we emerge with demonstrated leadership, expertise, and the handscoring system to complete the scoring for NEAC.

Overview of Assignment of Scores Handscoring refers to the processes necessary to determine the rating of a student’s response on a variety of item types. For NEAC, all non-essay extended-response items will be 100 percent human handscored, with 10 percent of these items receiving an independent second rating. In the event of a disagreement between the two scores, an independent “resolution” third read will be performed by an expert handscorer. CTB will work closely with NEAC to determine the specific rules governing final

Page 68: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 68 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

score resolution. All handscoring tasks will involve training and qualifying readers, who will be monitored for accuracy and production throughout the scoring period.

Scoring of essay extended-response items for online testers All essay extended-response items submitted by online testers will receive a first read from CTB’s AI engine. In Year one, essay extended-response items will receive a 100 percent human second read. In Year two, 25 percent of the essay extended-response items will receive a human second read. In Years three through five, ten percent of the essay extended-response items will receive a human second read.

Scoring of essay extended response items for paper-and-pencil testers AI will not be available to score essays for paper-and-pencil testers. All essays from paper-and-pencil testers will be scored only by humans, and ten percent will receive a second read.

Scorer Qualifications, Recruitment and Training CTB will ensure that that all scoring supervisors, team leaders, and readers possess a bachelor’s degree or higher and that all staff recruited and hired as readers are also degreed. CTB will work closely with our professional staffing vendor, Kelly Services Inc. and/or minority business owned organization, Dployit, who will recruit readers for employment. Recruiters carefully screen all new applicants and verify that 100 percent of all potential readers meet the degree requirement.

CTB has instituted a comprehensive, standardized training model to direct the training of our Handscoring staff. Our commitment to the training and qualification process stems from the knowledge that the consistency and reliability of the scores assigned is directly aligned with the quality of the training the readers receive. In our model, team leaders are trained prior to readers. Team leaders are extremely critical in the process of scoring accurately, as they must shepherd a team of readers throughout live scoring, ensuring consistency within the team and alignment to the scoring guidelines and protocols.

CTB has proven expertise in both training and scoring Smarter Balanced items. During the Smarter Balanced Pilot and Field Tests, CTB maintained responsibility for the creation of final rubrics, scoring guides, and handscoring materials. CTB works very closely with Smarter Balanced to implement Smarter Balanced training and scoring philosophies. All training will be based on the scoring procedures, rubrics, exemplars, anchor papers, training papers, qualification papers, and annotations provided by Smarter Balanced. Training for readers and team leaders is led by Handscoring supervisors and follows the process below:

1. An overview of the training process, types of materials that will be used, and terminology of our industry and the specific assessment being trained.

2. An introduction to and review of the individual items and any supplementary materials, such as reading passages, etc.

3. An introduction of the scoring guide for each item in the assigned Rater Item Block (RIB) set, with a focus on the rubric and corresponding anchor papers. These will be a key resource for readers throughout scoring. It is our general practice to provide the scoring guides electronically to readers.

4. An introduction to the training set for an item. 5. A review of the training set using Smarter Balanced-provided annotations to explain the scoring

rationale, referring readers back to the item’s rubric and anchor papers. After the training sets have been completed for all items, the qualification process can begin. Training is conducted online in CTB’s scoring system. The Handscoring supervisors and team leaders will determine whether a reader qualifies upon the reader’s completion of the set. CTB Handscoring will have the following high qualification standards:

For four-point items—non-adjacent agreement rate of less than five percent and perfect agreement rate of 80 percent

Page 69: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 69 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

For three-point items—non-adjacent agreement rate of less than five percent and perfect agreement rate of 80 percent

For two-point items—perfect agreement rate of 90 percent For one-point items—perfect agreement rate of 100 percent Readers must also pass a Condition Code Qualifying set in order to be eligible to score. Only qualified readers are assigned to score student responses and receive project-specific training on such issues as the handling of alert or sensitive papers. CTB’s daily supervision of all Handscoring staff members will continue once scoring begins and will follow quality assurance procedures, as described in a later section of this proposal.

Scoring System Aggregates and Manages Real Time Data CTB's Performance Evaluation and Management System (PEMS) is a Web-based system designed for aggregating and managing real-time data across multiple sites. It is a user-friendly system that requires no workstation setup. It simply requires the user to go to a Web site and start using it. PEMS serves as a central scoring management system. It will:

Receive student responses and item metadata Send each response to the designated scoring location Receive scored responses back from our scoring team in an interoperable format Integrate scores with the rest of the test data Deliver the data to NEAC for final scoring and reporting PEMS supports readers via a secure Internet connection and delivers images of student responses to those scorers for rating. The system presents the image without any information about where the document was collected. There is no information that allows a reader to form any notion of who the test taker was—gender, ethnicity, etc. The reader is presented with only the response for rating. PEMS captures the reader’s score and any exception codes the reader wishes to apply, and the record is removed from the screen, making room for the next response. Double reads are collected and are based on the program's business rules; third reads are generated when required and images routed to expert scorers for resolution. Readers are not aware if they are performing first or second reads. Once scored, the response record with scores or codes is routed for merging with any multiple-choice results from the same document, based on the UID assigned to the answer document at the beginning of our processing. PEMS merges the various item types into a single record, and that record then is matched with registration data allowing for easy viewing of score reports.

Quality Assurance Procedures CTB has expertise with producing very high quality Smarter Balanced scoring, gained through scoring the vast majority of the Smarter Balanced Pilot and Field Tests. CTB’s prior work on Smarter Balanced will ensure accurate scoring. The scoring of all performance items must be highly reliable to ensure that each student’s responses receive a fair, consistent, and accurate score. To this end, CTB will employ a variety of ongoing checks and controls, including systematic administration of intra-rater reliability reads (including validity papers/check-sets and read-behinds) as well as inter-rater reliability. A ten percent blind double-read will also be used with ongoing read-behinds to further validate scoring accuracy. To ensure the accuracy and reliability of the handscoring, we will institute a series of quality processing steps:

CTB will conduct intra-rater reliability reads through targeted read-behinds as part of our standard procedure. The targeted read-behind system allows our Handscoring staff to provide timely feedback to the reader because the team leaders can discuss incorrectly scored responses with the reader as soon as a problem is detected.

Page 70: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 70 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Inter-Rater Reliability—In our scoring system, readers score concurrently and do not know when they are participating in inter-rater reliability monitoring. This allows us to establish inter-rater reliability statistics for all readers and for the project as a whole. Inter-rater reliability statistics can be scrutinized to determine severity or leniency trends, agreement rates, discrepancy rates, the distribution of scores, and the number of condition codes. Furthermore, inter-rater reliability statistics are an excellent source to determine team drift and team leader influence.

Validity Papers. The purpose of validity sets is to provide consistent accurate scoring reflective of the scoring guides throughout the entire scoring session. By administering these pre-scored papers throughout scoring, we can ascertain whether the scoring teams/individuals are drifting from the original scoring criteria. Validity papers will be administered at pre-established intervals based on the rate of scoring, and they will appear to readers and team leaders in the same format as do actual student responses. The scores assigned to the validity papers are compared to the conventional or approved score; through this comparison, information is obtained about the accuracy and reliability of the scorer.

At CTB, our quality monitoring does not stop with the quality of our staff and reporting tools. We also strive to excel in our oversight and verification of the quality assurance processes as well as any corrective actions that may result, through an internal audit program. To this end, CTB has developed an additional Handscoring branch—the Data Monitor team.

Our staff of data monitors supports quality assurance for all CTB programs across all sites. The Data Monitor teams support the quality assurance process for all CTB programs by accessing the same reports reviewed daily by the scoring teams and by creating summary-level reports. They are responsible for identifying readers who are performing below the quality standard and for prescribing corrective actions for readers falling below the required quality standards.

Data monitors also act in an audit capacity to assure that no issues “slip through the cracks.” Reader and team leader performance is monitored regularly by the data monitors. They monitor and audit both the data that are generated as well as the procedures carried out by our staff to ensure the program is implemented according to its particular specifications. Any reader who does not maintain adequate levels of scoring accuracy will have his or her scores voided after collaboration between and determination from the appropriate Handscoring project manager (or his/her designee). Voided scores will be re-scored by qualified scorers, and voided scorers will be retrained and allowed to return to scoring at the discretion of content leadership.

Due to the importance of the NEAC program, the scoring of the constructed response items and essays must be highly reliable to ensure that each student’s responses receive a fair, consistent, and accurate score. To this end, CTB will employ a variety of scoring activities proposed to ensure a fair and accurate scoring of all student responses. These activities include systematic administration of intra-rater reliability reads (including check-sets and read-behinds), as well as inter-rater reliability monitoring. While these activities occur across contracts, and across scoring contractors, CTB deepens our commitment to accuracy and reliability in scoring in the provision of an independent Data Audit Team. For the scoring of NEAC, the Data Audit Team will provide an extra layer or “double check” in the monitoring of the validity in assignment of scores.

Data Monitoring Audit Team At each of our Handscoring Centers, we house a Data Audit Team under the leadership of a Data Supervisor. The Data Supervisor reports directly to the Handscoring Chief Reader and provides an extra layer or “double check” in the monitoring of the validity in assignment of scores.

Throughout the scoring window, a full complement of handscoring monitoring reports are posted daily on a CTB intranet site, centralizing reports across sites and allowing oversight of the quality assurance process by the Chief Reader. In addition to the scoring supervisors, the Chief Reader ensures that all quality processes detailed in the handscoring specifications are being maintained, that there is consistency in processing between the scoring sites, and that appropriate actions are being taken to produce the highest quality scoring.

Page 71: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 71 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

This quality assurance team acts in an audit capacity to ensure that no issues “slip through the cracks.” We feel that having this second quality assurance team supporting the program is of paramount importance to the reliability and consistency of scoring.

Daily Controls in addition to the studies that we will conduct to control rater drift, and the specialized monitoring completed by our Data Monitoring Team, CTB’s day-to-day handscoring process incorporates several critical elements to ensure consistency in the scores the readers assign. The daily use of check sets, comparisons of item performance between field test and operational administrations, and use of original training materials all work to ensure consistency in the scoring of constructed-response items.

Handscoring Reports The CTB electronic handscoring system provides multiple reports, at different levels of detail, to ensure consistency in scoring and to make sure that any problems that exist on the scoring floor can be identified and remedied immediately. Monitoring reports are available to supervisory staff and data monitoring staff in real-time to ensure the immediate identification and correction of any issue that could affect the reliability of scores assigned by readers.

Participation by Individual NEAC States We welcome the NEAC State’s involvement throughout the daily monitoring processes. To support your involvement, we can provide a sub-set of daily monitoring reports. Additional summary and cumulative level reports can be provided to NEAC States according to a predetermined schedule. During handscoring a NEAC State may also request any level of handscoring monitoring report for their review. CTB will generate this “on-demand” report for review, and deliver within four hours of the request being made. By working with NEAC States, we hope to establish the type and frequency of quality monitoring reporting so that the scheduled delivery of reports meets the needs of NEAC staff and reports are available prior to being requested. Finally, CTB will utilize our distributed handscoring system. While we certainly do not expect for NEAC States to have to generate their own quality monitoring reports, this option can be made available.

CTB will support NEAC State representatives to be on-site fort the handscoring process during reader training, qualification and initial scoring. CTB will complete the handscoring of NEAC States at our Indianapolis Scoring Center, so all monitoring by NEAC or State staff can occur at a single site.

8.9 Web-based Designated Support and Accommodations Data Collection System

The Smarter Balanced assessments provide students with universal tools, designated supports, and documented accommodations as described in the Usability, Accessibility, and Accommodations Guidelines (see Appendix 7). Proposals should include specifications and anticipated costs for the design and operation of a web-based data collection system for monitoring and cataloging designated supports and documented accommodations used by students during the assessment. Proposals should address the following:

8.9.1 Provide a detailed description of a secure web-based designated supports and accommodations data collection system that includes:

8.9.1.1 A submission process that allows for batch uploads of student demographic data from SDE Student Information Systems (SIS)

8.9.1.2 A submission process that allows for manual entries or batch uploads of students’ designated supports and accommodations from LEA special education management systems or Individualized Education Program (IEP) software.

8.9.1.3 A feature that allows for the review of designated supports and documented accommodation data in a roster report format

8.9.1.4 A summary report of designated supports and documented accommodations data by type

Page 72: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 72 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.9.1.5 A feature that allows for a check for possible errors in data submissions

8.9.1.6 A system that will allow for the transfer of designated supports and accommodations to the test administration platform (preferably an integrated system).

8.9.1.7 A system that will allow for future modifications

8.9.1.8 A system that will allow for use with other state assessments that may require different supports and accommodations for students.

8.9.1.9 Proposers are asked to review the CMT/CAPT Accommodations Data Collection Center Help Guide (See Appendix 6), Connecticut’s current accommodations data collection system to gain a better understanding of the options and functionalities states hope to offer LEAs.

Effective pre-identification (PreID) and rostering systems are essential components of every assessment program. CTB's Online Precode System was designed to work with many states’ Student Information Systems. The Online Precode System provides a secure, efficient, and user-friendly interface for district and/or state staff members so that the collection of student demographic and accommodation data proceeds smoothly and with great accuracy. CTB's Online Precode System is fully integrated with our Computer-Based Testing (CBT) System, so data collected through the Precode System can be used either to provide student identification labels or to complete the online testing registration process. For each test administration, CTB will work with NEAC states to collect student information from each state’s information system.

When we built the Online Precode System, our intent was to save the state department and/or district staffs time transferring data back and forth to CTB to create clean, usable data for precode documents and assignments. As detailed below, CTB will customize our data integration process to pull (and push) data between CTB's systems and the NEAC states’ student information systems. This integration will complete a pull request, at scheduled times or intervals, of the student population to accomplish the precode process, enrollment/registration, and online systems provisioning. Once extracted, student data will be loaded to CTB's Precode System for validation using our precode validation software. Valid values and missing information in any data fields that are critical to the processing of student results will be checked according to edit rules established between CTB and each NEAC state. Should errors be found, we will work with the NEAC states to make the necessary corrections.

For computer-based testing, at the point that the data pass this validation process, the data are then available for use by the NEAC states in CTB’s computer-based testing (CBT) system, allowing NEAC members and/or individual districts to generate student test rosters that include all student precode data.

The Online Precode system supports the ability to roster students, sorted by school district, grade level, or classroom teacher. The system can accommodate data disaggregated for the following subgroups:

Gender Race/ethnicity English Language Learner Individual Accommodation Plan (1AP) Economically disadvantage (free lunch, reduced meals) Students with disabilities/Individual Education Program Specific testing accommodations Migrant students Homeless students Additional disaggregated data requirements can be added, if needed, allowing for the support of different state and federal requirements.

Page 73: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 73 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

8.10 State Led Item Development All assessment items and performance tasks for the operational assessments will be provided by Smarter Balanced. Item pools will be replenished using a process called “State Led Item Development” which will involve states, working with their individual contractors, to develop an annual quota of items using Smarter Balanced item specifications. Smarter Balanced will pay the contractors for the item development, but proposers must demonstrate the capacity, skill and experience necessary provide the service. It should also be noted that Smarter Balanced does not anticipate the need for State Led Item Development during the first operational year of the assessment, and approximately 200 items per content area in subsequent years. The states are requesting an overview of the Proposer’s qualifications for this task, but no specific bid is required. Each of the following should be addressed: Item authoring, graphics development, tagging, item reviews for bias/sensitivity, accessibility, content and quality.

Item Development Design and Specifications CTB leads the industry in item development for the CCSS, having completed not only the Smarter Balanced item development, but also a portion of the PARCC item development as a subcontractor to another vendor. Prior to our work with Smarter Balanced, CTB content teams studied the CCSS and prepared item specifications for CCSS-aligned assessments. As part of our content development work with Smarter Balanced, we revised item specifications to include evidence statements and specific task models to address the Smarter Balanced assessment targets defined in the Smarter Balanced Content Specifications.

It is likely that Smarter Balanced may recommend additional revisions to the approved Content and/or Item Specifications based on the results of the Spring 2014 Field Test. We will build on our familiarity with these documents to revise item specifications, as needed, for state-managed item development when it is implemented.

We also worked closely with NEAC and Smarter Balanced to define the item pool specifications and distributions of items across claims, targets, DOK, difficulty, and other parameters (e.g., task models, response types) for the initial computer adaptive pool. This work will inform the design of state-managed item development within Smarter Balanced parameters to refresh the overall Smarter Balanced item pool.

Item Writing CTB has extensive experience writing items to the Common Core State Standards and similar college- and career-readiness standards. We support a variety of item writing models, including items developed by groups of educators as well as vendor-developed items. Our item writing process begins once the item pool specifications and item needs are defined. Qualified item writers or item writing vendors are selected, and initial overview and training sessions are scheduled. We propose to use the Smarter Balanced training materials that CTB revised/developed under Contract 16 and that are now publically available. In addition, CTB has a number of "deep dive" training modules that were also developed to enhance item writer training for the Smarter Balanced Content Specifications. We have had success with virtual training sessions. but we could also facilitate a traditional face-to-face item writing workshop. Both models were used effectively during the Smarter Balanced Field Test pool item development. Once item specifications and item development requirements are set, CTB editors will create detailed item writing assignments based on the Smarter Balanced Item Specifications. These item writing assignments will identify the claims, targets, and task models to be assessed; the number of items or tasks of each response type or format; and the level of cognitive complexity or rigor for each item.

Selected-Response Items When the item writers develop selected-response (multiple-choice) items, a single answer choice is clearly and irrefutably correct, and distractors represent common misconceptions and errors demonstrated by students as they acquire mastery of the content knowledge and skills. In addition, the item writers create answer choices so that there are no outliers or obviously incorrect answers to a

Page 74: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 74 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

student who does not have the knowledge, skills, or ability to answer the item correctly. All answer choices are of approximately the same length, paired so equal number of answer choices are either long or short, or of increasing or decreasing length. Numerical answers are configured so no one answer stands out from the others. As with answer choice length, the format of all answer choices is parallel, or they are paired so an equal number of answer choices are formatted the same.

A rationale for each multiple-choice distractor is developed at the same time the item is developed. The rationale for each distractor states the misconception or error that the distractor represents.

Constructed-Response Items The same philosophy is used with constructed-response item rubrics. In addition to response descriptions and criteria for each achievable score point, an exemplary response is given showing how a student will most likely respond. For each score point, common student misconceptions are exemplified. It is vital that a complete scoring guide for each constructed-response item is developed at the same time the item is written; doing so ensures that a fully correct response is achievable and that a range of responses can be written at each score point in order to discriminate among students’ proficiency levels.

At the end of the three-day workshop, CTB assessment specialists will finalize all items, art, and rubrics as needed to conform to Smarter Balanced or other identified style guidelines, and all items will be reviewed using CTB standard quality review processes. The focus of these internal reviews by the editors will be to ensure:

The alignment of each item to the identified standard The relevance of each item to the purpose of the test The adherence of each item to the principles of quality item development The adherence of each item to the principles of universal design and plain language That each item has an appropriate level of item difficulty The accuracy of the content presented in the item The adherence of each item to the approved project style guide The appropriateness of language, graphics/artwork, charts, figures, etc. within each item That each item has an accurate and appropriate item stem that:

• Presents the student with a problem to solve or a task to do • Is sufficiently focused and clear so that the task is understood without being dependent on the

answer choices for clarification • Does not clue the correct answer choice • Will elicit the intended response(s) as indicated in the rubric/correct response/correct response

rationale • Uses concise, precise, and unambiguous language

That each multiple-choice item has one, and only one, correct answer That distractors are plausible and attractive to students who have not mastered the objective or

skill That distractors are parallel and mutually exclusive, containing no outliers That distractors are accompanied by distractor rationales that are appropriate, clear, and precise That the content of items is fact-checked to make certain that the correct answer is indisputably

true and that distractors are indisputably false; facts are verified with three sources

Interactive (Technology-Enhanced) Items In addition to developing multiple-choice and constructed-response items, CTB is experienced in developing interactive item types and will apply this expertise to the creation of the interactive response

Page 75: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 75 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

types included in Smarter Balanced. These items, usually with a range of difficulty, allow for questions to be asked in different formats that are engaging and familiar to students.

CTB has established the creation of item templates and storyboards that serve as consistent, re-usable sets of criteria to identify and define content, narratives, images, depth of knowledge, and other elements across contents and item types as a business practice. Based on this practice, once the target content has been identified, the development of the associated technology-enhanced item involves four steps:

1. Determine if the item type can be computer-scored 2. Create or use an item template and a storyboard to guide item authoring 3. Create the rubric 4. Address universal design issues related to legibility, readability, comprehensibility, navigability,

and overall accessibility These procedures will be applied to the development of items for task models that can accommodate interactive item response types. In addition, items will be reviewed for:

The alignment of each item to the identified standard The relevance of each item to the purpose of the test The adherence of each item to the principles of quality item development The adherence of each item to the principles of universal design and plain language That each item has an appropriate level of item difficulty The accuracy of the content presented in the item The adherence of each item to the approved project style guide The appropriateness of language, graphics/artwork, charts, figures, etc. within each item That each item has an accurate and appropriate item stem that:

• Presents the student with a problem to solve or a task to do • Is sufficiently focused and clear so that the task is understood without being dependent on the

answer choices for clarification • Does not clue the correct answer choice • Will elicit the intended response(s) as indicated in the rubric/correct response/correct response

rationale • Uses concise, precise, and unambiguous language

That each multiple-choice item has one, and only one, correct answer That distractors are plausible and attractive to students who have not mastered the objective or

skill That distractors are parallel and mutually exclusive, containing no outliers That distractors are accompanied by distractor rationales that are appropriate, clear, and precise That the content of items is fact-checked to make certain that the correct answer is indisputably

true and that distractors are indisputably false; facts are verified with three sources

Graphics Development CTB will facilitate the creation of all graphics during state-managed item development. Item authors will provide rough or sketch art with a detailed description of the required art. CTB assessment editors will facilitate the development of all art by a professional graphics vendor to meet the required Smarter Balanced technical specifications for online delivery.

Page 76: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 76 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Content Reviews Once items are finalized, CTB will facilitate standard content and bias/sensitivity reviews. This review should take place in an online system to allow review of items in a format similar to that seen by students. We will use the Smarter Balanced training materials for Content and Bias/Sensitivity reviews and the Smarter Balanced Quality Criteria for Mathematics as guidelines for the reviews. These materials were designed to be used effectively in a virtual training setting, and CTB successfully facilitated the content and bias/sensitivity review of over 30,000 items using virtual training and reviews. We propose to use a combination of synchronous and asynchronous virtual meetings for the content and bias/sensitivity reviews of newly developed items. This format will provide reviewers with time to review items individually (after training) as well as a live WebEx facilitated format in which to discuss selected items for review calibration purposes.

All newly developed items will be reviewed by content review committees for alignment to item specifications, alignment to appropriate cognitive and language complexity, fairness, accessibility, reasonableness, and completeness of rubrics and scoring criteria, grade appropriateness, and technology-based presentation. The committees will use the Smarter Balanced Quality Criteria and additional checklists developed specifically for Smarter Balanced item reviews.

CTB will provide all required meeting materials for virtual trainings, including agendas, training materials (e.g., content specifications, blueprints, item pool specifications, and test item specifications), and copies of related materials, as required. We suggest an agenda that includes:

Opening remarks Security and confidentiality procedures Overview of the program and its purpose Overview of the content standards Review of the test design, content and item specifications Overview of instructionally valuable, fair, and reliable assessment Quality criteria for each item format, such as selected-response item stems, correct response, and

distractors, interactive (TE) items The principles of universal design, linguistic complexity, and accessibility for all examinees Specifics of the item review procedures As part of a typical training session for an item content review meeting, participants receive training on the construction of a selected-response item and the quality criteria for reviewing item stems, correct responses, and distractors. Quality criteria for other item formats, including interactive items, are presented and discussed. A few sample items that illustrate specific review criteria are presented and discussed. The session also includes discussion of how the principles of Universal Design and linguistic complexity apply to ensuring that test items are accessible to all examinees.

The facilitator for each content area panel provides additional content-specific instruction, as appropriate. For ELA/Literacy, this may include criteria for reviewing and evaluating informational and literary texts that are shared stimuli for item sets. The facilitator also explains how to interpret item metadata and how to record their review comments in the electronic authoring system. CTB will provide all materials needed for each content review group.

In the review process, committee members ask: “Does the item:”

Align with the construct being measured? Measure the content standard? Measure the intended reasoning skill or level of cognitive rigor? Test worthwhile concepts or information?

Page 77: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 77 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Reflect good and current teaching practices? Match the test item specifications? Have a stem that gives the student a full sense of what the item is asking? Avoid unnecessary wordiness? Have one and only one clearly correct answer (if a selected-response item)? Use response options that relate to the stem in the same way? Use response options that are plausible and reasonable misconceptions and errors? Avoid having one response option that is markedly different from the others? Avoid clues to students, such as absolutes or words repeated in both the stem and options? Support a range of appropriate responses that can be scored using the rubric (if a constructed

response item)? Reflect content that is free from potential bias toward any person or group? Ensure accessibility to differently-abled students?

Bias and Sensitivity Reviews CTB is committed to producing valid and reliable tests that are inclusive and that acknowledge diverse student populations. Bias can occur if an item or the test is measuring different things for different groups. Four procedures are used by CTB to reduce bias in items. The first is based on the premise that careful editorial attention to validity is an essential step in keeping potential sources of bias to a minimum. If the test includes construct-irrelevant skills or knowledge (however common), the possibility of bias is increased. Thus, careful attention is paid to content validity.

The second step is to follow the McGraw-Hill guidelines designed to reduce or eliminate bias. Item writers are directed to two McGraw-Hill publications: Guidelines for Bias-Free Publishing13 and Reflecting Diversity: Multicultural Guidelines for Educational Publishing Professionals (Macmillan/McGraw-Hill, 199314). Whenever CTB staff review items, these guidelines are kept in mind.

In the third procedure, educational community professionals who represent various ethnic groups review all items. They are asked to consider and comment on the appropriateness of language, subject matter, and representation of people. This review is often conducted by the committee that reviews items for content alignment; at other times a separate bias and sensitivity review committee is empaneled.

We will work with NEAC to conduct formal item Bias and Sensitivity review meetings for all newly developed items. These meetings can occur either simultaneous with or following the Content Review Panel meeting. We propose to have virtual review meetings using the same format as the content review meetings described above. Bias and Sensitivity reviewers will review all new field-test items. We will work with NEAC to define a set of committee members that might include educators, school board members, parents, community, and business leaders; and university or college subject-matter specialists. We recommend representation from special interest groups that shape policy in this area and ethnic groups in the state as well as gender balance.

 

13 McGraw-Hill. (1983). Guidelines for bias-free publishing. Monterey, CA: Author.

14 Macmillan/McGraw-Hill. (1993). Reflecting diversity: Multicultural guidelines for educational publishing professionals. Monterey, CA: Author.

Page 78: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 78 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

During these formal reviews, all individual test items are evaluated to identify items that may be biased or material that may be of a sensitive nature and for overall fairness. The test questions must avoid gender, ethnic, age, or other stereotyping and avoid language, symbols, words, phrases, or examples that reflect a gender or ethnic bias or that are otherwise potentially offensive or inappropriate for any sample of the student population. We recommend the use of the Smarter Balanced Quality Criteria and the supplemental bias/sensitivity review checklists developed by CTB and used to review the Smarter Balanced Field Test items under Contract 16.

Just as with the Content Review Panel, CTB will work with NEAC to design the optimal virtual or face-to-face settings and to select and approve all potential participants. CTB will prepare and submit all meeting materials for approval prior to the meeting. We typically provide a walkthrough of the training presentation and meeting materials via Webinar prior to the meeting.

Accessibility Reviews We also recommend a separate accessibility review for all items. Smarter Balanced has provided a robust set of accessibility guidelines, and each item specification document details accessibility considerations for item development. We recommend that items be reviewed by item response type using the accessibility considerations for each item type. This will allow reviewers to ensure that all items developed to a specific task model or evidence statement adhere to the guidelines for that response type.

We will facilitate accessibility reviews using the relevant Smarter Balanced training materials and a virtual set of meetings similar to those described for Content and Bias/Sensitivity reviews. We anticipate that these reviews can be carried out concurrently and the results can be combined.

All committee recommendations and suggestions for item edits are documented in the item authoring system. CTB will summarize the committee review process and results and provide summary analyses of global issues and recommended item revisions.

CTB will assist in the editing of all items, scenarios, art, and rubrics, as needed, once the recommendations are reviewed and final decisions are made by NEAC representatives. Once all committee reviews are complete, CTB will take scenarios, items, art, and rubrics through all final editorial and quality processes.

Final Quality Reviews and Approvals CTB will be responsible for all final approvals in the Smarter Balanced Authoring and Item Banking system or other authoring/delivery platforms. Our content teams provided all final quality control checks and approvals for delivery of the Smarter Balanced Field Test and have extensive checklists for final formatting and approvals. This check for newly developed items will be similar to the quality control checks for the currently active Smarter Balanced items described in Section 8.4.2.

8.11 Web-based Analysis and Reporting System (Separate Bid Requested)

Smarter Balanced will host an interactive website that will provide a variety of options for analysis and reporting, both static and interactive (currently under development). However, the states have a history of providing high quality, interactive analysis and reporting and therefore, NEAC may choose to develop its own reporting solution. The states are requesting prospective vendors to propose an analysis and reporting system, as described below, and submit a separate bid covering development and deployment of the reporting tools. Proposals should address the following:

8.11.1 Provide a detailed description of a web-based analysis and reporting system that includes:

8.11.1.1 Downloadable student level data files in csv format

8.11.1.2 Downloadable static reports. Vendor should propose a list of reports to be provided 8.11.1.3 Interactive

Page 79: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 79 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

results analysis that includes, at a minimum, disaggregation by gender and key student groups, with a function for cross-tabulation.

8.11.1.4 Longitudinal data reporting for districts, schools and individual students

8.11.1.5 Other recommendations for functions that will provide schools with actionable data that may be used to analyze results in ways that support NEAC’s desire to make the assessments highly relevant to monitoring and improving curriculum, instruction and general classroom practices.

8.11.2 Provide information reflecting the Proposer’s experience developing digital reporting systems, with links to demo sites and/or screen prints of key features of systems the Proposer has developed.

8.11.3 Provide descriptions of security measures embedded in the system, including multi-user password systems, that will allow the system to serve as a public portal, and also an access point for confidential student level data and reports.

8.11.4 Provide information on how results from both on-line and Pencil/Paper administrations will be integrated into the reporting system.

8.11.5 Provide descriptions of administrative tools that will permit local school administrators, as well as education agency personnel, to monitor use of the system, assign new user passwords, and other functions to be recommended in the proposal.

8.11.6 Provide each state with a complete set of student level results. The format of these results will be defined by the States, in conjunction with the contractor, and may include, as appropriate, items such as: student growth factor, student assessment results including sub-scores, item level response where available, testing school, grade and content area, achievement level for each content area claim, and others as proposed by the contractor and defined by NEAC.

8.11.7 Proposers are asked to review the states’ current interactive reporting sites to gain a better understanding of the options and functionalities they hope to offer schools. Access information follows:

8.11.7.1 NECAP Analysis and Reporting System (Demo):

URL: https://reporting.measuredprogress.org/NECAPReportingVT/ User Name: DEMOADEMO1

Password: 11475

8.11.7.2 Connecticut On-line Reports

URL: http://www.ctreports.com/

User Guide: https://solutions1.emetric.net/CTDataAnalyzer/Help/HelpGuide.pdf

CTB is pleased to offer a next generation reporting system to the NEAC states. Our primary mission is to provide and transform standardized assessment results into meaningful opportunities to improve student learning. Providing intuitive, thorough, and accurate reports for users at all levels is a critical component of this mission. The NEAC reporting system will be built using Jaspersoft-BI, an open-source platform and a cutting-edge reporting tool aimed at helping administrators, educators, and families build on current and past performance.

This system will use advanced business intelligence to create at-a-glance report dashboards that combine data and graphical indicators of aggregate and comparative results of district and school performance. Graphs and charts can be presented side-by-side with tabular data. The graphs and charts will have drill-down links that open separate tabs in the application so that users can toggle between their summary and detail data.

Disaggregate choices, online sorting, filtering, and column format changes can be saved for re-use on each report, and all reports can by default be exported in PDF or Excel (XLS) formats. CTB will work collaboratively with the NEAC states to determine the specific file format for the standard data export that will be supported in the system by default. Exporting to other formats, such as CSV, HTML, ODT, and DOC, will also be configurable in the system.

Page 80: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 80 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

The reporting system will also support ad-hoc reporting requirements. A drag-and-drop report design interface will allow power users to build crosstabs, pivot-tables, and chart-based reports using a standard browser or iPad device.

Modern and Mobile The NEAC reporting system will deliver smart user interfaces where "mouse-overs" are replaced with touch-friendly controls for selections and with menus that are fully compatible with modern tablets and smartphone browsers. Report layouts dynamically adjust, based on the screen size, to hide some controls and to optimize them for optimal viewing on a small screen. The reporting system will be built on HTML5 and CSS3 standards for optimal use with modern Web browsing platforms. These features are shown in the following figures.

Figure 14: Dynamic Floating Menu for both Small Screen and Large Screen

Page 81: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 81 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 15: Example of Floating Menu with Touch Controls

Filters and Export Formats The reporting system will support standard assessment, summary, and disaggregated reports. It will also provide custom research-focused tools such as comparative dashboards, predictive analysis, item analysis, cohort studies, report publishing, and data export capabilities that will empower educators in NEAC states to move the educational improvement process forward.

It will support importing and exporting of data in a variety of formats, facilitating data transfer processes with external vendors and systems. Assessment information, including demographic data, analytics, and scored results, will be easily imported for processing and reporting and will be hosted natively. CTB maintains an experienced resource pool to support ETL processes, using tools such as Informatica to automate data import or export for downstream data processing, as needed.

Security Securing and protecting personally identifiable information (PII) for every student is imperative. The platform uses Intrusion Detection Software (IDS) at the front end to ensure confidential demographic data are protected, and data encryption is in 256-bit or higher during transport processes or while data are in a restful state.

User permissions and roles will provide control over access to the reporting data. Roles for report access may differ from the test administration roles, so users will only be given access to report data and functionality consistent with the scope of their role and organization. Teachers will be able to view data related to their own students and summary data for their school; principals will be able to view their school's assessment results; administrators will be able to view all assessment results in the district or state.

Page 82: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 82 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Customization The platform used for the reporting system has been accessed world-wide by teachers and administrators to view and analyze assessment results. It supports a robust data model, an import/export process, an ETL tool, and a graphical user interface. The platform can be modified to support a custom "skin," to represent the look and feel most appropriate for the NEAC states’ districts and schools. In addition, data filters, formats, dashboard views, drill downs, labels, headings, font sizes, and navigation paths can all be customized to provide a powerful, user-friendly tool for educators at all levels.

Four samples of dynamic reports and charts are provided in the following figures as examples of the capabilities of the reporting system.

Figure 16: Report Sample 1

Page 83: Technology Proposal Sample

New England Assessment Consortium | Connecticut, New Hampshire, Vermont  6.2 Scope of Work | Page 83 

 

© 2014 CTB/McGraw‐Hill LLC (Unpublished) 

Figure 17: Report Sample 2

Figure 18: Report Sample 3

Figure 19: Report Sample 4