Center for Measurement Justice · NCIEA

Assessment Request for
Proposal (RFP) Development
Toolkit

Jennifer Randall & Thao Vo · Juan D'Brot · Complete Design Guide for State-Level Assessments
166
Pages
8
Chapters
10
Compliance Reqs.
3
Language Tiers
Introduction

Purpose of the Toolkit

A design guide for developing state-level educational assessment RFPs, covering every stage from planning and procurement through validation.

This document is a design guide for developing a Request for Proposal (RFP) for state-level educational assessments. The document details guidance and considerations for each stage of the assessment process, from initial planning and procurement to item development, administration, psychometrics, reporting, and validation procedures.

A key feature is the inclusion of ready-to-implement exemplars: tables, figures, and language blocks that can be directly incorporated into RFPs. Where relevant, the guide provides three language tiers—focused on foundational, equity, and justice—offering states different language levels of commitment they wish vendors to include in their proposals.

The Three Language Tiers

Foundational

Proposals adopting this language ensure that all students have an equal opportunity to participate in and demonstrate their learning on the assessment. This includes ensuring the assessment is accessible to students with disabilities, free from obvious bias, and administered fairly.

Equity

Proposals adopting this language go beyond acknowledging fairness and actively expand the participation of students, schools, and communities directly impacted by the proposal. This level seeks to identify and address potential disparities, with special attention to historically and systematically excluded groups.

Justice

Proposals adopting this language represent the most ambitious language in proposal writing because they explicitly recognize systemic inequities (e.g., racism, sexism, ableism, intersectionality) and their profound impact on educational outcomes. Proposals at this level would explicitly name and address these inequities and actively involve marginalized communities in the assessment development.

Chapter Scope: Required and Optional Sections

Every chapter in the toolkit contains material that some states will adopt as-is, some will customize, and some will omit. The table below identifies the default disposition of each chapter.

ChapterDispositionNotes
I — Procurement Process and TimelineRequiredState-specific dates and procurement vehicle. Apply your state's standard procurement boilerplate where applicable.
II — Background Information for VendorsRequiredIncludes the Rightsholder Accountability Council provisions and community partnership framing. Tier choice has the largest effect here.
III — Statement of WorkRequiredProject management framing. Lightly tier-sensitive.
IV — Assessment Design and DevelopmentRequired if in scopeIncludes content frameworks, item development, standard setting, and reporting. AI Vendor Accountability Annex is referenced here.
V — Test AdministrationRequired if in scopeIncludes item-adaptive testing, irregularities, remote/virtual testing definitions, scoring, AI scoring, security.
VI — Validation EffortsRequiredValidity evidence is required for any assessment used for high-stakes decisions. Justice tier moves community involvement upstream into construct articulation.
VII — Managing RiskRecommendedConflict of interest, issue and risk management, fiscal management. Strongly recommended for any multi-year contract.
VIII — Terms and ConditionsRequiredPenalties, bid evaluation process, cost proposal. The Four-Point Scoring Rubric and Deliverables Checklists are referenced from this chapter.
Getting Started

How to Use This Toolkit — A Quick Start

Five decisions will assemble a complete RFP from this toolkit. Each step references the chapter, table, or appendix where the relevant content lives.

1

Decide your tier

Choose Foundational, Equity, or Justice. The toolkit provides ready-to-use exemplar language at each tier across every section. The choice should be made by the state team that will own the procurement, with input from leadership about what your political and legal context can support. See the Tier-Selection Decision Guide for help. Your tier choice will determine which column of every tiered table you draw from. States may also opt to move back and forth between tiers for different components of the RFP.

2

Define your scope

Identify which assessment components are in scope (e.g., summative only; summative plus interim; the full system). Chapter II includes a Components of the Assessment System table — complete it first. Anything you mark out-of-scope can be removed from the resulting RFP.

3

Customize the state-specific placeholders

Throughout the toolkit, you will see placeholders enclosed in angle brackets — for example, <state name>, <List specific state systems>, <Subject Area>. These are the only places the language is intended to change. Before releasing the RFP, search the document for the strings <, >, and [ to confirm no placeholder was missed.

4

Adopt the Minimum Compliance Requirements

The toolkit includes a Minimum Compliance Requirements page. These are the commitments that are non-negotiable for any tier — including annual bias review, fair compensation for rightsholders, AI training data disclosure, and current-year accessibility (VPAT) reporting. Vendors who do not accept these commitments are disqualified, regardless of how their proposal otherwise scores.

5

Plan your evaluation

Two structures support evaluation: the Four-Point Scoring Rubric (included in the front matter) and the Required Deliverables Checklists at the end of each chapter. The deliverables checklist is a completeness check — it verifies that the vendor produced every required artifact — and is used before substantive scoring begins. The four-point rubric is then applied to score the substance of each response section.

ℹ️

Pre-release checklist Before you release the RFP, the state team should agree on the tier(s) (Step 1), the scope (Step 2), the customization choices (Step 3), any state-specific additions to the Minimum Compliance Requirements (Step 4), and how the team will divide evaluation responsibilities (Step 5). A pre-release checklist appears in the Bid Evaluation Process chapter.

Framework

Tier-Selection Decision Guide

The toolkit offers three tiers of exemplar language. Each tier escalates the substantive commitments your RFP will require of vendors.

TierWhat it commits the state toTypical context
Foundational
(Fairness)
Access, non-discrimination, baseline procedural fairness. Standards-aligned bias review. Accessibility compliance. Equal opportunity to demonstrate learning. Appropriate in any state. A defensible floor regardless of political context. Useful when "equity" framing is unworkable.
Equity Subgroup outcomes addressed, not just overall. Targeted outreach and engagement of historically underserved communities. Disaggregated validity, reliability, and impact analysis. Bias review committees with community representation. Appropriate where the state has political and legal latitude to name and address group-level disparities, and where community engagement infrastructure exists or can be funded.
Justice Community decision-making authority, not just consultation. Rightsholder Accountability Council with sign-off rights. Data sovereignty. Co-creation of constructs, items, and reporting. Material redistribution of resources to community partners. Appropriate where state leadership is prepared to share decision rights with impacted communities and to dedicate resources to community-led governance over the contract lifetime.

Decision Questions

Question 1
Is your state's political and legal climate one in which the words "equity" and "culturally responsive" can appear in a state procurement document without triggering a disqualifying objection?
If NO → Use Foundational (Fairness) language throughout. The Foundational tier is substantively meaningful and does not foreclose later moves toward Equity or Justice.
If YES → Continue to Question 2.
Question 2
Does your state currently disaggregate assessment results by race/ethnicity, disability status, English learner status, and socioeconomic status, and act on those disaggregations?
If NO → Use Equity language. Your RFP will help build the disaggregation and response infrastructure your state needs.
If YES → Continue to Question 3.
Question 3
Is your state prepared to share decision-making authority — including item approval, reporting design, and validity evidence priorities — with a community-controlled body whose decisions the vendor cannot unilaterally override?
If NO → Use Equity language and consider moving toward Justice in the next procurement cycle. Begin building the community engagement infrastructure now.
If YES → Use Justice language. The Rightsholder Accountability Council provisions become operative.
Question 4
Does your cost proposal include line items for fair compensation of community participants, accessibility accommodations, translation, and ongoing community engagement?
If NO → These are non-negotiable across all tiers. Before releasing the RFP, add them to the cost proposal template as required line items.
If YES → Proceed.
💡

Mixed-tier RFPs are common and appropriate. A state might adopt Justice-tier language for community partnership and validity evidence (Chapters II and VI) while using Equity-tier language for procurement procedures (Chapters I and III) where community decision rights are less applicable. The tier you choose should be section-appropriate, not necessarily document-wide.

Reference

Language Strength Legend

Every exemplar language block in this toolkit uses a controlled vocabulary so that vendors and reviewers understand what is binding, what is preferred, and what is optional.

WordMeaningUse when…
MUST / SHALLBinding requirement. Non-compliance disqualifies the proposal.The state will not negotiate this point — bias review, fair compensation for rightsholders, AI training data disclosure, current-year accessibility reporting.
WILLStatement of fact about what the awarded vendor will do.Describing scope of work the vendor performs once the contract is signed.
SHOULDStrong preference. Deviation requires justification.Practices the state strongly prefers but is willing to consider alternatives for. Use sparingly.
MAYOptional. Vendor discretion.Genuinely optional enhancements that do not affect evaluation.
📌

Practical convention: when a section asserts that something must happen and the vendor's response will be scored on whether they commit to it, use MUST. When the section describes what will already be true under the contract, use WILL. Reserve SHOULD for the small number of genuine preferences.

Customization Symbols Key

Every exemplar language block uses placeholder symbols where state-specific information must be inserted.

PlaceholderMeaningExample replacement
<state name>Full name of the state issuing the RFPMassachusetts
<State>State name when used as a possessive or adjectiveMassachusetts'
<Subject Area>Content area within scopeMathematics, English Language Arts, Science
<Specific Degree/Certification>Minimum credential the vendor's staff must holdBachelor's degree in subject area; valid teaching license
<Number>Years, count, percentage, or threshold the state specifies3 (years); 5 (panel members)
<List specific …>Italicized list the state replaces with state-specific itemsReplace with a complete list — do not leave the angle brackets in the final RFP
[number]Square brackets indicate a value the state should determine3 (innovative assessment methods)
⚠️

Before releasing the RFP, search the document for the strings <, >, and [ to confirm no placeholder was missed. A common procurement-disqualifying error is leaving a placeholder in the final document — vendors will either ask clarifying questions that delay the procurement or, worse, submit responses that quote the unfilled placeholder back to the state.

Evaluation

Universal Rubric

Apply the same 0–3 scale across every section of the RFP. The descriptors below define what each score level means in general; chapter-specific anchors follow.

0
Does not address

The response is missing, restates the requirement without substance, contradicts the requirement, or refers to a future state that the vendor commits no resources to build.

1
Minimally addresses

The response acknowledges the requirement and describes intent, but lacks specifics on process, timeline, qualifications, or evidence. Reads as boilerplate. Does not provide reviewers with anything to evaluate beyond the assertion itself.

2
Adequately addresses

The response describes a named process, identifies roles and qualifications, names specific deliverables and a timeline, and includes at least one concrete example from prior work that demonstrates the vendor has done this before.

3
Exceeds expectations

The response provides a documented protocol with evidence of past impact, names rightsholder partners with letters of support, specifies decision rights including community sign-off where applicable, and demonstrates a track record of having changed practice in response to community input. The response anticipates implementation challenges and addresses them.

📋

Scoring convention: a section is evaluated holistically. If a response includes one paragraph of substance and three paragraphs of restated requirement, the score is 1, not 2. Reviewers should be able to point to the specific evidence that supports a score of 2 or 3.

Reference

Customization & Symbols Key

Every exemplar language block uses placeholder symbols where state-specific information must be inserted. Use this key to ensure no placeholder is missed in the final document.

See the Language Legend section for the complete Customization Symbols Key, including all placeholders and example replacements.

Compliance Floor

Minimum Compliance Requirements

Commitments that the state Department of Education will not negotiate. Vendors who decline any MCR are disqualified before substantive evaluation begins.

🔒

How these requirements operate: The MCRs are scored as a single binary: accept or decline. A vendor who accepts every MCR proceeds to substantive scoring; a vendor who declines any MCR does not. Vendors may propose stricter language — they may not propose weaker language. This section applies at all three tiers.

1
Annual Bias and Sensitivity Review
The vendor MUST conduct an annual bias and sensitivity review of all operational assessment items. The review MUST be conducted by a committee that includes external members with documented qualifications, including at minimum: one representative of a disability advocacy organization; one representative of a multilingual learner advocacy organization; one community representative drawn from the state's rightsholder map; and one practitioner with documented expertise in culturally responsive assessment practices. The vendor MUST produce a public-facing annual summary report of the bias and sensitivity review's findings, items modified as a result, items removed as a result, and items retained over committee objection (if any). The summary report MUST be delivered to the state within 90 days of the review's conclusion and MUST be written so non-specialist audiences can understand it.
RationaleAnnual bias review is the single most common assessment-quality commitment vendors make in proposals and the single least frequently verified commitment in practice. Requiring an external review committee with named seat categories converts the commitment from a promise to a verifiable obligation.
Verification(a) The public-facing annual summary report; (b) committee composition records submitted to the state at the start of each review cycle; (c) at the state's discretion, an independent third-party audit of the review process (see MCR-10).
2
Disaggregated Reporting
The vendor MUST produce disaggregated reports of assessment outcomes, validity evidence, and reliability evidence for every state-defined reporting subgroup. At minimum, disaggregation MUST include race and ethnicity, disability status, English learner status, socioeconomic status, gender, and any additional subgroups identified by the state. Where intersectional disaggregation is statistically defensible, the vendor MUST also report at the intersection of subgroups. The vendor MUST define and disclose the small-cell suppression rules used to protect student privacy in disaggregated reports. The vendor MUST NOT use suppression as a mechanism for hiding disparities.
RationaleDisaggregated reporting is the prerequisite for identifying and addressing disparate impact. Including this as a Minimum Compliance Requirement ensures disaggregation is not negotiable away during contract negotiations or budget cuts.
Verification(a) Sample disaggregated reports submitted with the proposal; (b) written disclosure of suppression rules; (c) review of disaggregated outputs from the first operational reporting cycle.
3
Fair Compensation for Rightsholders
The vendor MUST fairly compensate all rightsholder participants for their time and expertise. This includes community members, parents of students, students of legal age, community-based advocates, fairness review committee members, item review panelists, standard-setting panelists, cognitive lab participants, and any other rightsholder serving in a contracted role on the work. Compensation rates MUST be competitive with consulting rates for comparable subject-matter expertise and MUST be disclosed in the cost proposal as a line item titled "Rightsholder Compensation Schedule." Honorary or token amounts do not satisfy this requirement. The vendor MUST also budget for accommodations that enable participation, including childcare, transportation, technology access, and qualified human translation.
RationaleWithout fair compensation, community-engagement commitments default to extractive arrangements in which the state and vendor benefit from community expertise without compensating it.
Verification(a) The Rightsholder Compensation Schedule line item in the cost proposal; (b) the accommodations budget line; (c) audit of payment records during periodic management monitoring meetings.
4
Documented Community Engagement Across the Assessment Lifecycle
The vendor MUST conduct documented community engagement at each phase of the assessment lifecycle: planning, item development, field testing, scoring, reporting, and validation review. "Documented" means the vendor MUST produce, for each phase: (a) a record of which rightsholders were engaged; (b) the specific input received; (c) the design or operational decisions affected by that input; and (d) decisions made over rightsholder objection, if any. The vendor MUST specify, at the start of the contract, the decision rights held by community engagement bodies at each phase (advisory, approval, or veto authority).
RationaleCommunity engagement without traceability is not verifiable. The lifecycle scope (six phases) ensures engagement is not collapsed to a single late-stage focus group.
Verification(a) The engagement-to-decision crosswalk produced for each phase; (b) periodic management monitoring meetings that include community representation; (c) the Rightsholder Accountability Council's annual report to the state.
5
Artificial Intelligence Disclosure and State Oversight (Conditional)
This requirement applies if the vendor uses artificial intelligence at any stage of the assessment lifecycle — including item generation, item review, scoring, reporting, anomaly detection, or quality control. The vendor MUST provide an AI Disclosure that includes: (1) each AI model used and its role; (2) training data sources and demographic composition; (3) a justification for the base architecture; (4) bias auditing procedures; (5) the community review process for AI-generated content; and (6) human-in-the-loop procedures. The state retains the right to require the vendor to replace any AI model whose disparate-impact performance exceeds defined thresholds.
RationaleStakeholder reviewers identified AI as the highest-risk and least-disclosed component of contemporary assessment contracts. State veto authority over model selection is critical: without it, the vendor's third-party AI vendor effectively controls a substantial portion of the assessment.
Verification(a) The AI Disclosure document submitted with the proposal; (b) annual bias audits with disclosed methodology; (c) independent algorithmic audits at the state's discretion; (d) sample human-review records during operational monitoring.
6
Accessibility Compliance and Specific Population Plans
The vendor MUST provide a current-year (issued within twelve months of proposal submission) Voluntary Product Accessibility Template (VPAT) report covering the assessment platform, item types, score reports, family-facing reports, and any administrator-facing systems. A VPAT older than twelve months does not satisfy this requirement. The vendor MUST also provide: (a) a remediation plan, with target dates, for every accessibility gap disclosed in the VPAT; (b) a specific plan for students with severe cognitive disabilities; (c) a specific plan for students who cannot access online testing; and (d) a multilingual learner accommodations plan that goes beyond translation to address linguistic and cultural variation in responses.
RationaleStakeholder reviewers identified students with severe cognitive disabilities and students who cannot access online testing as systematically underserved by contemporary state assessment contracts.
Verification(a) The current-year VPAT report; (b) the remediation plan with named target dates; (c) the specific population plans, each signed by a named advocacy partner; (d) operational verification during the first administration cycle.
7
Conflict of Interest Disclosure
The vendor and every named subcontractor MUST disclose any actual or potential conflict of interest in the proposal. Disclosable conflicts include: (a) existing or recent contracts with the state; (b) employment, consulting, or advisory relationships between vendor staff and current or former state officials; (c) ownership stakes in competing companies; (d) prior involvement in litigation involving the state's assessment system; and (e) any compensation arrangement that would be paid to a state official in connection with this contract. The vendor MUST update the conflict-of-interest disclosure when material new conflicts arise during the contract term, within 30 days.
RationaleNaming it as an MCR aligns it with the rest of the floor: a vendor that cannot disclose its conflicts cannot proceed to substantive evaluation.
Verification(a) The conflict-of-interest disclosure submitted with the proposal; (b) annual recertification during the contract term; (c) reference to the state's procurement office's standing conflict-of-interest verification procedures.
8
Data Privacy, Security, and Community Data Access
The vendor MUST comply with all applicable federal and state data privacy laws including FERPA, COPPA where applicable, and any state-specific student data privacy statute. The vendor MUST NOT use student-level data for any purpose outside the contracted work without the state's written authorization for each such use. The vendor MUST provide community partners with access to disaggregated, suppression-protected data in non-technical, accessible formats. Where the state adopts the Justice tier, the vendor MUST honor any community data sovereignty agreement entered into between the state and a specific community.
RationaleWithout this MCR, vendors routinely provide community access only through formal data requests reviewed for vendor convenience — effectively gating data behind administrative friction.
Verification(a) The vendor's data privacy compliance attestation; (b) sample community-facing data reports submitted with the proposal; (c) review of data access logs during periodic management monitoring; (d) review of any community data sovereignty agreements in force.
9
Validity with Targeted Populations
The vendor MUST conduct cognitive labs and think-aloud studies with targeted student populations as part of validity evidence collection. "Targeted populations" MUST include, at minimum: students with severe cognitive disabilities; multilingual learners across the languages most commonly spoken in the state; students from communities who have been historically and systematically excluded by state assessment; and any additional populations identified by the Rightsholder Accountability Council or fairness review committee. Cognitive lab participants MUST be fairly compensated under the Rightsholder Compensation Schedule.
RationaleValidity is frequently asserted but rarely demonstrated specifically across HSE populations. Naming specific populations prevents vendors from defaulting to convenience samples.
Verification(a) The cognitive lab study plan submitted with the proposal; (b) participant recruitment records demonstrating coverage of named populations; (c) cognitive lab reports disaggregated by population; (d) traceability from cognitive lab findings to item revisions.
10
Independent Third-Party Review Authority
The vendor MUST permit, at the state's discretion and expense, an independent third-party review of any element of the contracted work. Reviewable elements include, without limitation, bias and sensitivity review processes; AI training data, training procedures, and outputs; accessibility audits; validity evidence and supporting analyses; community engagement records; and cost-proposal compliance. The vendor MUST NOT impose contractual restrictions on the scope of any state-commissioned independent review, on the publication of the state's findings from the review, or on the independent reviewer's access to underlying data, methods, or staff.
RationaleWithout independent review authority, the state's ability to verify any other MCR depends on the vendor's self-reporting. Independent review is the backstop that makes every other MCR enforceable.
Verification(a) The vendor's written acceptance of this requirement; (b) the absence of conflicting language elsewhere in the proposal; (c) operational verification when an independent review is commissioned.

Vendor Attestation Checklist

This checklist mirrors the Vendor Attestation Form. A signed Vendor Attestation is a condition of bid eligibility.

Acceptance — Vendor Attestation Form
Chapter I

Procurement Process and Timeline

This chapter outlines the procurement process for the statewide assessment system and provides a detailed timeline and key dates.

Purpose of the Procurement

This section should outline the state's purpose of seeking proposals from qualified vendors to develop, implement, and maintain a comprehensive statewide assessment system. This system will be a critical tool for measuring student achievement, informing instructional practices, and ensuring accountability across the state's educational landscape.

Process Overview

To facilitate a fair and transparent evaluation of all submissions, the procurement process will follow a competitive sealed proposal method.

#StageDescription
01Release of RFPOfficial release of this document outlining the assessment system's requirements and specifications.
02Vendor Q&A PeriodA designated period for vendors to submit questions seeking clarification on the RFP.
03Proposal Submission DeadlineThe deadline for vendors to submit their complete proposals.
04Proposal EvaluationA comprehensive evaluation of all submitted proposals by a designated evaluation committee.
05Oral Presentations (optional)If necessary, vendors may be invited to give oral presentations to clarify aspects of their proposals.
06Contract Negotiation and AwardNegotiation and award of the contract to the selected vendor(s).
07Implementation and TransitionThe start of the contract, including system implementation and transition activities.

Key Dates — Example Timeline

ActivityDateNotes
Release of RFP[Date][Notes]
Vendor Question Deadline[Date]
State Answers to Vendor Questions[Date]
Proposal Evaluation Period[Date]
Oral Presentations (If Applicable)[Date]
Notice of Intent to Award[Date]
Contract Negotiation Period[Date]
Contract Award[Date]
Contract Start Date[Date]
Chapter II

Background Information for Vendors

Equips vendors with a thorough understanding of the state's assessment system, its governing policies, and the diverse educational landscape within which it operates.

Glossary of Terms

TermDefinition
ALDAchievement Level Descriptor
LDSLongitudinal Data System
SWDStudents with Disabilities
REMRacially and Ethnically Minoritized
RACRightsholders Accountability Council
HSEHistorically and Systemically Excluded
ToATheory of Action
UDLUniversal Design for Learning
VPATVoluntary Product Accessibility Template

Rightsholders, Clients, and Partners

Rightsholders are individuals or groups directly affected by the assessments' outcomes but hold minimal to no direct decision-making power in its design or implementation. Examples include students, parents, or local community-based advocates.

Partners are individuals or groups who have a vested interest in the assessment and hold some decision-making power in the assessment development processes. Partners include legislators, the Governor's office, the state budget office, the state procurement office, the state auditor's office, and local school boards.

Rightsholder Accountability Council (RAC)

The RAC is a community engagement body comprised of individuals and groups representing intersecting marginalized populations from the local community. Input provided by RACs helps inform the intended context of the assessment, provides meaningful and actionable feedback concerning the intended outcomes, and informs the selection of the most relevant and appropriate measures to evaluate them.

MemberProportionFrequency
Students30%Ongoing and iterative participation across all assessment development stages, including construct definition, review and development, administration, planning, reporting, and interpretation of results.
Parents/Guardians30%Ongoing and iterative participation across all assessment development stages. Regular feedback sessions and surveys to capture student perspectives.
Community-Based Advocates20%Regular and consistent engagement throughout the assessment lifecycle. Cadence can be quarterly or as needed during key development stages.
Educators/School Staff20%Regular and consistent engagement throughout the assessment lifecycle, focusing on professional expertise on assessment design and implementation. Cadence can be quarterly or as needed.

Assessment System Background — Exemplar Language

Exemplar Language: Assessment System Background
Foundational
"The proposed assessment system will fully comply with the state's governing documents to ensure all students have equal opportunity to demonstrate their knowledge. The state also requires that all assessments offer appropriate accommodations for different subgroups of students. The proposed assessment system will fully comply with these requirements and provide clear reporting of assessment results, including disaggregated data for subgroups, as required by law…"
Equity
"The proposed assessment system recognizes that score differences across student subgroups often indicate systemic inequities. Therefore, the proposal will analyze disaggregated data, paying close attention to historically marginalized groups. When describing differences between subgroups, the proposal will use 'opportunity gap' to highlight systemic factors such as inequitable access to resources and quality instruction…"
Justice
"The proposed assessment system is grounded in the understanding that systemic inequities disproportionately impact marginalized communities. Terms like 'achievement gap' perpetuate harmful narratives and fail to address the systemic factors contributing to these disparities. Thus, proposals should use language that brings attention to these systemic barriers, such as 'education debt,' to reflect the historical and ongoing injustices impacting student outcomes for marginalized learners…"

Major Reference Documents

CodeDocument
AThe Standards for Educational and Psychological Testing
BWhite paper on common accessibility language for states and assessment vendors
CCriteria for Procuring and Evaluating High Quality Assessments
DOperational Best Practices for Statewide Large-Scale Assessment Programs
ECulturally responsive assessment: Provisional principles
FStrategies that address culturally responsive evaluation
Chapter III

Statement of Work

Project and program management requirements for the assessment system vendor, including key contacts, documentation, change management, and scheduling.

Program Management

The vendor shall establish and maintain effective program management practices throughout the contract. Key areas include:

  • Key Contacts — Designated contacts on both the state and vendor sides for all communications.
  • Executive Management Meetings — Regular meetings between senior leadership to review program status and resolve escalated issues.
  • Documentation Repository — A centralized, state-accessible repository for all project documents, deliverables, and decision logs.
  • Change Management — A formal process for managing changes to scope, schedule, and budget, with written approval from the state required before implementation.
  • Annual Kickoff Meetings — Formal kickoff at the start of each contract year to align on priorities, schedule, and any program changes.
  • Periodic Management Monitoring Meetings — Regular monitoring meetings (typically monthly or quarterly) that include community representation.

Project Management

The vendor shall maintain a project management approach that includes a detailed project schedule updated monthly, a project management team with named roles and responsibilities, and a plan for finalizing the Theory of Action (ToA) if applicable.

Communication Support

All communications between vendors and rightsholders/partners must always be jointly conducted with the client (state). Direct meetings between vendors and rightsholders — including presentations — must always be jointly conducted with the client to maintain transparency and equity of process.

Disaster Planning and Recovery

The vendor shall provide a documented disaster planning and recovery plan that covers system failure scenarios, data recovery timelines, and communication protocols to ensure assessment continuity.

Chapter IV

Assessment Design and Development

Content frameworks, item development, culturally responsive assessment principles, standard setting, and reporting requirements.

Content Frameworks and Standards

This section requires vendors to demonstrate full alignment with state content standards, including annotations that identify the depth, breadth, and complexity the assessment must measure. Vendors must provide Content Standards and Annotations that map each standard to test blueprint specifications.

Achievement Level Descriptors (ALDs)

Vendors must develop ALDs that describe what students at each performance level know and can do. ALDs must be developed in collaboration with the Rightsholder Accountability Council at the Justice tier, and must be written in accessible language for non-specialist audiences including parents and community members.

Innovation in Assessment Design

Student Choice — Foundational

Proposals MUST incorporate student choice in assessment topics, allowing students to select from various relevant options to increase engagement and demonstration of understanding within areas of personal interest.

Student Choice — Equity

Proposals MUST incorporate student choice in assessment topics, providing culturally relevant and diverse content that reflects students' lived experiences and backgrounds, addressing potential biases in topic selection.

Student Choice — Justice

Proposals MUST incorporate student choice in assessment topics, allowing students to co-create and define assessment topics based on their community- and culturally-identified needs and interests, promoting student agency and empowerment.

Item Development

The foundation of a fair and accurate assessment lies in the expertise and diversity of its item writers. Item writer recruitment must emphasize qualifications that value cultural responsiveness, implement proactive recruitment strategies, and establish inclusive hiring procedures.

Principles of Culturally Responsive Assessment for Item Writing

PrincipleDescriptionKey Red Flags
ValidatingAssessment leverages students' cultural knowledge, strengths, and backgrounds to create relevant assessments that bridge the gap between academic concepts and lived experiences.Marginalized cultural knowledge framed as "alternative" or "supplementary" rather than central.
Comprehensive and InclusiveAssessment employs cultural resources to maintain students' ethnic identities, community connections, and success ethic across all content areas.One cultural frame as default, others as "also represented."
MultidimensionalAssessment connected to curriculum and standards, allowing students to see the connection between curriculum, lived experiences, and the assessment.Multidimensional construct forced into unidimensional psychometric model.
EmpoweringAssessment is asset-based, leveraging and highlighting what students know and do well; encouraging collaborative problem-solving and cultural capital acquisition.Score reports leading with deficits.
TransformativeAssessment items nurture a sense of obligation to communities and society, empowering students to be social critics and agents of change."Transformative" framed as individualistic upward mobility rather than community-level change.
EmancipatoryAssessment items challenge the notion of absolute scholarly truth, empowering students to contest and contextualize multiple perspectives.Single "correct" answer where multiple are defensible.
HumanisticAssessment items foster a deeper understanding of self and others and promote empathy and interconnectedness across diverse ethnic, racial, and social groups.Token diversity — one item or character per identified group.
Normative and EthicalAssessment items expose cultural biases and challenge Eurocentric norms inherent to mainstream educational policies and practices.Assessment claims to be culturally neutral.

Standard Setting

Vendors must provide a documented standard-setting plan that includes a qualified external evaluator, a detailed process for incorporating rightsholder feedback into performance level definitions, and a plan for monitoring classification accuracy and consistency.

Reporting

Reporting must consider multiple audiences — students, families, educators, and policymakers — and must be designed with input from the RAC. Reports must be written in asset-based language and must account for measurement error in all score presentations.

📋

AI in Item Development: If AI is used for item generation, vendors must comply with the AI Vendor Accountability Annex (Annex A). Key requirements include a documented Training Data Disclosure, a bias auditing protocol, and mandatory human-in-the-loop review of every AI-generated item before it reaches students.

Chapter V

Test Administration

Item-adaptive testing, remote testing, scoring procedures, AI scoring requirements, test security, and accessibility.

Item-Adaptive Testing

For assessments using adaptive testing algorithms, vendors must document the adaptive algorithm, including its psychometric basis, selection criteria, and how it handles irregularities such as connectivity failures, student disengagement, or anomalous response patterns.

Remote and Virtual Testing

Vendors must define "remote testing" and "virtual testing" as used in the contract, and must provide a comprehensive plan for students who cannot access online testing environments — including an alternative-format administration option with the same content coverage as the online assessment.

Scoring

  • Non-AI-based automated scoring — Rule-based and pattern-matching systems must be documented with clear evidence of validity and reliability across student subgroups.
  • Hand-scoring — Vendors must document scorer training, reliability monitoring, and the process for resolving scoring discrepancies.
  • AI Scoring — If AI is used for scoring, vendors must comply with Annex A requirements, including bias auditing across all reporting subgroups, human review of a defined percentage of AI-scored responses, and community review of AI scoring rubrics.

Test Security

Vendors must provide comprehensive test security procedures and protocols covering pre-administration, administration, and post-administration periods. Security plans must address both physical and digital security, including procedures for detecting and responding to security incidents.

Online Dynamic Reporting System

The vendor must provide an online reporting system that allows educators and administrators to access disaggregated score reports, filter by reporting subgroup, and download data files in accessible formats for community partners.

Deliverables Checklist (Test Administration): Accessibility and Administration plan · Assistive Technology integration specs · AI Scoring documentation (if applicable) · Test Security procedures and qualifications.

Chapter VI

Validation Efforts

Claims, validity evidence, innovative methodological approaches, psychometrics, and the validation argument.

Claims and Validity Evidence

Vendors must develop a validity argument that identifies the intended inferences from assessment scores (claims) and marshals evidence to support each claim. The validity argument must address five major sources of validity evidence as identified in the Standards for Educational and Psychological Testing:

  • Content evidence — Alignment between assessment content and the constructs being measured.
  • Response process evidence — Evidence that students engage with items in ways aligned with the intended construct, including cognitive lab evidence across targeted populations.
  • Internal structure evidence — Psychometric evidence that the assessment behaves as intended, including factor analyses and fit statistics.
  • Relations to other variables — Convergent and discriminant validity evidence showing the assessment relates to other measures as expected.
  • Consequences evidence — Evidence regarding the impact of assessment use on students and institutions, including disparate impact analyses across all reporting subgroups.

Psychometrics

The vendor must document its psychometric model and calibration procedures, including IRT model selection, field test analyses, operational item analyses, and scaling and equating procedures. All psychometric analyses must be disaggregated by reporting subgroup to support detection of differential item functioning (DIF) and disparate impact.

Validation Argument and Peer Review

The assembled validity evidence must be synthesized into a Validation Report that is reviewed by an independent peer review panel. The peer review panel must include at least one member with documented expertise in measurement equity and one community member representing the RAC's perspective on the validity evidence.

⚠️

Justice Tier Note: At the Justice tier, community involvement in validation moves upstream into construct articulation — the RAC must be involved in defining what "success" means before the construct is translated into test specifications. Community-defined constructs of success must be documented and traceable to assessment design choices.

Chapter VII

Managing Risk

Conflict of interest, issue and risk management, and fiscal management provisions.

Conflict of Interest

The vendor must disclose all actual or potential conflicts of interest at the time of proposal submission and must update this disclosure throughout the contract term. See MCR-7 for the complete list of disclosable conflicts.

Issue and Risk Management

The vendor must provide an Issue and Risk Management Plan that includes a risk register updated at each periodic management monitoring meeting, a defined escalation path for high-severity risks, and a process for involving the RAC in risk assessments that affect community engagement or data access.

Fiscal Management

The vendor must maintain documented fiscal management practices including budget tracking by deliverable, a process for managing scope change and associated budget adjustments, and annual financial reporting to the state that includes actual vs. budgeted spend on rightsholder compensation and community engagement.

📋

Deliverables Checklist (Managing Risk): Conflict of Interest Disclosure · Risk Register · Fiscal Management Plan · Annual Financial Report with Rightsholder Compensation accounting.

Chapter VIII

Terms & Conditions

Penalties, bid evaluation process, and cost proposal requirements.

Penalties

The contract must specify financial penalties for breach of MCRs and substantive deliverable failures. Penalty structures should be calibrated to the severity and impact of the breach on students and communities.

Bid Evaluation Process

The evaluation process proceeds in two stages:

  • Stage 1: MCR Compliance Check — The evaluation team verifies that the vendor has accepted every MCR using the Vendor Attestation form. Proposals failing this stage are disqualified without substantive review.
  • Stage 2: Substantive Scoring — Qualifying proposals are scored using the Universal Rubric (0–3 scale) applied to each section. Proposals must meet a minimum overall score threshold to qualify for contract negotiation.

The state uses a "best value" criterion rather than a "lowest cost" criterion for selecting from among qualifying proposals.

Cost Proposal

The cost proposal template must include separate line items for: core assessment development; rightsholder compensation schedule; accommodations budget (childcare, transportation, technology, translation); accessibility remediation; community engagement activities; and AI compliance activities (if applicable).

Pre-Release Checklist: Tier selection confirmed · Scope defined · All placeholders resolved · State-specific MCRs added · Evaluation responsibilities assigned · Penalty structures reviewed by procurement office.

Annex A

AI Vendor Accountability

Definitions, required disclosures, use-case-specific requirements, state decision rights, and verification mechanisms for all artificial intelligence used in assessment work.

Purpose and Scope

This Annex applies if the vendor uses artificial intelligence at any stage of the assessment lifecycle. It establishes the disclosure requirements, use-case-specific guardrails, state decision rights, and verification mechanisms that govern AI use in assessment. This Annex is referenced in the AI section of MCR-5 and in Chapters IV and V.

Key Definitions

TermDefinition
Artificial Intelligence (AI)Includes, without limitation, large language models, generative models, classifier models, scoring engines, and adaptive testing algorithms used at any stage of the assessment lifecycle.
Custom vs. Off-the-Shelf AICustom AI is trained or fine-tuned by the vendor for assessment purposes. Off-the-shelf AI is a third-party product used without modification to its underlying model.
Training DataThe data used to train, fine-tune, or calibrate any AI model used in the assessment lifecycle, including its demographic composition and cleaning procedures.
Disparate-Impact ThresholdThe quantitative threshold beyond which AI-driven disparities in outcomes across reporting subgroups trigger mandatory human review or model replacement.
Human-in-the-LoopA defined process in which qualified human reviewers review and approve AI outputs before those outputs reach students or inform scoring decisions.
State Decision RightsThe enumerated rights retained by the state to approve, reject, audit, or require replacement of AI models, training data, or AI-generated outputs.

AI Use Cases

AI for Item Generation Use Case 2.1

Vendors using AI to generate assessment items must provide Training Data Disclosure (model, data sources, demographic composition), a bias auditing protocol with named techniques and disparate-impact thresholds, evidence that the AI's base architecture is grounded in learning progressions or a comparable construct-aligned model, and mandatory human-in-the-loop review of every AI-generated item before it reaches operational use. At the Justice tier, community members must be involved in reviewing and approving AI-generated items.

AI for Item Review Use Case 2.2

AI used to assist in item review (e.g., automated bias flagging) must be validated against human expert judgments and must not replace the human bias and sensitivity review committee required by MCR-1. The AI's flagging decisions must be disclosed to committee members, who retain final approval authority.

AI for Item Scoring Use Case 2.3

AI scoring engines must demonstrate score reliability and validity equal to or greater than human hand-scoring across all reporting subgroups. The vendor must specify the percentage of AI-scored responses reviewed by qualified human scorers, the triggers for human review (e.g., low confidence scores, responses from flagged demographic groups), and the process for adjudicating disagreements between AI and human scores.

AI for Adaptive Testing Use Case 2.4

Adaptive algorithms must be documented with respect to item selection rules, exposure controls, and handling of aberrant response patterns. The algorithm must be validated for measurement equivalence across student subgroups, with particular attention to students with disabilities and multilingual learners.

AI for Anomaly Detection and Test Security Use Case 2.5

AI used for security monitoring must have documented false positive and false negative rates disaggregated by student demographic group. States must retain authority over the consequence of security flags — AI may flag but not adjudicate.

AI for Reporting and Analytics Use Case 2.6

AI-generated insights, predictions, or recommendations in reporting systems must be clearly labeled as AI-generated, must include confidence intervals or uncertainty estimates, and must be reviewed by qualified assessment professionals before being included in official score reports.

State Decision Rights

  • 3.1 Pre-Deployment Approval — The state must approve any AI model before it is deployed in the assessment lifecycle.
  • 3.2 Mid-Contract Model Replacement — The state retains the right to require model replacement if disparate-impact thresholds are breached.
  • 3.3 Independent Algorithmic Audit — The state retains the right to commission an independent algorithmic audit at intervals defined by the state, at the state's expense.
  • 3.4 Off-the-Shelf Model Veto — The state retains the right to veto the use of any off-the-shelf model that the state cannot adequately audit or that fails the disparate-impact threshold.
  • 3.5 Community Sign-Off (Justice Tier) — At the Justice tier, the RAC must approve AI models used for item generation, scoring, and reporting before deployment.

Required AI Deliverables: AI Disclosure Document · Training Data Disclosure · Annual AI Compliance Report · Independent Algorithmic Audit results · Public-Facing AI Use Summary · Community Engagement Artifacts (Equity and Justice tiers).