Assessment Request for
Proposal (RFP) Development
Toolkit
Purpose of the Toolkit
A design guide for developing state-level educational assessment RFPs, covering every stage from planning and procurement through validation.
This document is a design guide for developing a Request for Proposal (RFP) for state-level educational assessments. The document details guidance and considerations for each stage of the assessment process, from initial planning and procurement to item development, administration, psychometrics, reporting, and validation procedures.
A key feature is the inclusion of ready-to-implement exemplars: tables, figures, and language blocks that can be directly incorporated into RFPs. Where relevant, the guide provides three language tiers—focused on foundational, equity, and justice—offering states different language levels of commitment they wish vendors to include in their proposals.
The Three Language Tiers
Proposals adopting this language ensure that all students have an equal opportunity to participate in and demonstrate their learning on the assessment. This includes ensuring the assessment is accessible to students with disabilities, free from obvious bias, and administered fairly.
Proposals adopting this language go beyond acknowledging fairness and actively expand the participation of students, schools, and communities directly impacted by the proposal. This level seeks to identify and address potential disparities, with special attention to historically and systematically excluded groups.
Proposals adopting this language represent the most ambitious language in proposal writing because they explicitly recognize systemic inequities (e.g., racism, sexism, ableism, intersectionality) and their profound impact on educational outcomes. Proposals at this level would explicitly name and address these inequities and actively involve marginalized communities in the assessment development.
Chapter Scope: Required and Optional Sections
Every chapter in the toolkit contains material that some states will adopt as-is, some will customize, and some will omit. The table below identifies the default disposition of each chapter.
| Chapter | Disposition | Notes |
|---|---|---|
| I — Procurement Process and Timeline | Required | State-specific dates and procurement vehicle. Apply your state's standard procurement boilerplate where applicable. |
| II — Background Information for Vendors | Required | Includes the Rightsholder Accountability Council provisions and community partnership framing. Tier choice has the largest effect here. |
| III — Statement of Work | Required | Project management framing. Lightly tier-sensitive. |
| IV — Assessment Design and Development | Required if in scope | Includes content frameworks, item development, standard setting, and reporting. AI Vendor Accountability Annex is referenced here. |
| V — Test Administration | Required if in scope | Includes item-adaptive testing, irregularities, remote/virtual testing definitions, scoring, AI scoring, security. |
| VI — Validation Efforts | Required | Validity evidence is required for any assessment used for high-stakes decisions. Justice tier moves community involvement upstream into construct articulation. |
| VII — Managing Risk | Recommended | Conflict of interest, issue and risk management, fiscal management. Strongly recommended for any multi-year contract. |
| VIII — Terms and Conditions | Required | Penalties, bid evaluation process, cost proposal. The Four-Point Scoring Rubric and Deliverables Checklists are referenced from this chapter. |
How to Use This Toolkit — A Quick Start
Five decisions will assemble a complete RFP from this toolkit. Each step references the chapter, table, or appendix where the relevant content lives.
Decide your tier
Choose Foundational, Equity, or Justice. The toolkit provides ready-to-use exemplar language at each tier across every section. The choice should be made by the state team that will own the procurement, with input from leadership about what your political and legal context can support. See the Tier-Selection Decision Guide for help. Your tier choice will determine which column of every tiered table you draw from. States may also opt to move back and forth between tiers for different components of the RFP.
Define your scope
Identify which assessment components are in scope (e.g., summative only; summative plus interim; the full system). Chapter II includes a Components of the Assessment System table — complete it first. Anything you mark out-of-scope can be removed from the resulting RFP.
Customize the state-specific placeholders
Throughout the toolkit, you will see placeholders enclosed in angle brackets — for example, <state name>, <List specific state systems>, <Subject Area>. These are the only places the language is intended to change. Before releasing the RFP, search the document for the strings <, >, and [ to confirm no placeholder was missed.
Adopt the Minimum Compliance Requirements
The toolkit includes a Minimum Compliance Requirements page. These are the commitments that are non-negotiable for any tier — including annual bias review, fair compensation for rightsholders, AI training data disclosure, and current-year accessibility (VPAT) reporting. Vendors who do not accept these commitments are disqualified, regardless of how their proposal otherwise scores.
Plan your evaluation
Two structures support evaluation: the Four-Point Scoring Rubric (included in the front matter) and the Required Deliverables Checklists at the end of each chapter. The deliverables checklist is a completeness check — it verifies that the vendor produced every required artifact — and is used before substantive scoring begins. The four-point rubric is then applied to score the substance of each response section.
Pre-release checklist Before you release the RFP, the state team should agree on the tier(s) (Step 1), the scope (Step 2), the customization choices (Step 3), any state-specific additions to the Minimum Compliance Requirements (Step 4), and how the team will divide evaluation responsibilities (Step 5). A pre-release checklist appears in the Bid Evaluation Process chapter.
Tier-Selection Decision Guide
The toolkit offers three tiers of exemplar language. Each tier escalates the substantive commitments your RFP will require of vendors.
| Tier | What it commits the state to | Typical context |
|---|---|---|
| Foundational (Fairness) |
Access, non-discrimination, baseline procedural fairness. Standards-aligned bias review. Accessibility compliance. Equal opportunity to demonstrate learning. | Appropriate in any state. A defensible floor regardless of political context. Useful when "equity" framing is unworkable. |
| Equity | Subgroup outcomes addressed, not just overall. Targeted outreach and engagement of historically underserved communities. Disaggregated validity, reliability, and impact analysis. Bias review committees with community representation. | Appropriate where the state has political and legal latitude to name and address group-level disparities, and where community engagement infrastructure exists or can be funded. |
| Justice | Community decision-making authority, not just consultation. Rightsholder Accountability Council with sign-off rights. Data sovereignty. Co-creation of constructs, items, and reporting. Material redistribution of resources to community partners. | Appropriate where state leadership is prepared to share decision rights with impacted communities and to dedicate resources to community-led governance over the contract lifetime. |
Decision Questions
Mixed-tier RFPs are common and appropriate. A state might adopt Justice-tier language for community partnership and validity evidence (Chapters II and VI) while using Equity-tier language for procurement procedures (Chapters I and III) where community decision rights are less applicable. The tier you choose should be section-appropriate, not necessarily document-wide.
Language Strength Legend
Every exemplar language block in this toolkit uses a controlled vocabulary so that vendors and reviewers understand what is binding, what is preferred, and what is optional.
| Word | Meaning | Use when… |
|---|---|---|
| MUST / SHALL | Binding requirement. Non-compliance disqualifies the proposal. | The state will not negotiate this point — bias review, fair compensation for rightsholders, AI training data disclosure, current-year accessibility reporting. |
| WILL | Statement of fact about what the awarded vendor will do. | Describing scope of work the vendor performs once the contract is signed. |
| SHOULD | Strong preference. Deviation requires justification. | Practices the state strongly prefers but is willing to consider alternatives for. Use sparingly. |
| MAY | Optional. Vendor discretion. | Genuinely optional enhancements that do not affect evaluation. |
Practical convention: when a section asserts that something must happen and the vendor's response will be scored on whether they commit to it, use MUST. When the section describes what will already be true under the contract, use WILL. Reserve SHOULD for the small number of genuine preferences.
Customization Symbols Key
Every exemplar language block uses placeholder symbols where state-specific information must be inserted.
| Placeholder | Meaning | Example replacement |
|---|---|---|
| <state name> | Full name of the state issuing the RFP | Massachusetts |
| <State> | State name when used as a possessive or adjective | Massachusetts' |
| <Subject Area> | Content area within scope | Mathematics, English Language Arts, Science |
| <Specific Degree/Certification> | Minimum credential the vendor's staff must hold | Bachelor's degree in subject area; valid teaching license |
| <Number> | Years, count, percentage, or threshold the state specifies | 3 (years); 5 (panel members) |
| <List specific …> | Italicized list the state replaces with state-specific items | Replace with a complete list — do not leave the angle brackets in the final RFP |
| [number] | Square brackets indicate a value the state should determine | 3 (innovative assessment methods) |
Before releasing the RFP, search the document for the strings <, >, and [ to confirm no placeholder was missed. A common procurement-disqualifying error is leaving a placeholder in the final document — vendors will either ask clarifying questions that delay the procurement or, worse, submit responses that quote the unfilled placeholder back to the state.
Universal Rubric
Apply the same 0–3 scale across every section of the RFP. The descriptors below define what each score level means in general; chapter-specific anchors follow.
The response is missing, restates the requirement without substance, contradicts the requirement, or refers to a future state that the vendor commits no resources to build.
The response acknowledges the requirement and describes intent, but lacks specifics on process, timeline, qualifications, or evidence. Reads as boilerplate. Does not provide reviewers with anything to evaluate beyond the assertion itself.
The response describes a named process, identifies roles and qualifications, names specific deliverables and a timeline, and includes at least one concrete example from prior work that demonstrates the vendor has done this before.
The response provides a documented protocol with evidence of past impact, names rightsholder partners with letters of support, specifies decision rights including community sign-off where applicable, and demonstrates a track record of having changed practice in response to community input. The response anticipates implementation challenges and addresses them.
Scoring convention: a section is evaluated holistically. If a response includes one paragraph of substance and three paragraphs of restated requirement, the score is 1, not 2. Reviewers should be able to point to the specific evidence that supports a score of 2 or 3.
Customization & Symbols Key
Every exemplar language block uses placeholder symbols where state-specific information must be inserted. Use this key to ensure no placeholder is missed in the final document.
See the Language Legend section for the complete Customization Symbols Key, including all placeholders and example replacements.
Minimum Compliance Requirements
Commitments that the state Department of Education will not negotiate. Vendors who decline any MCR are disqualified before substantive evaluation begins.
How these requirements operate: The MCRs are scored as a single binary: accept or decline. A vendor who accepts every MCR proceeds to substantive scoring; a vendor who declines any MCR does not. Vendors may propose stricter language — they may not propose weaker language. This section applies at all three tiers.
Vendor Attestation Checklist
This checklist mirrors the Vendor Attestation Form. A signed Vendor Attestation is a condition of bid eligibility.
Procurement Process and Timeline
This chapter outlines the procurement process for the statewide assessment system and provides a detailed timeline and key dates.
Purpose of the Procurement
This section should outline the state's purpose of seeking proposals from qualified vendors to develop, implement, and maintain a comprehensive statewide assessment system. This system will be a critical tool for measuring student achievement, informing instructional practices, and ensuring accountability across the state's educational landscape.
Process Overview
To facilitate a fair and transparent evaluation of all submissions, the procurement process will follow a competitive sealed proposal method.
| # | Stage | Description |
|---|---|---|
| 01 | Release of RFP | Official release of this document outlining the assessment system's requirements and specifications. |
| 02 | Vendor Q&A Period | A designated period for vendors to submit questions seeking clarification on the RFP. |
| 03 | Proposal Submission Deadline | The deadline for vendors to submit their complete proposals. |
| 04 | Proposal Evaluation | A comprehensive evaluation of all submitted proposals by a designated evaluation committee. |
| 05 | Oral Presentations (optional) | If necessary, vendors may be invited to give oral presentations to clarify aspects of their proposals. |
| 06 | Contract Negotiation and Award | Negotiation and award of the contract to the selected vendor(s). |
| 07 | Implementation and Transition | The start of the contract, including system implementation and transition activities. |
Key Dates — Example Timeline
| Activity | Date | Notes |
|---|---|---|
| Release of RFP | [Date] | [Notes] |
| Vendor Question Deadline | [Date] | |
| State Answers to Vendor Questions | [Date] | |
| Proposal Evaluation Period | [Date] | |
| Oral Presentations (If Applicable) | [Date] | |
| Notice of Intent to Award | [Date] | |
| Contract Negotiation Period | [Date] | |
| Contract Award | [Date] | |
| Contract Start Date | [Date] |
Background Information for Vendors
Equips vendors with a thorough understanding of the state's assessment system, its governing policies, and the diverse educational landscape within which it operates.
Glossary of Terms
| Term | Definition |
|---|---|
| ALD | Achievement Level Descriptor |
| LDS | Longitudinal Data System |
| SWD | Students with Disabilities |
| REM | Racially and Ethnically Minoritized |
| RAC | Rightsholders Accountability Council |
| HSE | Historically and Systemically Excluded |
| ToA | Theory of Action |
| UDL | Universal Design for Learning |
| VPAT | Voluntary Product Accessibility Template |
Rightsholders, Clients, and Partners
Rightsholders are individuals or groups directly affected by the assessments' outcomes but hold minimal to no direct decision-making power in its design or implementation. Examples include students, parents, or local community-based advocates.
Partners are individuals or groups who have a vested interest in the assessment and hold some decision-making power in the assessment development processes. Partners include legislators, the Governor's office, the state budget office, the state procurement office, the state auditor's office, and local school boards.
Rightsholder Accountability Council (RAC)
The RAC is a community engagement body comprised of individuals and groups representing intersecting marginalized populations from the local community. Input provided by RACs helps inform the intended context of the assessment, provides meaningful and actionable feedback concerning the intended outcomes, and informs the selection of the most relevant and appropriate measures to evaluate them.
| Member | Proportion | Frequency |
|---|---|---|
| Students | 30% | Ongoing and iterative participation across all assessment development stages, including construct definition, review and development, administration, planning, reporting, and interpretation of results. |
| Parents/Guardians | 30% | Ongoing and iterative participation across all assessment development stages. Regular feedback sessions and surveys to capture student perspectives. |
| Community-Based Advocates | 20% | Regular and consistent engagement throughout the assessment lifecycle. Cadence can be quarterly or as needed during key development stages. |
| Educators/School Staff | 20% | Regular and consistent engagement throughout the assessment lifecycle, focusing on professional expertise on assessment design and implementation. Cadence can be quarterly or as needed. |
Assessment System Background — Exemplar Language
Major Reference Documents
| Code | Document |
|---|---|
| A | The Standards for Educational and Psychological Testing |
| B | White paper on common accessibility language for states and assessment vendors |
| C | Criteria for Procuring and Evaluating High Quality Assessments |
| D | Operational Best Practices for Statewide Large-Scale Assessment Programs |
| E | Culturally responsive assessment: Provisional principles |
| F | Strategies that address culturally responsive evaluation |
Statement of Work
Project and program management requirements for the assessment system vendor, including key contacts, documentation, change management, and scheduling.
Program Management
The vendor shall establish and maintain effective program management practices throughout the contract. Key areas include:
- Key Contacts — Designated contacts on both the state and vendor sides for all communications.
- Executive Management Meetings — Regular meetings between senior leadership to review program status and resolve escalated issues.
- Documentation Repository — A centralized, state-accessible repository for all project documents, deliverables, and decision logs.
- Change Management — A formal process for managing changes to scope, schedule, and budget, with written approval from the state required before implementation.
- Annual Kickoff Meetings — Formal kickoff at the start of each contract year to align on priorities, schedule, and any program changes.
- Periodic Management Monitoring Meetings — Regular monitoring meetings (typically monthly or quarterly) that include community representation.
Project Management
The vendor shall maintain a project management approach that includes a detailed project schedule updated monthly, a project management team with named roles and responsibilities, and a plan for finalizing the Theory of Action (ToA) if applicable.
Communication Support
All communications between vendors and rightsholders/partners must always be jointly conducted with the client (state). Direct meetings between vendors and rightsholders — including presentations — must always be jointly conducted with the client to maintain transparency and equity of process.
Disaster Planning and Recovery
The vendor shall provide a documented disaster planning and recovery plan that covers system failure scenarios, data recovery timelines, and communication protocols to ensure assessment continuity.
Assessment Design and Development
Content frameworks, item development, culturally responsive assessment principles, standard setting, and reporting requirements.
Content Frameworks and Standards
This section requires vendors to demonstrate full alignment with state content standards, including annotations that identify the depth, breadth, and complexity the assessment must measure. Vendors must provide Content Standards and Annotations that map each standard to test blueprint specifications.
Achievement Level Descriptors (ALDs)
Vendors must develop ALDs that describe what students at each performance level know and can do. ALDs must be developed in collaboration with the Rightsholder Accountability Council at the Justice tier, and must be written in accessible language for non-specialist audiences including parents and community members.
Innovation in Assessment Design
Proposals MUST incorporate student choice in assessment topics, allowing students to select from various relevant options to increase engagement and demonstration of understanding within areas of personal interest.
Proposals MUST incorporate student choice in assessment topics, providing culturally relevant and diverse content that reflects students' lived experiences and backgrounds, addressing potential biases in topic selection.
Proposals MUST incorporate student choice in assessment topics, allowing students to co-create and define assessment topics based on their community- and culturally-identified needs and interests, promoting student agency and empowerment.
Item Development
The foundation of a fair and accurate assessment lies in the expertise and diversity of its item writers. Item writer recruitment must emphasize qualifications that value cultural responsiveness, implement proactive recruitment strategies, and establish inclusive hiring procedures.
Principles of Culturally Responsive Assessment for Item Writing
| Principle | Description | Key Red Flags |
|---|---|---|
| Validating | Assessment leverages students' cultural knowledge, strengths, and backgrounds to create relevant assessments that bridge the gap between academic concepts and lived experiences. | Marginalized cultural knowledge framed as "alternative" or "supplementary" rather than central. |
| Comprehensive and Inclusive | Assessment employs cultural resources to maintain students' ethnic identities, community connections, and success ethic across all content areas. | One cultural frame as default, others as "also represented." |
| Multidimensional | Assessment connected to curriculum and standards, allowing students to see the connection between curriculum, lived experiences, and the assessment. | Multidimensional construct forced into unidimensional psychometric model. |
| Empowering | Assessment is asset-based, leveraging and highlighting what students know and do well; encouraging collaborative problem-solving and cultural capital acquisition. | Score reports leading with deficits. |
| Transformative | Assessment items nurture a sense of obligation to communities and society, empowering students to be social critics and agents of change. | "Transformative" framed as individualistic upward mobility rather than community-level change. |
| Emancipatory | Assessment items challenge the notion of absolute scholarly truth, empowering students to contest and contextualize multiple perspectives. | Single "correct" answer where multiple are defensible. |
| Humanistic | Assessment items foster a deeper understanding of self and others and promote empathy and interconnectedness across diverse ethnic, racial, and social groups. | Token diversity — one item or character per identified group. |
| Normative and Ethical | Assessment items expose cultural biases and challenge Eurocentric norms inherent to mainstream educational policies and practices. | Assessment claims to be culturally neutral. |
Standard Setting
Vendors must provide a documented standard-setting plan that includes a qualified external evaluator, a detailed process for incorporating rightsholder feedback into performance level definitions, and a plan for monitoring classification accuracy and consistency.
Reporting
Reporting must consider multiple audiences — students, families, educators, and policymakers — and must be designed with input from the RAC. Reports must be written in asset-based language and must account for measurement error in all score presentations.
AI in Item Development: If AI is used for item generation, vendors must comply with the AI Vendor Accountability Annex (Annex A). Key requirements include a documented Training Data Disclosure, a bias auditing protocol, and mandatory human-in-the-loop review of every AI-generated item before it reaches students.
Test Administration
Item-adaptive testing, remote testing, scoring procedures, AI scoring requirements, test security, and accessibility.
Item-Adaptive Testing
For assessments using adaptive testing algorithms, vendors must document the adaptive algorithm, including its psychometric basis, selection criteria, and how it handles irregularities such as connectivity failures, student disengagement, or anomalous response patterns.
Remote and Virtual Testing
Vendors must define "remote testing" and "virtual testing" as used in the contract, and must provide a comprehensive plan for students who cannot access online testing environments — including an alternative-format administration option with the same content coverage as the online assessment.
Scoring
- Non-AI-based automated scoring — Rule-based and pattern-matching systems must be documented with clear evidence of validity and reliability across student subgroups.
- Hand-scoring — Vendors must document scorer training, reliability monitoring, and the process for resolving scoring discrepancies.
- AI Scoring — If AI is used for scoring, vendors must comply with Annex A requirements, including bias auditing across all reporting subgroups, human review of a defined percentage of AI-scored responses, and community review of AI scoring rubrics.
Test Security
Vendors must provide comprehensive test security procedures and protocols covering pre-administration, administration, and post-administration periods. Security plans must address both physical and digital security, including procedures for detecting and responding to security incidents.
Online Dynamic Reporting System
The vendor must provide an online reporting system that allows educators and administrators to access disaggregated score reports, filter by reporting subgroup, and download data files in accessible formats for community partners.
Deliverables Checklist (Test Administration): Accessibility and Administration plan · Assistive Technology integration specs · AI Scoring documentation (if applicable) · Test Security procedures and qualifications.
Validation Efforts
Claims, validity evidence, innovative methodological approaches, psychometrics, and the validation argument.
Claims and Validity Evidence
Vendors must develop a validity argument that identifies the intended inferences from assessment scores (claims) and marshals evidence to support each claim. The validity argument must address five major sources of validity evidence as identified in the Standards for Educational and Psychological Testing:
- Content evidence — Alignment between assessment content and the constructs being measured.
- Response process evidence — Evidence that students engage with items in ways aligned with the intended construct, including cognitive lab evidence across targeted populations.
- Internal structure evidence — Psychometric evidence that the assessment behaves as intended, including factor analyses and fit statistics.
- Relations to other variables — Convergent and discriminant validity evidence showing the assessment relates to other measures as expected.
- Consequences evidence — Evidence regarding the impact of assessment use on students and institutions, including disparate impact analyses across all reporting subgroups.
Psychometrics
The vendor must document its psychometric model and calibration procedures, including IRT model selection, field test analyses, operational item analyses, and scaling and equating procedures. All psychometric analyses must be disaggregated by reporting subgroup to support detection of differential item functioning (DIF) and disparate impact.
Validation Argument and Peer Review
The assembled validity evidence must be synthesized into a Validation Report that is reviewed by an independent peer review panel. The peer review panel must include at least one member with documented expertise in measurement equity and one community member representing the RAC's perspective on the validity evidence.
Justice Tier Note: At the Justice tier, community involvement in validation moves upstream into construct articulation — the RAC must be involved in defining what "success" means before the construct is translated into test specifications. Community-defined constructs of success must be documented and traceable to assessment design choices.
Managing Risk
Conflict of interest, issue and risk management, and fiscal management provisions.
Conflict of Interest
The vendor must disclose all actual or potential conflicts of interest at the time of proposal submission and must update this disclosure throughout the contract term. See MCR-7 for the complete list of disclosable conflicts.
Issue and Risk Management
The vendor must provide an Issue and Risk Management Plan that includes a risk register updated at each periodic management monitoring meeting, a defined escalation path for high-severity risks, and a process for involving the RAC in risk assessments that affect community engagement or data access.
Fiscal Management
The vendor must maintain documented fiscal management practices including budget tracking by deliverable, a process for managing scope change and associated budget adjustments, and annual financial reporting to the state that includes actual vs. budgeted spend on rightsholder compensation and community engagement.
Deliverables Checklist (Managing Risk): Conflict of Interest Disclosure · Risk Register · Fiscal Management Plan · Annual Financial Report with Rightsholder Compensation accounting.
Terms & Conditions
Penalties, bid evaluation process, and cost proposal requirements.
Penalties
The contract must specify financial penalties for breach of MCRs and substantive deliverable failures. Penalty structures should be calibrated to the severity and impact of the breach on students and communities.
Bid Evaluation Process
The evaluation process proceeds in two stages:
- Stage 1: MCR Compliance Check — The evaluation team verifies that the vendor has accepted every MCR using the Vendor Attestation form. Proposals failing this stage are disqualified without substantive review.
- Stage 2: Substantive Scoring — Qualifying proposals are scored using the Universal Rubric (0–3 scale) applied to each section. Proposals must meet a minimum overall score threshold to qualify for contract negotiation.
The state uses a "best value" criterion rather than a "lowest cost" criterion for selecting from among qualifying proposals.
Cost Proposal
The cost proposal template must include separate line items for: core assessment development; rightsholder compensation schedule; accommodations budget (childcare, transportation, technology, translation); accessibility remediation; community engagement activities; and AI compliance activities (if applicable).
Pre-Release Checklist: Tier selection confirmed · Scope defined · All placeholders resolved · State-specific MCRs added · Evaluation responsibilities assigned · Penalty structures reviewed by procurement office.
AI Vendor Accountability
Definitions, required disclosures, use-case-specific requirements, state decision rights, and verification mechanisms for all artificial intelligence used in assessment work.
Purpose and Scope
This Annex applies if the vendor uses artificial intelligence at any stage of the assessment lifecycle. It establishes the disclosure requirements, use-case-specific guardrails, state decision rights, and verification mechanisms that govern AI use in assessment. This Annex is referenced in the AI section of MCR-5 and in Chapters IV and V.
Key Definitions
| Term | Definition |
|---|---|
| Artificial Intelligence (AI) | Includes, without limitation, large language models, generative models, classifier models, scoring engines, and adaptive testing algorithms used at any stage of the assessment lifecycle. |
| Custom vs. Off-the-Shelf AI | Custom AI is trained or fine-tuned by the vendor for assessment purposes. Off-the-shelf AI is a third-party product used without modification to its underlying model. |
| Training Data | The data used to train, fine-tune, or calibrate any AI model used in the assessment lifecycle, including its demographic composition and cleaning procedures. |
| Disparate-Impact Threshold | The quantitative threshold beyond which AI-driven disparities in outcomes across reporting subgroups trigger mandatory human review or model replacement. |
| Human-in-the-Loop | A defined process in which qualified human reviewers review and approve AI outputs before those outputs reach students or inform scoring decisions. |
| State Decision Rights | The enumerated rights retained by the state to approve, reject, audit, or require replacement of AI models, training data, or AI-generated outputs. |
AI Use Cases
AI for Item Generation Use Case 2.1
Vendors using AI to generate assessment items must provide Training Data Disclosure (model, data sources, demographic composition), a bias auditing protocol with named techniques and disparate-impact thresholds, evidence that the AI's base architecture is grounded in learning progressions or a comparable construct-aligned model, and mandatory human-in-the-loop review of every AI-generated item before it reaches operational use. At the Justice tier, community members must be involved in reviewing and approving AI-generated items.
AI for Item Review Use Case 2.2
AI used to assist in item review (e.g., automated bias flagging) must be validated against human expert judgments and must not replace the human bias and sensitivity review committee required by MCR-1. The AI's flagging decisions must be disclosed to committee members, who retain final approval authority.
AI for Item Scoring Use Case 2.3
AI scoring engines must demonstrate score reliability and validity equal to or greater than human hand-scoring across all reporting subgroups. The vendor must specify the percentage of AI-scored responses reviewed by qualified human scorers, the triggers for human review (e.g., low confidence scores, responses from flagged demographic groups), and the process for adjudicating disagreements between AI and human scores.
AI for Adaptive Testing Use Case 2.4
Adaptive algorithms must be documented with respect to item selection rules, exposure controls, and handling of aberrant response patterns. The algorithm must be validated for measurement equivalence across student subgroups, with particular attention to students with disabilities and multilingual learners.
AI for Anomaly Detection and Test Security Use Case 2.5
AI used for security monitoring must have documented false positive and false negative rates disaggregated by student demographic group. States must retain authority over the consequence of security flags — AI may flag but not adjudicate.
AI for Reporting and Analytics Use Case 2.6
AI-generated insights, predictions, or recommendations in reporting systems must be clearly labeled as AI-generated, must include confidence intervals or uncertainty estimates, and must be reviewed by qualified assessment professionals before being included in official score reports.
State Decision Rights
- 3.1 Pre-Deployment Approval — The state must approve any AI model before it is deployed in the assessment lifecycle.
- 3.2 Mid-Contract Model Replacement — The state retains the right to require model replacement if disparate-impact thresholds are breached.
- 3.3 Independent Algorithmic Audit — The state retains the right to commission an independent algorithmic audit at intervals defined by the state, at the state's expense.
- 3.4 Off-the-Shelf Model Veto — The state retains the right to veto the use of any off-the-shelf model that the state cannot adequately audit or that fails the disparate-impact threshold.
- 3.5 Community Sign-Off (Justice Tier) — At the Justice tier, the RAC must approve AI models used for item generation, scoring, and reporting before deployment.
Required AI Deliverables: AI Disclosure Document · Training Data Disclosure · Annual AI Compliance Report · Independent Algorithmic Audit results · Public-Facing AI Use Summary · Community Engagement Artifacts (Equity and Justice tiers).