Every Major U.S. AI Lab Now Subject to Government Testing Before Model Release
The Commerce Department's Center for AI Standards and Innovation has finalized pre-deployment evaluation agreements with all five major American frontier AI laboratories — OpenAI, Anthropic, Google DeepMind, Microsoft, and xAI — completing a voluntary framework under which every significant new AI model must pass government security evaluation before public release. With more than 40 assessments completed since 2024, the program is quietly becoming the de facto regulatory floor for U.S. frontier AI.
Without a formal law, without a regulator with enforcement authority, and without a single congressional vote, the United States has quietly achieved something that eluded European policymakers for years: mandatory evaluation of every major AI model before it reaches the public. The mechanism is the Commerce Department’s Center for AI Standards and Innovation, and as of May 5, 2026, its voluntary framework has expanded to cover all five of the country’s frontier AI laboratories.
What CAISI Does
The Center for AI Standards and Innovation sits within the National Institute of Standards and Technology, part of the Department of Commerce. Its pre-deployment evaluation program does what its name implies: frontier AI companies submit their models to CAISI before public launch, and the center assesses them for specific risk categories — primarily cybersecurity vulnerabilities, biosecurity risks tied to dangerous pathogen synthesis or enhancement, and chemical weapons development capabilities.
The evaluations are not safety reviews in the broad sense that AI ethicists typically discuss. CAISI is not assessing whether a model might be biased, harmful in ordinary consumer interactions, or likely to generate toxic content. The focus is narrower and more operationally specific: can this model provide meaningful uplift to someone trying to build a cyberweapon, create a biological weapon, or synthesize a chemical weapon?
CAISI reports findings to the labs, which are then expected to mitigate identified capabilities before release. The program has now completed more than 40 model evaluations since its first agreements with OpenAI and Anthropic in 2024. In practice, that means every major model release from these companies in the past two years has passed through some form of government capability assessment before reaching developers and consumers.
The Expansion to Five Labs
The original CAISI agreements, signed with OpenAI and Anthropic in 2024, reflected the political logic of the Biden administration’s AI executive order, which emphasized voluntary commitments from leading labs as a pragmatic alternative to legislation that Congress showed no appetite for passing. Those agreements were renegotiated and renewed when the current administration took office.
On May 5, 2026, NIST announced that Google DeepMind, Microsoft, and xAI had signed new agreements, completing the five-lab coverage. The announcement came through NIST’s Center for AI Standards and Innovation, and the framing from the Trump administration was notably different from its predecessor: where the Biden approach emphasized responsible AI development language, the current administration emphasizes national security and competitive advantage — a framing that makes the same pre-deployment review process easier to justify politically.
Microsoft’s inclusion is significant because the company’s AI products are heavily powered by OpenAI models rather than independently developed frontier systems. The Microsoft agreement likely covers jointly developed products and any frontier models the company develops with its OpenAI partnership, extending the coverage to Copilot and related enterprise AI products.
xAI’s participation, representing Elon Musk’s AI lab and its Grok model family, is the geopolitically visible addition. xAI is the newest entrant to the frontier model market and has released models with notably fewer content restrictions than its competitors. The CAISI agreement subjects Grok’s next-generation models to the same pre-deployment national security evaluation as the more established labs.
Voluntary But Real
The “voluntary” designation in the agreements matters less in practice than it might sound. When all five major labs have signed the same framework, the practical pressure on any new entrant to the frontier model market to also participate is substantial. Launching a powerful AI model without engaging CAISI’s review process would immediately invite questions from regulators, policymakers, and enterprise customers who have grown accustomed to the framework as a baseline assurance.
This is how regulatory floors get established in fast-moving technology markets: not through formal mandates that take years to pass and immediately face legal challenge, but through voluntary commitments that become industry norms, followed eventually by codification once the practice is established. The U.S. approach contrasts sharply with the EU AI Act’s top-down legislative framework, which has faced repeated implementation delays and compliance challenges — most recently in its application to high-risk AI systems.
There are real limitations to the current program. CAISI’s assessment capacity is constrained by the same shortage of specialized AI safety evaluators that affects the private sector. The scope of the evaluations is narrow: passing a CAISI assessment for cybersecurity and biosecurity risk says nothing about a model’s broader harms. And the voluntary structure means a well-funded foreign competitor or a domestic actor who simply declines to engage faces no legal consequence — the program depends entirely on the reputational and commercial incentives that make participation rational for established labs.
The Broader Regulatory Context
The CAISI framework sits within a larger and still-incomplete picture of AI governance in the United States. Congress has held numerous hearings and produced multiple discussion drafts on AI regulation, but no comprehensive AI legislation has passed. The administration has used executive orders and agency guidance to fill the gap, but the legal durability of those mechanisms depends on future administrations continuing to support them.
The EU AI Act, which is in force but experiencing implementation delays, creates a contrasting regulatory environment that affects U.S. companies selling into European markets. As of 2026, the high-risk provisions of the EU AI Act — covering AI in hiring, education, credit scoring, and similar applications — have been delayed until 2027, giving companies additional compliance runway. But the foundational registration and transparency requirements are already taking effect, creating a dual compliance burden for global AI deployments.
The practical result for frontier AI developers is a patchwork: CAISI pre-deployment review for national security risks in the U.S., EU AI Act obligations for European market access, and a range of sector-specific requirements in finance, healthcare, and critical infrastructure that apply on top of both frameworks.
What Comes Next
The CAISI model offers a template for broader AI oversight without requiring landmark legislation. Whether Congress will move to codify the framework, expand its scope to include broader safety evaluations, or create a dedicated AI regulatory body with enforcement authority remains an open question. The current administration’s preference for executive action over legislation makes formal codification unlikely in the near term.
What is clear is that the days when a frontier AI model could move from training run to public deployment without any government awareness are over. The question for the next phase of AI governance is not whether pre-deployment review will persist, but how deep and how broad it will eventually become.
For now, the completion of five-lab coverage represents a meaningful governance milestone — achieved through pragmatic voluntary agreements rather than legislation, and quietly enough that most of the public attention this week is on what those labs are building, not how it is being evaluated before it reaches the market.