Policy

The U.S. Government Is Using ChatGPT to Hunt $200 Billion in Medicaid Fraud

The Department of Health and Human Services has deployed ChatGPT and other AI tools across all 50 states to analyze years of Medicaid audit data, targeting an estimated $100–200 billion in annual waste and fraud. The AERO program raises profound questions about AI accuracy, due process, and whether Washington is weaponizing AI against political opponents.

1h ago 5 min read

In a government building in Washington, D.C., federal health officials have turned a consumer AI chatbot into one of the most consequential audit tools in American public policy history. The Department of Health and Human Services is using ChatGPT—the same product millions of Americans use to draft emails and summarize documents—to scan more than five years of Medicaid audit data from all fifty states, searching for what officials estimate could be between $100 billion and $200 billion in annual waste, fraud, and improper payments.

The program, formally called the Audit Enforcement and Risk Oversight initiative, was announced on May 21, 2026, by Gustav Chiarello, the HHS Assistant Secretary for Financial Resources. It represents one of the most sweeping deployments of commercial AI in the history of the U.S. federal government—and one of the most controversial.

What AERO Does

The scope of AERO is deliberately broad. The initiative targets every organization receiving more than $1 million annually in federal funding—a threshold that captures not only state Medicaid programs but also research institutions, addiction treatment providers, childcare networks, and hundreds of other federally subsidized programs that collectively channel trillions of dollars through the American healthcare and social services systems.

AI tools, including off-the-shelf ChatGPT plus supplementary large language models, ingest “Single Audit” submissions—standardized annual financial reports that states and grant recipients are required to file under federal law. The system looks for persistent patterns of noncompliance: chronic repeat deficiencies, material weaknesses, internal control failures, and delinquent filing obligations that grantees have “consistently failed” to address despite prior federal findings.

When the system flags anomalies, human analysts review the outputs before enforcement action is taken. At least in theory.

The Scale of the Problem Being Targeted

To understand why AERO exists, it helps to understand the sheer scale of American healthcare spending. Medicaid alone distributes roughly $900 billion annually across the fifty states, covering more than 80 million low-income Americans. Federal oversight of that spending has historically relied on a patchwork of state-level auditors, annual reports, and periodic federal reviews—a system designed for a pre-digital era that has struggled to keep pace with the complexity of modern healthcare billing.

Independent estimates of waste, fraud, and improper payments in Medicaid range from roughly 10% to 22% of total expenditures annually—hence the $100 to $200 billion figure HHS officials have cited. Even capturing a fraction of that amount would represent one of the largest fiscal recoveries in government history.

The political logic is equally clear. The Trump administration, which has made government efficiency a central rhetorical priority, sees AI-powered fraud detection as both a substantive policy tool and a high-visibility demonstration that technology can accomplish what armies of traditional auditors could not.

The Controversy: Accuracy, Accountability, and Politics

For all the potential scale of its impact, AERO has attracted pointed criticism from healthcare policy researchers, state officials, and civil liberties advocates—criticism that falls into three distinct categories.

Accuracy. The use of a general-purpose commercial language model to analyze complex government financial documents is unprecedented at this scale. Critics note that ChatGPT and similar models can hallucinate—producing confident-sounding outputs that are factually incorrect—and that the ambiguities inherent in healthcare billing codes, state-specific accounting conventions, and multi-year audit trails create exactly the conditions in which AI errors are most likely and hardest to detect. “These are not simple documents,” one federal healthcare analyst told The Boston Globe. “The idea that an off-the-shelf chatbot can reliably flag fraud in a 400-page Single Audit from a state Medicaid agency with confidence is a claim that should require extraordinary evidence.”

Due process. HHS has enforcement authority that is genuinely severe: it can withhold payments, claw back previously distributed funds, and in extreme cases terminate federal awards entirely. Critics argue that if AI flags a state’s audit incorrectly—and the burden of proving the error falls on the state—the process effectively inverts the presumption of compliance that has historically governed federal-state funding relationships.

Politics. Perhaps the most pointed criticism concerns the geographic distribution of AERO’s most aggressive enforcement actions to date. The Trump administration withheld hundreds of millions of dollars from Minnesota and more than $1 billion from California in Medicaid funding in the months surrounding AERO’s announcement—both Democratic-led states with which the administration has had high-profile public conflicts. Critics contend the timing is not coincidental. Administration officials reject the characterization, pointing to objective noncompliance findings in the audit record.

What HHS Says

Chiarello has been direct about the program’s intent and its limitations. The AERO initiative, he has said, is designed to identify chronic noncompliance that has slipped through traditional audit channels—not to replace human judgment, but to scale it. Grantees that engage constructively when flagged will find HHS a willing partner in remediation, he has stated. Those that do not will face consequences.

On the question of AI accuracy, HHS officials emphasize that AERO outputs are reviewed by human analysts before enforcement decisions are made. The system surfaces candidates for scrutiny; people make the calls.

The program’s launch also follows Vice President JD Vance’s prominent public statements calling for AI-driven modernization of federal oversight, and CMS Administrator Mehmet Oz’s direct warnings to state Medicaid directors about the consequences of noncompliance—signals that suggest AERO enjoys high-level political backing within the executive branch.

A Template for AI in Government

Whatever one thinks of AERO’s merits, the program represents a significant inflection point in how the U.S. federal government uses commercial AI tools. Previous government AI deployments have typically involved custom-built systems, lengthy procurement cycles, and narrow, well-defined use cases. AERO is different: it deploys a consumer product, at the direction of a politically appointed official, across the full complexity of American public health finance, on a timeline measured in weeks rather than years.

That speed is simultaneously AERO’s selling point—traditional government procurement would have taken years to accomplish what AERO achieved in months—and its greatest risk. AI systems deployed at scale in high-stakes contexts without extensive domain-specific validation produce systematic errors, and systematic errors in Medicaid enforcement affect some of the most economically vulnerable Americans.

Whether AERO becomes a model for AI-powered government efficiency or a cautionary tale about the unintended consequences of moving fast in complex systems may well depend on questions that won’t be answerable for years: How often is the AI wrong? What happens to states and programs that fight incorrect flags? And who is accountable when the algorithm makes a mistake that costs a clinic its federal funding?

The government has the tool. The answers are still being written.

Sources

healthcare medicaid ChatGPT government-AI HHS fraud-detection policy