Skip to content
FAQ

Meta's Chief AI Officer Claims 'Watermelon' Model Has Caught Up to GPT-5.5

Alexandr Wang, Meta's Superintelligence Chief, told employees in an internal briefing that the company's next model, codenamed Watermelon, matches OpenAI's GPT-5.5 on key benchmarks. The unverified claim arrives as Meta pours unprecedented capital into AI infrastructure and seeks to close the gap with OpenAI and Anthropic on frontier model performance.

4 min read

Meta’s superintelligence chief Alexandr Wang told employees during an internal town hall meeting on July 3, 2026 that the company’s next model — internally codenamed Watermelon — has matched OpenAI’s GPT-5.5 on key benchmarks. The claim, first reported by Business Insider, has not been independently verified and the specific benchmarks referenced remain undisclosed. But the signal it sends about the trajectory of Meta’s AI ambitions is significant regardless of the precise numbers behind it.

Wang framed the announcement as evidence that Meta’s aggressive infrastructure investment is translating into frontier model performance — a crucial message for a company that has spent the better part of the past 18 months being perceived as a fast follower rather than a frontier setter in the large language model race.

What We Know About Watermelon

Watermelon is Meta’s next large language model after Avocado — previously known internally as Muse Spark — which was released in April 2026. The codename continues Meta’s recent tradition of naming internal model projects after large foods, a naming convention that has done little to obscure the strategic seriousness of what’s inside.

Wang described Watermelon as currently in training and requiring “significantly more computing power than Avocado.” That framing is consistent with what’s publicly known about Meta’s infrastructure buildout: the company has been commissioning enormous GPU clusters over the past year and has signaled plans to spend between $60 and $70 billion on AI infrastructure in 2026 alone. Watermelon appears to be the intended beneficiary of that investment.

The reported benchmark numbers, while unverified, are specific enough to be notable. On MMLU (Massive Multitask Language Understanding), Watermelon scored 92.4% — identical to GPT-5.5’s reported figure. On HumanEval, the standard for Python coding proficiency, Watermelon solved 96.3% of problems compared to GPT-5.5’s 96.1%. GSM8K, a math reasoning benchmark, showed a similar dead heat at 94.7% versus 94.5%. These are cherry-picked scores from a model still in training, and the lack of independent verification makes their reliability unclear — but the specificity suggests Wang was working from actual internal evaluation data, not aspirational projections.

Caveats Worth Taking Seriously

The announcement comes with several important qualifications that should temper how the news is interpreted.

First, Watermelon is not released. It is still in training. Benchmark scores from a model in active training can shift — sometimes significantly — as training continues and the model’s final configuration is determined. Wang’s internal briefing is effectively a progress report, not a launch announcement.

Second, the benchmarks named — MMLU, HumanEval, GSM8K — are among the most widely used in the industry, but they are also the most saturated. The leading models from OpenAI, Anthropic, and Google have all achieved very high scores on these evaluations, to the point where differences between top-tier models on these specific tests are often within the margin of noise. Parity on MMLU and GSM8K means something; it does not mean the same thing it would have meant two years ago.

Third, Meta has a documented pattern of making aggressive internal claims that don’t fully materialize in public evaluations. Wang’s public deployment has been rapid — he was brought in from Scale AI in 2024 to lead Meta’s superintelligence efforts — and his communication style skews toward confident assertions. That doesn’t make his claims wrong, but it is worth holding the numbers loosely until Watermelon is publicly released and independently evaluated.

The Bigger Picture: Meta’s Catch-Up Play

The context in which this claim lands matters as much as the claim itself. Meta has been in an uncomfortable position in the frontier model race for the past year. Its LLaMA series of open-weight models has been enormously influential in the open-source community, but on commercial benchmarks and in head-to-head evaluations of the kind that enterprise customers care about, Meta’s best models have consistently trailed OpenAI’s GPT and Anthropic’s Claude by a meaningful margin.

That gap has real business consequences. Meta’s AI features across WhatsApp, Instagram, Facebook, and its hardware products compete directly with products powered by competitors’ models. If Watermelon genuinely closes the performance gap with GPT-5.5, it would represent the first time Meta’s frontier model has been competitive with — rather than simply a credible alternative to — OpenAI’s leading commercial offering.

Notably, the claim comes just a week after Meta CEO Mark Zuckerberg publicly acknowledged at a closed briefing that the trajectory of agentic AI development over the past four months had “not really accelerated” as expected. That admission was widely read as a softening of Zuckerberg’s earlier “superintelligence by end of year” positioning from January 2026. Wang’s Watermelon claim can be read, in part, as a corrective to that narrative — an assertion that Meta is still very much in the frontier race even as timelines on agentic milestones have slipped.

What Comes Next

The practical question is when Watermelon will be released and how it performs on third-party evaluations. Meta has consistently chosen to release its frontier models in some form to the public — partly as a strategic commitment to open development, partly as a distribution advantage that seeds its models into developer ecosystems worldwide. If Watermelon is as good as Wang claims, Meta would likely release at least an open-weight version, with full commercial deployment across its consumer products.

For the broader AI industry, a Meta model that genuinely matches GPT-5.5 performance would be a significant development. It would signal that the resources required to train frontier models — while enormous — are no longer exclusively within reach of the two or three organizations that have been setting the pace. Competition at the frontier is good for users, tends to drive down prices, and typically accelerates the research that benefits everyone.

Whether Wang’s benchmark claims survive contact with independent evaluation remains to be seen. But the direction of travel they describe is consistent with the infrastructure Meta has been building, and consistent with what a $60 billion annual AI infrastructure spend, sustained over two years, should eventually produce.


The Watermelon benchmark claims were first reported by Business Insider on July 3, 2026, based on an internal Meta briefing.

Meta Watermelon GPT-5.5 Alexandr Wang frontier models AI race benchmarks Mark Zuckerberg superintelligence
Share

Related Stories

Zuckerberg Tells Meta Staff: AI Agents Aren't Progressing as Fast as We Expected

In a rare public admission of a strategic gap, Meta CEO Mark Zuckerberg told employees at an internal town hall on July 2 that AI agent development has not accelerated the way the company anticipated—despite Meta spending up to $145 billion on AI infrastructure this year and laying off thousands of workers to refocus on the technology. Zuckerberg said he expects the picture to improve within three to six months.

5 min read