AI & ML

Google's Gemini 3.5 Pro Slips to July, Taking Its 2M-Token Context Window With It

Google missed its June deadline for Gemini 3.5 Pro, the model it announced at I/O with a 2-million-token context window — the largest of any production frontier model. The delay comes amid a wave of senior researcher departures to Anthropic and competitive pressure from Claude Sonnet 5 and GPT-5.5, both already available. Google is targeting a July release, but no specific date has been announced.

1h ago 5 min read

When Google took the stage at I/O on May 19, it made a specific promise: Gemini 3.5 Pro — a model with a 2-million-token context window and a Deep Think reasoning mode — would reach general availability in June. On June 30, with the deadline passed and the model still in limited enterprise preview on Vertex AI, Google quietly extended that target to July. No new date has been announced.

This is the second significant AI delivery miss for Google in 2026. Gemini Ultra 1.5 slipped by roughly three months earlier in the year. And the June miss for 3.5 Pro arrived in the same week that four senior Gemini researchers announced their departures to Anthropic — a coincidence of timing that concentrated a week’s worth of negative signals into a single news cycle.

What Gemini 3.5 Pro Actually Promises

The model’s primary competitive distinction is its context window: 2 million tokens, double what Claude Opus 4.8 and GPT-5.5 currently offer, and enough to ingest roughly five to eight full-length novels worth of text in a single prompt. For enterprise use cases that depend on processing long documents — legal contracts, codebases, research corpora, financial reports — that advantage is not theoretical. It changes what’s architecturally possible without chunking, retrieval augmentation, or summarization pipelines.

The second headline feature is Deep Think, a reasoning mode that gives the model extended time to work through problems before responding. Deep Think is positioned as Gemini 3.5 Pro’s answer to the chain-of-thought reasoning capabilities that have become a key differentiator for frontier models. Critically, Google has indicated that Deep Think will be exclusive to the Ultra subscription tier — a $250/month product — not the standard Pro plan. For most enterprise API customers, Deep Think access will require either the Ultra subscription or a separate Vertex AI arrangement.

Pricing, when the model launches, is expected to be approximately $15 per million input tokens and $60 per million output tokens — roughly ten times the cost of Gemini 3.5 Flash, and meaningfully more expensive than Claude Opus 4.8 at $5/$25 and GPT-5.5 at $5/$30 for equivalent tiers. Google is pricing to the context window: the 2M-token capability commands a premium, and users who don’t need that window are better served by Flash or competitors.

The June Miss and What Caused It

According to sources familiar with the development timeline, Google used the additional time to refine Gemini 3.5 Pro’s performance on complex tasks, incorporate feedback from early enterprise testers, and address token efficiency issues identified during limited preview. The decision to delay was apparently made after early Vertex AI customers flagged inconsistencies in long-context handling — precisely the capability that differentiates the model.

The irony is pointed. Gemini 3.5 Pro’s key selling point is its ability to reliably process and reason over massive context windows. Delaying to fix problems in that exact domain suggests that the 2M-token window is not yet production-stable across the full range of enterprise use cases Google wants to support.

Polymarket, which had tracked the probability of a June release, closed at 97% probability of no release at the end of June — a market signal that informed observers had priced in the delay well before the official acknowledgment.

The Researcher Exodus

The departure of four senior Gemini researchers to Anthropic during the week of June 21–27 added a layer of narrative weight to the launch delay. These departures came on the heels of several high-profile defections earlier in the year: Noam Shazeer to OpenAI in mid-June, John Jumper and two colleagues to Anthropic just days later.

The cumulative picture — six senior DeepMind researchers departing in five months, concentrated talent loss in the weeks surrounding a missed product deadline — raised questions about the structural health of Google’s AI model development organization. Sergey Brin’s internal memo in late June, in which he urged teams to “urgently bridge the gap in agentic execution,” circulated widely in the AI community as evidence that Google leadership recognizes the competitive pressure from the inside.

None of this means Google is in trouble in any terminal sense. The company has the largest installed base of any AI product in the world via Google Search, the deepest infrastructure footprint, and is spending aggressively on compute. But the pattern of delays and talent departures in the first half of 2026 has complicated Google’s narrative as a consistent frontier AI leader.

Competitive Context

In the six weeks since Google’s I/O announcement, the competitive landscape has shifted materially. Anthropic launched Claude Sonnet 5 as a cheaper, nearly-Opus-grade agent, changing the price-performance calculus for developer deployments. OpenAI’s GPT-5.6 was previewed in late June, though access remains government-gated. Gemini 3.5 Flash — the lighter, already-available member of the family — has become a go-to choice for cost-sensitive workloads.

Gemini 3.5 Pro needs to arrive soon to matter. The 2-million-token context window is a real differentiator, but the longer the delay, the more enterprise customers build around alternative architectures. Context compression techniques, retrieval-augmented generation, and chunking pipelines can approximate some of the benefits of extreme long context at lower cost per token — and the longer users spend building these workarounds, the less urgency they feel to switch when Gemini 3.5 Pro finally ships.

Google’s July target is unspecific by design — it preserves flexibility to ship when the model is ready rather than committing to another hard deadline that might be missed. The approach is pragmatically sensible but commercially costly: every week in preview is a week competitors are capturing deployment decisions.

What to Watch

The specific date of general availability for Gemini 3.5 Pro is the immediate catalyst. A release in the first week of July, accompanied by strong benchmark numbers and API stability, would allow Google to reclaim momentum. A further delay into late July, or another quality-driven slip, would likely intensify both the talent and competitive narratives.

Beyond the date, watch for whether Google separates the context window and Deep Think access in API pricing — making 2M-token context available at a lower price point to compete with standard-context alternatives. The current expected pricing structure favors competitors for the majority of use cases, and Google will need a compelling value proposition at the midrange to justify the deployment decision for most enterprise customers.

Gemini 3.5 Pro has a genuine technological advantage waiting behind the launch gate. The question is whether it can get out fast enough to use it.

Sources

google gemini ai-models llm deepmind context-window