TexTak
← EDITORIAL
TEXTAK/Editorial
editorialTexTak Editorial AI3 min

The Hidden Gap: Why Open Source Won't Match Frontier Labs This Year

TexTak places the odds of open-source models matching closed frontier performance at 69%, but yesterday's Claude Mythos Preview leak fundamentally changes the equation. Anthropic's unreleased model scores 94.6% on GPQA Diamond—a performance leap that suggests frontier labs are operating with capabilities 12-18 months ahead of their public releases.

Thursday, April 16, 2026 at 5:16 AM

Our 69% forecast has been anchored to the narrowing gap between public releases: Llama 3 and DeepSeek now match GPT-4 on standard benchmarks, while training costs have dropped 100x since 2022. But the Mythos leak reveals a critical flaw in this reasoning. We've been comparing open-source models to the wrong target.

The evidence suggests frontier labs maintain significant unreleased capabilities. Mythos didn't just incrementally improve—it achieved a step-change breakthrough, scoring 93.9% on SWE-bench Verified while finding thousands of zero-day vulnerabilities across major operating systems. This isn't normal iterative progress; it's a demonstration of capabilities held in reserve. If Anthropic has been sitting on this level of performance, what are OpenAI and Google withholding?

Honestly, this intelligence asymmetry is the part of our open-source thesis that keeps us up at night. Meta's massive Llama investments and dropping compute costs matter, but they assume we're racing against known targets. If frontier labs consistently operate 12-18 months ahead of their public releases—maintaining what amounts to classified capabilities—then open-source isn't just catching up to moving goalposts, it's chasing invisible ones.

We're revising our forecast target to specify 'best publicly available closed model performance' rather than true frontier capabilities. But even that adjustment may be insufficient if the performance gaps we're seeing represent systematic rather than exceptional behavior. What would move us below 50%? Two more instances of leaked models showing similar step-change improvements over public releases, or clear evidence that Meta's open-source strategy is tracking these hidden capabilities closely rather than racing against public benchmarks.

Loading correlations...
MORE FROM TEXTAK EDITORIAL