Why Open Source Already Won the AI Performance Race—But Frontier Labs Won't Admit It
TexTak's forecast that open-source models would match closed frontier performance now sits at 69%—but April's releases suggest we're being too conservative. GLM-5.1 tops global leaderboards at 58.4 on SWE-Bench Pro, beating GPT-5.4 and Claude Opus. Qwen 3.5 outperforms every closed model except the most expensive options. When MIT reports the US-China AI race is "almost neck and neck" with models separated by "razor-thin margins," the performance parity thesis isn't a prediction anymore—it's documented reality.
Our 69% probability reflects benchmark convergence data and Meta's massive open-source investment, but honestly, we're being too careful. When Z.ai's GLM-5.1 under MIT license beats both GPT-5.4 and Claude Opus on the same benchmark, that's not "approaching parity"—that's superiority. LangChain's data shows open models like GLM-5 and MiniMax M2.7 now match closed models "on core agent tasks" at fraction of the cost. The performance gap has closed.
The strongest counterargument comes from Anthropic's leaked Mythos capabilities, which discovered thousands of zero-day vulnerabilities across major systems—suggesting frontier labs have unreleased step-change improvements. This is compelling because vulnerability discovery requires reasoning depth that benchmarks don't fully capture. But here's the problem with that argument: if Anthropic has breakthrough capabilities, why are their released models still losing to open alternatives on public benchmarks? Either they're sandbagging commercially available models, or the private capabilities don't translate to general performance.
What we might be underweighting is the definitional problem. Frontier labs are "increasingly competing on capabilities that benchmarks don't capture"—reliability, safety, enterprise features. If frontier performance means "whatever closed labs do next," then parity becomes a moving target by design. But that's not intellectually honest. Performance means measurable capability on defined tasks, and on those metrics, open source has achieved parity.
If major cloud providers start defaulting to closed models despite cost disadvantages, or if benchmark performance diverges significantly in Q3 releases, we'd reconsider. But based on current evidence, this forecast has effectively resolved YES—we're just being too conservative to call it.