Open-Source Has Crossed the Frontier. The Question Now Is Whether That Changes Anything.
textak places the probability of open-source matching closed frontier performance at 75%, up from 72% three months ago. Today's benchmark data — DeepSeek V4-Pro-Max tying Gemini 3.1 Pro exactly at 80.6% on SWE-Bench Verified, with Qwen 3 and MiniMax M3 within 0.2 points — is the most direct evidence we've seen that the gap has effectively closed on the dimension we were forecasting. But the Anthropic news today also matters for a different reason: Claude Fable 5 just dropped at 80.3% on SWE-Bench Pro, representing the public release of what was previously described as a 'Mythos-class' step-change improvement. If that's the new frontier ceiling, open-source is still touching it. If it isn't — if Mythos 5 with full capabilities sits materially above Fable 5 — we have a harder question to answer.
Let's be precise about what the forecast actually requires. We defined parity as open-source matching closed frontier performance — not beating it, not matching it on every task, but reaching a credible equivalence on the benchmark dimensions that matter for real workflows. SWE-Bench Verified is the closest thing we have to a contamination-resistant, practitioner-relevant code benchmark. DeepSeek V4-Pro-Max tying the current Gemini flagship at 80.6% is direct evidence, not proximate. Three open-weight models now sit within rounding error of the closed frontier on this specific dimension. That is, by any honest reading, the thing we said would happen.
The counterargument we take most seriously is the tiered access story breaking simultaneously with the benchmark convergence story. Anthropic today launched a dual-tier release: Fable 5 for broad access, Mythos 5 for vetted organizations with 'full capabilities.' OpenAI is running parallel trusted-access programs for frontier cybersecurity use. What this suggests is that frontier labs have internalized the open-source parity threat and responded by retreating to a capability moat that benchmarks can't reach — literally, because the models being benchmarked are not the most capable versions being deployed. If the actual frontier is Mythos 5 rather than Fable 5, and Mythos 5 scores materially above 80.3% on the same benchmarks, then open-source at 80.6% hasn't reached frontier — it's reached the publicly benchmarkable frontier, which is a different thing.
This is the gap in our model that we're watching most carefully. The forecast was always vulnerable to frontier labs holding unreleased capabilities above the disclosed ceiling. We flagged Anthropic's leaked Mythos work as a key downward risk in our FOR/AGAINST framing, and today that risk materialized in a specific form: the tiered release strategy explicitly decouples public benchmarkable performance from actual frontier capability. DeepSeek's permanent 75% price cut on V4-Pro API rates is strong supporting evidence that open-source is winning the commoditization war. Whether it's winning the capability war depends on where you draw the ceiling.
We're holding at 75%. The benchmark evidence today is direct and substantial. The tiered access dynamic is a genuine complication but not a refutation — it's more accurate to say the definition of 'frontier' is itself fragmenting under competitive pressure than to say open-source hasn't reached it. What would move us above 80%: an independent technical evaluation confirming Mythos 5's full-capability scores are not materially above Fable 5 on practitioner benchmarks, or a Mythos-class model being released open-weight within six months. What would drop us below 65%: evidence that the full-capability gap between Mythos 5 and Fable 5 exceeds 10 points on SWE-Bench Pro — which would suggest the public release is a deliberately constrained product, not a true capability reflection.