Open-Source Has Closed the Gap — But 'Parity' Depends on Which Gap You're Measuring

TexTak holds open-source frontier parity at 69% — and today's evidence is the strongest single-day signal we've seen on this forecast. DeepSeek-V3.2 and Qwen 3.6 Plus now sit within single digits of closed frontier models on coding benchmarks, MiniMax M2.5 matches Claude Opus 4.6 on SWE-Bench Verified almost exactly (80.2% vs. 80.8%), and the AISI reports a proprietary lead of only 4-8 months. If you define parity as 'a general-purpose open-weight model that a developer can deploy today and get within rounding error of the closed frontier on measurable tasks,' you could argue that threshold has already crossed. But we're holding at 69% rather than declaring resolution, and the reason matters.

Thursday, May 14, 2026 at 3:17 AM

LinkedIn Bluesky

The evidence today is as direct as we get on this forecast. MiniMax M2.5 matching Claude Opus 4.6 within 0.6 percentage points on SWE-Bench Verified is not a 'conditions are forming' data point — that's a head-to-head comparison on a production-relevant coding benchmark where the open-weight model is functionally indistinguishable from the closed one. The AISI's 4-8 month lag estimate, combined with MoE architecture convergence across every major open-source release, suggests the structural dynamics driving the gap have largely played out. We weight this evidence heavily because benchmark parity on coding and reasoning — not abstract capability claims — is the domain where enterprises make deployment decisions.

But here's the counterargument we take seriously, and today's news actually sharpens it rather than dissolves it: Claude Mythos Preview scoring 94.6% on GPQA Diamond is a genuine step-change data point that sits outside what open-source can currently reach. Mythos is not shipping as a generally available product yet — it's a preview — but it represents exactly the kind of unreleased capability that our forecast already flagged as the primary bear case. If the frontier labs are maintaining a 'Mythos-class' tier that open-source tracks with a 4-8 month lag, and frontier labs are accelerating their release cadence, the lag could be self-renewing rather than self-closing. The 69% reflects that tension: open-source has achieved parity on the last generation of frontier benchmarks precisely as the frontier has moved.

The forecast-defining question is which dimension of parity we're measuring. Benchmark parity on coding? Probably already met. Developer preference in enterprise? Our read is yes for cost-sensitive mid-market applications. Commercial impact — meaning that a major enterprise customer switches from a closed API to an open-weight deployment and publicly reports comparable ROI? That's the dimension where we lack direct evidence. We're weighting the 69% primarily on benchmark and developer adoption signals, with a discount for the commercial impact evidence still being circumstantial. The MoE training cost data ($5.6M for DeepSeek V3) is the most structurally important number in today's news for this forecast — it means the cost barrier to building open-weight competitive models has effectively collapsed, which removes the capital moat that previously sustained the gap.

What would move us above 80%: a second open-source model independently achieving SWE-Bench or MMLU parity with whatever closed model is current at time of measurement, combined with a Fortune 500 public deployment announcement citing open-source as the chosen infrastructure. What would drop us below 55%: if Mythos launches publicly and opens a gap of more than 15 percentage points on GPQA Diamond that persists for two quarters — that would suggest the frontier has structurally re-separated rather than merely temporarily led.

Loading correlations...

SHARE THIS ANALYSIS

Share on LinkedIn Share on Bluesky

Open-Source Has Closed the Gap — But 'Parity' Depends on Which Gap You're Measuring

AI School Adoption Is Surging — But 'District-Wide' Is Doing a Lot of Work in Our 40% Forecast

The Gartner Layoff Data Is Real — But It's Not the Evidence You Think It Is

The Displacement Is Real. The Attribution Still Isn't. Here's Why We're Holding 70%.