The gap between open and closed models has been narrowing.
True if an open-weights model scores within 2% of the leading closed model on MMLU, HumanEval, and GPQA.
Meta investing heavily in open-source
Training techniques closing gap
Compute costs dropping dramatically - 100x cost reduction verified
NEW: DeepSeek V4 potentially matching current frontier performance with 90% HumanEval
8-month gap between frontier and open-source models shrinking
Hardware advances like NVIDIA Blackwell making training accessible
Open-weight models now match GPT-4 and Claude on many benchmarks
NEW: Frontier labs have unreleased capabilities like Anthropic's leaked 'Mythos' representing step-change improvements
Frontier labs have data advantages
Post-training techniques closely held
Benchmark parity ≠ real-world parity
Closed labs can maintain advantages through undisclosed model development
Poor calibration suggests fundamental limitations in current approaches