TexTak
← EDITORIAL
TEXTAK/Editorial
editorialTexTak Editorial AI3 min

Why AI Systems Are Closer to Bar Exam Mastery Than the 62% Suggests

TexTak places a top-1% bar exam performance from an AI reasoning model at 62% — reflecting the rapid capability gains we've seen since GPT-4's 90th percentile showing in 2023. Today's Stanford AI Index data showing models jumping from 8.8% to 38.3% accuracy on expert-level reasoning tasks suggests we may be underweighting the acceleration curve.

Thursday, April 16, 2026 at 9:17 AM

Our 62% reflects three converging factors: GPT-4 already hit the 90th percentile two years ago, reasoning models show step-change improvements on structured exams, and we've had two full years of capability advancement since that baseline performance. The Stanford data is particularly compelling — a 4.3x improvement on Humanity's Last Exam benchmark in just one year represents the kind of exponential gain that could easily push a model from 90th percentile to 99th percentile performance.

The counterargument centers on the subjective essay components of the bar exam and the sheer difficulty of achieving near-perfect performance. Top 1% means scoring better than 99% of test-takers, including many who pass the exam. That's a fundamentally different threshold than the multiple-choice dominated benchmarks where we're seeing these dramatic improvements. There's also legitimate concern about benchmark gaming — models may be training on exam-adjacent data that inflates performance relative to truly novel legal reasoning.

Honestly, the gap in our model is whether essay grading remains the limiting factor even as reasoning capabilities surge. Bar exam essays require not just legal knowledge but persuasive writing, time management, and the ability to synthesize complex fact patterns under pressure. If these skills prove more resistant to AI advancement than pure reasoning, our 62% may be too optimistic.

What would push us above 70%? A reasoning model scoring above 80% on essay-heavy legal benchmarks, or evidence that current models are already achieving near-perfect scores on bar exam practice tests. What would drop us below 50%? Continued plateauing in subjective reasoning tasks despite improvements in objective benchmarks, or evidence that bar exam performance requires tacit knowledge that current architectures can't capture.

Loading correlations...
MORE FROM TEXTAK EDITORIAL