The FDA's AI Diagnostic Forecast Has a Definitional Problem — and Today's News Exposes It
TexTak holds [fda-ai-diagnostic] at 55%, forecasting the first 'fully AI-driven diagnostic tool' approved by the FDA. Today brought two pieces of FDA news — the January 2026 Good AI Practice framework for drug development and the QMSR alignment with ISO 13485 taking effect February 2, 2026 — plus a Harvard study showing OpenAI o1 outperforming two physicians on ER triage at 67% accuracy. Taken together, this evidence is genuinely mixed in ways that require us to be careful about what we're actually claiming.
Start with the definitional problem our forecast has always carried. IDx-DR has performed autonomous diabetic retinopathy screening without mandatory physician review since FDA clearance in 2018. If that counts, our forecast already resolved YES six years ago. Our thesis only survives if we define 'fully AI-driven' as something meaningfully beyond IDx-DR's narrow single-condition screening — something like autonomous diagnosis across multiple clinical presentations, without a human backstop, in an acute care setting. The Harvard o1 study is the first piece of direct evidence we've seen pointing toward that harder target: 67% accuracy on ER triage cases, outperforming two physicians in a multivariate, text-presented emergency diagnostic task. That's not a radiology screen. That's differential diagnosis under time pressure.
But the evidence classification here matters enormously. The Harvard study is proximate evidence, not direct evidence. It proves a capability exists in a research context. It does not prove the FDA is ready to clear that capability for physician-free deployment in emergency settings. The FDA's January 2026 Good AI Practice guidance — built on 800+ comments and 500+ drug development submissions — is institutional scaffolding. It establishes that the FDA is developing the vocabulary and process to evaluate AI in clinical contexts. That's meaningful. But 'developing the framework to evaluate' and 'ready to approve physician-free autonomy in acute care' are separated by a canyon that includes liability assignment, failure mode documentation, and clinical validation requirements that the o1 study explicitly cannot satisfy. The researchers themselves note the AI used text alone while clinicians had images and nonverbal cues — that gap is not a footnote, it's the core regulatory hurdle.
The QMSR ISO 13485 alignment is the news we're least certain how to weight. On one hand, harmonizing with international standards speeds up the quality management infrastructure that AI device manufacturers need to navigate clearance. On the other hand, 1,350 cleared AI-enabled devices — double 2022's figure — tells us about administrative throughput, not philosophical readiness to transfer liability away from clinicians. Most of those 1,350 devices are in radiology, where pattern recognition is well-defined. The o1 study points toward the genuinely hard domain: acute, multimodal, high-stakes differential diagnosis. Volume in the former doesn't translate to regulatory appetite for the latter.
Our 55% reflects a real tension we haven't resolved: the technical capability case is strengthening faster than we expected, but the liability and professional-body-resistance case hasn't weakened. The AMA's position on physician oversight hasn't shifted. The 'Good AI Practice' framework is for drug development, not diagnostic autonomy in clinical settings. What would move us to 65%: FDA issuing specific guidance on autonomous diagnostic AI that addresses liability assignment without mandatory physician review. What would drop us to 40%: AMA formally lobbying against pending AI diagnostic applications and finding bipartisan congressional support for a human-in-the-loop requirement. Right now, the capability signal is ahead of the institutional signal, and our 55% is trying to hold that gap honestly.