OpenAI says GPT-5.5 Instant now matches its top models on health questions

OpenAI announced Wednesday that GPT-5.5 Instant, the default model for free ChatGPT users, now matches its frontier reasoning models on health-related responses.x
A separate study published Thursday in NEJM AI found OpenAI’s o3 model helped Boston Children’s Hospital clinicians diagnose 18 children with previously unsolved rare diseases.iheart
OpenAI says physician evaluations show a 71% drop in flagged health responses over two months, though weaknesses remain in context-seeking and global health scenarios.openai

OpenAI Says GPT-5.5 Instant Now Rivals Its Best Models on Health Questions

OpenAI announced on Wednesday that GPT-5.5 Instant, the default model powering free ChatGPT accounts, has reached a level of health-response quality comparable to the company’s most advanced reasoning models. The update arrives as more than 230 million people use ChatGPT for health and wellness questions each week, according to the company.x

Fewer Flagged Responses, Higher Marks Than Physicians

In a blog post published June 18, OpenAI said it uses privacy-preserving monitors on billions of weekly health-related messages and that the rate of responses flagged for at least one factuality issue has dropped 71% over the past two months. A separate physician evaluation found that GPT-5.5 Instant responses scored higher than physician-written answers across accuracy, clarity, and completeness, with physicians rating the model as having fewer instances of missing red flags, failing to seek context, or neglecting to refer users to care.openai

The improvements build on OpenAI’s HealthBench framework, developed with more than 260 physicians across 60 countries and 26 specialties. To date, that physician network has reviewed more than 700,000 example model responses. GPT-5.5 Instant, which became ChatGPT’s default model in early May, had already shown gains on the HealthBench benchmark, scoring 51.4 compared with 49.6 for its predecessor, GPT-5.3 Instant.openai

AI-Assisted Rare Disease Diagnoses

The announcement coincided with a study published Thursday in NEJM AI describing how OpenAI’s o3 model helped clinicians at Boston Children’s Hospital diagnose 18 children with rare diseases that had gone unresolved despite years of genetic testing and specialist review. Researchers reanalyzed 376 de-identified pediatric cases, using o3 Deep Research to connect clinical features, inheritance patterns, and scientific literature into hypotheses that specialists then confirmed. The diagnosed conditions spanned neurodevelopmental disorders, neuromuscular diseases, sudden unexpected pediatric death, and early-onset psychosis.iheart

OpenAI said the hospital’s broader use of AI has led to more than 40 previously unsolved rare disease diagnoses, saved 60,000 hours in work time, and redeployed over $7 million in labor costs.iheart

A Widening Health Push

The health upgrades are part of a broader strategy that includes ChatGPT for Clinicians, a free tool launched in April for verified U.S. physicians, nurse practitioners, and pharmacists. On HealthBench Professional, a benchmark for clinical tasks, OpenAI’s GPT-5.4 model scored 59.0 compared with a physician baseline of 43.7. OpenAI cautioned that weak spots remain across all models, including context-seeking, global health scenarios, and adaptive communication with non-expert patients.openai