Newsletter Subscribe
Enter your email address below and subscribe to our newsletter
Enter your email address below and subscribe to our newsletter

Anthropic acknowledged it “made the wrong tradeoff” with the safety restrictions on its newly released Claude Fable 5 model, reversing a controversial policy that secretly degraded the AI’s performance when it detected users working on frontier AI development. The apology, issued to WIRED on Tuesday, came just two days after the model’s June 9 launch sparked backlash from researchers, developers, and AI policy experts.
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in its statement. “We made the wrong tradeoff and we apologize for not getting the balance right.”simonwillison
The controversy centered on a disclosure buried in Fable 5’s 319-page system card revealing that the model would silently downgrade its responses when it detected requests related to cutting-edge AI development, such as building training infrastructure for large language models. Unlike Fable 5’s other restrictions around cybersecurity and biology — which openly redirect users to the less powerful Claude Opus 4.8 with a visible notification — the AI development safeguard operated invisibly, using techniques like prompt modification and steering vectors to limit effectiveness without informing users.fortune
Claude Fable 5 is Anthropic’s first publicly available “Mythos-class” model, sharing the same underlying architecture as the restricted Claude Mythos 5 but wrapped in safety classifiers that intercept queries touching cybersecurity, biology, chemistry, and model distillation. When triggered, responses are handled by Claude Opus 4.8 instead. Anthropic said the fallback fires on fewer than 5 percent of sessions.techcrunch
But cybersecurity researchers and biologists complained the classifiers were overly broad, flagging legitimate work. Anthropic itself acknowledged the biology and chemistry safeguard casts too wide a net and said narrowing is planned.lushbinary
Under the revised policy, flagged requests will now visibly fall back to Opus 4.8 across all restricted categories. On the API, flagged requests will return a reason for their refusal. “You will see this every time it happens,” an Anthropic spokesperson said.moneycontrol
The company framed the restrictions as necessary to prevent adversaries from using its most capable model to erode U.S. technological advantages in frontier chips and training software, and to enforce its terms of service prohibiting use of Claude to build competing AI systems. The episode has nonetheless intensified debate over where the line sits between responsible deployment and crippling a model’s utility — a tension Anthropic will likely face again as it prepares for a reported IPO.fortune