Newsletter Subscribe
Enter your email address below and subscribe to our newsletter
Enter your email address below and subscribe to our newsletter

Anthropic’s most powerful publicly available AI model, Claude Fable 5, has reportedly had its safety guardrails bypassed by a well-known AI red-teamer just one day after its June 9 release, raising fresh questions about the durability of AI safety measures even as models grow more capable.
Prolific AI jailbreaker “Pliny the Liberator” posted on X on June 10 that he and a team of automated agents had successfully circumvented Fable 5’s safety classifiers using what he described as “a pack hunt” — a coordinated multi-agent attack combining several techniques. According to Cybersecurity News, the methods included Unicode and Cyrillic character substitution to evade keyword filters, long-context reference tracking, taxonomy and document-structure framing, fiction and narrative framing, and a decomposition-recomposition approach in which harmful information was extracted in innocuous fragments and reassembled.cybersecuritynews
The last technique proved most effective. “Getting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable” than requesting a named harmful compound directly, Pliny wrote, adding that a previously jailbroken Claude Opus 4.8 instance assisted in the backend. Screenshots shared by the researcher showed outputs including step-by-step stack buffer overflow exploitation guidance and chemical synthesis pathways. Pliny also published what he described as Fable 5’s approximately 120,000-character system prompt on GitHub, exposing the internal safety instructions Anthropic uses to govern the model.x
Anthropic had positioned Fable 5 as hardened against such attacks. A TechCrunch report from launch day noted the company said an external bug bounty “produced no universal jailbreaks in over 1,000 hours of testing,” and that external red-teaming organizations also failed to find universal bypasses. The company’s system card stated the public bug bounty received approximately 100,000 attempts as of June 5. As a precaution, Anthropic imposed a mandatory 30-day data-retention policy on all Fable 5 traffic to defend against novel attacks.b2bnn
The rapid breach follows a pattern established across major AI releases. Pliny has previously claimed jailbreaks of Claude Opus 4.8 within minutes of its launch in late May, Claude Opus 4.7 in April, and OpenAI’s GPT-OSS models on their release day last year. Anthropic has not yet publicly responded to the Fable 5 jailbreak claims or the leaked system prompt.x