Anthropic’s Claude Fable 5 jailbroken days after launch, researcher claims

Researcher “Pliny the Liberator” said he defeated Claude Fable 5’s safety layers using coordinated multi-agent attacks combining Unicode tricks and decomposition tactics.cybersecuritynews
Anthropic had claimed an external bug bounty found no universal jailbreaks across over 1,000 hours of testing before the model’s June 9 launch.techcrunch
The researcher also published what he described as Fable 5’s full system prompt on GitHub; Anthropic has not publicly responded to the claims.cybersecuritynews

Anthropic’s Claude Fable 5 Jailbroken Within Days of Launch

Anthropic’s most powerful publicly available AI model, Claude Fable 5, has reportedly had its safety guardrails bypassed by a well-known AI red-teamer just one day after its June 9 release, raising fresh questions about the durability of AI safety measures even as models grow more capable.

The Bypass

Prolific AI jailbreaker “Pliny the Liberator” posted on X on June 10 that he and a team of automated agents had successfully circumvented Fable 5’s safety classifiers using what he described as “a pack hunt” — a coordinated multi-agent attack combining several techniques. According to Cybersecurity News, the methods included Unicode and Cyrillic character substitution to evade keyword filters, long-context reference tracking, taxonomy and document-structure framing, fiction and narrative framing, and a decomposition-recomposition approach in which harmful information was extracted in innocuous fragments and reassembled.cybersecuritynews

The last technique proved most effective. “Getting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable” than requesting a named harmful compound directly, Pliny wrote, adding that a previously jailbroken Claude Opus 4.8 instance assisted in the backend. Screenshots shared by the researcher showed outputs including step-by-step stack buffer overflow exploitation guidance and chemical synthesis pathways. Pliny also published what he described as Fable 5’s approximately 120,000-character system prompt on GitHub, exposing the internal safety instructions Anthropic uses to govern the model.x

Anthropic’s Pre-Launch Claims

Anthropic had positioned Fable 5 as hardened against such attacks. A TechCrunch report from launch day noted the company said an external bug bounty “produced no universal jailbreaks in over 1,000 hours of testing,” and that external red-teaming organizations also failed to find universal bypasses. The company’s system card stated the public bug bounty received approximately 100,000 attempts as of June 5. As a precaution, Anthropic imposed a mandatory 30-day data-retention policy on all Fable 5 traffic to defend against novel attacks.b2bnn

A Familiar Pattern

The rapid breach follows a pattern established across major AI releases. Pliny has previously claimed jailbreaks of Claude Opus 4.8 within minutes of its launch in late May, Claude Opus 4.7 in April, and OpenAI’s GPT-OSS models on their release day last year. Anthropic has not yet publicly responded to the Fable 5 jailbreak claims or the leaked system prompt.x