ChatGPT generates graphic violence, sexual images from simple prompts

Mindgard, a British AI security firm, showed the BBC that ChatGPT’s GPT-5.4 model can be tricked into generating violent and sexualized images with a simple prompt.bbc
OpenAI said it added safeguards, but researchers told the BBC that minor prompt tweaks still bypass the new protections.bbc
Mindgard’s technique exploits ChatGPT’s memory and system prompt layers, requiring no backend access or special credentials, according to the firm.mindgard

ChatGPT’s Image Generator Produces Graphic Violence and Sexualized Content From Simple Prompts, Researchers Find

OpenAI’s ChatGPT can be manipulated into generating sexualized and graphically violent images using only minor modifications to a widely circulated prompt, according to findings by British AI security firm Mindgard reported by the BBC on Wednesday.koha

The discovery centers on OpenAI’s GPT-5.4 model, the latest public version of ChatGPT’s image generation capability. Mindgard researchers found that tweaking a prompt originally designed to produce humorous results caused the system to output disturbing content — without any explicit instructions specifying violent or sexual subject matter.bbc

“Very Gruesome, Sometimes Sexual”

Peter Garraghan, Mindgard’s founder, told the BBC that the AI “autonomously generated a variety of shocking and sexualized visuals” even though the prompt did not define the content of the images. He described the outputs as “very gruesome, sometimes sexual, and sometimes both”.koha

Among the images generated were depictions of a man with a head wound, a dead woman with a bloody body, and scenes combining sexual violence with nudity. Mindgard’s earlier disclosure, published in February, noted that the technique could also produce sexualized images of real people — raising concerns about non-consensual deepfakes.bbc

OpenAI Responds, but Researchers Say Fixes Are Incomplete

After the BBC approached OpenAI with the findings, the company said it had acted. “After investigating this phenomenon, we have put in place additional safeguards against this type of instruction,” OpenAI stated. The company added that it maintains multiple layers of defenses to prevent users from creating content that violates its policies.bbc

However, AI safety researchers told the BBC that with only minor variations, the problematic prompt continued to produce disturbing results even after OpenAI’s intervention.koha

A Pattern of Safety Concerns

Mindgard’s technical blog, published in February, detailed how the bypass worked: researchers manipulated ChatGPT’s custom memory and system prompt context to override its image safety guardrails, requiring no backend access or special credentials. The vulnerability was first discovered on January 1 and disclosed to OpenAI on January 28.mindgard

The findings arrive amid broader scrutiny of AI image generation safety. OpenAI has separately faced questions over its planned “Adult Mode” feature for ChatGPT, which the company delayed earlier this year after internal safety advisors warned it could put minors at risk. The BBC did not publish the specific prompts used in the research.mashable