Inaudible sounds hidden in podcasts can hijack AI voice assistants, researchers find

AudioHijack, developed by researchers from Zhejiang University and two Singapore universities, embeds undetectable commands in podcasts or videos that exploit AI voice systems with 79-96% success rates.futurism
Attacks designed against open-source models transferred to commercial voice systems from Microsoft ↗0.13% and Mistral AI, and existing defenses reduced success by only about 7%.welcome
The findings, presented at the IEEE Symposium on Security and Privacy, highlight growing risks as companies deploy voice AI agents capable of browsing, messaging, and accessing files.windowsforum

Hidden Audio Signals Can Hijack AI Voice Assistants, Researchers Demonstrate

A team of security researchers has shown that imperceptible audio signals hidden in podcasts, YouTube videos, and voice calls can manipulate AI voice assistants into performing unauthorized actions — a finding presented this week at the IEEE Symposium on Security and Privacy in San Francisco that exposes a new class of vulnerability in the rapidly expanding world of voice-based AI.

How AudioHijack Works

The technique, called AudioHijack, was developed by researchers from Zhejiang University, the National University of Singapore, and Nanyang Technological University. It embeds adversarial audio into ordinary-sounding media files — music, recordings, or video clips — that is undetectable to human ears but steers large audio-language models toward attacker-chosen behavior such as conducting web searches, downloading files, or sending emails containing user data.windowsforum

“It takes just half an hour to train this signal, and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says,” lead author Meng Chen, a PhD candidate at Zhejiang University, told IEEE Spectrum.futurism

The researchers tested AudioHijack against 13 open audio AI models, including Qwen2-Audio, GLM-4-Voice, Phi-4-Multimodal, and Kimi-Audio, achieving success rates between 79 and 96 percent. They also demonstrated that attacks designed for open-source models transferred effectively to commercial voice systems from Microsoft Azure and Mistral AI, since many commercial products are built atop open-source foundations.welcome

Limited Defenses

Perhaps most concerning, existing defenses proved largely ineffective. Prompt hardening and intent-verification techniques reduced attack success by only about 7 percent, according to the findings. The researchers noted that models struggle to distinguish between legitimate user intent and adversarial instructions embedded in audio.welcome

The vulnerability differs from earlier inaudible-command research that targeted simple speech recognition. AudioHijack exploits the deeper reasoning layers of large audio-language models — systems now capable of not just transcribing speech but taking actions such as browsing the web, accessing files, and sending messages on behalf of users.windowsforum

Implications for Everyday Users

The research arrives as companies race to deploy voice AI agents with increasing autonomy. Microsoft reportedly told IEEE Spectrum that real-world deployments typically include additional safeguards beyond the base model. But security experts warn that the gap between a voice assistant that summarizes a recording and one that can act on corporate systems represents a fundamental change in risk profile.windowsforum

“These single-point defenses struggle to resist our attack because we found it’s very hard for these models to distinguish the normal user intent and our adversary attack,” Chen said.futurism

The findings suggest that as voice AI systems gain the ability to execute actions — searching files, drafting emails, modifying calendars — the audio channel itself becomes an attack surface that current security architectures are not designed to defend.