Enter your email address below and subscribe to our newsletter

Google rolls out real-time speech translator across 70+ languages

Share your love

  • Gemini 3.5 Live Translate from Google 1.48% began rolling out Tuesday across the Google Translate app, Google Meet, and the Gemini Live API.blog
  • Unlike turn-by-turn systems, the model translates speech continuously across 70+ languages, preserving intonation, pacing, and pitch.blog
  • Google also released DiffusionGemma, an experimental open model that generates text up to 4x faster by producing entire blocks of tokens in parallel.blog

Google Launches Gemini 3.5 Live Translate for Real-Time Speech Translation in 70+ Languages

Google on Tuesday released Gemini 3.5 Live Translate, an AI-powered audio model that translates speech in near real time across more than 70 languages, marking one of the company’s most ambitious deployments of on-device translation technology. The same day, Google DeepMind introduced DiffusionGemma, an experimental open model that generates text up to four times faster than standard autoregressive models by applying diffusion techniques to the Gemma 4 architecture.

Breaking the Language Barrier

Gemini 3.5 Live Translate departs from conventional turn-by-turn translation systems that wait for a speaker to finish before producing output. Instead, the model processes and translates speech continuously, staying just a few seconds behind the speaker while preserving intonation, pacing, and pitch.blog

The model is rolling out across three surfaces: globally on the Google Translate app for Android and iOS, in public preview for developers via the Gemini Live API and Google AI Studio, and in private preview for select Google Workspace business customers on Google Meet starting this month. The Meet integration expands speech translation from five previously supported languages to more than 70, enabling over 2,000 language combinations in a single meeting.9to5google

A new “listening mode” on Android allows users to hear translations through their phone’s earpiece without headphones — simply by holding the device to their ear like a regular call. All generated audio is watermarked with SynthID to help identify AI-generated content.indiatoday

DiffusionGemma: Text Generation Through Diffusion

Separately, Google DeepMind released DiffusionGemma, a 26-billion-parameter mixture-of-experts model that generates text the way image diffusion models create pictures — starting from noise and refining entire blocks of up to 256 tokens in parallel rather than predicting one word at a time.nvidia

Built on the Gemma 4 architecture and activating only 3.8 billion parameters during inference, DiffusionGemma achieves over 1,000 tokens per second on a single Nvidia H100 GPU and roughly 700 tokens per second on a consumer GeForce RTX 5090. The model weights are available on Hugging Face under the Apache 2.0 open-source license.blog

Google CEO Sundar Pichai highlighted DiffusionGemma on social media, calling it “a racehorse achieving up to 4x faster inference” that brings the company’s text diffusion research to the Gemma 4 family.x

Speed Versus Quality Trade-offs

Google cautioned that DiffusionGemma is experimental and trails standard Gemma 4 on output quality benchmarks, recommending it primarily for speed-critical local workflows such as inline editing, rapid iteration, and agentic loops rather than production deployments requiring maximum quality. Nvidia has optimized the model across its hardware lineup, from consumer GPUs to DGX Spark systems, with day-one support in vLLM, Hugging Face Transformers, and Unsloth.thenewstack

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay informed and not overwhelmed, subscribe now!