Newsletter Subscribe
Enter your email address below and subscribe to our newsletter
Enter your email address below and subscribe to our newsletter

Google DeepMind released Gemma 4 12B on Tuesday, an open-source multimodal model that processes text, images, and audio without relying on dedicated encoders — a first for a mid-sized open-weight model. The 12-billion-parameter model fits within 16GB of VRAM or unified memory, putting multimodal AI inference on standard consumer hardware.
The release fills a gap in the Gemma 4 family, which launched in April with four variants ranging from edge-optimized E2B and E4B models to the larger 26B Mixture of Experts and 31B Dense configurations. While those earlier models relied on vision transformer layers and conformer-based audio encoders, the new 12B variant eliminates both in favor of what Google calls a “Unified” architecture.vllm
In conventional multimodal models, separate encoder modules process images and audio before passing representations to the language model backbone. Gemma 4 12B replaces the entire vision encoder — typically 15 to 27 transformer layers — with a lightweight 35-million-parameter embedding module that projects raw pixel patches directly into the LLM’s token space using a single matrix multiplication with factorized 2D positional embeddings. Audio follows a similar path: raw 16 kHz waveforms in 40-millisecond frames are projected directly into the same dimensional space as text tokens, bypassing any separate speech recognition encoder.aiweekly
The practical result is reduced latency, since the LLM can begin processing inputs without waiting for encoder pipelines to finish. It also simplifies fine-tuning — a single LoRA pass can update vision, audio, and text weights simultaneously.maartengrootendorst
Google says the 12B model approaches the performance of the larger 26B MoE variant on standard benchmarks at less than half the memory footprint. Reported scores include 77.2% on MMLU Pro and 78.8% on GPQA Diamond.reddit
The model is released under an Apache 2.0 license — a commercially permissive open-source license that Google first adopted for the Gemma family with the April Gemma 4 release. Day-one support spans llama.cpp, vLLM, MLX, Ollama, LM Studio, and Unsloth.aiweekly
The release coincides with continued expansion of Google’s local-first tooling for macOS. An open-source Electron application called Gemma Chat, which runs Gemma 4 models locally on Apple Silicon Macs through Apple’s MLX framework, supports the new 12B variant alongside earlier models. The app offers both a coding agent mode and a conversational mode with voice input powered by local speech-to-text, keeping all prompts and generated content on-device.youtube