At the edge, our E2B and E4B models redefine device usability by prioritizing multimodal capabilities, low-latency processing, and seamless ecosystem integration over raw parameter counts.
Powerful, accessible, open
To support the next generation of pioneering research and products, we have selected Gemma 4 models specifically for performance and hardware tuning – from the billions of Android devices around the world, to laptop GPUs, to developer workstations and accelerators.
Using these highly optimized models, you can tune Gemma 4 to achieve cutting-edge performance for specific tasks. We have already seen incredible success with this approach; for example, the INSAIT project pioneered the Bulgarian First Language Model (BgGPT), and with Yale University we collaborated on Cell2 Sentence Scale discover, among others, modern paths of cancer therapy.
Here’s what makes Gemma 4 our most powerful open model family ever:
- Advanced Reasoning: Capable of multi-step planning and deep logic, Gemma 4 shows significant improvement in math tests and the execution of instructions that require it.
- Agent workflows: Native support for function calls, structured JSON output, and native system instructions enable the creation of autonomous agents that can interact with various tools and APIs and reliably execute workflows.
- Code generation: Gemma 4 supports high-quality offline code, turning your workstation into a local AI code assistant.
- Vision and sound: All models natively process video and images, support variable resolutions, and excel at visual tasks such as OCR and chart understanding. Additionally, the E2B and E4B models have a native audio input enabling speech recognition and understanding.
- Longer context: Process long-form content seamlessly. Edge models have a 128KB context window, while larger models offer up to 256KB, allowing you to upload repositories or long documents in a single line.
- Over 140 languages: Natively trained in over 140 languages, Gemma 4 helps developers create integrated, high-performance applications for audiences around the world.
Versatile models for a variety of equipment
We release Gemma 4 model weights in sizes tailored to your specific equipment and employ cases, ensuring you get top-notch reasoning wherever you need it:
Models 26B and 31B: border intelligence, offline on personal computers
Optimized to provide researchers and developers with cutting-edge reasoning on available hardware, our bfloat16 unquantized weights efficiently fit on a single 80GB NVIDIA H100 GPU. For on-premises configurations, quantized versions run natively on consumer GPUs to power IDEs, coding assistants, and agentic workflows. Our 26B Blend of Experts (MoE) focuses on latency, activating only 3.8 billion of all parameters during inference to ensure exceptionally quick tokens per second, while our 31B Dense Blend maximizes raw quality and provides a powerful foundation for tuning.
