Just hours after OpenAI updated its flagship GPT-5 base model to GPT-5.1, promising overall reduced token usage and a nicer personality with more out-of-the-box options, the Chinese search giant Baidu presented a new generation foundation model, ERNIE 5.0, along with a suite of AI product enhancements and strategic international expansion.
The goal: to position ourselves as a global competitor in the increasingly competitive enterprise AI market.
Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, natively multimodal model designed to collaboratively process and generate content in the form of text, images, audio and video.
Unlike Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinking software, which is open source under the enterprise-friendly and liberal Apache 2.0 license, ERNIE 5.0 is a proprietary model and is only available through ERNIE Baidu robot website (I had to select it manually from the model selector drop-down list) and file Application programming interface (API) of the Qianfan cloud platform for enterprise customers.
Along with the model’s launch, Baidu rolled out major updates to its digital human platform, no-code tools, and general-purpose AI agents – all aimed at expanding AI’s reach beyond China.
The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, along with a general preview model that balances different modalities.
Baidu emphasized that ERNIE 5.0 represents a shift in how intelligence is deployed at scale, with CEO Robin Li stating, “When you internalize AI, it becomes a native capability and transforms intelligence from a cost to a source of productivity.”
Where ERNIE 5.0 outshines GPT-5 and Gemini 2.5 Pro
ERNIE 5.0 benchmark results suggest Baidu has achieved comparability – or near parity – with leading Western entry-level models across a broad spectrum of tasks.
In public benchmark slides released during the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro in multimodal reasoning, document understanding, and image-based quality controlsimultaneously demonstrating robust language modeling and code execution skills.
The company emphasized its ability to handle common inputs and outputs across modalities, rather than relying on post hoc modality fusion, which it described as a technological differentiator.
In visual tasks, ERNIE 5.0 achieved leading results in OCRBench, DocVQA and ChartQA – three tests that test the recognition, understanding and reasoning of structured documents.
Baidu says the model outperformed both GPT-5-High and Gemini 2.5 Pro in document- and graph-based benchmarks, areas it describes as critical for enterprise applications such as automated document processing and financial analysis.
According to Baidu’s internal GenEval-based evaluation, ERNIE 5.0 matched or outperformed Google’s Veo3 in categories including semantic matching and image quality. Baidu said the model’s multimodal integration allows it to generate and interpret visual content with greater context awareness than models that rely on modality-specific encoders.
For audio and speech tasks, ERNIE 5.0 demonstrated competitive performance on the MM-AU and TUT2017 audio comprehension tests, and also answered questions using spoken language. Its audio quality, although not as strongly emphasized as image or text, suggests broad capabilities intended to support full-spectrum multimodal applications.
In language tasks, the model showed good results in following instructions, answering fact-based questions, and mathematical reasoning – key areas that define the utility of vast language models for enterprises.
The Preview 1022 ERNIE 5.0 variant, tuned for text performance, showed even better language-specific results in Early Access for developers. While Baidu doesn’t claim a significant advantage in overall language reasoning, its internal evaluations suggest that the ERNIE 5.0 Preview 1022 fills the gap with top-end English-language models and outperforms them in Chinese language performance.
While Baidu has not publicly released full benchmark details or raw results, its performance positioning suggests a deliberate attempt to present ERNIE 5.0 not as a niche multimodal system, but as a flagship model competing with the largest models locked in general-purpose reasoning.
Baidu says its clear advantage is understanding structured documents, reasoning about visual graphs, and integrating multiple modalities into a single, native modeling architecture. Independent verification of these results remains pending, but the range of claimed capabilities positions ERNIE 5.0 as a sedate alternative in the multimodal foundation modeling landscape.
Pricing strategy for enterprises
ERNIE 5.0 is located on end of premium Baidu model pricing structure. The company published detailed prices for API employ on its Qianfan platform, bringing costs in line with other high-end offerings from Chinese competitors such as Alibaba.
|
Model |
Entry cost (per 1 thousand tokens) |
Output cost (per 1 thousand tokens) |
Source |
|
ERNIE 5.0 |
USD 0.00085 (0.006 yen) |
USD 0.0034 (0.024 yen) |
|
|
ERNIE 4.5 Turbo (e.g.) |
0.00011 USD (0.0008 yen) |
USD 0.00045 (0.0032 yen) |
|
|
Qwen3 (e.g. programmer) |
USD 0.00085 (0.006 yen) |
USD 0.0034 (0.024 yen) |
The cost contrast between ERNIE 5.0 and earlier models such as ERNIE 4.5 Turbo highlights Baidu’s strategy to distinguish between high-volume, low-cost models and high-performance models designed for intricate tasks and multimodal reasoning.
Compared to other American alternatives, it remains in the mid-price range:
|
Model |
Input (/1M tokens) |
Output (/1M tokens) |
Source |
|
GPT-5.1 |
$1.25 |
$10.00 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
|
|
ERNIE 4.5 Turbo (e.g.) |
$0.11 |
$0.45 |
|
|
Close job 4.1 |
$15.00 |
$75.00 |
|
|
Gemini 2.5 Pro |
USD 1.25 (≤200k) / USD 2.50 (>200k) |
USD 10.00 (≤200k) / USD 15.00 (>200k) |
|
|
Grok 4 (grok-4-0709) |
$3.00 |
$15.00 |
Global expansion: products and platforms
In parallel with the release of the model, Baidu is expanding internationally:
-
GenFlow 3.0currently used by over 20 million users, it is the company’s largest general-purpose artificial intelligence agent, offering enhanced memory and support for multi-modal tasks.
-
Renowneda self-evolving agent capable of dynamically solving intricate problems, is now commercially available via invitation.
-
Fearthe international version of Baidu’s no-code software development tool, Miaoda, is available worldwide via medo.dev.
-
Orateproductive workspace with support for documents, slides, images, videos and podcasts, reached over 1.2 million users around the world.
Baidu’s digital platform, already deployed in Brazil, is also part of the global effort. According to the company, 83% of live-streamers at this year’s “Double 11” shopping event in China used Baidu’s digital human technology, contributing to a 91% boost in GMV.
Meanwhile, Baidu’s Apollo Go autonomous passenger transportation service has surpassed 17 million trips, operating autonomous fleets in 22 cities and earning the title of the world’s largest robotxi network.
An open-source vision language model is attracting industry attention
Two days before the ERNIE 5.0 flagship event, Baidu also released an open source multimodal model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking.
As my colleague Michael Nuñez at VentureBeat reported, the model activates just 3 billion parameters while retaining a total of 28 billion, using a Mixture-of-Experts (MoE) architecture for proficient inference.
Key technical innovations include:
-
“Thinking in pictures” that enables energetic visual analysis based on magnification
-
Supports graph interpretation, document understanding, visual grounding, and temporal awareness in video
-
Runtime on a single 80GB GPU, making it accessible to mid-sized organizations
-
Full compatibility with Baidu’s Transformers, vLLM and FastDeploy toolkits
This release increases the pressure on closed-source competitors. With the Apache 2.0 license, ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable entry-level model for commercial applications without license restrictions – something that few high-performance models in this class offer.
Community feedback and Baidu response
Following the launch of ERNIE 5.0, AI developer and evaluator Lisan al Gaib (@scaling01) posted a mixed review on site X. While they were initially impressed with the model’s performance in benchmarks, they reported a persistent issue with ERNIE 5.0 repeatedly calling tools – even when explicitly instructed not to do so – during SVG generation tasks.
“The ERNIE 5.0 benchmarks looked insane until I tested them…unfortunately RL’s brain was damaged or there was a major problem with the chat platform/system notification,” Lisan wrote.
Within hours, Baidu developer support account, @ErnieforDevs replied: :
“Thank you for your feedback! This is a known bug – certain syntax can consistently cause this bug. We are working on a fix. You can try rephrasing or changing the prompt to avoid it for now.”
The quick turnaround reflects Baidu’s growing emphasis on communicating with developers, especially as it attracts international users through both proprietary and open-source offerings.
Prospects for Baidu and its core LLM ERNIE family
Baidu’s ERNIE 5.0 marks a strategic escalation in the global entry-level model race. With performance claims that put it on par with the most advanced OpenAI and Google systems, as well as a combination of attractive pricing and open access alternatives, Baidu is signaling its ambition to become not just a domestic AI leader, but a credible global infrastructure provider.
At a time when enterprise AI users are increasingly demanding multimodal performance, malleable licensing, and deployment efficiencies, Baidu’s two-pronged approach – premium hosted APIs and open source releases – could broaden its appeal among both the enterprise and developer communities.
Time will tell whether the company’s performance claims hold up in third-party testing. However, in an environment shaped by rising costs, model complexity and computational bottlenecks, ERNIE 5.0 and its supporting ecosystem position Baidu to be competitive in the next wave of AI adoption.
