Inside Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks on the Scale of Trillions

Share

China A group of antsAlibaba affiliate, detailed technical information about the novel model, Ring-1Twhich the company claims is “the first open-source reasoning model with a total trillion parameters.”

Ring-1T aims to compete with other reasoning models such as GPT-5 and O-series OpenAIbut also GoogleGemini 2.5. With the release of its latest model, Ant is extending the geopolitical debate over who will make it dominate the AI race: China or USA.

Ant Group says Ring-1T is optimized for math and logic problems, code generation and scientific problem solving.

“With approximately 50 billion activated parameters per token, Ring-1T achieves state-of-the-art performance on many demanding benchmarks – despite relying solely on natural language reasoning capabilities,” Ant said in paper.

Ring-1T, which was first released in preview in September, adopts the same architecture as Ling 2.0 and is trained on the Ling-1T base model that the company released earlier this month. Ant said this allows the model to handle up to 128,000 tokens.

To train a model as vast as Ring-1T, researchers had to develop novel methods for scaling reinforcement learning (RL).

Recent training methods

Ant Group has developed three “interconnected innovations” to support Ring-1T RL and training, which is a challenge given the size of the model and the typically weighty computational requirements it entails. These three are IcePop, C3PO++ and ASystem.

IcePop removes clamorous gradient updates to stabilize training without slowing down inference. It helps eliminate the catastrophic misalignment of learning and inference in RL. The researchers noted that when training models, especially those using a mixture of experts (MoE) architecture such as Ring-1T, there can often be discrepancies in probability calculations.

“This problem is particularly pronounced when training MoE models using RL due to the inherent use of a dynamic routing mechanism. Furthermore, in the case of long CoT setups, these discrepancies may gradually accumulate over subsequent iterations and become further amplified,” the researchers say.

IcePop “bypasses unstable training updates through bilateral masking calibration.”

Another novel method that researchers had to develop is C3PO++, an improved version of the C3PO system previously developed by Ant. The method manages how Ring-1T and other high-performance models generate and process training examples, or so-called deployments, so that GPUs don’t remain idle.

The way this works would be to divide the work within deployments into parts that can be processed in parallel. One group is the inference pool, which generates novel data, and the other is the training pool, which collects results to update the model. C3PO++ creates a token budget to control the amount of data processed, ensuring proficient operate of GPUs.

The last novel method, ASystem, uses the SingleController+SPMD (Single Program, Multiple Data) architecture to enable asynchronous operations.

Benchmark results

Ant pointed to Ring-1T for benchmarks measuring performance in math, coding, logical reasoning, and general tasks. They tested it against models such as DeepSeek-V3.1-Terminus-Thinking, Qwen-35B-A22B-Thinking-2507, Gemini 2.5 Pro and GPT-5 Thinking.

In benchmarks, Ring-1T performed well, ranking second to OpenAI’s GPT-5 in most tests. Ant said the Ring-1T showed the best performance of all open-mass models tested.

The model achieved a score of 93.4% on the AIME 25 leaderboard, second only to GPT-5. In encoding, Ring-1T outperformed both DeepSeek and Qwen.

“This indicates that our carefully synthesized dataset shapes the solid performance of Ring-1T in software applications, which provides a solid foundation for future agent application efforts,” the company said.

Ring-1T shows how much Chinese companies invest in models

Ring-1T is the latest model from China, which is expected to dethrone GPT-5 and Gemini.

Since the surprise launch of DeepSeek in January, Chinese companies have been releasing impressive models at a rapid pace. Ant’s parent company, Alibabarecently released Qwen3 – Omnia multimodal model that natively unifies text, image, audio and video. DeepSeek also continues to refine its models and earlier this month launched DeepSeek-OCR. This novel model reimagines the way models process information.

As Ring-1T and Ant develop novel methods for training and scaling very vast models, the battle for AI supremacy between the U.S. and China continues to intensify.

The AI Sckool

Categories

Inside Ring-1T: Ant Engineers Solve Reinforcement Learning Bottlenecks on the Scale of Trillions

Recent training methods

Benchmark results

Ring-1T shows how much Chinese companies invest in models

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’

‘It’s undignified’: Hundreds of workers training Meta’s artificial intelligence could be fired

Local transcription of the whisper sound

The United Arab Emirates is leaving OPEC after almost 60 years

More News

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’

‘It’s undignified’: Hundreds of workers training Meta’s artificial intelligence could be fired

The United Arab Emirates is leaving OPEC after almost 60 years

OpenAI really wants Codex to stop talking about goblins

Elon Musk Testifies He Launched OpenAI to Prevent ‘Terminator Outcome’

‘It’s undignified’: Hundreds of workers training Meta’s artificial intelligence could be fired