Holy smoke! Up-to-date, 200% faster variant Deepseek R1-0528 appears from the German TNG Technology Consulting GmbH laboratory

Do you want smarter insights in your inbox? Sign up for our weekly newsletters to get what is essential for AI leaders, data and security. Subscribe now

Just over a month has passed since the Chinese startup Ai Deepseek, branching Hong Kong High-Flyer Management, released the latest version of its hit Open Source Deepseek, R1-0528.

Like his predecessor, Deepseek-R1, he shook artificial intelligence and global business communities with how inexpensive it was trained and how well he worked on the tasks of reasoning, all available for programmers and enterprises for free-r1-0528 is already adapted and remixed by other AI laboratory and developers, largely thanks to its permissible Apache 2.0.

This week, a 24-year-old German company TNG Technology Consulting GmbH has released one Such adaptation: Deepseek-tng r1t2 chimeraThe latest model of the Chimera Immense Language (LLM) family. R1T2 provides a significant raise in efficiency and speed, scoring points up 90% Benchmark R1-0528 resultsWhen generating answers from Less than 40% of the number of output tokens R1-0528.

This means that it causes shorter answers, translating directly into faster inference and lower calculation costs. On the TNG model card issued for the up-to-date R1T2 on the community dividing AI code, the company states that it is “about 20% faster than ordinary R1” (the one issued in January) “and more than twice faster than R1-0528” (May official update from Deepseek).

The answer was already extremely positive on the part of the AI programmers community. “Damn! Deepseek R1T2-200% faster than R1-0528 and 20% faster than R1”, wrote Vaibhav (VB) Srivastav, senior leader at Hugging Face, on x. “Much better than R1 on GPQA and AIME 24, made by the assembly of experts from DS V3, R1 and R1-0528-I is licensed by myth, available on hugging the face.”

This profit is possible thanks to the TNG (AOE) assembly method-LLMS building test by selective combination of weighing tensors (internal parameters) from many pre-trained models that TNG is described by TNG Article published in May At Arxiv, Nie-Peer reviewed the Open Access Online magazine.

The successor to the original Chimera R1T, R1T2 introduces a up-to-date “Tri -ind” configuration, which integrates three superior models: Deepseek-R1-0528, Deepseek-R1 and Deepseek-V3-0324. The result is a model designed to maintain high reasoning options while reducing the costs of application.

R1T2 is constructed without further tuning or retraining. He inherits the reasoning force of R1-0528, structured R1 thought patterns and concise, instructions oriented behavior V3-0324-supplying a more proficient, but capable model for the exploit of enterprises and research.

How the assembly of experts (AOE) differs from an expert mixture (MOE)

A mix of experts (MOE) is an architectural design in which various components or “experts” are conditionally activated to the entrance. In MEE LLM, such as Deepseek-V3 or Mixtral, only the subset of model expert layers (e.g. 8 out of 256) is energetic when serving the token served. This enables very immense models to achieve a higher number of parameters and specialization while maintaining the possibility of applying the costs of applying – because only a fraction of the network is assessed.

Installation of experts (AOE) is a technique for connecting the model, not architecture. It is used to create a up-to-date model from many pre -trained models can be selective interpolating their tensors.

“Experts” in AOE relate to the combined components of the model – usually disheveled with experts in the MOO layers – not experts dynamically activated during performance.

The implementation of AOE by TNG focuses primarily on the combination of Tened Expert Tensors-the part of the model most responsible for specialized reasoning-at the same time keeping more proficient common layers and comments from faster models, such as V3-0324. This approach allows the resulting Chimera models to inherit the reasoning of the reasoning without replication of talk or delay the strongest mother models.

Performance and speed: what is actually showing comparative tests

According to comparisons presented by TNG, R1T2 reaches between 90% and 92% From the reasoning of his most knowledgeable parent, Deepseek-R1-0528, measured with the AME-24, AIME-25 and GPQA-Diamond test sets.

However, unlike Deepseek-R1-0528, it usually brings long, detailed answers due to extended chain reasoning-R1T2 is designed to be much more concise. It provides similarly knowledgeable answers, while using much less words.

Instead of focusing on the raw processing time or token per second, TNG measures “speed” in terms of Number of output tokens to answer – Practical proxies of both costs and delays. According to comparative tests made available by TNG, R1T2 generates answers with About 40% of tokens Required by R1-0528.

This translates into 60% reduction of the output lengthwhich directly shortens the time and calculate the load, accelerating the answers by 2x or 200%.

Compared to the original Deepseek-R1, R1T2 is also nearby On average, 20% more conciseBy offering significant profits from high bandwidth or sensitive to implementation costs.

This performance has no interview costs. As shown on the comparative chart presented in the TNG technical document, R1T2 is in the desired zone on the intelligence curve compared to the output cost. It maintains the quality of reasoning, at the same time minimizing the talk – a critical result for the applications of the company, in which the speed of inference, capacity and cost all materials.

Considerations and availability of implementation

The R1T2 is issued under the MIT license and is now available on hugging the face, which means that it is open source and is available for exploit and built into commercial applications.

TNG notes that although the model is suitable for general reasoning tasks, it is not currently recommended for cases of exploit requiring function or exploit of the tool, due to the restrictions inherited from its Deepseek-R1 line. They can be solved in future updates.

The company also advises European users to assess the compliance of the EU AI ACT, which enters into force on August 2, 2025.

Enterprises operating in the EU should view the relevant regulations or consider applying the detention model after this day if the requirements cannot be met.

However, American companies running domestic and serving American users or other nations are NO Subject to the EU AI ACT conditions, which should give them significant flexibility when using this free, rapid Open Source reasoning model. If they serve users in the EU, some The provisions of the EU Act will still apply.

TNG has already provided earlier variants of Chimera via platforms such as OpenRouter and Chutes, where they reportedly processed billions of tokens a day. The R1T2 edition is further evolution in this effort of public availability.

About TNG Technology Consulting GmbH

Founded in January 2001, TNG Technology Consulting GmbH It is based in Bavaria, Germany and employs over 900 people, with high concentration of doctoral students and technical specialists.

The company focuses on creating software, artificial intelligence and Devops/Cloud Services, serving main corporate clients in various industries, such as telecommunications, insurance, automotive, e-commerce and logistics.

TNG acts as a consulting partnership based on values. Its unique structure, based on the principles of operational research and self -management, supports the culture of technical innovation.

It actively contributes to Open Source community and research, as shown in public editions, such as R1T2 and the publication of the methodology of the Assembly of Experts.

What does it mean for technical decision makers of enterprises

In the case of CTOS, owners of AI platforms, engineering teams and IT, R1T2 introduces material benefits and strategic options:

Lower application costs: With fewer output tokens for the R1T2 task reduces the GPU time and energy consumption, translating directly into infrastructure savings-especially essential in high bandwidth or real time.
High quality reasoning without general costs: Retains a lot of reasoning of the highest level models, such as R1-0528, but without their long-term. It is ideal for structural tasks (mathematics, programming, logic), in which concise answers are preferred.
Open and modified: The MIT license allows full control and adjustment of implementation, enabling private hosting, alignment of the model or further training in regulated or air environments.
Appearing modularity: AOE approach suggests a future in which models are built modularly, allowing enterprises to install specialized variants by recombinating the strengths of existing models, instead of retraining from scratch.
Reservations: Enterprises involving calling functions, using a tool or advanced orchestration of an agent should record current restrictions, although future Chimera updates can solve these gaps.

TNG encourages researchers, programmers and corporate users to explore the model, test its behavior and transfer feedback. Chimera R1T2 is available at huggingface.co/tngtech/deepseek-tng-r1t2-chimeraand you can direct technical queries test@tngtech.com.

In the case of a technical background and comparative methodology, the TNG research document is available at ARXIV: 2506.14794.

Daily observations in matters of business exploit with VB daily

If you want to impress your boss, VB Daily is covered by you. We give you an internal measure about what companies do with generative artificial intelligence, from regulatory changes to practical implementation, so you can share insights for the maximum roi.

Read our Privacy Policy

Thanks for the subscription. Check out more VB newsletter here.

There was a mistake.

Categories

Holy smoke! Up-to-date, 200% faster variant Deepseek R1-0528 appears from the German TNG Technology Consulting GmbH laboratory

How the assembly of experts (AOE) differs from an expert mixture (MOE)

Performance and speed: what is actually showing comparative tests

Considerations and availability of implementation

About TNG Technology Consulting GmbH

What does it mean for technical decision makers of enterprises

This summer, America’s water crisis is becoming real

MIT-IBM Computing Research Lab launches to shape the future of artificial intelligence and quantum computing

Standalone LLM in the real world: limitations, workarounds and strenuous lessons

How Elon Musk squeezed OpenAI: “They will want to kill me”

Solving the mole dilemma: A smarter way to challenge AI vision models

More News

How Elon Musk squeezed OpenAI: “They will want to kill me”

Sanctioned Chinese artificial intelligence company SenseTime releases an image model built for speed

When the robots have their moment in GPT chat, remember these tongs

OpenAI really wants Codex to stop talking about goblins

This summer, America’s water crisis is becoming real

MIT-IBM Computing Research Lab launches to shape the future of artificial intelligence and quantum computing

Standalone LLM in the real world: limitations, workarounds and strenuous lessons