Saturday, April 19, 2025

Deepcoder ensures the highest coding efficiency in the capable 14b open model

Share


Join our daily and weekly newsletters to get the latest updates and exclusive content regarding the leading scope of artificial intelligence. Learn more


Researchers in Together AI AND Agent They released Deepcoder-14B, a novel coding model that provides impressive performance comparable to leading their own models such as O3-Mini OpenAI.

This model, built on the Deepseek-R1, provides greater flexibility in integration with high code performance and the ability to reason in real applications. Importantly, teams fully in addition to the model, its training data, code, journals and system optimizations that can assist researchers improve their work and accelerate progress.

Competitive coding options in a smaller package

Experiments of the research team show that DeepCoder-14B is strongly operating in several complex coding test tests, including LivecodeBench (LCB), Code and HumaneVal+.

“Our model shows good results in all coding reference tests … comparable to O3-Mini (Low) and O1 performance, write scientists Blog post which describes the model.

Interestingly, despite the training primarily in the scope of coding tasks, the model shows better mathematical reasoning, gaining 73.8% in Aime 2024, which is an improvement by 4.1% in relation to the basic model (Deepseek-R1-Distill-Qwen-14B). This suggests that reasoning skills developed through RL in the code can be effectively generalized to other domains.

Credit: Total AI

The most striking aspect is achieving this performance level with only 14 billion parameters. This makes Deepcoder much smaller and potentially more capable in launching than many border models.

Innovations that drive Deepcoder performance

When developing a model, scientists have solved some of the key challenges in the field of training coding models using reinforcement learning (RL).

The first challenge was tupporting training data. Learning reinforcement requires reliable reward signals indicating that the model output is correct. As scientists note: “Unlike mathematics-on, on abundant high-quality verifiable data is easily accessible on the internet-domestic coding suffers from a relative deficiency of such data.”

To solve this problem, the Deepcoder syndrome has implemented a strict pipeline that accumulates examples from different data sets and filters them to be valid, complexity and reproduction. This process caused 24,000 high quality problems, which is a solid basis for effective RL training.

The team also designed a uncomplicated prize function, which provides a positive signal only if the generated code undergoes all unit samples for the problem at a specific time. In combination with examples of high -quality training, this prize system focused on results prevents learning models, such as printing remembered answers to public tests or optimization for uncomplicated edge cases without solving a basic problem.

The basic model training algorithm is based on a relative policy optimization group (GPO), a reinforcement algorithm that proved to be very effective in Deepseek-R1. However, the team made several algorithm to make it more stable and enable the model to further improve when the training extends for longer.

Grpo+
GRPO+ allows Deepcoder-14 to continue the long time without the credit: AI together

Finally, the team expanded the context window of the model iteratively, first training it on shorter reasoning sequences and gradually increasing the length. They also developed a filtering method to avoid punishing the model when he created reasoning chains that have exceeded contextual limits when solving a strenuous line.

Iterative context extension
Deepcoder was trained in terms of problems with the context of 32K, but was also able to solve 64K credit tasks: Total AI

Scientists explain the basic idea: “To maintain the reasoning of a long context, while enabling efficient training, we turned on the filter browsing … This technique masks cut sequences during training, so that the models are not punished for generating thoughtful, but long exits that exceed the current context limit.”

The training was gradually scaled from the 16K to 32K context window, and the resulting model can also solve problems that required up to 64 tokens.

Optimization of long contact RL training

Training huge models from RL, especially in terms of tasks requiring long -generated sequences, such as coding or intricate reasoning, is intense and sluggish. The main bottleneck is the “sampling” stage, in which the model generates potentially thousands of tokens, for example in a party. Variants of response length mean that some answers end much later than others, leaving the GPU inactivity and slowing the entire training loop.

To accelerate this, the team was developed by Verl-Pipelin, optimized extension of the Verl Open Source library for Learning to strengthen based on human feedback (RLHF). The key innovation, which they call a “one -time pipeline”, change with sampling of answers and updating the model to shorten the bottlenecks and time of inactivity.

One -off stream
One -off stream

Their experiments have shown that disposable pipelines provided up to 2x speed for encoding RL tasks compared to output implementation. This optimization was crucial for Deepcoder training in a reasonable time (2.5 weeks to 32 H100) and is currently open as part of Verl-Pipelin, so that the community can utilize and build.

Influence of the enterprise

Scientists have provided all artifacts for training and conducting DeepCoder-14B Girub AND Hugging under the permissible license.

“By fully sharing our set of data, code and training provision, we enable the community to restore our work and make RL training for everyone,” write scientists.

Deepcoder-14B strongly illustrates a wider, accelerating trend in the AI ​​landscape: an raise in highly talented, but capable and open models.

In the case of the company’s world, this change means more options and higher availability of advanced models. The most current performance is no longer only a field of hyperstanists or people willing to pay premium API fees. Models such as Deepcoder can enable organizations of all sizes to utilize sophisticated code generation and reasoning, adapt solutions to their specific needs and safely implement them in their environment.

This trend can reduce the entry barrier to AI adoption and support a more competitive and novel ecosystem in which progress is conducted by Open Source cooperation.

Latest Posts

More News