Join our daily and weekly newsletters to receive the latest updates and exclusive content on our industry-leading AI coverage. Find out more
Chinese AI startup DeepSeekknown for challenging leading open-source AI vendors, has just dropped another bombshell: a up-to-date open-reasoning LLM tool called DeepSeek-R1.
Based on the recently introduced DeepSeek V3 expert mix model, DeepSeek-R1 matches the performance of o1, OpenAI’s pioneering LLM reasoning, on math, coding and inference tasks. The best part? It does this at a much more tempting price, turning out to be 90-95% cheaper than the latter.
This release represents a major step forward in the open source arena. It shows that open models are further closing the gap to closed commercial models in the race to artificial general intelligence (AGI). To demonstrate the efficiency of its work, DeepSeek also used R1 to distill six Lamy and Qwen models, taking their performance to a up-to-date level. In one case, the distilled version of Qwen-1.5B outperformed the much larger models, GPT-4o and Claude 3.5 Sonnet, on selected math tests.
These distilled models along with main R1they are open source and available on the website Hugging face licensed by MIT.
What does DeepSeek-R1 offer?
The focus is on artificial general intelligence (AGI), a level of artificial intelligence that can perform intellectual tasks like humans. Many teams are working on improving the reasoning capabilities of models. OpenAI has taken the first significant step in this field with the o1 model, which uses a chain-of-mind reasoning process to solve a problem. Through RL (reinforcement learning or reward-based optimization), the o1 learns to refine his thinking and refine the strategies he uses – ultimately learning to recognize and correct his mistakes or try up-to-date approaches when current ones aren’t working.
Now, continuing its work in this direction, DeepSeek has released DeepSeek-R1, which uses a combination of RL and supervised tuning to handle intricate inference tasks and match the performance of o1.
During testing, DeepSeek-R1 scored 79.8% on the AIME 2024 math tests and 97.3% on the MATH-500 test. He also achieved a Codeforces rating of 2029 – better than 96.3% of human developers. In contrast, o1-1217 scored 79.2%, 96.4% and 96.6% in these benchmarks, respectively.
It also demonstrated mighty general knowledge with an accuracy of 90.8% in MMLU, just behind the 91.8% in o1.
Training pipeline
DeepSeek-R1’s reasoning performance marks a major victory for the Chinese startup in the US-dominated AI space, especially since the entire work is based on open source code, including how the company trained the entire thing.
However, the job is not as effortless as it seems.
According to the research paper, DeepSeek-R1 was developed as an improved version of DeepSeek-R1-Zero, a groundbreaking model trained solely on reinforcement learning.
We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.
— Jim Fan (@DrJimFan) January 20, 2025
DeepSeek-R1 not only open-sources a barrage of models but… pic.twitter.com/M7eZnEmCOY
The company first used the DeepSeek-V3 database as a base model, developing its inference capabilities without using supervised data, essentially focusing solely on its own evolution through a purely RL-based trial-and-error process. This ability, developed from work, ensures that the model can solve increasingly intricate reasoning tasks, using calculations extended during testing to delve deeper into and refine thought processes.
“During training, DeepSeek-R1-Zero naturally revealed many powerful and interesting reasoning behaviors,” the researchers note in the paper. “After thousands of RL steps, DeepSeek-R1-Zero shows excellent performance in inference tests. For example, the pass@1 score in AIME 2024 increased from 15.6% to 71.0%, and with majority voting the score further improved to 86.7%, matching the performance of OpenAI-o1-0912.
However, despite improvements in performance, including behaviors such as reflection and exploration of alternatives, the initial model did exhibit some problems, including impoverished readability and language mixing. To solve this problem, the company built on the work done for R1-Zero, using a multi-step approach combining both supervised learning and reinforcement learning, and thus developed an improved R1 model.
“Specifically, we start by collecting thousands of cold start data to fine-tune the DeepSeek-V3-Base model,” the researchers explained. “Then we do a reasoning-oriented RL like DeepSeek-R1-Zero. Once we achieve convergence in the RL process, we create new SFT data by reject sampling at the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, actual QA, and self-discovery, and then retrain the DeepSeek-V3-base model . After fine-tuning with new data, the checkpoint undergoes an additional RL process, incorporating hints from all scenarios. After completing these steps, we obtained a checkpoint called DeepSeek-R1, which achieves performance comparable to OpenAI-o1-1217.”
Much cheaper than o1
In addition to improved performance that nearly rivals OpenAI’s o1 in benchmarks, the up-to-date DeepSeek-R1 is also very affordable. Specifically, while OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs $0.55 per million input tokens and $2.19 per million output tokens.
Sooo @deepseek_ai's reasoner model, which sits somewhere between o1-mini & o1 is about 90-95% cheaper 👀 https://t.co/ohnI6dtPRC pic.twitter.com/Qn78yIGUtt
— Emad (@EMostaque) January 20, 2025
The model can be tested as “DeepThink” on the platform DeepSeek chat platformwhich is similar to ChatGPT. Interested users can access the model weights and code repository via Hugging Face under the MIT license, or can employ the API for direct integration.