The modern model AI Deepseek sparkles shock, admiration and questions from American competitors

The real price of the development of modern Deepseek models remains unknown, because one number cited in one research article may not capture the full picture of its costs. “I don’t believe it’s $ 6 million, but even if it’s $ 60 million, it’s a game changer,” says Umesh Padval, managing director of Thomvest Ventures, a company that has invested in Cohere and other AI companies. “He exerts pressure on the profitability of companies that focus on artificial intelligence.”

Shortly after Deepseek revealed the details of his latest model, Ghodysi from Databicks claims that customers began to ask if they could apply it, as well as the deepeek techniques underlying to reduce costs in their own organizations. He adds that one approach used by Deepseek engineers, known as distillation, which includes using the exit from one vast language model to train another model, is relatively low-cost and uncomplicated.

Padval claims that the existence of models such as Deepseek will ultimately benefit companies that want to spend less on artificial intelligence, but claims that many companies may have reservations about relying on the Chinese model of sensitive tasks. So far, at least one outstanding company AI, embarrassment, has publicly announced He uses the R1 Deepseek model, but says he is hosted “completely independent of China.”

Amjad Massad, General Director of the Replit, Startup, which provides AI encoding tools, said Wired that he thinks that the latest Deepseek models are impressive. Although he still thinks that Anthropica’s sonnet model is better in many tasks of computer engineering, he discovered that R1 is especially good in transforming text commands into a code that can be done on a computer. “We are investigating the use of it specifically for the justification of the agent,” he adds.

The latest two offers Deepseek-Deeepseek R1 and Deepseek R1-Zero-Są capable of the same type of simulated reasoning as the most advanced operai and Google systems. Everyone works, translating problems in the components to solve them more effectively, a process that requires a significant amount of additional training to ensure that AI will reliably achieve the correct answer.

AND paper Last week, sent by Deepseek researchers, they present the approach that the company used to create their R1 models, which, he claims, perform on some references, as well as a groundbreaking model of OPENAI reasoning known as O1. The tactic used includes a more automated method of learning the correct solution to the problem, as well as a strategy for transferring skills from larger models to smaller ones.

IN Research article From August 2024, Deepseek pointed out that he has access to the 10,000 NVIDIA A100 cluster, which were covered by American restrictions announced in October 2022. separate paper From June this year, Deepseek stated that he had previously created a model called Deepseek-V2 was developed using the NVIDIA H800 computer systems, a less talented component developed by NVIDIA to follow US export control.

The source in one AI company, which trains vast AI models, which asked for anonymous to protect its professional relationships, estimates that Deepseek probably used about 50,000 NVIDIA systems to build its technology.

Nvidia refused to comment directly about which Deepseek could consist of her systems. “Deepseek is an excellent AI promotion,” said NVIDIA spokesman in a statement, adding that the approach of startup reasoning “requires a significant number of NVIDIA graphic processors and high -performance networks.”

Regardless of how Deepseek models have been built, they seem to show that a less closed approach to the development of artificial intelligence is gaining momentum. In December, Clem Delangue, CEO of Huggingface, a platform that hosts artificial intelligence, I predicted it The Chinese company would include conducting in artificial intelligence due to the speed of innovation in Open Source models, which China has largely accepted. “It went faster than I thought,” he says.

Categories

The modern model AI Deepseek sparkles shock, admiration and questions from American competitors

Agent Engineering Status Report Overview

5 key changes D&A leaders need to make to ensure analytics and AI success

COBOL is the asbestos of programming languages

Japan approves world’s first treatment using reprogrammed human cells

Wall Street is already betting on markets based on forecasts

More News

COBOL is the asbestos of programming languages

Wall Street is already betting on markets based on forecasts

The war in Iran is causing chaos in global shipping

China’s OpenClaw boom is a gold rush for artificial intelligence companies

Agent Engineering Status Report Overview

5 key changes D&A leaders need to make to ensure analytics and AI success

COBOL is the asbestos of programming languages