Today, Deepseek is one of the few leading AI companies in China, which is not based on the financing of technological giants such as Baidu, Alibaba or Bytedance.
Newborn group of geniuses willing to prove
According to Liang, when he gathered the Deepseek research team, he did not look for experienced engineers to build a product addressed to consumers. Instead, he focused on doctorates from the best universities in China, including Peking University and Tsinghua University, who were cheerful to prove themselves. Many were published in the best magazines and won awards at international academic conferences, but according to industry experiences there were no industry experiences Chinese technological publication QBitai.
“Our basic technical positions are mostly planted by people who have graduated from school this year or in the last one or two years” Liang said 36KR in 2023. The employment strategy helped to create a company culture in which people could freely exploit computing resources to implement unconventional research projects. This is a clearly different way of acting from recognized online companies in China, where teams often compete for resources. (Last example: Bytedance accused the former interns – the prestigious winner of the academic award, no less – sabotaging the work of his colleagues to collect more computing resources for his team)
Liang said that students can be better for research with a high investment and low content of the organization. “Most people, when they are young, can completely devote themselves to mission without utilitarian considerations,” he explained. His jump for potential employees is that Deepseek was created to “solve the most difficult questions in the world.”
Experts say that the fact that these adolescent researchers are almost completely educated in China. “This younger generation also personifies a sense of patriotism, especially when they move in the USA and choked points in critical technologies of hardware and software,” explains Zhang. “Their determination to overcome these barriers reflects not only personal ambitions, but also a broader involvement in China’s position as a global leader in innovation.”
Innovations born from the crisis
In October 2022, the US government began to accumulate export controls, which seriously restricted Chinese AI access to the latest systems, such as H100 NVIDIA. This movement is a problem for Deepseek. The company started with a 10,000 H100 campaign, but it needed more to compete with companies such as Opeli and Meta. “The problem we are facing was never financed, but an export control of advanced tokens,” said Liang 36KR In the second interview in 2024.
Deepseek had to come up with more productive methods of training his models. “They optimized their model architecture using the battery life of engineering tricks-the communication scheme between systems, reducing the size of the fields to save memory and innovative use of the approach to the mix of models,” says Wendy Chang, a software engineer who transformed the policy principles of the analyst at Mercator Institute for China Studies. “Many of these approaches are not new ideas, but an effective combination of them to get the most modern model is an amazing feat.”
Deepseek has also made significant progress in many latent comments (MLA) and a mix of experience, two technical projects that make the Deepseek models more profitable, requiring a fewer number of computational resources for training. In fact, the latest Deepseeka model is so productive that it required one tenth computing force of the comparable LLME 3.1 META for training, According to the research institution of the AI era.
Deepek’s desire to share these innovations with the public brought her a significant value of good will in the global research community of AI. In the case of many Chinese AI, the development of Open Source models is the only way to catch up with their western counterparts, because it attracts more users and colleagues, which in turn helps in the development of models. “Now they have shown that the most modern models can be built using less, although still a lot of money and that the current models building standards leave a lot of space for optimization,” says Chang. “We will certainly see many more attempts in this direction.”
The message can mean problems for current American export controls that focus on creating a bottleneck of computing resources. “The existing estimates of how much AI computing power they have and what they can achieve with it can be raised,” says Chang.