However, it's still cheaper than its competitors.
The new chatbot from DeepSeek greeted me with an intriguing introduction:
Hi, I was created so you can ask anything and get an answer that might even surprise you.
Today, DeepSeek's AI has emerged as a formidable player in the market, notably contributing to one of NVIDIA's largest stock price declines.
Image: ensigame.com
What distinguishes this model are its innovative architecture and training methods, which include:
Multi-token Prediction (MTP): This technique allows the model to predict multiple words simultaneously by analyzing various parts of a sentence, improving both accuracy and efficiency.Mixture of Experts (MoE): Utilizing 256 neural networks, with eight activated for each token processing task, this architecture speeds up AI training and enhances performance. Multi-head Latent Attention (MLA): By focusing on the most significant parts of a sentence repeatedly, MLA reduces the chance of overlooking crucial information, thereby capturing essential nuances in the input data.DeepSeek, a prominent Chinese startup, claims to have developed a competitive AI model at a minimal cost, stating they spent only $6 million on training DeepSeek V3, using just 2048 graphics processors.
Image: ensigame.com
However, analysts at SemiAnalysis have revealed that DeepSeek operates a vast computational infrastructure with around 50,000 Nvidia Hopper GPUs, including 10,000 H800 units, another 10,000 H100s, and additional H20 GPUs. These resources are spread across multiple data centers, used for AI training, research, and financial modeling.
The company's total investment in servers stands at approximately $1.6 billion, with operational costs estimated at $944 million.
DeepSeek is a subsidiary of the Chinese hedge fund High-Flyer, which launched the startup as a separate AI-focused division in 2023. Unlike most startups that lease computing power from cloud providers, DeepSeek owns its data centers, allowing for full control over AI model optimization and quicker innovation implementation. The company remains self-funded, which enhances its agility and decision-making speed.
Image: ensigame.com
Furthermore, some DeepSeek researchers earn over $1.3 million annually, drawing top talent from leading Chinese universities (the company does not hire foreign specialists).
Despite this, DeepSeek's recent claim of training its latest model for just $6 million appears unrealistic. This figure only accounts for GPU usage during pre-training and excludes research expenses, model refinement, data processing, and overall infrastructure costs.
Since its start, DeepSeek has invested over $500 million in AI development. Yet, its smaller size compared to larger, more bureaucratic companies enables it to implement AI innovations more actively and effectively.
Image: ensigame.com
DeepSeek's case illustrates that a well-funded independent AI company can challenge industry giants. Nevertheless, experts stress that the company's success is largely due to substantial investments, technical breakthroughs, and a strong team, rather than a "revolutionary budget" for AI model development.
Still, competitors' costs remain significantly higher. For example, DeepSeek spent $5 million on R1, while ChatGPT4o cost $100 million.