DeepSeek R1 Stuns the AI World
The AI world has been taken by a storm. China’s new open-source reasoning model, the DeepSeek R1, has sparked concerns that advances in AI by Chinese firms could threaten the revenue prospects of tech giants in the West and the AI supremacy long held by the U.S.
The model was developed by the Chinese AI startup DeepSeek, a company very few of us would have heard of until last week. Within a few days, however, it has sent shockwaves through the tech world.
DeepSeek claims that the R1 model matches or even surpasses the performance of OpenAI’s ChatGPT-4 and Anthropic’s Claude-3.5-Sonnet. These models are widely recognized as some of the most innovative in the industry, but R1 seemingly beat it on certain AI benchmarks. According to DeepSeek, R1 beats these models on the benchmarks MATH-500, AIME, and SWE-bench Verified.
In a staggering revelation, DeepSeek claims that the R1 only cost $5.6 million to train. This amount is in stark contrast to the hundreds of millions of dollars that leading U.S. tech companies spend to develop their models. A reported cost reduction of 50 times suggests rethinking the “throw more data centers at it” approach used by all the major Foundation Model makers (e.g., Meta Llama, Anthropic Claude, IBM Granite, Microsoft Phi, Mistral AI, Nvidia Nemotron, and OpenAI GPT-4, etc.). DeepSeek may have developed an AI cotton gin to quickly process the raw data used for Foundational models.
While these claims are disputed in the AI community, the news of R1 has been enough for investors to rethink the large returns they are expecting from AI investments. It may also prompt tech companies to revisit their budget allocations for heavy spending on AI amid growing investor push for returns.
DeepSeek grabbed the attention of the tech world last week when it released a research paper outlining the development process for its two primary models called DeepSeek R1-Zero and DeepSeek R1s. The paper highlights R1’s strong performance in coding, general knowledge, and open-ended generation tasks.
A major appeal of DeepSeek R1 is its fully open-weight framework, which enables users to fine-tune and customize the model for specialized purposes. It is also small enough to run on a mobile device or in combination with other models. This surge in popularity has catapulted DeepSeek R1 to the top of the productivity charts on the Apple App Store.
Regardless of what is hype and what is not, the disruption caused by DeepSeek R1 has led to a sharp decline in major US tech stocks. Semiconductor equipment specialists ASML, NVIDIA, Meta, Alphabet, and Microsoft all saw significant stock price drops on Monday. The stock price drops have erased hundreds of billions of dollars in market value, with the S&P 500 losing more than 2 percent and the tech-heavy Nasdaq dropping 3.5 percent.
The emergence of R1 has resulted in mixed reactions in the tech world. While some have praised the outstanding innovation as a step forward for open-source AI development, others have raised concerns about the geopolitical implications.
“To people who see the performance of DeepSeek and think: ‘China is surpassing the US in AI.’ You are reading this wrong,” LeCun wrote on X. “The correct reading is: ‘Open-source (Open-weight) models are surpassing proprietary ones.’” LeCun praised DeepSeek’s use of tools like PyTorch and LlaMA (both open-weight) to build its model.
According to a statement shared by an NVIDIA spokesperson to BigDataWire, “DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”
Bernstein analysts remain skeptical about the DeepSeek claims. Stacy Rasgon, a senior analyst at Bernstein covering US semiconductors questions whether DeepSeek was truly built for less than $6M, or whether it is more of a mixture-of-experts system built with several optimizations and clever techniques that build on other large foundation models. According to Rasgon, this could explain how R1 has such low GPU requirements. However, Rasgon admits that DeepSeek’s pricing blows away the competition.
Gary Marcus from Marcus on AI, suggests that DeepSeek reportedly got its start in LLMs retraining Meta’s Llama model. If this is the case, then some of the cost reduction could be due to fine-tuning and not fully training an independent model.
Marcus further notes that “GPT-5 has yet to arrive,” suggesting that hardware and data scaling alone may not be the answer for continued progress toward AGI (or improved GPT-based systems). The introduction of new reasoning models like OpenAI o1 vs. general models like GPT4o may indicate a scale-out of capabilities rather than a scale-up.
OpenAI CEO Sam Altman has so far remained silent on the matter.
“Time will tell if the DeepSeek threat is real — the race is on as to what technology works and how the big Western players will respond and evolve,” said Michael Block, market strategist at Third Seven Capital. “Markets had gotten too complacent at the beginning of the Trump 2.0 era and may have been looking for an excuse to pull back — and they got a great one here.”
Venture capitalist Marc Andreessen is calling the unveiling of R1 AI’s “Sputnik Moment”, referring to how the launch of a satellite by the Soviet Union in the late 1950s marked the start of the space race. Many industry analysts and finance pundits are waiting to see how the development unfolds, and whether the claims made by DeepSeek live up to expectations.
“We still don’t know the details and nothing has been 100% confirmed in regards to the claims, but if there truly has been a breakthrough in the cost to train models from $100 million+ to this alleged $6 million number this is actually very positive for productivity and AI end users as cost is obviously much lower meaning lower cost of access,” said Jon Withaar, a senior portfolio manager at Pictet Asset Management.
DeepSeek R1 has arrived at a time when the Trump administration promises to accelerate the production of American AI chips. On his first day at the office, President Trump announced that private companies would make a $500B investment in AI infrastructure and signed an executive order to “remove barriers” to the development of AI.
If we are to believe the claims, DeepSeek’s success is even more remarkable given the growing challenges Chinese AI companies face under tightened U.S. export restrictions on advanced semiconductor technology. This could be the moment where the U.S. authorities question whether the sanctions are working as intended. Could these restrictions be driving startups like DeepSeek to innovate, ultimately undermining the very goals the sanctions were designed to achieve?
For now, we know that DeepSeek has thrown down the gauntlet, disrupting the industry and setting the stage for a new wave of competition. It would be interesting to see how this new dynamic plays out.
** Nvidia canceled a scheduled press briefing today where they were going to share news on a set of software, tools, and libraries for Nvidia Blackwell that raise the performance bar for generative and agentic AI workloads at scale.
Related Items
Deloitte Survey: Majority of CEOs Exploring Generative AI Despite Rising Geopolitical Uncertainty