Aussie AI Blog
DeepSeek Upends Progress in Reasoning and AGI
-
Jan 27th August, 2025
-
by David Spuler, Ph.D.
Recent Progress in Large Reasoning Models
Remember all that stuff about AI hitting the wall back in November, 2024? And Sam Altman tweeting, "there is no wall", and a flurry of articles. The idea was that training of AI models on lots of data wasn't scaling up the model capabilities as well as it used to. We're running out data in the world. And the training algorithms have lost steam in getting to more advanced reasoning.
Umm, not so much. It only took three months!
Ironically, it's not Sam Altman's OpenAI models that have dismissed the wall to the annals of history. The o1 and o3 models are based on multi-step inference, rather than making models smarter in advanced training, so they seemed to be confirming the wall theory. Instead, it's a Chinese startup called DeepSeek that has built a pre-trained Large Reasoning Model (LRM) that beats o1/o3 on a lot of metrics, but does so with only one step of inference. And it did so using:
- Data!
- Training improvements!
- Prompt engineering!
Who would have thought of that? It's like back to the future with GPT-3, but much better at math.
One wonders what happens when you combine a single-shot DeepSeek R1 model with a multi-step inference algorithm (a la o3)? Rumor has it that all the US-based AI labs are scrambling to understand the R1 model, which has helpfully and generously been open-sourced by DeepSeek, so we can expect an answer to that question in the near future.
What's the Impact?
Embarassment, for starters! All those billions of dollars in funding of US foundation model startups, and a much less funded company in China, using much lower-end GPUs, beats them!
Long-term, the impact may not be so great for US AI companies, because the DeepSeek breakthrough has been written up in a research paper and open-sourced, so the US companies will incorporate it and use it to leapfrog the current state-of-the-art systems.
On the other hand, Wall Street took a negative view on this breakthrough, and there was a cycling out of AI and Mag Seven stocks into others. NVidia was the stock hit hardest, with its decline at 17%, compared to about 2% in the general market.
Wall Street is not always known for its patience. There are reasons to be bearish and bullish, and it's not that clear longer term:
- Lower training costs — there may be a diminished need for all those high-end GPUs in 100,000+ GPU cluster architectures, and this is where NVidia is dominant with very little competition.
- Inference costs — there will be less impact on inference, since R1 works by giving "long answers" in a single step, which may not be much more efficient than the multi-step Chain-of-Thought giving multiple sequential answers with multiple inference steps (both methods are talking through the steps in plain English, so they involve more inference cost than non-reasoning LLMs).
- Smarter LLMs! — some AI applications have been somewhat underwhelming (e.g., Apple Intelligence), so any general advances whereby AI can be smarter at low cost, has a number of beneficiaries. Overall AI demand may soar, which increases mostly inference, but also training and fine-tuning needs to a lesser extent.
- Non-text — the real "killer app" in AI is likely to be a brainier version of Siri, but processing speech, voice, video, and multimodal AI is an order of magnitude more expensive than text, so any efficiency advances bring this "assistant in your pocket" goal closer to reality.
Clearly, Wall Street's attitude backs that thesis that this reduces training GPU demand, which is mostly NVidia chips, rather than general inference chips, which is where NVidia has a lot of competition from dozens of AI-specific hardware startups (e.g. Cerebras, Groq, SambaNova, Blaize, Graphcore, Etched, etc.). Most of the foundation model startups (e.g., OpenAI, Anthropic, Mistral, etc.) also have inference businesses, so they may benefit in some ways. Having a powerful Chinese competitor may also have less impact on B2B AI sales in the USA.
For consumers, this breakthrough probably means better AI at cheaper prices is coming very soon, and that's on top of massive declines in pricing over the last 18 months. DeepSeek's API pricing per token is already much lower than many US competitors.
This extra efficiency is also helpful for the dozens of AI inference platforms that aim to support all of this extra AI usage demand (e.g., the hyperscalars AWS/Azure/GCP and lots of startups like HuggingFace, Groq, and dozens more). Arguably, this efficiency improvement also benefits the hardware platforms that currently struggle to run larger models — all of the on-device AI hardware companies (e.g., Qualcomm, ARM, AMD), including advancements for AI phones (e.g., Apple, Google, Samsung) and AI PC companies (e.g., Microsoft, Dell, Apple, Lenovo, HP, Acer, Asus, etc.). Edge AI use cases may also benefit, such as Internet-of-Things (IoT), autonomous cars, and many other embedded devices.
NVidia is the odd one out? Well, maybe. Even if GPU cluster-based training declines, it'll probably have a long and profitable death, and it may not even decline. After all, one surprising thing that DeepSeek showed was that synthetic data works even for advanced reasoning capabilities. This is mainly just for text, and if synthetic multimodal data also works for training multimodal LRMs, that's another couple of orders of magnitude in token cost. This means that AI trainers can spin up trillions of extra tokens to train their LRMs, and incorporating all that jazz into models will still need 100,000+ GPU clusters to run.
Furthermore, it's not like NVidia knows nothing about the AI industry, and it has been making some forward-looking moves to diversify it's AI platform. NVidia has very deep pockets and has already been broadening out its product portfolio with acquisitions of AI inference infrastructure (e.g., Run:AI and Octo AI) and numerous investments in AI startups via its NVentures arm. NVidia R&D has also built out its own full-stack inference software platform with NIM services, using its own GPUs underneath. NVidia is clearly aiming to build its own cloud platform for AI inference support, touting a potential $150B annual business in that space, competing against hyperscalers such as AWS, Azure, and GCP. Hence, the long-term shakeout of AI companies remains to be seen.
Reasoning is not AGI
It's an important point to note that "reasoning" is only part of the quest for Artificial General Intelligence (AGI). In addition to "reasoning" and "generalization" (whatever that means), there are also other aspects to "intelligence" of a model or system, such as:
- Planning — reasoning does require some level of planning, but the concept of planning a path to walk through a room, or that of a long-term project like a vacation booking is a level beyond that.
- Tool usage — deciding whether or not a question needs a tool to answer in a TALM architecture (e.g., "What's the time?" needs a tool called a "clock").
- Retrieval — deciding whether or not to "retrieve" extra input data, such as running an integrated web search "plugin" for up-to-date results, or specialized data via a RAG architecture or a general RALM architecture (and also being smart enough to know whether to use retrieved data versus your innate parametric knowledge from pre-training).
- Retrieval reasoning — most retrieval-based LLM architectures are used to retrieve "data" or "answers," but we also need to retrieve information on how to solve a problem (e.g., when doing your tax return, learning the rules of a new board game, or trying to do a mathematical proof by induction, there are methods and steps and hints on how to go about it.)
- Memory — raw LLMs are "stateless" and remember nothing. Short-term memory about a conversation can be managed via the "context" of a prompt (e.g., in a session), but LLMs need long-term memory that isn't stored in model parameters. Instead, LLM memory extensions involve key-value lookup datastores or "memory layers" amongst other innovations.
- Learning retention — related to long-term memory, LLMs also need to remember things that they've already learned how to do in an ongoing fashion (i.e., after pre-training), and this is currently difficult.
- Goal direction — it helps to have a reason for doing reasoning, or whatever else you're doing, but LLMs don't have any of that by default.
All of the above areas are hot topics of conversation in AI research labs, and there are plenty of papers on them. Getting to AGI won't be easy, because it will require advanced progress in all of the above.
References
More AI Research Topics
Read more about: