Aussie AI Blog
DeepSeek is Good for NVIDIA and AI
-
30th Jan, 2025
-
by David Spuler, Ph.D.
DeepSeek is Good
Two things happening in my world this week: (1) I asked AI about a possible cyclone near my town and it told me "yes, it's already formed" using articles from 2023 (clueless), and (2) about 100 articles in my feeds about DeepSeek.
Here's my thesis: DeepSeek's advances are good for AI and probably also good for NVIDIA in the mid/long-term.
DeepSeek is a China-based AI lab that has been putting out a steady stream of models and research papers, with various innovations. Apparently, DeepSeek achieved training cost reductions using non-H100 NVIDIA chips (amazing) and their inference costs are down too (like everyone's).
Training Breakthroughs
Wall Street has been focused on DeepSeek's training of a model at a much lower cost, with much lower-end GPUs. However, their advance in using training to create a single-step reasoning model should not be understated. Other than the cost focus, let's look deeper at what DeepSeek proved:
(a) they broke "the wall" and showed that more training can still make a smarter model (and even a reasoning model),
(b) the data scaling laws for training are therefore still intact for building smarter LLMs,
(c) test time compute architecture (e.g., o3) is not needed when you can build a smart one-shot model (with reasoning capabilities),
(d) synthetic data still works for training, even for reasoning models, so AI trainers can spin up trillions of extra content tokens as training data,
(e) human-curated data still works for training, so all the current investments in getting advanced humans like doctors and lawyers to label AI data will still be useful, and
(f) LLMs probably can still get a lot lot lot smarter.
All of these points are positive for the need for lots more AI training, where NVIDIA excels.
If you combine these furthermore with some other existing trends, such as that multimodal/image/video has many more tokens than text data and we're early in that trend, you can see why I think there's still going to be a lot of AI training that NVIDIA needs to power.
Even if there are ways to do it cheaper than now, the sheer demand for forward progress will benefit NVIDIA and all those data center buildouts, at least for the next few years.
DeepSeek's Inference Optimizations
The inference cost reductions, meanwhile, aren't really in NVIDIA's sweet spot, but they've been happening for 18 months already, and NVIDIA has been making acquisitions that build out its own inference products also. This is not revolutionary, as there's been a steady drumbeat of new technological advancements that have reduced inference cost and plummeted the per-token costs of providers.
The main inference optimizations that DeepSeek used included:
- Single-step reasoning — this obviates the need for multiple steps of inference to do reasoning, although it does output "long answers" as it talks through its own reasoning steps. This comes with an increased token cost, but presumably fewer tokens overall compared to multi-step Chain-of-Thought methods, although there are also various CoT token reduction optimizations (more research papers needed!).
- Multi-token decoding — the idea is to spit out two or more tokens per inference iteration, rather than one at a time as in most decoding algorithms. This parallel decoding and multi-token output research has been around for a while, including a major version from Meta Research, but this is the first major industry offering that uses it.
- Multi-Head Latent Attention (MHLA or MLA) — this is a new optimization of the attention module, apparently invented for DeepSeek-V2 in May 2024. There are numerous other attention optimization techniques, including the original Multi-Head Attention (MHA), and two reductions to KV caching in Multi-Query Attention (MQA) and Group-Query Attention (GQA). MHLA is closer to MHA in accuracy with increased efficiency, but with better accuracy and comparable efficiency to MQA and GQA. How it compares to other methods like Flash attention or Paged attention is unclear. Some papers needed!
This is a good body of work on inference optimization from DeepSeek, but their training advances are probably more significant. Many labs have been outputting new inference techniques, and there's going to be an ongoing sequence of new inference advances over time. After all, there are literally 500 inference optimization techniques in the research papers, and it will take time for all of them to be tested, and the best ones combined.
Cheaper models are good news for consumers, mean that there'll be lots of buzz in the "application layer" (where I'm working), good for on-device AI (e.g., Microsoft AI PCs, Google Android, and Apple Intelligence devices), and overall will drive demand for use of smarter and cheaper AI in both B2C and B2B.
And finally, my point about bad cyclone answers shows that all of that inference stuff is only as good as the underlying LLMs, which are still clueless in many ways, and they need more training.
Some More Points
- DeepSeek open-sourced their model and published a nice research paper on their work. Indeed, several papers over the past year. Versions of DeepSeek R1 are already appearing on US-based hosting sites, and you can expect their training methods to be incorporated into the US major models at a clip.
- If you want to prove the bull case for AI, note that the DeepSeek app has now topped the charts for phone apps, will millions of downloads in the last few days. People want free AI!
- Jevon's paradox says that any technological advancement making a resource more efficient often causes overall consumption of that resource to rise.
- Data center businesses benefit from both training and/or inference. In the very early days of generative AI, training was the dominant workload, but lately inference has taken over, mainly due to the sheer magnitude of users using AI for whatever the heck they use it for. Recent statistics before o3 and DeepSeek had inference at 95% versus 5% for training, and the multi-step inference in o3-like models would have increased that.
- NVIDIA already has a robust inference business, although it's size is unknown. This is not just from acquisitions, but also aggressive development work whereby NVIDIA R&D has built a full inference software stack (e.g., NIM services and blueprints). They've also announced plans to compete against the hyperscalars with their own AI Cloud offering, touting a potential $150B business therein. This goes against the thesis that inference advances would hurt NVIDIA.
- NVIDIA also has a stake in the AI PC business, with its multi-billion dollar gaming GPU business (now largely overlooked!), and has recently announced its own AI-specific PC for $3,000. Although this new product is targeted at AI developers initially, one can envision it moving on to other high-end users of AI in the near future.
- Networking products is another billion-dollar data center business that NVIDIA has in its portfolio, which also rarely gets a mention. I think it's understimated as a moat. Building any data center needs networking hardware components and software tools, all available for a fee from NVIDIA. AI training needs to send training data and parameter updates over the network (in bursts), and inference needs to send KV cache data flying all over the place (continually).
- Note that some US AI luminaries are suggesting that maybe DeepSeek did use 50,000 H100's but can't talk about it due to US export restrictions. On the other hand, no evidence of this has been provided, and it may be self-serving rumors and misinformation, which is why I moved this comment to the bottom.
- Meta CEO Mark Zuckerberg has announced that he has no plans to cut back his AI infrastructure spend, regardless of what's been happening with DeepSeek. I tend to agree.
More AI Research Topics
Read more about: