Aussie AI

AI PC Research

  • Last Updated 11 December, 2024
  • by David Spuler, Ph.D.

AI models and applications are set to make PC's hot again (see also GenAI market research). The next generation of PCs will likely run some AI models natively, and there will also be hybrid architectures with AI workloads sent into the cloud. It is early days in this trend, but it's surely going to be a major technology driver for years.

Our main research interest in relation to "AI PCs" is optimization of inference algorithms, so that the models can run fast enough. This includes execution of AI inference on CPU-only PCs and low-end GPU-based PCs that are available.

Fast LLMs on Your PC or Laptop

A desktop PC or laptop is more capable than a phone, so some of issues about phones running AI inference are less problematic on a PC. Most obviously, a PC can have a decent GPU, which can be used by AI engines. Concerns about CPU usage, over-heating, and battery depletion are less problematic on a PC.

The first generation is likely to be "AI Developer PCs". Software developers typically have high-end PCs, and various AI models can be run on desktop PCs. However, execution speed is still rather sluggish for large models, even on multi-thousand dollar PCs with powerful GPUs, so there is much research still to be done on optimization of inference. Large models are where the action is at in terms of AI functionality, so it may be that software developers are still using cloud server AI for some time to come. And certainly, training and fine-tuning workloads seem less likely to move down onto desktop PCs.

But "AI PCs" are already in the works for everyday users. For end user applications, the model still has to run fast to give the user a decent response time, so there are still some significant obstacles before AI models will appear widespread on non-developer PCs. However, hybrid architectures where some AI execution is still uploaded to the cloud will likely hide a lot of the limitations of native AI execution.

Fast AI PC Techniques

What optimization techniques will be needed to run an AI model natively on a GPU-less or low-end GPU system? This remains to be seen, since the state-of-the-art is not there yet.

One likely answer: multiple techniques. It's probably going to be a combination of multiple orthogonal inference optimization techniques. Models will need to be both smaller and faster.

To make the models smaller, some of the techniques for "model compression" include:

To make the inference algorithms run faster, there are various alternative strategies vying for attention in the research:

And orthogonal to these higher-level AI software methods, there will need to be underlying capabilities including:

  • Hardware acceleration support (i.e. hardware-aware software optimizations)
  • Deep learning compiler optimizations

And floating above all that are some top-level performance considerations:

  • Hybrid multi-AI synchronization methods (i.e., ensemble methods, big-little, swarm/multi-mini-model, etc.)
  • AI-aware heuristic methods
  • Use-case-specific optimizations (e.g. document summarization versus search versus chatbot question-and-answer)

Putting all of that together looks like some kind of fun. Nobody's there yet. It's far from clear which is the best combination of techniques.

Articles and Announcements for AI PCs

Various PR and press articles have started pushing "AI PCs" as a new segment.

Research on PC Execution of LLMs

Desktop PCs are considered to be "edge" platforms in the AI literature (along with phones and IoT devices). Research papers specifically on PC execution of AI Models:

On-Device inference

For more about on-device inference on PCs and phones, see on-device inference research.

More AI Research

Read more about: