Aussie AI

AI PC Research

Last Updated 29 August, 2025

by David Spuler, Ph.D.

AI models and applications are set to make PC's hot again (see also GenAI market research). The next generation of PCs will likely run some AI models natively, and there will also be hybrid architectures with AI workloads sent into the cloud. It is early days in this trend, but it's surely going to be a major technology driver for years.

Our main research interest in relation to "AI PCs" is optimization of inference algorithms, so that the models can run fast enough. This includes execution of AI inference on CPU-only PCs and low-end GPU-based PCs that are available.

Fast LLMs on Your PC or Laptop

A desktop PC or laptop is more capable than a phone, so some of issues about phones running AI inference are less problematic on a PC. Most obviously, a PC can have a decent GPU, which can be used by AI engines. Concerns about CPU usage, over-heating, and battery depletion are less problematic on a PC.

The first generation is likely to be "AI Developer PCs". Software developers typically have high-end PCs, and various AI models can be run on desktop PCs. However, execution speed is still rather sluggish for large models, even on multi-thousand dollar PCs with powerful GPUs, so there is much research still to be done on optimization of inference. Large models are where the action is at in terms of AI functionality, so it may be that software developers are still using cloud server AI for some time to come. And certainly, training and fine-tuning workloads seem less likely to move down onto desktop PCs.

But "AI PCs" are already in the works for everyday users. For end user applications, the model still has to run fast to give the user a decent response time, so there are still some significant obstacles before AI models will appear widespread on non-developer PCs. However, hybrid architectures where some AI execution is still uploaded to the cloud will likely hide a lot of the limitations of native AI execution.

Fast AI PC Techniques

What optimization techniques will be needed to run an AI model natively on a GPU-less or low-end GPU system? This remains to be seen, since the state-of-the-art is not there yet.

One likely answer: multiple techniques. It's probably going to be a combination of multiple orthogonal inference optimization techniques. Models will need to be both smaller and faster.

To make the models smaller, some of the techniques for "model compression" include:

To make the inference algorithms run faster, there are various alternative strategies vying for attention in the research:

Faster Transformer architectures
Multi-axis dynamic pruning (e.g. combining depth pruning, width pruning, length pruning, etc.)
Dynamic inference optimizations (e.g. loop optimizations, early-exit)
Integer-only arithmetic models (e.g. integer-only quantization, approximation methods)
Zero-multiplication algorithms (e.g. adder models, shift models, log models)
Faster attention algorithms (e.g. Flash attention, non-autoregression, and/or head pruning)

And orthogonal to these higher-level AI software methods, there will need to be underlying capabilities including:

Hardware acceleration support (i.e. hardware-aware software optimizations)
Deep learning compiler optimizations

And floating above all that are some top-level performance considerations:

Hybrid multi-AI synchronization methods (i.e., ensemble methods, big-little, swarm/multi-mini-model, etc.)
AI-aware heuristic methods
Use-case-specific optimizations (e.g. document summarization versus search versus chatbot question-and-answer)

Putting all of that together looks like some kind of fun. Nobody's there yet. It's far from clear which is the best combination of techniques.

Articles and Announcements for AI PCs

Various PR and press articles have started pushing "AI PCs" as a new segment.

Michael Kan, July 2023, Intel CEO: Get Ready for the 'AI PC', PCMag UK, https://uk.pcmag.com/laptops/147984/intel-ceo-get-ready-for-the-ai-pc
Intel, May 23, 2023, AI Coming to the PC at Scale, https://www.intel.com/content/www/us/en/newsroom/news/ai-coming-to-pc-at-scale.html
Simon Sharwood, May 2023, Intel says AI is overwhelming CPUs, GPUs, even clouds – so all Meteor Lakes get a VPU, The Register, https://www.theregister.com/2023/05/29/vpus_all_meteork_lake_skus/
David Meyer, August 31, 2023, A.I. and big market shifts are making PCs interesting again, Fortune, https://fortune.com/2023/08/30/ai-pc-idc-demand-growth-windows-10/
Julie Coleman, May 30, 2023, HP Inc. CEO says A.I. will enable a new kind of PC, which could release in 2024, Mad Money with Jim Cramer, CNBC, https://www.cnbc.com/2023/05/30/hp-inc-ceo-ai-will-enable-a-new-kind-of-pc-could-launch-in-2024.html
Mark Hachman, Jan 9th, 2023, Intel and AMD are building AI into PCs. It doesn’t matter yet—but it will, PC World, https://www.pcworld.com/article/1447856/ai-pcs-should-be-the-trend-that-begins-in-2023.html
Mark Hachman, Sep 8th, 2022, Intel’s futuristic Meteor Lake CPUs will focus on ‘core AI capabilities’, PC World, https://www.pcworld.com/article/1076150/intel-confirms-ai-improvements-will-come-in-meteor-lake.html
Jesse Clayton, May 23, 2023, NVIDIA and Microsoft Drive Innovation for Windows PCs in New Era of Generative AI, NVIDIA Blog, https://blogs.nvidia.com/blog/2023/05/23/microsoft-build-nvidia-ai-windows-rtx/
Simon Sharwood, Sep 2023, Desktop AI isn’t happening, says AMD, and might not for quite a while, The Register, https://www.theregister.com/2023/09/19/amd_desktop_ai_futures/
Darren Allan, Sep 27, 2023, If you wanted an Intel Meteor Lake CPU for your next desktop PC, we’ve got some bad news, TechRadar, https://www.msn.com/en-us/news/technology/if-you-wanted-an-intel-meteor-lake-cpu-for-your-next-desktop-pc-we-ve-got-some-bad-news/ar-AA1hkJMQ
IDC, 28 Aug 2023, Global PC Shipments Expected to Return to Growth in 2024 Albeit Below 2019 Pre-Pandemic Levels, According to IDC, https://www.idc.com/getdoc.jsp?containerId=prUS51184723
Gartner, July 11, 2023, Gartner Says Worldwide PC Shipments Declined 16.6% in Second Quarter of 2023, https://www.gartner.com/en/newsroom/press-releases/2023-07-11-gartner-says-worldwide--pc-shipments-declined-16-percent-in-second-quarter-of-2023
Christian Guyton, John Loeffler, October 20, 2022, Intel Core i9-13900K review: the most powerful consumer processor ever, TechRadar, https://www.techradar.com/reviews/intel-core-i9-13900k (Intel Raptor Lake CPUs.)
Anton Shilov, April 11, 2021, New Algorithm Makes CPUs 15 Times Faster Than GPUs in Some AI Work, Tom's Hardware, https://www.tomshardware.com/news/cpu-vs-gpu-ai-performance-uplift-with-optimizations
Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Yong Wu, Sameh Gobriel, Charlie Tai, Anshumali Shrivastava, Mar 2021, Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More, https://arxiv.org/abs/2103.10891, Code: https://github.com/RUSH-LAB/SLIDE (Fast training on CPUs using AVX-512 and locality-sensitive hashing of vectors.)
PyTorch Edge Team, October 17, 2023, PyTorch Edge: Enabling On-Device Inference Across Mobile and Edge Devices with ExecuTorch, https://pytorch.org/blog/pytorch-edge/
Andy Patrizio, 12 Apr 2024, The desktop processor market is suddenly hot again, https://www.computerworld.com/article/2086948/desktop-processor-market-suddenly-hot-again.html
David Linthicum, Jan 16, 2024, Do you need GPUs for generative AI systems? InfoWorld, https://www.infoworld.com/article/3712134/do-you-need-gpus-for-generative-ai-systems.html

Research on PC Execution of LLMs

Desktop PCs are considered to be "edge" platforms in the AI literature (along with phones and IoT devices). Research papers specifically on PC execution of AI Models:

Huma Abidi, Chandan Damannagari, "AI inference acceleration on CPUs", Intel/VentureBeat, December 9, 2021, https://venturebeat.com/ai/ai-inference-acceleration-on-cpus/.
Simon Willison, "Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp", TILs, March 2023, https://github.com/simonw/til/blob/main/llms/llama-7b-m2.md
Umang Sharan, Running Llama on M2 Macbook, March 15, 2023, https://www.umangsh.com/blog/running-llama-on-m2-macbook/
Katyanna Quach, "Small custom AI models are cheap to train and can keep data private, says startup", The Register, 22 June 2023, https://www.theregister.com/2023/06/22/small_custom_ai_models/
Julien Simon, "Smaller is better: Q8-Chat, an efficient generative AI experience on Xeon", May 16th 2023, https://huggingface.co/blog/generative-ai-models-on-intel-cpu
Chellammal Surianarayanan, John Jeyasekaran Lawrence, Pethuru Raj Chelliah, Edmond Prakash, Chaminda Hewage, "A Survey on Optimization Techniques for Edge Artificial Intelligence (AI)", Sensors, Volume 3, Issue 3, 23, 1279, January 2023, https://www.mdpi.com/1424-8220/23/3/1279
Jarred Walton, "How to Run a ChatGPT Alternative on Your Local PC", March 19th, 2023, Tom's Hardware, https://www.tomshardware.com/news/running-your-own-chatbot-on-a-single-gpu
V. Vanhoucke, A. Senior, and M. Z. Mao, Improving the speed of neural networks on CPUs, In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, volume 1, page 4, 2011, https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.308.2766
Dave Dice, Alex Kogan, Optimizing Inference Performance of Transformers on CPUs, Feb 2021, https://arxiv.org/abs/2102.06621
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim M. Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, and Peizhao Zhang. Machine Learning at Facebook: Understanding Inference at the Edge. In IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 331–344, 2019, https://research.facebook.com/publications/machine-learning-at-facebook-understanding-inference-at-the-edge/
Morgan Funtowicz, Scaling up BERT-like model Inference on modern CPU - Part 1, April 2021, https://huggingface.co/blog/bert-cpu-scaling-part-1
Shufan Wu, Tao Lv, Pengxin Yuan, Patric Zhao, Jason Ye, and Haibin Lin, Optimization for BERT Inference Performance on CPU, Sep 2019, https://medium.com/apache-mxnet/optimization-for-bert-inference-performance-on-cpu-3bb2413d376c
Emma Ning, Nathan Yan, Jeffrey Zhu, and Jason Li. Microsoft open sources breakthrough optimizations for transformer inference on GPU and CPU, Jan 2020, https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-source-optimizations-transformer-inference-gpu-cpu/
Jiarui Fang, Yang Yu, Chengduo Zhao, and Jie Zhou, Turbotransformers: An efficient GPU serving system for transformer models, CoRR, abs/2010.05680, 2020, https://arxiv.org/abs/2010.05680
Y. Wang, Q. Wang, and X. Chu, Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs, In IEEE Conferences on Green Computing and Communications (GreenCom), pages 323–331, 2020, https://ieeexplore.ieee.org/document/9291633
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang, Optimizing CNN Model Inference on CPUs, In Proc. of USENIX Annual Technical Conference (ATC), pages 1025–1040, 2019, https://arxiv.org/abs/1809.02697
Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu, Sep 2023, TinyLlama, Code: https://github.com/jzhang38/TinyLlama (Apache-licensed 1.1B "tiny" Llama model trained on 3T tokens.)
Lightning AI, 2023, Lit-GPT, https://github.com/Lightning-AI/lit-gpt (Apache licensed model for low-capacity requirements.)
Md. Maruf Hossain Shuvo, Syed Kamrul Islam, Jianlin Cheng, Bashir I. Morshed, "Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review", Proceedings of the IEEE, vol.111, no.1, pp.42-91, 2023. https://ieeexplore.ieee.org/document/9985008, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008
E Kristiani, CT Yang, KLP Nguyen, 2020, Optimization of deep learning inference on edge devices, 2020 International Conference on Pervasive Artificial Intelligence, https://ieeexplore.ieee.org/abstract/document/9302695
Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
Benj Edwards, March 14, 2023, You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi, Ars Technica, https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/
Benj Edwards, Sep 28, 2023, Jony Ive and OpenAI’s Altman reportedly collaborating on mysterious AI device, Ars Technica, https://arstechnica.com/information-technology/2023/09/jony-ive-and-openais-altman-reportedly-collaborating-on-mysterious-ai-device/
Oleksandr Kuvshynov, Oct 2023, Slow LLama, Code: https://github.com/okuvshynov/slowllama ("Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices")
Benjamin Marie, Sep 29, 2023, Run Llama 2 70B on Your GPU with ExLlamaV2, Towards Data Science, https://towardsdatascience.com/run-llama-2-70b-on-your-gpu-with-exllamav2-588141a88598
Computer World, 29 May 2024, In two years, 100% of enterprise PC purchases will be AI computers, https://www.computerworld.com/article/2130275/in-two-years-100-of-enterprise-pc-purchases-will-be-ai-computers.html
Dell Technologies, May 20, 2024, Dell Technologies Expands Dell AI Factory with NVIDIA to Turbocharge AI Adoption, PR Newswire, https://www.prnewswire.com/news-releases/dell-technologies-expands-dell-ai-factory-with-nvidia-to-turbocharge-ai-adoption-302150245.html
Djip007, May 2024, llamafile 0.8.6 CPU benchmark #450, https://github.com/Mozilla-Ocho/llamafile/discussions/450 (Running llamafile at 20 tokens per second on a non-GPU commodity CPU.)
Ken Yeung, May 21, 2024, Microsoft introduces Phi-Silica, a 3.3B parameter model made for Copilot+ PC NPUs, https://venturebeat.com/ai/microsoft-introduces-phi-silica-a-3-3b-parameter-model-made-for-copilot-pc-npus/
J Cañete, F Bravo-Marquez, 2024, Speedy Gonzales: A Collection of Fast Task-Specific Models for Spanish, https://felipebravom.com/publications/starsem2024.pdf (Optimizing small models on CPU and GPU for the Spanish language, mostly using distillation.)
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
Martin Thissen, April 20, 2024, Llama 3 on Your Local Computer | Free GPT-4 Alternative, https://medium.com/@martin-thissen/llama-3-on-your-local-computer-free-gpt-4-alternative-1f533e9abff7 (Llama3-70B with 4-bit quantization using vLLM for inference on NVIDIA RTX 6000 Ada GPU.)
Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
Intel, April 2024, Intel® Compiler First to Achieve SYCL* 2020 Conformance, https://www.intel.com/content/www/us/en/developer/articles/technical/compiler-first-full-sycl2020-conformance.html
Kif Leswing, April 9, 2024, Intel unveils latest AI chip as Nvidia competition heats up, CNBC, https://www.cnbc.com/2024/04/09/intel-unveils-gaudi-3-ai-chip-as-nvidia-competition-heats-up-.html (Intel Gaudi 3 chip for high-end datacenter usage, completing with NVIDIA H100.)
Siddhant Sahu, May 30, 2024, Beyond the Cloud: Distributed AI and On-Device Intelligence: Transition of AI workflows from cloud to the edge with specialized chip infrastructure & models, multi-modality and ambience across devices, https://sidstage.substack.com/p/beyond-the-cloud-distributed-ai-and
Andy Patrizio, 12 Apr 2024, The desktop processor market is suddenly hot again, https://www.computerworld.com/article/2086948/desktop-processor-market-suddenly-hot-again.html
Qualcomm, May 2023, The future of AI is hybrid, Qualcomm White Paper, https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Whitepaper-The-future-of-AI-is-hybrid-Part-1-Unlocking-the-generative-AI-future-with-on-device-and-hybrid-AI.pdf
David Spuler, Mar 30, 2024, Generative AI in C++: Coding Transformers and LLMs, Yoryck AI, https://www.amazon.com/Generative-AI-Coding-Transformers-LLMs-ebook/dp/B0CXJKCWX9/
Jaskirat Singh, Bram Adams, Ahmed E. Hassan, 25 Mar 2024, On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance, https://arxiv.org/abs/2403.17154 (MLOps deployment for quantization, partitioning and early-exit across mobile, edge, and cloud platforms, including running early exit on mobile.)
Sergio De Simone, Apple Extends Core ML, Create ML, and Vision Frameworks for iOS 17, JUL 03, 2023, https://www.infoq.com/news/2023/07/coreml-createml-vision-ios-17/
Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen 2023, PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU, https://arxiv.org/abs/2312.12456 Code: https://github.com/SJTU-IPADS/PowerInfer (Computes a GPU-CPU hybrid engine with some "active" neurons run on the GPU and other less "hot" neurons on the CPU, which is akin to adaptive inference on the width dimension.)
Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng, Dec 2023, Efficient LLM Inference on CPUs, Intel, NeurIPS 2023, https://arxiv.org/abs/2311.00502 Code: https://github.com/intel/intel-extension-for-transformers
Tom Warren, April 9, 2024, Microsoft is confident Windows on Arm could finally beat Apple, The Verge, https://www.theverge.com/2024/4/8/24116587/microsoft-macbook-air-surface-arm-qualcomm-snapdragon-x-elite
Steve Dent, Thu, Mar 28, 2024, Microsoft Copilot AI will soon run locally on PCs, https://www.engadget.com/microsoft-copilot-ai-will-soon-run-locally-on-pcs-130642514.html
AMD AI Staff, How to run a Large Language Model (LLM) on your AMD Ryzen™ AI PC or Radeon Graphics Card, March 2024, AMD Blog, https://community.amd.com/t5/ai/how-to-run-a-large-language-model-llm-on-your-amd-ryzen-ai-pc-or/ba-p/670709
Ramine_Roane, 6 Dec, 2023, Enabling AI PCs with Ryzen AI Software, AMD Blog, https://community.amd.com/t5/ai/enabling-ai-pcs-with-ryzen-ai-software/ba-p/648665
Lucas Mearian, 21 Mar 2024, Microsoft integrates its Copilot chatbot on new devices https://www.computerworld.com/article/2071480/microsoft-integrates-its-copilot-chatbot-across-entire-product-line.html (New Surface laptops with support for ChatGPT-based Copilot.)
Sharon Machlis, March 28, 2024, 5 easy ways to run an LLM locally, InfoWorld, https://www.infoworld.com/article/3705035/5-easy-ways-to-run-an-llm-locally.html
Venkatraman Iyer, Sungho Lee, Semun Lee, Juitem Joonwoo Kim, Hyunjun Kim, Youngjae Shin, 12 December 2023, Automated Backend Allocation for Multi-Model, On-Device AI Inference, Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 7, Issue 3, Article No.: 62, pp 1–33, https://doi.org/10.1145/3626793 https://dl.acm.org/doi/abs/10.1145/3626793
Jeff Butts, Feb 16th, 2023, What Is the Apple Neural Engine and What Does It Do? https://www.macobserver.com/tips/deep-dive/what-is-apple-neural-engine/
Semaphore, Dec 14, 2023, 6 Ways to Run LLMs Locally, https://semaphoreci.medium.com/6-ways-to-run-llms-locally-fa25be0797e5 (The six ways are HF Transformers, LangChain, Llama.cpp, Llamafile, Ollama, and GPT4All.)
Benj Edwards, 2/22/2024, Google goes “open AI” with Gemma, a free, open-weights chatbot family, Gemma chatbots can run locally, and they reportedly outperform Meta's Llama 2. Ars Technica, https://arstechnica.com/information-technology/2024/02/google-goes-open-ai-with-gemma-a-free-open-weights-chatbot-family/
Murray Kornelsen, April 2023, Low-Latency BERT Inference for Heterogeneous Multi-Processor Edge Devices, Department of Electrical & Computer Engineering, McGill University, Canada https://escholarship.mcgill.ca/downloads/m326m732p
Dell is refreshing its popular XPS laptop line with all the AI features (and they still look good). https://www.zdnet.com/article/dell-is-refreshing-its-popular-xps-laptop-line-with-all-the-ai-features-and-they-still-look-good/
Gavin Li, Nov 19, 2023, Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique, AI Advances https://ai.gopubby.com/unbelievable-run-70b-llm-inference-on-a-single-4gb-gpu-with-this-new-technique-93e2057c7eeb
Paul Thurrott, October 5, 2023, HP: AI Will Transform the PC Into a Personal Companion, https://www.thurrott.com/hardware/290462/hp-ai-will-transform-the-pc-into-a-personal-companion
Jesse Clayton, Kedar Potdar and Annamalai Chockalingam, Jun 02, 2024, Streamline Development of AI-Powered Apps with NVIDIA RTX AI Toolkit for Windows RTX PCs, NVIDIA Technical Blog, https://developer.nvidia.com/blog/streamline-ai-powered-app-development-with-nvidia-rtx-ai-toolkit-for-windows-rtx-pcs/
MWU Rahman, MM Abrar, HG Copening, S Hariri, Oct 2023, Quantized Transformer Language Model Implementations on Edge Devices, https://arxiv.org/pdf/2310.03971.pdf (Uses a "FlatBuffer" format on TensorFlow-Lite.)
H Dai, X Peng, X Shi, L He, Q Xiong, H Jin, 2022, Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment, Science China Information Sciences volume 65, Article number: 112103 (2022), https://link.springer.com/article/10.1007/s11432-020-3182-1 http://scis.scichina.com/en/2022/112103.pdf
Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu, 14 Jun 2024, GEB-1.3B: Open Lightweight Large Language Model, https://arxiv.org/abs/2406.09900 Code: https://huggingface.co/GEB-AGI/geb-1.3b
David Spuler, March 2024, Chapter 4. AI on Your Desktop, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Intel, Apr 25, 2024, Deployment of Llama3 on Your AI PC with OpenVINO™, https://medium.com/openvino-toolkit/deployment-of-llama3-on-your-ai-pc-with-openvino-b58e961501d6
y Matthew Finnegan 14 Jun 2024, Microsoft delays Recall launch amid privacy concerns, ComputerWorld, https://www.computerworld.com/article/2147736/microsoft-delays-recall-launch-amid-privacy-concerns.html
Steve Kovach, June 19 2024, Microsoft AI PCs take aim at Apple: CNBC’s Steve Kovach reports on news from Microsoft, CNBC, https://www.cnbc.com/video/2024/06/18/microsoft-ai-pcs-aim-at-apple.html
Aniket Hingane, May 23, 2024, A New AI Era in PC Begins : AI Agent Computers, https://ai.plainenglish.io/a-new-ai-era-in-pc-begins-ai-agent-computers-d6210a8f1b48
Esther Shein Jul 9 2024, Anticipating the Year of the AI PC, https://cacm.acm.org/news/anticipating-the-year-of-the-ai-pc/
Dmitriy Pastushenkov, Ria Cheruvu, Max Domeika, Paula Ramos, Apr 20, 2024, AI is coming to the PC — AI PC Essentials, https://medium.com/openvino-toolkit/ai-is-coming-to-the-pc-ai-pc-essentials-ba2aa8686a59
Jason Perlow, Aug. 6, 2024, How to run dozens of AI models on your Mac or PC - no third-party cloud needed, https://www.zdnet.com/article/how-to-run-dozens-of-ai-models-on-your-mac-or-pc-no-third-party-cloud-needed/
Gavin Li, August 3rd, 2024, Crazy Challenge: Run Llama 405B on a 8GB VRAM GPU, https://ai.gopubby.com/crazy-challenge-run-llama-405b-on-a-8gb-vram-gpu-ab5a280a3889 (Run Llama's 405B model on a low-end GPU via 4-bit quantization and layer-by-layer inference, both to save memory.)
Vince Lam, Mar 12, 2024, 50+ Open-Source Options for Running LLMs Locally, https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
Sujeet Kumar, May 20, 2024, 14 Best Software for Running local LLM, https://scifilogic.com/interface-for-running-local-llm/
Sean Hollister, Sep 4, 2024, Intel reveals first Lunar Lake laptop CPUs: everything you need to know, https://www.theverge.com/2024/9/3/24233957/intel-lunar-lake-core-ultra-200v-launch
Michael Nuñez, September 13, 2024, Microsoft’s Windows Agent Arena: Teaching AI assistants to navigate your PC, https://venturebeat.com/ai/microsofts-windows-agent-arena-teaching-ai-assistants-to-navigate-your-pc/
Steve Kovach, Sep 5 2024, AI gadgets have been a bust so far. Apple aims to change that, https://www.cnbc.com/2024/09/05/ai-gadgets-have-been-a-bust-so-far-apple-aims-to-change-that.html
Amos Gyamfi, Aug 28, 2024, The 6 Best LLM Tools To Run Models Locally, https://medium.com/@amosgyamfi/the-6-best-llm-tools-to-run-models-locally-eedd0f7c2bbd
Michael Nuñez, October 16, 2024, Mistral AI’s new language models bring AI power to your phone and laptop, https://venturebeat.com/business/mistral-ai-new-language-models-bring-ai-power-to-your-phone-and-laptop/
OpenVINO™ toolkit, Oct 1, 2024, How to run Llama 3.2 locally with OpenVINO™, https://medium.com/openvino-toolkit/how-to-run-llama-3-2-locally-with-openvino-60a0f3674549
Lucas Mearian, 24 Oct 2024, 2025: The year of the AI PC, Computer World, https://www.computerworld.com/article/3583355/2025-the-year-of-the-ai-pc.html
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Chris Wellons, November 10, 2024, Everything I've learned so far about running local LLMs, https://nullprogram.com/blog/2024/11/10/
Justine, Apr 2023, Edge AI Just Got Faster, https://justine.lol/mmap/ (Loading models using mmap.)
Emilia David, November 14, 2024, OpenAI launches ChatGPT desktop integrations, rivaling Copilot, https://venturebeat.com/ai/openai-launches-chatgpt-desktop-integrations-rivaling-copilot/
Simon Willison, Dec 2024, I can now run a GPT-4 class model on my laptop. Meta’s new Llama 3.3 70B is a genuinely GPT-4 class Large Language Model that runs on my laptop. https://simonwillison.net/2024/Dec/9/llama-33-70b/
Anthony Fei, Mohamed S. Abdelfattah, 15 Dec 2024, NITRO: LLM Inference on Intel Laptop NPUs, https://arxiv.org/abs/2412.11053 https://github.com/abdelfattah-lab/nitro
Steven Vaughan-Nichols, Jan. 10, 2025, Thanks to Nvidia, there's a new generation of PCs coming, and they'll be running Linux. OK, maybe you wouldn't pay three grand for a Project DIGITS PC. But what about a $1,000 Blackwell PC from Acer, Asus, or Lenovo? https://www.zdnet.com/article/thanks-to-nvidia-theres-a-new-generation-of-pcs-coming-and-theyll-be-running-linux/
Kyle Wiggers, January 23, 2025, Hugging Face claims its new AI models are the smallest of their kind, https://techcrunch.com/2025/01/23/hugging-face-claims-its-new-ai-models-are-the-smallest-of-their-kind/
OpenVINO™ toolkit, Nov 22, 2024, How to generate images locally on AI PC with OpenVINO GenAI API, https://medium.com/openvino-toolkit/how-to-generate-images-locally-on-ai-pc-with-openvino-genai-api-220d08370958
OpenVINO™ toolkit, Dec 10, 2024 How to create a multimodal chatbot locally on AI PC with OpenVINO™ GenAI API, https://medium.com/openvino-toolkit/how-to-create-a-multimodal-chatbot-locally-on-ai-pc-with-openvino-genai-api-58382be8b242
Radhika Rajkumar, Jan. 30, 2025, Mistral AI says its Small 3 model is a local, open-source alternative to GPT-4o mini. The new 24B-parameter LLM 'excels in scenarios where quick, accurate responses are critical.' In fact, the model can be run on a MacBook with 32GB RAM. https://www.zdnet.com/article/mistral-ai-says-its-small-3-model-is-a-local-open-source-alternative-to-gpt-4o-mini/
hannibal27, Feb 2025, mistral-small-24b-instruct-2501 is simply the best model ever made, https://www.reddit.com/r/LocalLLaMA/comments/1ig2cm2/mistralsmall24binstruct2501_is_simply_the_best/
Jack Wallen, Feb. 13, 2025, How I feed my files to a local AI for better, more relevant responses Msty is one of the best apps for interacting with the Ollama local AI tool and it contains a feature you'll want to use to help provide contextuality to its responses. https://www.zdnet.com/article/how-i-feed-my-files-to-a-local-ai-for-better-more-relevant-responses/
Sabri Eyuboglu, Dan Biderman, Avanika Narayan, Feb 24, 2025, Minions: the rise of small, on-device LMs: Embracing small LMs, shifting compute on-device, and cutting cloud costs in the process, https://hazyresearch.stanford.edu/blog/2025-02-24-minions
Pradeep Viswanathan @pradeepviswav, Mar 3, 2025 , Microsoft brings DeepSeek 7B and 14B AI models to Copilot+ PCs, https://www.neowin.net/news/microsoft-brings-deepseek-7b-and-14b-ai-models-to-copilot-pcs/
Michael Nuñez, March 24, 2025, DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI, https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
Andrew Zuo, Apr 29, 2025, Stop Trying To Make Local LLMs Happen, https://andrewzuo.com/stop-trying-to-make-local-llms-happen-06ba63fb53e2
Evangelos Georganas, Dhiraj Kalamkar, Alexander Heinecke, 8 Aug 2025, Pushing the Envelope of LLM Inference on AI-PC, https://arxiv.org/abs/2508.06753
Zachary T. Rewolinski and Bin Yu, 18 Jun 2025, PCS Workflow for Veridical Data Science in the Age of AI, https://arxiv.org/abs/2508.00835