Aussie AI
Edge Computing
-
Last Updated 8 December, 2024
-
by David Spuler, Ph.D.
Edge Computing is the name researchers use for running computations on various low-resource devices. The devices on the "edge" are "close" to the user, but "far away" from the bigger servers in the cloud. The goal is therefore to run machine learning code on these smaller devices. Examples of such edge devices include:
- Smartphones (see AI Smartphones)
- Desktops and laptops
- Cars (e.g. autonomous self-driving cars)
- Video cameras (e.g. security cameras)
- Internet of Things (IoT) devices (e.g. industrial devices, refrigerators, network stations, etc.)
Running AI models on edge devices usually means inference only, because the small devices usually cannot support the cost of training in terms of processing power and/or storage. However, there is some research into "on-device training."
Many architectures that use edge computing involve multiple machines, with at least two being the edge device and a main server. Hence, much of the research into ensemble methods such as distributed inference is also relevant.
Survey Papers on Edge Computing
- Praveen Joshi, Mohammed Hasanuzzaman, Chandra Thapa, Haithem Afli, Ted Scully, "Enabling All In-Edge Deep Learning: A Literature Review", IEEE Access, vol.11, pp.3431-3460, 2023. https://ieeexplore.ieee.org/document/10007810, https://arxiv.org/abs/2204.03326 (Extensive survey of edge computing, including deployment architectures and optimizations.)
- Kah Phooi Seng, Li-Minn Ang, "Embedded Intelligence: State-of-the-Art and Research Challenges", IEEE Access, vol.10, pp.59236-59258, 2022. https://ieeexplore.ieee.org/document/9775683, PDF: https://research.usc.edu.au/esploro/outputs/99640278002621
- X Wang, J Li, Z Ning, Q Song, L Guo, S Guo, July 2023, Wireless powered mobile edge computing networks: A survey, ACM Computing Surveys, Volume 55, Issue 13s, Article No. 263, pp 1–37, https://dl.acm.org/doi/abs/10.1145/3579992 PDF: http://101.43.59.126/static/53.Wireless_Powered_Mobile_Edge_Vomputing_Networks_A_Survey.pdf
- H Hua, Y Li, T Wang, N Dong, W Li, J Cao, 2023, Edge computing with artificial intelligence: A machine learning perspective, ACM Computing Surveys, https://dl.acm.org/doi/abs/10.1145/3555802 PDF: https://dl.acm.org/doi/pdf/10.1145/3555802
- HJ Damsgaard, A Ometov, J Nurmi, 2023, ACM Computing Surveys, Approximation Opportunities in Edge Computing Hardware: A Systematic Literature Review https://dl.acm.org/doi/abs/10.1145/3572772, PDF: https://dl.acm.org/doi/pdf/10.1145/3572772
- Tian Wang, Yuzhu Liang, Xuewei Shen, Xi Zheng, Adnan Mahmood, Quan Z. Sheng, 2023, Edge Computing and Sensor-Cloud: Overview, Solutions, and Directions, ACM Computing Surveys, Volume 55, Issue 13s, Article No.: 281, pp 1–37, https://dl.acm.org/doi/abs/10.1145/3582270, PDF: http://web.science.mq.edu.au/~qsheng/papers/CSUR-edge.pdf
- Y Mao, C You, J Zhang, K Huang, 2017, Mobile edge computing: Survey and research outlook, ACM Computing Surveys, Volume 55, Issue 13s, Article No.: 281, pp 1–37, PDF: https://www.researchgate.net/profile/Changsheng-You/publication/312061424_Mobile_Edge_Computing_Survey_and_Research_Outlook/links/5c22f648a6fdccfc70690a30/Mobile-Edge-Computing-Survey-and-Research-Outlook.pdf
- Wazir Zada Khan, Ejaz Ahmed, Saqib Hakak, Ibrar Yaqoob, Arif Ahmed, 2019, Edge computing: A survey Future Generation Computer Systems, Volume 97, August 2019, Pages 219-235, https://www.sciencedirect.com/science/article/abs/pii/S0167739X18319903, PDF: https://www.researchgate.net/profile/Ibrar_Yaqoob/publication/331362529_Edge_computing_A_survey/links/5ca33dcca6fdcc12ee8c3a2a/Edge-computing-A-survey.pdf
- Nasir Abbas; Yan Zhang; Amir Taherkordi; Tor Skeie, 2018, Mobile edge computing: A survey, IEEE Internet of Things Journal, Volume 5, Issue 1, February 2018, https://ieeexplore.ieee.org/document/8030322, https://www.duo.uio.no/bitstream/handle/10852/65081/Nasir_Abbas_Thesis.pdf?sequence=1
- Yuyi Mao; Changsheng You; Jun Zhang; Kaibin Huang; Khaled B. Letaief, 2017, A survey on mobile edge computing: The communication perspective IEEE Communications Surveys & Tutorials, Volume 19, Issue 4, Fourthq uarter 2017, https://ieeexplore.ieee.org/document/8016573, PDF: https://arxiv.org/pdf/1701.01090
- Fang Liu, Guoming Tang, Youhuizi Li, Zhiping Cai, Xingzhou Zhang, Tongqing Zhou, 2019, A survey on edge computing systems and tools, Proceedings of the IEEE, Volume 107, Issue 8, August 2019, https://ieeexplore.ieee.org/abstract/document/8746691/, https://arxiv.org/pdf/1911.02794 (Includes a survey of open source edge computing projects and edge ML frameworks in 2019.)
- Xiaofei Wang, Yiwen Han, Victor C.M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen, 2020, Convergence of Edge Computing and Deep Learning: A Comprehensive Survey, IEEE Communications Surveys & Tutorials, Volume: 22, Issue: 2, Secondquarter 2020, https://ieeexplore.ieee.org/abstract/document/8976180/, https://arxiv.org/abs/1907.08349
- Keyan Cao; Yefan Liu; Gongjie Meng; Qimeng Sun, 2020, An Overview on Edge Computing Research, IEEE Access, Volume 8, https://ieeexplore.ieee.org/document/9083958, PDF: https://ieeexplore.ieee.org/iel7/6287639/6514899/09083958.pdf (General survey of edge computing, not specific to ML.)
- Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, Dimitrios S. Nikolopoulos, 2016, Challenges and opportunities in edge computing, 2016 IEEE International Conference on Smart Cloud (SmartCloud), https://ieeexplore.ieee.org/abstract/document/7796149/, https://arxiv.org/pdf/1609.01967 (General edge computing theory, not specific to ML.)
- J Chen, X Ran, 2019, Deep learning with edge computing: A review, Proceedings of the IEEE, Volume 107, Issue 8, August 2019, https://ieeexplore.ieee.org/abstract/document/8763885/, PDF: https://ieeexplore.ieee.org/ielaam/5/8789751/8763885-aam.pdf
- A Bourechak, O Zedadra, MN Kouahla, A Guerrieri, 2023, At the Confluence of Artificial Intelligence and Edge Computing in IoT-Based Applications: A Review and New Perspectives, Sensors, https://www.mdpi.com/1424-8220/23/3/1639, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9920982/
- Linghe Kong, Jinlin Tan, Junqin Huang, Guihai Chen, Shuaitian Wang, Xi Jin, Peng Zeng, Muhammad Khan, Sajal K. Das, 2022, Edge-computing-driven internet of things: A survey, ACM Computing Surveys, Volume 55, Issue 8, Article No. 174, pp 1–41, https://dl.acm.org/doi/abs/10.1145/3555308, PDF: https://huangjunqin.com/papers/KongCSUR2022Edge.pdf
- M Lee, S Lee, T Kim, 2023, Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms, IEMEK Journal of Embedded Systems and Applications, https://koreascience.kr/article/JAKO202325643250869.page, PDF https://koreascience.kr/article/JAKO202325643250869.pdf (Abstract in English, paper In Korean.)
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Othmane Friha, Mohamed Amine Ferrag, Burak Kantarci, Burak Cakmak, Arda Ozgun, Nassira Ghoualmi-Zine, 2024, LLM-based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10669603
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- Kailai Sun, Xinwei Wang, Xi Miao, Qianchuan Zhao, Oct 2024, A review of AI edge devices and lightweight CNN and LLM deployment, Neurocomputing, Volume 614, 2025, 128791, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2024.128791 https://www.sciencedirect.com/science/article/abs/pii/S0925231224015625
Research on Edge Computing
There are plenty of papers on edge computing to choose from:
- Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
- Benj Edwards, March 14, 2023, You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi, Ars Technica, https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/
- Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
- Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
- Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, Jan. 2020. doi:10.1109/TWC.2019.2946140, https://arxiv.org/abs/1910.05316
- Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, pages 296–308. Springer, 2020, https://arxiv.org/abs/2008.05124
- Tao Ge, Si-Qing Chen, and Furu Wei. 2022. EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10786– 10798, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics, https://arxiv.org/abs/2202.07959
- Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2021. ProFormer: Towards On-Device LSH Projection Based Transformers. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2823– 2828, Online. Association for Computational Linguistics. https://arxiv.org/abs/2004.05801
- F Manca, F Ratto, 2023, ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs arXiv preprint arXiv:2309.13321, https://arxiv.org/pdf/2309.13321.pdf (Approximation techniques applied to edge computing.)
- Pierre-Emmanuel Novac, March 2023, MicroAI: Embedded Artificial Intelligence for Human Activity Recognition on Smart Glasses, Ph.D. Thesis, Artificial Intelligence. Université Côte d’Azur, https://theses.hal.science/tel-04049008/document (Quantization in smart glasses device.)
- R Snytsar, Oct 2023, Accelerating Machine Learning Primitives on Commodity Hardware, arXiv preprint arXiv:2310.05218, https://arxiv.org/pdf/2310.05218.pdf (Uses the "sliding window" technique to optimize general matrix multiplication on edge devices.)
- GY Lee, T Dam, MM Ferdaus, DP Poenar, VN Duong, Oct 2023, Unlocking the capabilities of explainable fewshot learning in remote sensing, https://arxiv.org/pdf/2310.08619.pdf
- PyTorch Edge Team, October 17, 2023, PyTorch Edge: Enabling On-Device Inference Across Mobile and Edge Devices with ExecuTorch, https://pytorch.org/blog/pytorch-edge/
- Junho Wohn, February 2024, Optimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices, Thesis for the Master of Science, Graduate School of Hanyang University, https://repository.hanyang.ac.kr/handle/20.500.11754/188388, PDF: https://hanyang.dcollection.net/public_resource/pdf/200000726139_20240331200233.pdf (Compiles models using the TVM deep learning compiler and then partitions them across multiple edge devices for collaborative edge inference.)
- Zao Zhang, 23 May 2024, Design Efficient Deep Neural Networks with System Optimization, Ph.D. Thesis, School of Electrical and Information Engineering, Faculty of Engineering, The University of Sydney, Australia, PDF: https://ses.library.usyd.edu.au/bitstream/handle/2123/32642/zhang_z_thesis.pdf?sequence=1&isAllowed=y https://ses.library.usyd.edu.au/handle/2123/32642 https://hdl.handle.net/2123/32642
- Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım, 16 May 2024, Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems, https://arxiv.org/abs/2405.10426
- Md Fahim Faysal Khan, May 2024, Constraint Driven Multimodal Edge Intelligence, Ph.D. Thesis, Electrical Engineering and Computer Science, Pennsylvania State University, https://etda.libraries.psu.edu/files/final_submissions/29680 (Layer-specific quantization levels for mixed-precision quantization.)
- Jeffrey Yu, Kartik Prabhu, Yonatan Urman, Robert M. Radway, Eric Han, Priyanka Raina, 27 April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, April 2024, Pages 5–21, https://doi.org/10.1145/3620666.3651368 https://dl.acm.org/doi/abs/10.1145/3620666.3651368
- Jiwei HUANG, Fangzheng LIU, and Jianbin ZHANG, “Multi-dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A Survey,” Chinese Journal of Electronics, vol. 33, no. 5, pp. 1–16, 2024 doi: 10.23919/cje.2023.00.264 shu https://cje.ejournal.org.cn/article/doi/10.23919/cje.2023.00.264 (Theory of benchmarking and evaluation of mobile edge computing.)
- Mikail Yayla, 2024, A vision for edge AI: ROBUST BINARIZED NEURAL NETWORKS ON EMERGING RESOURCE-CONSTRAINED HARDWARE Ph.D. Dissertation, Technischen Universität Dortmund, Fakultät Informatik, Dortmund 2024, http://129.217.131.68:8080/bitstream/2003/42431/1/Dissertation_Yayla.pdf (Binarized networks with consideration of both software and hardware issues.)
- Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni, 16 Apr 2024, Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration, https://arxiv.org/abs/2404.10733
- Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
- Seungtae Hong, Gunju Park, Jeong-Si Kim, 9 June 2024, Automated deep-learning model optimization framework for microcontrollers, https://doi.org/10.4218/etrij.2023-0522 https://onlinelibrary.wiley.com/doi/full/10.4218/etrij.2023-0522 (Framework for using quantization and pruning on microcontroller devices.)
- Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen, 27 May 2024, Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference, https://arxiv.org/abs/2405.17245
- Qualcomm, May 2023, The future of AI is hybrid, Qualcomm White Paper, https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Whitepaper-The-future-of-AI-is-hybrid-Part-1-Unlocking-the-generative-AI-future-with-on-device-and-hybrid-AI.pdf
- Guozhi Yan; Kai Liu; Chunhui Liu; Jie Zhang, 2024, Edge Intelligence for Internet of Vehicles: A Survey, IEEE Transactions on Consumer Electronics (Early Access), 18 March 2024, https://ieeexplore.ieee.org/abstract/document/10474509
- Daniel Situnayake, 24 January 2023, AI at the Edge: Solving Real-World Problems with Embedded Machine Learning, O'Reilly Media, Inc, USA, https://www.amazon.com/dp/1098120205/
- Jaskirat Singh, Bram Adams, Ahmed E. Hassan, 25 Mar 2024, On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance, https://arxiv.org/abs/2403.17154 (MLOps deployment for quantization, partitioning and early-exit across mobile, edge, and cloud platforms, including running early exit on mobile.)
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- P Dong, L Lu, C Wu, C Lyu, G Yuan, H Tang, Y Wang, 2023, PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile, https://openreview.net/pdf?id=N56hAiQvot Code: https://github.com/PeiyanFlying/PackQViT
- Bingkun Lai, Jinbo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie, 19 Dec 2023, Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study, https://arxiv.org/abs/2312.12063
- Mohammed Ayyat; Tamer Nadeem; Bartosz Krawczyk, Dec 2023, ClassyNet: Class-Aware Early Exit Neural Networks for Edge Devices, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10365527
- Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen, Dec 2023, PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU https://arxiv.org/abs/2312.12456 Code: https://github.com/SJTU-IPADS/PowerInfer
- Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar, Dec 2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory Apple Research, https://arxiv.org/abs/2312.11514
- X Li, S Chen, S Zhang, L Hou, Y Zhu, Z Xiao, 2023, Human Activity Recognition Using IR-UWB Radar: A Lightweight Transformer Approach, IEEE Geoscience and Remote Sensing Letters (Early Access), https://ieeexplore.ieee.org/document/10247554
- Ali Rahmanian, Doctoral Thesis, April 2024, Edge Orchestration for Latency-Sensitive Applications, Department of Computing Science, Umea University, Sweden, https://www.diva-portal.org/smash/get/diva2:1849510/FULLTEXT02.pdf
- Victor J.B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini, 3 Apr 2024, Optimizing the Deployment of Tiny Transformers on Low-Power MCUs, https://arxiv.org/abs/2404.02945 (Uses an approach called "Fused Weight Self-Attention" that fuses some of the QKV matrices and also tiling in multi-head attention, along with 8-bit integer quantization and integerized Softmax.)
- MMH Shuvo, SK Islam, J Cheng, Efficient acceleration of deep learning inference on resource-constrained edge devices: A review, 2022, Proceedings of the IEEE ( Volume: 111, Issue: 1, January 2023), pp 42 - 91, 14 December 2022 , https://ieeexplore.ieee.org/abstract/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008
- Minghao Yan, Hongyi Wang, Shivaram Venkataraman, 9 Jan 2024 (v2), PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices, https://arxiv.org/abs/2310.19991 (Faster inference with a focus on pipelining and scheduling of hardware acceleration.)
- 26 Feb 2024 (v2), From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges, Sai Krishna Revanth Vuruma, Ashley Margetts, Jianhai Su, Faez Ahmed, Biplav Srivastava, https://arxiv.org/abs/2402.12702
- Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
- Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
- Rohit Sharma, 9 July 2022 Introduction to TinyML, Independently published, https://www.amazon.com/Introduction-TinyML-Rohit-Sharma/dp/B0B5Q281L9/
- Semaphore, Dec 14, 2023, 6 Ways to Run LLMs Locally, https://semaphoreci.medium.com/6-ways-to-run-llms-locally-fa25be0797e5 (The six ways are HF Transformers, LangChain, Llama.cpp, Llamafile, Ollama, and GPT4All.)
- Zhepeng Wang, Isaacshubhanand Putla, Weiwen Jiang, Youzuo Lin, Oct 2023, Edge-InversionNet: Enabling Efficient Inference of InversionNet on Edge Devices, https://arxiv.org/abs/2310.09667 (Using structured pruning via layerwise filter pruning to run a model on a Raspberry Pi.)
- Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao, Nov 2023, TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices, https://arxiv.org/abs/2311.01759
- Yuyi Mao, Xianghao Yu, Kaibin Huang, Ying-Jun Angela Zhang, Jun Zhang, Dec 2023, Green Edge AI: A Contemporary Survey, https://arxiv.org/abs/2312.00333
- Murray Kornelsen, April 2023, Low-Latency BERT Inference for Heterogeneous Multi-Processor Edge Devices, Department of Electrical & Computer Engineering, McGill University, Canada https://escholarship.mcgill.ca/downloads/m326m732p
- Yifeng Wu; Xu He; Lingfei Mo; Qing Wang, Jan 2024, A Self-Attention-Assisted TinyML With Effective Representation for UWB NLOS Identification, IEEE Internet of Things Journal (Early Access), https://ieeexplore.ieee.org/abstract/document/10380220
- Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang, 5 Jan 2024, Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence, https://arxiv.org/abs/2401.02668 (Covers processing on cloud and edge servers in various configurations with communication between nodes for both training/fine-tuning and inference tasks.)
- C Gernigon, SI Filip, O Sentieys, C Coggiola, M Bruno, Oct 2023, Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing, https://hal.science/hal-04252197/document
- Y Liang, Z Wang, X Xu, Y Tang, Z Jie, J Lu, Oct 2023, MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory, arXiv preprint arXiv:2310.16898, https://arxiv.org/pdf/2310.16898.pdf
- MWU Rahman, MM Abrar, HG Copening, S Hariri, Oct 2023, Quantized Transformer Language Model Implementations on Edge Devices, https://arxiv.org/pdf/2310.03971.pdf (Uses a "FlatBuffer" format on TensorFlow-Lite.)
- H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
- PE Novac, G Boukli Hacene, A Pegatoquet, 2021, Quantization and deployment of deep neural networks on microcontrollers, Sensors, 2021, https://www.mdpi.com/1424-8220/21/9/2984
- P Cruz, N Achir, AC Viana, 2022, On the edge of the deployment: A survey on multi-access edge computing https://dl.acm.org/doi/abs/10.1145/3529758 https://inria.hal.science/hal-03637105/file/ACM_MEC_Survey___Camera_Ready.pdf
- W Yu, F Liang, X He, WG Hatcher, C Lu, J Lin, 2017, A survey on the edge computing for the Internet of Things, IEEE Access (Volume: 6), https://ieeexplore.ieee.org/abstract/document/8123913/ https://ieeexplore.ieee.org/iel7/6287639/8274985/08123913.pdf
- R. Sanchez-Iborra and A. F. Skarmeta, Tinyml-enabled frugal smart objects: Challenges and opportunities, IEEE Circuits and Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020. https://ieeexplore.ieee.org/document/9166461 PDF: https://sci-hub.se/10.1109/MCAS.2020.3005467
- R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
- S. Prakash, T. Callahan, J. Bushagour, C. Banbury, A. V. Green, P. Warden, T. Ansell, and V. J. Reddi, 2023, CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs, 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 157–167. https://ui.adsabs.harvard.edu/abs/2022arXiv220101863P/abstract
- M. Giordano, L. Piccinelli, and M. Magno, Survey and comparison of milliwatts micro controllers for tiny machine learning at the edge, in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 94–97. https://ieeexplore.ieee.org/document/9870017
- Md. Maruf Hossain Shuvo; Syed Kamrul Islam; Jianlin Cheng; Bashir I. Morshed, 2023, Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review, Proceedings of the IEEE (Volume 111, Issue 1, January 2023), https://ieeexplore.ieee.org/document/9985008 PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9985008 (Extensive 2023 survey of inference optimization in general and specifically on edge platforms.)
- T Tambe, 2023, Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning, https://dash.harvard.edu/bitstream/handle/1/37375806/Final_Draft_PhD_Dissertation_Thierry_Tambe.pdf?sequence=1&isAllowed=y
- Douglas C. Youvan , June 15, 2024, Developing and Deploying AI Applications on NVIDIA Jetson Orin NX: A Comprehensive Guide, https://www.researchgate.net/profile/Douglas-Youvan/publication/381434888_Developing_and_Deploying_AI_Applications_on_NVIDIA_Jetson_Orin_NX_A_Comprehensive_Guide/links/666d7390de777205a32fceb6/Developing-and-Deploying-AI-Applications-on-NVIDIA-Jetson-Orin-NX-A-Comprehensive-Guide.pdf
- Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
- Dan Peng, Zhihui Fu, Jun Wang, 1 Jul 2024, PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs, https://arxiv.org/abs/2407.01031 (Running fine-tuning on a smartphone via a low-memory optimization using a "derivative-free" "zeroth-order" technique called MeZo, with advantages such as privacy.)
- Ying He, Jingcheng Fang, F. Richard Yu, Victor C. Leung, 2024, Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach, PrePrints pp. 1-12, DOI: 10.1109/TMC.2024.3415661, https://www.computer.org/csdl/journal/tm/5555/01/10591707/1YraFlDdKYo
- Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati, 10 Jul 2024, Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical, https://arxiv.org/abs/2407.11061
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun, 3 Aug 2024, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 Code: https://github.com/OpenBMB/MiniCPM-V
- Beom Jin Kang, Hae In Lee, Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim, October 2024, A survey of FPGA and ASIC designs for transformer inference acceleration and optimization, Journal of Systems Architecture, Volume 155, 103247, https://www.sciencedirect.com/science/article/abs/pii/S138376212400184X
- R. Narmeen, P. Mach, Z. Becvar and I. Ahmad, 16 August 2024, Joint Exit Selection and Offloading Decision for Applications Based on Deep Neural Networks, IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3444898, https://doi.org/10.1109/JIOT.2024.3444898 https://ieeexplore.ieee.org/abstract/document/10638073
- Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
- Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
- L. Cheng, Y. Gu, Q. Liu, L. Yang, C. Liu and Y. Wang, 2024, Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A Survey, in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3353176. https://ieeexplore.ieee.org/abstract/document/10398463
- Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
- Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
- Tyler Mullen, August 22, 2024, Unlocking 7B+ language models in your browser: A deep dive with Google AI Edge's MediaPipe, https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
- Othmane Friha, Mohamed Amine Ferrag, Burak Kantarci, Burak Cakmak, Arda Ozgun, Nassira Ghoualmi-Zine, 2024, LLM-based Edge Intelligence: A Comprehensive Survey on Architectures, Applications, Security and Trustworthiness, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10669603
- Dimitrios Kafetzis, Iordanis Koutsopoulos, Oct 2024, Demo: AnExperimental Platform for AI Model Partitioning on Resource-constrained Devices, https://dl.acm.org/doi/pdf/10.1145/3641512.3690629
- M. Sponner, L. Servadei, B. Waschneck, R. Wille and A. Kumar, "Harnessing Temporal Information for Efficient Edge AI," 2024 9th International Conference on Fog and Mobile Edge Computing (FMEC), Malmö, Sweden, 2024, pp. 5-13, doi: 10.1109/FMEC62297.2024.10710223. https://ieeexplore.ieee.org/abstract/document/10710223
- Mistral AI, Oct 2024, Un Ministral, des Ministraux: Introducing the world’s best edge models. https://mistral.ai/news/ministraux/
- Michael Nuñez, October 16, 2024, Mistral AI’s new language models bring AI power to your phone and laptop, https://venturebeat.com/business/mistral-ai-new-language-models-bring-ai-power-to-your-phone-and-laptop/
- Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
- Zebin Yang, Renze Chen, Taiqiang Wu, Ngai Wong, Yun Liang, Runsheng Wang, Ru Huang, Meng Li, 23 Oct 2024, MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers https://arxiv.org/abs/2410.17957
- Arun Nanda, Sep 7, 2024, Reducing the Size of AI Models. Running large AI models on edge devices, https://towardsdatascience.com/reducing-the-size-of-ai-models-4ab4cfe5887a
- Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
- Justine, Apr 2023, Edge AI Just Got Faster, https://justine.lol/mmap/ (Loading models using mmap.)
- Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang, 11 Nov 2024, WDMoE: Wireless Distributed Mixture of Experts for Large Language Models, https://arxiv.org/abs/2411.06681
- Ibrahim Kok, Orhan Demirci, Suat Ozdemir, 20 Nov 2024, When IoT Meet LLMs: Applications and Challenges, https://arxiv.org/abs/2411.17722
- M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
- Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris, 5 Dec 2024, MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference, https://arxiv.org/abs/2412.04147
Hybrid Edge-Cloud Architectures
A hybrid architecture is where some processing is done on edge devices (e.g., PCs or security cameras), and some is passed up to the cloud for more powerful processing. The "Apple Intelligence" architecture is a prominent example now, with some processing done "on-device" for iPhone and Macs, and some passed up to the cloud.
- Hasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil Prasad, 11 Mar 2024, A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge, https://arxiv.org/abs/2403.07036 (Hybrid cloud and on-device inference for image analysis.)
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang, 2017, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Comput. Archit. News, vol. 52, no. 4, pp. 615–629, https://dl.acm.org/doi/10.1145/3037697.3037698
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
- Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati, 10 Jul 2024, Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical, https://arxiv.org/abs/2407.11061
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
- Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia, 19 Jun 2024, VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework, https://arxiv.org/abs/2406.13399
- Teresa Peng, Kabir Mehta, Liam Liu, Aditya Nair, Priya Sing, 2024, Enhanced Hybrid Inference Techniques for Scalable On-Device LLMPersonalization and Cloud Integration, PDF: https://www.researchgate.net/profile/Priya-Singh-103/publication/384311522_Enhanced_Hybrid_Inference_Techniques_for_Scalable_On-Device_LLM_Personalization_and_Cloud_Integration/links/66f3cfb09e6e82486fef9f1c/Enhanced-Hybrid-Inference-Techniques-for-Scalable-On-Device-LLM-Personalization-and-Cloud-Integration.pdf
- Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras, Haitham Bou Ammar, Jun Wang, 4 Oct 2024, Mixture of Attentions For Speculative Decoding, https://arxiv.org/abs/2410.03804
- Divya Jyoti Bajpai, Manjesh Kumar Hanawal, 6 Oct 2024, Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach, https://arxiv.org/abs/2410.05338
- Jiaming Qiu, Ruiqi Wang, Brooks Hu, Roch Guerin, Chenyang Lu, 24 Oct 2024, Optimizing Edge Offloading Decisions for Object Detection, https://arxiv.org/abs/2410.18919
- Fan Yang, Zehao Wang∗, Haoyu Zhang, Zhenhua Zhu, Xinhao Yang, Guohao Dai, Yu Wang, Oct 2024, Efficient Deployment of Large Language Model across Cloud-Device Systems, https://nicsefc.ee.tsinghua.edu.cn/nics_file/pdf/f06a14c1-4d6d-441d-b4e4-82545ac5781b.pdf
More AI Research
Read more about: