Aussie AI

Hardware-Software Co-Design

  • Last Updated 18 November, 2024
  • by David Spuler, Ph.D.

Hardware-software co-design is the joint design of both hardware and software components of the AI tech stack. In practice, all neural network software framework implementations are modified to use hardware-acceleration features, but further improvement is possible by improving the architecture of both hardware and software together.

Research Papers on Co-Design

There is no shortage of papers on hardware accelerators or software frameworks for optimizing inference. Many papers look at hardware to optimize the algorithms, or software modifications to take advantage of hardware optimizations. The papers below are only selected for a specific focus on the joint design of solutions by combining changes to hardware and software. See also the comprehensive list of possible optimizations.

Research papers on how to further optimize systems with dual focus on software and hardware implementations:

  • Haikuo Shao; Jinming Lu; Meiqi Wang; Zhongfeng Wang, 2023, An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization, IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Early Access), https://ieeexplore.ieee.org/document/10251161
  • F Mince, D Dinh, J Kgomo, N Thompson, S Hooker, 2023, The Grand Illusion: The Myth of Software Portability and Implications for ML Progress, arXiv preprint arXiv:2309.07181, https://arxiv.org/pdf/2309.07181.pdf (Examines ML software frameworks TensorFlow, Pytorch, and JAX, and their portability across hardware.)
  • Kah Phooi Seng, Li-Minn Ang, "Embedded Intelligence: State-of-the-Art and Research Challenges", IEEE Access, vol.10, pp.59236-59258, 2022. https://ieeexplore.ieee.org/document/9775683, PDF: https://research.usc.edu.au/esploro/outputs/99640278002621
  • Minxuan Zhou; Weihong Xu; Jaeyoung Kang; Tajana Rosing, 2022, TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for Transformer, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), https://ieeexplore.ieee.org/document/9773212, PDF: https://par.nsf.gov/servlets/purl/10345536
  • Panjie Qi; Edwin Hsing-Mean Sha; Qingfeng Zhuge; Hongwu Peng; Shaoyi Hua, 2021, Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization, 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), https://ieeexplore.ieee.org/document/9643586
  • Tae Jun Ham; Yejin Lee; Seong Hoon Seo; Soosung Kim; Hyunji Choi; Sung Jun Jung; Jae W. Lee, 2021, ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), https://ieeexplore.ieee.org/abstract/document/9499860/, PDF: https://taejunham.github.io/data/elsa_isca21.pdf
  • A Roy, K Roy, 2023, HADES: Hardware/Algorithm Co-design in DNN accelerators using Energy-efficient Approximate Alphabet Set Multipliers, arXiv preprint arXiv:2302.01990, https://arxiv.org/abs/2302.01990
  • L Capogrosso, F Cunico, DS Cheng, F Fummi, 2023, A Machine Learning-oriented Survey on Tiny Machine Learning arXiv preprint arXiv:2309.11932, https://arxiv.org/pdf/2309.11932.pdf
  • C Fu, 2023, Machine Learning Algorithm and System Co-design for Hardware Efficiency, Ph.D. thesis, Computer Science, University of California San Diego, https://escholarship.org/content/qt52q368p3/qt52q368p3.pdf
  • W Chen, Y Wang, Y Xu, C Gao, C Liu, 2022, A framework for neural network architecture and compile co-optimization, https://dl.acm.org/doi/abs/10.1145/3533251, PDF: https://dl.acm.org/doi/pdf/10.1145/3533251
  • Alberto Delmas Lascorz, Mostafa Mahmoud, Ali Hadi Zadeh, Milos Nikolic, Kareem Ibrahim, Christina Giannoula, Ameer Abdelhadi, Andreas Moshovos, 2024, Atalanta: A Bit is Worth a “Thousand” Tensor Values, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, April 2024, Pages 85–102, https://doi.org/10.1145/3620665.3640356 https://dl.acm.org/doi/abs/10.1145/3620665.3640356
  • Mikail Yayla, 2024, A vision for edge AI: ROBUST BINARIZED NEURAL NETWORKS ON EMERGING RESOURCE-CONSTRAINED HARDWARE Ph.D. Dissertation, Technischen Universität Dortmund, Fakultät Informatik, Dortmund 2024, http://129.217.131.68:8080/bitstream/2003/42431/1/Dissertation_Yayla.pdf (Binarized networks with consideration of both software and hardware issues.)
  • Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
  • Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang, 7 Apr 2024, Allo: A Programming Model for Composable Accelerator Design, https://arxiv.org/abs/2404.04815
  • Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda, 30 Jan 2024 (v2), The Case for Co-Designing Model Architectures with Hardware, https://arxiv.org/abs/2401.14489
  • Lei Xun, Jonathon Hare, Geoff V. Merrett, 17 Jan 2024, Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices, https://arxiv.org/abs/2401.08965
  • Wenjie Li; Aokun Hu; Ningyi Xu; Guanghui He, Jan 2024, Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models IEEE Transactions on Circuits and Systems I: Regular Papers (Early Access), https://ieeexplore.ieee.org/abstract/document/10400181/ (Quantization software algorithms done in a hardware-aware co-designed method to optimize hardware matrix-vector multiplication.)
  • S Tuli, NK Jha, 2023, TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference, IEEE Transactions on Computer-Aided Design, https://ieeexplore.ieee.org/abstract/document/10144614/ https://arxiv.org/pdf/2303.14882
  • Christopher Wolters, Xiaoxuan Yang, Ulf Schlichtmann, Toyotaro Suzumura, 12 Jun 2024, Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference, https://arxiv.org/abs/2406.08413
  • Z Gong, H Ji, Y Yao, CW Fletcher, CJ Hughes, 2022, Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques, https://dl.acm.org/doi/abs/10.1145/3470496.3527403 https://dl.acm.org/doi/pdf/10.1145/3470496.3527403
  • T Tambe, 2023, Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning, https://dash.harvard.edu/bitstream/handle/1/37375806/Final_Draft_PhD_Dissertation_Thierry_Tambe.pdf?sequence=1&isAllowed=y
  • Y. Yang, Q. Huang, B. Wu, T. Zhang, L. Ma, G. Gambardella, M. Blott, L. Lavagno, K. Vissers, J. Wawrzynek, and K. Keutzer, 2019, “Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas,” in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 23–32. [Online]. Available: https://doi.org/10.1145/3289602.3293902
  • Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen, 16 Jun 2024, New Solutions on LLM Acceleration, Optimization, and Application, https://arxiv.org/abs/2406.10903 (A survey of inference optimization methods and further analysis of Medusa-type speculative decoding and KV cache compression. Also explores hardware co-design, ML compilers and LLM-assisted code debugging.)
  • Chen, C, 2024, Hardware‑software co‑exploration and optimization for next‑generation learning machines. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/178423 (Extensive coverage of hardware design with multiple contributions to accelerating various neural network types, ranging from acceleration of various single non-linear functions and end-to-end optimization algorithms. Specific topics include data compression, non-maximum suppression, MHA, and MatMul/GEMM optimizations.)
  • Shubha R. Kharel, Prashansa Mukim, Piotr Maj, Grzegorz W. Deptuch, Shinjae Yoo, Yihui Ren, Soumyajit Mandal, 18 Jul 2024, Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence, https://arxiv.org/abs/2407.14560
  • Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li, 17 Jul 2024, MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs, https://arxiv.org/abs/2407.18267
  • Cyrus Zhou, Pedro Savarese, Vaughn Richard, Zack Hassman, Xin Yuan, Michael Maire, Michael DiBrino, Yanjing Li, 6 May 2024 (v2), SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions, https://arxiv.org/abs/2311.14114
  • Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou, 31 Aug 2024 (v2), Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning, DeepSeek AI, https://www.arxiv.org/abs/2408.14158
  • Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai (Helen)Li, Yiran Chen, 8 Oct 2024. A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models, https://arxiv.org/abs/2410.07265
  • Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
  • Akshat Ramachandran, Souvik Kundu, Tushar Krishna, 12 Nov 2024 (v2), MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization, https://arxiv.org/abs/2411.05282

More AI Research

Read more about: