Aussie AI
Approximate Computing for Faster AI
-
Last Updated 29 September, 2024
-
by David Spuler, Ph.D.
Approximate computing is a longstanding technique to improve speed at the cost of accuracy in many areas of Computer Science. The idea has recently been garnering much interest in the AI research community with many papers. There is interest in approximation research for speeding up low-level arithmetic (i.e. the multiplication bottleneck) and at the higher-level of whole model components.
Approximate Multiplication. Multiplication can be sped up using approximate algorithms in software and/or hardware. Some of the areas where approximations can improve model inference with approximate arithmetic include:
- Approximate multiplication algorithms for faster arithmetic (see below and also advanced mathematics).
- Approximate matrix multiplication algorithms such as low-rank factorization (see matrix algebra).
- Logarithmic number system (LNS) (replaces multiplication with addition, but is approximate).
- Other number systems: RNS, PNS, Dyadic numbers (see advanced math).
- Other approximate arithmetic: approximate division, approximate addition (see approximate arithmetic).
- Additive inference engines (including "AdderNets"), look-up tables (LUTs), and other multiplication-free inference (see zero-multiplication inference algorithms).
Approximate Components. Some higher-level Transformer components are also being considered for acceleration via approximation:
- Approximating attention heads with simpler versions (or removing them entirely via head pruning)
- Approximating GELU and other activation function approximations
- Approximating SoftMax
- Approximate normalization functions
- Approximate top-k algorithms
Approximate Multipliers for Faster Model Inference
There has been an explosion of papers on approximate multiplication algorithms and their use in model inference and training. For analysis of low-level approximate multiplication algorithms and their theory, including logarithmic approximate multiplication and non-logarithmic approximate multiplication, see advanced AI mathematics. Also related is the Logarithmic number system (LNS) and other obscure number systems such as Dyadic numbers, the Residue Number System (RNS) and Posit Number System (PNS); see advanced number systems. See also additive neural networks and multiplier-free inference.
AI Approximate Multiplication Research: Papers focused on the specific use of approximate multiplication algorithms for neural networks and Transformers, include:
- S. S. Sarwar, S. Venkataramani et al., “Energy-efficient neural computing with approximate multipliers,” J. Emerg. Technol. Comput. Syst., vol. 14, no. 2, pp. 16:1–16:23, Jul. 2018, https://dl.acm.org/doi/10.1145/3097264
- Q. Zhang, T. Wang, Y. Tian, F. Yuan, and Q. Xu, “Approxann: An approximate computing framework for artificial neural network,” in DATE’15, March 2015, pp. 701–706, https://ieeexplore.ieee.org/document/7092478
- M. A. Hanif, R. Hafiz, and M. Shafique, Error resilience analysis for systematically employing approximate computing in convolutional neural networks, Design, Automation and Test in Europe Conference and Exhibition (DATE), 2018, IEEE (2018), pp. 913–916, https://ieeexplore.ieee.org/document/8342139
- M. A. Hanif, A. Marchisio et al., “X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks,” Journal of Low Power Electronics, vol. 14, no. 4, pp. 520–534, Dec. 2018. https://www.semanticscholar.org/paper/X-DNNs:-Systematic-Cross-Layer-Approximations-for-Hanif-Marchisio/5ddaf1aff7d5a4a3484963849828c8d2d1315bc3
- V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, Design of power-efficient approximate multipliers for approximate artificial neural networks, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November (2016), pp. 1–7, https://ieeexplore.ieee.org/document/7827658
- S. Kim, P. Howe, T. Moreau, A. Alaghi, L. Ceze, and V. Sathe, MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators, Design, Automation and Test in Europe Conference and Exhibition (DATE), 2018, IEEE (2018), pp. 1–6, https://arxiv.org/abs/1706.04332
- S. De, J. Huisken, and H. Corporaal, “Designing energy efficient approximate multipliers for neural acceleration,” in 2018 21st Euromicro Conference on Digital System Design (DSD). IEEE, 2018, pp. 288–295, https://ieeexplore.ieee.org/document/8491830
- X. He, L. Ke, W. Lu, G. Yan, and X. Zhang, Axtrain: Hardware-oriented neural network training for approximate inference. arXiv preprint arXiv:1805.08309 (2018), https://arxiv.org/abs/1805.08309v1
- P. Gysel, J. Pimentel et al., “Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., 2018, https://ieeexplore.ieee.org/abstract/document/8318896
- Min Soo Kim; Alberto A. Del Barrio; Leonardo Tavares Oliveira; Román Hermida; Nader Bagherzadeh, "Efficient Mitchell’s Approximate Log Multipliers for Convolutional Neural Networks", IEEE Transactions on Computers, Volume 68 Issue 5, p.660-675, November 2018, https://ieeexplore.ieee.org/abstract/document/8532287
- T. Mogami, Deep neural network training without multiplications, In Beyond BackPropagation WS at 34th Conference on Neural Information Processing Systems, 2020, https://arxiv.org/abs/2012.03458 (multiplication of floating-point numbers with integer addition, using Mitchell's approximate multiplication)
- Lingyun Yao, Martin Trapp, Karthekeyan Periasamy, Jelin Leslin, Gaurav Singh, Martin Andraud, June 2023, Logarithm-Approximate Floating-Point Multiplier for Hardware-efficient Inference in Probabilistic Circuits, Proceedings of The 6th Workshop on Tractable Probabilistic Modeling, https://openreview.net/forum?id=WL7YDLOLfK, PDF: https://openreview.net/pdf?id=WL7YDLOLfK (Probabilistic speed improvement; uses Mogami's approximate multiplier.)
- T. Hokchhay, S. Hashemi, R. I. Bahar, and S. Reda, “Hardware-software codesign of accurate, multiplier-free deep neural networks,” in Proc. 54th Annu. Design Autom. Conf. (DAC), 2017, pp. 1–6., https://arxiv.org/abs/1705.04288
- M. S. Ansari, B. F. Cockburn, and J. Han, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Transactions on Computers, 2020, https://ieeexplore.ieee.org/document/9086744
- U. Lotric and P. Bulic, "Applicability of approximate multipliers in hardware neural networks," Neurocomput., vol. 96, pp. 57–65, Nov. 2012, https://dl.acm.org/doi/10.1016/j.neucom.2011.09.039
- Z. Du, K. Palem, A. Lingamneni, O. Temam, Y. Chen, and C. Wu, "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," in Proc. 19th Asia South Pacific Des. Autom. Conf., 2014, pp. 201–206, https://pages.saclay.inria.fr/olivier.temam/files/eval/DLCPTW2014.pdf
- S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy, "Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing," in Proc. Des. Autom. Test Eur. Conf. Exhib., 2016, pp. 145–150, https://arxiv.org/abs/1602.08557
- J. Choi and S. Venkataramani, Approximate Computing Techniques for Deep Neural Networks. Cham: Springer, 2019, pp. 307–329, Chapter 15, https://link.springer.com/chapter/10.1007/978-3-319-99322-5_15
- M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, 2019, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 317–328, Oct 2019, https://ieeexplore.ieee.org/document/8863138
- Biyanu Zerom, Mohammed Tolba, Huruy Tesfai, Hani Saleh, Mahmoud Al-Qutayri, Thanos Stouraitis, Baker Mohammad, Ghada Alsuhli, 2022, Approximate Logarithmic Multiplier For Convolutional Neural Network Inference With Computational Reuse, 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 24-26 October 2022, https://doi.org/10.1109/ICECS202256217.2022.9970861, https://ieeexplore.ieee.org/abstract/document/9970861/
- M. S. Ansari, B. F. Cockburn, and J. Han, 2020, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Transactions on Computers, vol. 70, no. 4, pp. 614–625, May 2020. https://ieeexplore.ieee.org/document/9086744
- Tso-Bing Juang; Cong-Yi Lin; Guan-Zhong Lin, 2018, “Area-delay product efficient design for convolutional neural network circuits using logarithmic number systems,” in International SoC Design Conference (ISOCC). IEEE, 2018, pp. 170–171, https://ieeexplore.ieee.org/abstract/document/8649961
- Ourania Spantidi, Iraklis Anagnostopoulos, "The Perfect Match: Selecting Approximate Multipliers for Energy-Efficient Neural Network Inference", 2023 IEEE 24th International Conference on High Performance Switching and Routing (HPSR), pp.27-32, 2023. https://ieeexplore.ieee.org/document/10147918
- O. Spantidi, G. Zervakis, I. Anagnostopoulos, H. Amrouch and J. Henkel, "Positive/negative approximate multipliers for dnn accelerators", arXiv preprint arXiv:2107.09366, 2021. https://arxiv.org/abs/2107.09366 (Approximate multiplication for DNNs without needing retraining.)
- Vojtech Mrazek, "Approximation of Hardware Accelerators driven by Machine-Learning Models: (Embedded Tutorial)", 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pp.91-92, 2023. https://ieeexplore.ieee.org/document/10139484
- Michal Pinos, Vojtech Mrazek, Filip Vaverka, Zdenek Vasicek, Lukas Sekanina, "Acceleration Techniques for Automated Design of Approximate Convolutional Neural Networks", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol.13, no.1, pp.212-224, 2023. https://ieeexplore.ieee.org/document/10011413
- Mohammad Hasan Ahmadilivani, Mario Barbareschi, Salvatore Barone, Alberto Bosio, Masoud Daneshtalab, Salvatore Della Torca, Gabriele Gavarini, Maksim Jenihhin, Jaan Raik, Annachiara Ruospo, Ernesto Sanchez, Mahdi Taheri, "Special Session: Approximation and Fault Resiliency of DNN Accelerators", 2023 IEEE 41st VLSI Test Symposium (VTS), pp.1-10, 2023. https://ieeexplore.ieee.org/document/10140043
- Zahra Ebrahimi, Muhammad Zaid, Mark Wijtvliet, Akash Kumar, "RAPID: Approximate Pipelined Soft Multipliers and Dividers for High Throughput and Energy Efficiency", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.42, no.3, pp.712-725, 2023. https://ieeexplore.ieee.org/document/9802734
- U. Anil Kumar, Pavankumar Bikki, Sreehari Veeramachaneni, Syed Ershad Ahmed, "Power Efficient Approximate Multiplier Architectures for Error Resilient Applications", 2022 IEEE 19th India Council International Conference (INDICON), pp.1-5, 2022. https://ieeexplore.ieee.org/document/10039748
- Qiao Shen, Renyuan Zhang, Hao Zhang, Hao Cai, Bo Liu, Jian Xiao, "A CGP-based Efficient Approximate Multiplier with Error Compensation", 2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), pp.48-49, 2022. https://ieeexplore.ieee.org/document/9963083
- Siyuan Liang, Ke Chen, Bi Wu, Weiqiang Liu, "A Survey of Approximation based Hardware Acceleration Techniques for Deep Neural Networks (Invited)", 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp.1-4, 2022. https://ieeexplore.ieee.org/document/9963257
- Zhen Li, Su Zheng, Jide Zhang, Yao Lu, Jingbo Gao, Jun Tao, Lingli Wang, "Adaptable Approximate Multiplier Design Based on Input Distribution and Polarity", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.30, no.12, pp.1813-1826, 2022. https://ieeexplore.ieee.org/document/9861394
- Ourania Spantidi, Georgios Zervakis, Iraklis Anagnostopoulos, Jörg Henkel, "Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.41, no.11, pp.3838-3849, 2022. https://ieeexplore.ieee.org/document/9852790
- Ourania Spantidi, Iraklis Anagnostopoulos, "How much is too much error? Analyzing the impact of approximate multipliers on DNNs", 2022 23rd International Symposium on Quality Electronic Design (ISQED), pp.1-6, 2022. https://ieeexplore.ieee.org/document/9806282
- Hao Zhang, Seok-Bum Ko, "Variable-Precision Approximate Floating-Point Multiplier for Efficient Deep Learning Computation", IEEE Transactions on Circuits and Systems II: Express Briefs, vol.69, no.5, pp.2503-2507, 2022. https://ieeexplore.ieee.org/document/9739768
- S Raghuram, N Shashank, "Approximate Adders for Deep Neural Network Accelerators", 2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID), pp.210-215, 2022. https://ieeexplore.ieee.org/document/9885998
- Georgios Zervakis, Iraklis Anagnostopoulos, Sami Salamin, Ourania Spantidi, Isai Roman-Ballesteros, Jörg Henkel, Hussam Amrouch, "Thermal-Aware Design for Approximate DNN Accelerators", IEEE Transactions on Computers, vol.71, no.10, pp.2687-2697, 2022. https://ieeexplore.ieee.org/document/9672753
- Tao Li, Yitao Ma, Ko Yoshikawa, Osamu Nomura, Tetsuo Endoh, "Energy-Efficient Convolution Module With Flexible Bit-Adjustment Method and ADC Multiplier Architecture for Industrial IoT", IEEE Transactions on Industrial Informatics, vol.18, no.5, pp.3055-3065, 2022. https://ieeexplore.ieee.org/document/9519513
- Tong Li, Hong-Lan Jiang, Hai Mo, Jie Han, Lei-Bo Liu, Zhi-Gang Mao, "Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators", Journal of Computer Science and Technology, vol.38, no.2, pp.309, 2023. https://doi.org/10.1007/s11390-023-2548-8
- M. Esmali Nojehdeh, L. Aksoy, M. Altun, Efficient hardware implementation of artificial neural networks using approximate multiply-accumulate blocks, in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2020), pp. 96–101, https://ieeexplore.ieee.org/document/9154973
- Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, "Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey", ACM Computing Surveys, vol.55, no.4, pp.1, 2023. https://doi.org/10.1145/3527156, https://arxiv.org/abs/2203.08737 (Survey of many approximate techiques in AI.)
- Anjankar, S., Hemant Gillurkar, Joshi, P., & Dwaramwar, P. (2022). Design and Analysis of Multipliers for DNN application using approximate 4:2 Compressors. International Journal of Next-Generation Computing, 13(5). https://doi.org/10.47164/ijngc.v13i5.918, https://ijngc.perpetualinnovation.net/index.php/ijngc/article/view/918
- Hao Zhang, Mohammadreza Asadikouhanjani, Jie Han, Deivalakshmi Subbian, Seok-Bum Ko, "Approximate Computing for Efficient Neural Network Computation: A Survey", In: Approximate Computing, Editors: Weiqiang Liu, Fabrizio Lombardi, pp.397, 2022. https://doi.org/10.1007/978-3-030-98347-5_16, Amazon: https://www.amazon.com/Approximate-Computing-Weiqiang-Liu-ebook/dp/B0BBKR65SB/
- Sudeh Shirkavand Saleh Abad, Mohammad Hossein Moaiyeri, "A Hardware- and Accuracy-Efficient Approximate Multiplier with Error Compensation for Neural Network and Image Processing Applications", Circuits, Systems, and Signal Processing, vol.41, no.12, pp.7057, 2022. https://doi.org/10.1007/s00034-022-02110-7
- Cecilia De la Parra, Andre Guntoro, Akash Kumar, Efficient Accuracy Recovery in Approximate Neural Networks by Systematic Error Modelling, ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference, January 2021, Pages 365–371, https://doi.org/10.1145/3394885.3431533, https://dl.acm.org/doi/10.1145/3394885.3431533
- Issam Hammad; Kamal El-Sankary; Jason Gu, 2019, Deep Learning Training with Simulated Approximate Multipliers. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), https://ieeexplore.ieee.org/abstract/document/8961780
- Issam Hammad and Kamal El-Sankary. 2018. Impact of Approximate Multipliers on VGG Deep Learning Network. IEEE Access (2018). https://ieeexplore.ieee.org/document/8488463
- Vojtech Mrazek, Zdenek Vasícek, Lukás Sekanina, Muhammad Abdullah Hanif, and Muhammad Shafique. 2019. ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining. ICCAD '19 (2019) https://arxiv.org/abs/1907.07229
- Michal Pinos, Vojtech Mrazek, and Lukás Sekanina. 2021. Evolutionary Neural Architecture Search Supporting Approximate Multipliers. In Genetic Programming-24th European Conference, EuroGP 2021, Virtual Event, April 7--9, 2021. https://arxiv.org/abs/2101.11883
- Uros Lotric and Patricio Bulic. 2012. Applicability of approximate multipliers in hardware neural networks. Neurocomputing 96 (2012), 57--65. https://dl.acm.org/doi/10.1016/j.neucom.2011.09.039
- Cecilia De la Parra, Andre Guntoro, and Akash Kumar. 2020. ProxSim: GPU-based Simulation Framework for Cross-Layer Approximate DNN Optimization. In 2020 Design, Automation & Test in Europe Conference & Exhibition, DATE 2020, Grenoble, France, March 9--13, 2020. https://ieeexplore.ieee.org/abstract/document/9116476, PDF: https://cfaed.tu-dresden.de/files/Images/people/chair-pd/Papers/date_framework.pdf
- Cecilia De la Parra, Andre Guntoro, and Akash Kumar. 2020. Full Approximation of Deep Neural Networks through Efficient Optimization. In IEEE International Symposium on Circuits and Systems, ISCAS 2020, Sevilla, Spain, October 10--21, 2020 https://ieeexplore.ieee.org/document/9181236 (Evaluates over 400 different approximate multipliers.)
- Min Soo Kim; Alberto A. Del Barrio; Román Hermida; Nader Bagherzadeh, 2018, “Low-power implementation of Mitchell’s approximate logarithmic multiplication for convolutional neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018, pp. 617–622. https://ieeexplore.ieee.org/document/8297391
- U. Lotric and P. Bulic, 2011, “Logarithmic multiplier in hardware implementation of neural networks,” in International Conference on Adaptive and Natural Computing Algorithms. Springer, April 2011, pp. 158–168. https://dl.acm.org/doi/10.5555/1997052.1997071
- X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Shows how multiplication by a power-of-two, which could be optimized to a bitshift in integers, can also be calculated quickly for floating point operands using integer addition on the sign and exponent bits of a floating point number.)
Approximate Caching
Caching or "memoization" is the optimization of storing a computation to re-use it later. Typically, this is the same exact computation, but some newer techniques have used caching of approximate values, so as to get an approximation of the calculated value when reused later. An example is the use of Locality Sensitive Hashing to detect "near-exact" vectors, so as to cache and reuse an entire vector dot product calculation. See more about hashing algorithms and caching optimizations in neural networks.
Papers with approximate caching optimizations:
- Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356
Advanced Number Systems and Model Inference
There are a variety of alternative mathematical constructs such as the Residue Number System (RNS) and the Posit Number System (PNS); see advanced number systems. For an addition-based method of approximate multiplication, see the Logarithmic Number System (LNS). Papers on the use of advanced number systems with neural networks include:
- G Alsuhli, V Sakellariou, H Saleh, M Al-Qutayri, Number Systems for Deep Neural Network Architectures: A Survey, 2023, https://arxiv.org/abs/2307.05035 (A very comprehensive survey.)
- Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer, HAWQ-V3: Dyadic Neural Network Quantization, Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11875-11886, 2021, https://arxiv.org/abs/2011.10680 (Dyadic numbers.)
- S. Salamat, M. Imani, S. Gupta, and T. Rosing, RNSnet: In-memory neural network acceleration using residue number system, 2018, In Proceedings of the 2018 IEEE International Conference on Rebooting Computing (ICRC’18), 1–12, https://ieeexplore.ieee.org/document/8638592 (Residue Number System)
- Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D. Kudithipudi, Deep positron: A deep neural network using the posit number system. 2019, In Proceedings of the 2019 Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 1421–1426, https://arxiv.org/abs/1812.01762 (Posit Number System)
- Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, and Dhireesha Kudithipudi, Performance-efficiency trade-off of low-precision numerical formats in deep neural networks, 2019, In Proceedings of the 2019 Conference for Next Generation Arithmetic (CoNGA’19), ACM, New York, NY, Article 3, 9 pages, https://doi.org/10.1145/3316279.3316282
Approximate Transformer Components
Research has turned to approximating the larger building block components inside the Transformer architecture. See also high-level Transformer optimization techniques, such as quantization, attention head pruning and layer pruning. Papers on high-level approximations of Transformer components are below in areas such as:
- Activation function approximation
- Attention head approximation
- Softmax approximation
- Normalization approximation
Attention Head Approximation
See the research on approximate attention head architectures and attention optimization in general.
Activation Function Approximation
See approximations of activation functions.
Softmax Approximation
See research on softmax optimization and approximation.
Approximating Normalization
The normalization layer can be coded as an approximate normalization layer or alternatively, there is also pruned normalization (removed).
Approximating Other Transformer Components
Other general papers on approximations for Transformer architectures (and neural networks in general):
- Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi, NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference, Dec 2021, https://arxiv.org/pdf/2112.02191 (Approximation using look-up tables.)
- Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., Jones, L., Schuster, M., Shazeer, N., Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Chen, Z., Wu, Y., and Hughes, M. The best of both worlds: Combining recent advances in neural machine translation. In ACL, 2018, https://arxiv.org/abs/1804.09849 (Hybrid Transformer architectures.)
- Ma, J. and Yarats, D. On the adequacy of untuned warmup for adaptive optimization. arXiv:1910.04209, 2019. https://arxiv.org/abs/1910.04209
- J Zhong, Z Liu, X Chen, Apr 2023, Transformer-based models and hardware acceleration analysis in autonomous driving: A survey, https://arxiv.org/abs/2304.10891 (Sections on approximating various components of Transformers.)
Approximate Neural Networks
More research papers on approximation used with neural networks, in general:
- Z. Peng et al. 2018. AXNet: ApproXimate computing using an end-to-end trainable neural network. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) https://ieeexplore.ieee.org/document/8605388 (Ensemble dual-model method where one model is a fast approximatation of the other.)
- Matevž Fabjančič, Octavian Machidon, Hashim Sharif, Yifan Zhao, Saša Misailović, Veljko Pejović, March 2023, Mobiprox: Supporting Dynamic Approximate Computing on Mobiles, https://arxiv.org/abs/2303.11291 (Uses probabilistic approximations, such as loop perforation, for fast neural networks on mobile.)
- Jorge Castro-Godínez, Deykel Hernández-Araya, Muhammad Shafique, Jörg Henkel, 2020, Approximate acceleration for CNN-based applications on IoT edge devices, 2020 IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS), https://ieeexplore.ieee.org/document/9069040
- Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356 (Examines some early approximate neural networks such as AxNN.)
- W Dong, G Kestor, D Li, 2023, Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications, HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, August 2023, Pages 31–44, https://doi.org/10.1145/3588195.3592985, https://dl.acm.org/doi/abs/10.1145/3588195.3592985
- Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints, In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, ACM MobiSys, Singapore, 26–30 June 2016, pp. 123–136, https://dl.acm.org/doi/10.1145/2906388.2906396
- F Manca, F Ratto, 2023, ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs arXiv preprint arXiv:2309.13321, https://arxiv.org/pdf/2309.13321.pdf (Approximation techniques applied to edge computing.)
- HJ Damsgaard, A Ometov, J Nurmi, 2023, ACM Computing Surveys, Approximation Opportunities in Edge Computing Hardware: A Systematic Literature Review https://dl.acm.org/doi/abs/10.1145/3572772, PDF: https://dl.acm.org/doi/pdf/10.1145/3572772
- M. A. Hanif, F. Khalid, and M. Shafique, CANN: Curable approximations for high-performance deep neural network accelerators, in Proc. 56th Annu. Design Automat. Conf. (DAC). New York, NY, USA: Association for Computing Machinery, 2019, pp. 1–6. https://ieeexplore.ieee.org/document/8806937
Approximation to Avoid Redundant Computations
A simple approximate calculation can sometimes be performed as a preliminary, so as to avoid expensive calculations in some cases (see also conditional computation and caching optimizations). This method is sometimes called "common case first" or "simple case first". General papers about using this at the low-level are below, but the logical extension to the high-level is "big-little models" (see ensemble architectures).
- Duvindu Piyasena, Rukshan Wickramasinghe, Debdeep Paul, Siew Kei Lam, and Meiqing Wu. 2019. Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies. Proceedings 29th International Conference on Field-Programmable Logic and Applications, FPL 2019 (9 2019), 354–359, https://ieeexplore.ieee.org/document/8891989, PDF: https://siewkeilam.github.io/ei-research-group/Paper/2019H-Duvindu-FPL.pdf (Calculates an approximate result to inexactly avoid cases where the exact computations would be negative, and would be reduced to zero by RELU activation.)
- Yuxiang Huan, Yifan Qin, Yantian You, Lirong Zheng, and Zhuo Zou. Sep 2016. A multiplication reduction technique with near-zero approximation for embedded learning in IoT devices. 2016 29th IEEE International System-on-Chip Conference (SOCC), 102–107. https://ieeexplore.ieee.org/abstract/document/7905445 (Avoids near-zero low multiplications on small values, that would result in zero, thereby skipping a wasteful multiplication.)
- Maedeh Hemmat, Joshua San Miguel, and Azadeh Davoodi. 2020. AirNN: A Featherweight Framework for Dynamic Input-Dependent Approximation of CNNs. Transactions on Computer-Aided Design of Integrated Circuits and Systems. https://ieeexplore.ieee.org/document/9239327 (Approximates weight computations by pre-computing them into groups offline, and only using some of the weights in calculations during inference, effectively dynamically pruning the other weights to zero.)
- Minkyu Kim and Jae Sun Seo. 2021. An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access. IEEE Journal of Solid-State Circuits 56, 3 (2021), 803–813, https://ieeexplore.ieee.org/document/9229157 (Approximate convolutions with most-significant bits are done first.)
Approximate Computing General Research
The general idea of using approximations in computing, as a general algorithm with trade-offs, has a considerable body of research. Here are a few of the theoretical papers:
- D. Palomino, M. Shafique, A. Susin, J. Henkel, “Thermal Optimization using Adaptive Approximate Computing for Video Coding”, IEEE/ACM 19th Design, Automation and Test in Europe Conference (DATE), 2016, https://ieeexplore.ieee.org/document/7459495
- V. Mrazek, M. A. Hanif et al., “autoax: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components,” in DAC’19. ACM, 2019, https://arxiv.org/abs/1902.10807
- R. Nair, “Big data needs approximate computing: technical perspective”, ACM Communications, 58(1): 104, 2015. https://dl.acm.org/doi/10.1145/2688072
- A. K. Mishra, R. Barik, S. Paul, “iACT: A Software-Hardware Framework for Understanding the Scope of Approximate Computing”, Workshop on Approximate Computing Across the System Stack (WACAS), 2014. PDF: https://sampa.cs.washington.edu/wacas14/papers/mishra.pdf
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Architecture support for disciplined approximate programming”, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. PDF: https://www.cs.cornell.edu/~asampson/media/papers/truffle-asplos2012.pdf
- V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan, “Analysis and characterization of inherent application resilience for approximate computing”, Design Automation Conference (DAC), 2013. https://ieeexplore.ieee.org/document/6560706
- J. Choi and S. Venkataramani, Approximate Computing Techniques for Deep Neural Networks. Cham: Springer, 2019, pp. 307–329, Chapter 15, https://link.springer.com/chapter/10.1007/978-3-319-99322-5_15
- S. Venkataramani, S. T. Chakradhar, K. Roy, and A. Raghunathan, Approximate computing and the quest for computing efficiency, Proceedings of the 52nd Annual Design Automation Conference, ACM (2015), p. 120, https://ieeexplore.ieee.org/document/7167251
- G. Pekhimenko, D. Koutra, K. Qian, “Approximate computing: Application analysis and hardware design”, May 2013, PDF: www.cs.cmu.edu/~gpekhime/Projects/15740/paper.pdf
- Weiqiang Liu, Fabrizio Lombardi (Book Editors), Approximate Computing, 2022, https://link.springer.com/book/10.1007/978-3-030-98347-5, https://www.amazon.com/Approximate-Computing-Weiqiang-Liu-ebook/dp/B0BBKR65SB/
- Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356
- VV Kulkarni, 2020, Approximate computing techniques for accelerating compute intensive workloads, https://www.ideals.illinois.edu/items/115960, PDF: https://www.ideals.illinois.edu/items/115960/bitstreams/379143/object?dl=1
- Amir Yazdanbakhsh; Divya Mahajan; Hadi Esmaeilzadeh; Pejman Lotfi-Kamran, 2017, AxBench: A multiplatform benchmark suite for approximate computing, IEEE Design & Test, Volume 34, Issue 2, April 2017, https://ieeexplore.ieee.org/abstract/document/7755728/, PDF: https://ieeexplore.ieee.org/ielaam/6221038/7862860/7755728-aam.pdf, PDF: http://axbench.org/papers/dt.darksilicon16-camera.pdf
- Michael Ringenburg, Adrian Sampson, Isaac Ackerman, Luis Ceze, and Dan Grossman. 2015. Monitoring and debugging the quality of results in approximate programs. In International Conference on Architectural Support for Programming Languages and Operating Systems. 399–411. https://dl.acm.org/doi/10.1145/2775054.2694365, PDF: https://homes.cs.washington.edu/~luisceze/publications/approxdebug-asplos15.pdf
- Thomas Y. Yeh, Petros Faloutsos, Milos Ercegovac, Sanjay J. Patel, and Glenn Reinman. 2007. The art of deception: Adaptive precision reduction for area efficient physics acceleration. 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). pp. 394–406. https://ieeexplore.ieee.org/document/4408271
- Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In ACM SIGARCH Computer Architecture News, Vol. 42. 35–50, https://dl.acm.org/doi/10.1145/2654822.2541948
- Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 2022, Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, ACM Computing Surveys, Volume 55, Issue 4, No. 83, pp 1–36 https://doi.org/10.1145/3527156, https://dl.acm.org/doi/10.1145/3527156, https://arxiv.org/abs/2203.08737
- Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W Lee, et al. A^3: Accelerating attention mechanisms in neural networks with approximation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 328–341. IEEE, 2020. https://arxiv.org/abs/2002.10941
- Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 12 Feb 2024, TransAxx: Efficient Transformers with Approximate Computing, https://arxiv.org/abs/2402.07545 (Using approximations in Vision Transformer architectures.)
- Salar Shakibhamedan, Amin Aminifar, Nima TaheriNejad, Axel Jantsch, 2024, EASE: Energy Optimization through Adaptation — A Review of Runtime Energy-Aware Approximate Deep Learning Algorithms, https://eclectx.org/Publications/2024_M13.pdf (Survey paper on techniques for adaptive inference with a focus on approximations of inference, including loop performance, stochastic algorithms, approximate arithmetic, quantization, pruning and low-rank.)
- John Fraser Hart, Jul 1, 1978 Computer Approximations, https://www.amazon.com/Computer-Approximations-John-Fraser-Hart/dp/0882756427/
- Teofilo F. Gonzalez, Sep 30, 2020 Handbook of Approximation Algorithms and Metaheuristics, Second Edition: Two-Volume Set (Chapman & Hall/CRC Computer and Information Science Series), https://www.amazon.com/Handbook-Approximation-Algorithms-Metaheuristics-Second/dp/0367570289/
- Ivan Markovsky, Aug 3, 2018, Low-Rank Approximation: Algorithms, Implementation, Applications (Communications and Control Engineering) Part of: Communications and Control Engineering (62 books), https://www.amazon.com/Low-Rank-Approximation-Implementation-Applications-Communications/dp/3319896199/
- Vijay V. Vazirani, Jul 2, 2001, Approximation Algorithms, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazirani/dp/3540653678/
- David P. Williamson and David B. Shmoys, Apr 26, 2011, The Design of Approximation Algorithms, https://www.amazon.com/Design-Approximation-Algorithms-David-Williamson-ebook/dp/B009019XCG/
- A. M. Dalloo, A. J. Humaidi, A. K. A. Mhdawi and H. Al-Raweshidy, "Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future Directions," in IEEE Access, doi: 10.1109/ACCESS.2024.3467375. https://ieeexplore.ieee.org/document/10693435
More AI Research
Read more about:
- Advanced AI Mathematics
- Zero-Multiplication Models
- Matrix Algebra
- Logarithmic Models
- Inference Optimizations
- « Research Home