Aussie AI

Approximate Computing for Faster AI

Last Updated 30 August, 2025

by David Spuler, Ph.D.

Approximate computing is a longstanding technique to improve speed at the cost of accuracy in many areas of Computer Science. The idea has recently been garnering much interest in the AI research community with many papers. There is interest in approximation research for speeding up low-level arithmetic (i.e. the multiplication bottleneck) and at the higher-level of whole model components.

Approximate Multiplication. Multiplication can be sped up using approximate algorithms in software and/or hardware. Some of the areas where approximations can improve model inference with approximate arithmetic include:

Approximate multiplication algorithms for faster arithmetic (see below and also advanced mathematics).
Approximate matrix multiplication algorithms such as low-rank factorization (see matrix algebra).
Logarithmic number system (LNS) (replaces multiplication with addition, but is approximate).
Other number systems: RNS, PNS, Dyadic numbers (see advanced math).
Other approximate arithmetic: approximate division, approximate addition (see approximate arithmetic).
Additive inference engines (including "AdderNets"), look-up tables (LUTs), and other multiplication-free inference (see zero-multiplication inference algorithms).

Approximate Components. Some higher-level Transformer components are also being considered for acceleration via approximation:

Approximating attention heads with simpler versions (or removing them entirely via head pruning)
Approximating GELU and other activation function approximations
Approximating SoftMax
Approximate normalization functions
Approximate top-k algorithms

Approximate Multipliers for Faster Model Inference

There has been an explosion of papers on approximate multiplication algorithms and their use in model inference and training. For analysis of low-level approximate multiplication algorithms and their theory, including logarithmic approximate multiplication and non-logarithmic approximate multiplication, see advanced AI mathematics. Also related is the Logarithmic number system (LNS) and other obscure number systems such as Dyadic numbers, the Residue Number System (RNS) and Posit Number System (PNS); see advanced number systems. See also additive neural networks and multiplier-free inference.

AI Approximate Multiplication Research: Papers focused on the specific use of approximate multiplication algorithms for neural networks and Transformers, include:

S. S. Sarwar, S. Venkataramani et al., “Energy-efficient neural computing with approximate multipliers,” J. Emerg. Technol. Comput. Syst., vol. 14, no. 2, pp. 16:1–16:23, Jul. 2018, https://dl.acm.org/doi/10.1145/3097264
Q. Zhang, T. Wang, Y. Tian, F. Yuan, and Q. Xu, “Approxann: An approximate computing framework for artificial neural network,” in DATE’15, March 2015, pp. 701–706, https://ieeexplore.ieee.org/document/7092478
M. A. Hanif, R. Hafiz, and M. Shafique, Error resilience analysis for systematically employing approximate computing in convolutional neural networks, Design, Automation and Test in Europe Conference and Exhibition (DATE), 2018, IEEE (2018), pp. 913–916, https://ieeexplore.ieee.org/document/8342139
M. A. Hanif, A. Marchisio et al., “X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks,” Journal of Low Power Electronics, vol. 14, no. 4, pp. 520–534, Dec. 2018. https://www.semanticscholar.org/paper/X-DNNs:-Systematic-Cross-Layer-Approximations-for-Hanif-Marchisio/5ddaf1aff7d5a4a3484963849828c8d2d1315bc3
V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, Design of power-efficient approximate multipliers for approximate artificial neural networks, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November (2016), pp. 1–7, https://ieeexplore.ieee.org/document/7827658
S. Kim, P. Howe, T. Moreau, A. Alaghi, L. Ceze, and V. Sathe, MATIC: Learning Around Errors for Efficient Low-Voltage Neural Network Accelerators, Design, Automation and Test in Europe Conference and Exhibition (DATE), 2018, IEEE (2018), pp. 1–6, https://arxiv.org/abs/1706.04332
S. De, J. Huisken, and H. Corporaal, “Designing energy efficient approximate multipliers for neural acceleration,” in 2018 21st Euromicro Conference on Digital System Design (DSD). IEEE, 2018, pp. 288–295, https://ieeexplore.ieee.org/document/8491830
X. He, L. Ke, W. Lu, G. Yan, and X. Zhang, Axtrain: Hardware-oriented neural network training for approximate inference. arXiv preprint arXiv:1805.08309 (2018), https://arxiv.org/abs/1805.08309v1
P. Gysel, J. Pimentel et al., “Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., 2018, https://ieeexplore.ieee.org/abstract/document/8318896
Min Soo Kim; Alberto A. Del Barrio; Leonardo Tavares Oliveira; Román Hermida; Nader Bagherzadeh, "Efficient Mitchell’s Approximate Log Multipliers for Convolutional Neural Networks", IEEE Transactions on Computers, Volume 68 Issue 5, p.660-675, November 2018, https://ieeexplore.ieee.org/abstract/document/8532287
T. Mogami, Deep neural network training without multiplications, In Beyond BackPropagation WS at 34th Conference on Neural Information Processing Systems, 2020, https://arxiv.org/abs/2012.03458 (multiplication of floating-point numbers with integer addition, using Mitchell's approximate multiplication)
Lingyun Yao, Martin Trapp, Karthekeyan Periasamy, Jelin Leslin, Gaurav Singh, Martin Andraud, June 2023, Logarithm-Approximate Floating-Point Multiplier for Hardware-efficient Inference in Probabilistic Circuits, Proceedings of The 6th Workshop on Tractable Probabilistic Modeling, https://openreview.net/forum?id=WL7YDLOLfK, PDF: https://openreview.net/pdf?id=WL7YDLOLfK (Probabilistic speed improvement; uses Mogami's approximate multiplier.)
T. Hokchhay, S. Hashemi, R. I. Bahar, and S. Reda, “Hardware-software codesign of accurate, multiplier-free deep neural networks,” in Proc. 54th Annu. Design Autom. Conf. (DAC), 2017, pp. 1–6., https://arxiv.org/abs/1705.04288
M. S. Ansari, B. F. Cockburn, and J. Han, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Transactions on Computers, 2020, https://ieeexplore.ieee.org/document/9086744
U. Lotric and P. Bulic, "Applicability of approximate multipliers in hardware neural networks," Neurocomput., vol. 96, pp. 57–65, Nov. 2012, https://dl.acm.org/doi/10.1016/j.neucom.2011.09.039
Z. Du, K. Palem, A. Lingamneni, O. Temam, Y. Chen, and C. Wu, "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," in Proc. 19th Asia South Pacific Des. Autom. Conf., 2014, pp. 201–206, https://pages.saclay.inria.fr/olivier.temam/files/eval/DLCPTW2014.pdf
S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy, "Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing," in Proc. Des. Autom. Test Eur. Conf. Exhib., 2016, pp. 145–150, https://arxiv.org/abs/1602.08557
J. Choi and S. Venkataramani, Approximate Computing Techniques for Deep Neural Networks. Cham: Springer, 2019, pp. 307–329, Chapter 15, https://link.springer.com/chapter/10.1007/978-3-319-99322-5_15
M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, 2019, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 317–328, Oct 2019, https://ieeexplore.ieee.org/document/8863138
Biyanu Zerom, Mohammed Tolba, Huruy Tesfai, Hani Saleh, Mahmoud Al-Qutayri, Thanos Stouraitis, Baker Mohammad, Ghada Alsuhli, 2022, Approximate Logarithmic Multiplier For Convolutional Neural Network Inference With Computational Reuse, 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 24-26 October 2022, https://doi.org/10.1109/ICECS202256217.2022.9970861, https://ieeexplore.ieee.org/abstract/document/9970861/
M. S. Ansari, B. F. Cockburn, and J. Han, 2020, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Transactions on Computers, vol. 70, no. 4, pp. 614–625, May 2020. https://ieeexplore.ieee.org/document/9086744
Tso-Bing Juang; Cong-Yi Lin; Guan-Zhong Lin, 2018, “Area-delay product efficient design for convolutional neural network circuits using logarithmic number systems,” in International SoC Design Conference (ISOCC). IEEE, 2018, pp. 170–171, https://ieeexplore.ieee.org/abstract/document/8649961
Ourania Spantidi, Iraklis Anagnostopoulos, "The Perfect Match: Selecting Approximate Multipliers for Energy-Efficient Neural Network Inference", 2023 IEEE 24th International Conference on High Performance Switching and Routing (HPSR), pp.27-32, 2023. https://ieeexplore.ieee.org/document/10147918
O. Spantidi, G. Zervakis, I. Anagnostopoulos, H. Amrouch and J. Henkel, "Positive/negative approximate multipliers for dnn accelerators", arXiv preprint arXiv:2107.09366, 2021. https://arxiv.org/abs/2107.09366 (Approximate multiplication for DNNs without needing retraining.)
Vojtech Mrazek, "Approximation of Hardware Accelerators driven by Machine-Learning Models: (Embedded Tutorial)", 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pp.91-92, 2023. https://ieeexplore.ieee.org/document/10139484
Michal Pinos, Vojtech Mrazek, Filip Vaverka, Zdenek Vasicek, Lukas Sekanina, "Acceleration Techniques for Automated Design of Approximate Convolutional Neural Networks", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol.13, no.1, pp.212-224, 2023. https://ieeexplore.ieee.org/document/10011413
Mohammad Hasan Ahmadilivani, Mario Barbareschi, Salvatore Barone, Alberto Bosio, Masoud Daneshtalab, Salvatore Della Torca, Gabriele Gavarini, Maksim Jenihhin, Jaan Raik, Annachiara Ruospo, Ernesto Sanchez, Mahdi Taheri, "Special Session: Approximation and Fault Resiliency of DNN Accelerators", 2023 IEEE 41st VLSI Test Symposium (VTS), pp.1-10, 2023. https://ieeexplore.ieee.org/document/10140043
Zahra Ebrahimi, Muhammad Zaid, Mark Wijtvliet, Akash Kumar, "RAPID: Approximate Pipelined Soft Multipliers and Dividers for High Throughput and Energy Efficiency", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.42, no.3, pp.712-725, 2023. https://ieeexplore.ieee.org/document/9802734
U. Anil Kumar, Pavankumar Bikki, Sreehari Veeramachaneni, Syed Ershad Ahmed, "Power Efficient Approximate Multiplier Architectures for Error Resilient Applications", 2022 IEEE 19th India Council International Conference (INDICON), pp.1-5, 2022. https://ieeexplore.ieee.org/document/10039748
Qiao Shen, Renyuan Zhang, Hao Zhang, Hao Cai, Bo Liu, Jian Xiao, "A CGP-based Efficient Approximate Multiplier with Error Compensation", 2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), pp.48-49, 2022. https://ieeexplore.ieee.org/document/9963083
Siyuan Liang, Ke Chen, Bi Wu, Weiqiang Liu, "A Survey of Approximation based Hardware Acceleration Techniques for Deep Neural Networks (Invited)", 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), pp.1-4, 2022. https://ieeexplore.ieee.org/document/9963257
Zhen Li, Su Zheng, Jide Zhang, Yao Lu, Jingbo Gao, Jun Tao, Lingli Wang, "Adaptable Approximate Multiplier Design Based on Input Distribution and Polarity", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.30, no.12, pp.1813-1826, 2022. https://ieeexplore.ieee.org/document/9861394
Ourania Spantidi, Georgios Zervakis, Iraklis Anagnostopoulos, Jörg Henkel, "Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.41, no.11, pp.3838-3849, 2022. https://ieeexplore.ieee.org/document/9852790
Ourania Spantidi, Iraklis Anagnostopoulos, "How much is too much error? Analyzing the impact of approximate multipliers on DNNs", 2022 23rd International Symposium on Quality Electronic Design (ISQED), pp.1-6, 2022. https://ieeexplore.ieee.org/document/9806282
Hao Zhang, Seok-Bum Ko, "Variable-Precision Approximate Floating-Point Multiplier for Efficient Deep Learning Computation", IEEE Transactions on Circuits and Systems II: Express Briefs, vol.69, no.5, pp.2503-2507, 2022. https://ieeexplore.ieee.org/document/9739768
S Raghuram, N Shashank, "Approximate Adders for Deep Neural Network Accelerators", 2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID), pp.210-215, 2022. https://ieeexplore.ieee.org/document/9885998
Georgios Zervakis, Iraklis Anagnostopoulos, Sami Salamin, Ourania Spantidi, Isai Roman-Ballesteros, Jörg Henkel, Hussam Amrouch, "Thermal-Aware Design for Approximate DNN Accelerators", IEEE Transactions on Computers, vol.71, no.10, pp.2687-2697, 2022. https://ieeexplore.ieee.org/document/9672753
Tao Li, Yitao Ma, Ko Yoshikawa, Osamu Nomura, Tetsuo Endoh, "Energy-Efficient Convolution Module With Flexible Bit-Adjustment Method and ADC Multiplier Architecture for Industrial IoT", IEEE Transactions on Industrial Informatics, vol.18, no.5, pp.3055-3065, 2022. https://ieeexplore.ieee.org/document/9519513
Tong Li, Hong-Lan Jiang, Hai Mo, Jie Han, Lei-Bo Liu, Zhi-Gang Mao, "Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators", Journal of Computer Science and Technology, vol.38, no.2, pp.309, 2023. https://doi.org/10.1007/s11390-023-2548-8
M. Esmali Nojehdeh, L. Aksoy, M. Altun, Efficient hardware implementation of artificial neural networks using approximate multiply-accumulate blocks, in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2020), pp. 96–101, https://ieeexplore.ieee.org/document/9154973
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, "Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey", ACM Computing Surveys, vol.55, no.4, pp.1, 2023. https://doi.org/10.1145/3527156, https://arxiv.org/abs/2203.08737 (Survey of many approximate techiques in AI.)
Anjankar, S., Hemant Gillurkar, Joshi, P., & Dwaramwar, P. (2022). Design and Analysis of Multipliers for DNN application using approximate 4:2 Compressors. International Journal of Next-Generation Computing, 13(5). https://doi.org/10.47164/ijngc.v13i5.918, https://ijngc.perpetualinnovation.net/index.php/ijngc/article/view/918
Hao Zhang, Mohammadreza Asadikouhanjani, Jie Han, Deivalakshmi Subbian, Seok-Bum Ko, "Approximate Computing for Efficient Neural Network Computation: A Survey", In: Approximate Computing, Editors: Weiqiang Liu, Fabrizio Lombardi, pp.397, 2022. https://doi.org/10.1007/978-3-030-98347-5_16, Amazon: https://www.amazon.com/Approximate-Computing-Weiqiang-Liu-ebook/dp/B0BBKR65SB/
Sudeh Shirkavand Saleh Abad, Mohammad Hossein Moaiyeri, "A Hardware- and Accuracy-Efficient Approximate Multiplier with Error Compensation for Neural Network and Image Processing Applications", Circuits, Systems, and Signal Processing, vol.41, no.12, pp.7057, 2022. https://doi.org/10.1007/s00034-022-02110-7
Cecilia De la Parra, Andre Guntoro, Akash Kumar, Efficient Accuracy Recovery in Approximate Neural Networks by Systematic Error Modelling, ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference, January 2021, Pages 365–371, https://doi.org/10.1145/3394885.3431533, https://dl.acm.org/doi/10.1145/3394885.3431533
Issam Hammad; Kamal El-Sankary; Jason Gu, 2019, Deep Learning Training with Simulated Approximate Multipliers. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), https://ieeexplore.ieee.org/abstract/document/8961780
Issam Hammad and Kamal El-Sankary. 2018. Impact of Approximate Multipliers on VGG Deep Learning Network. IEEE Access (2018). https://ieeexplore.ieee.org/document/8488463
Vojtech Mrazek, Zdenek Vasícek, Lukás Sekanina, Muhammad Abdullah Hanif, and Muhammad Shafique. 2019. ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining. ICCAD '19 (2019) https://arxiv.org/abs/1907.07229
Michal Pinos, Vojtech Mrazek, and Lukás Sekanina. 2021. Evolutionary Neural Architecture Search Supporting Approximate Multipliers. In Genetic Programming-24th European Conference, EuroGP 2021, Virtual Event, April 7--9, 2021. https://arxiv.org/abs/2101.11883
Uros Lotric and Patricio Bulic. 2012. Applicability of approximate multipliers in hardware neural networks. Neurocomputing 96 (2012), 57--65. https://dl.acm.org/doi/10.1016/j.neucom.2011.09.039
Cecilia De la Parra, Andre Guntoro, and Akash Kumar. 2020. ProxSim: GPU-based Simulation Framework for Cross-Layer Approximate DNN Optimization. In 2020 Design, Automation & Test in Europe Conference & Exhibition, DATE 2020, Grenoble, France, March 9--13, 2020. https://ieeexplore.ieee.org/abstract/document/9116476, PDF: https://cfaed.tu-dresden.de/files/Images/people/chair-pd/Papers/date_framework.pdf
Cecilia De la Parra, Andre Guntoro, and Akash Kumar. 2020. Full Approximation of Deep Neural Networks through Efficient Optimization. In IEEE International Symposium on Circuits and Systems, ISCAS 2020, Sevilla, Spain, October 10--21, 2020 https://ieeexplore.ieee.org/document/9181236 (Evaluates over 400 different approximate multipliers.)
Min Soo Kim; Alberto A. Del Barrio; Román Hermida; Nader Bagherzadeh, 2018, “Low-power implementation of Mitchell’s approximate logarithmic multiplication for convolutional neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2018, pp. 617–622. https://ieeexplore.ieee.org/document/8297391
U. Lotric and P. Bulic, 2011, “Logarithmic multiplier in hardware implementation of neural networks,” in International Conference on Adaptive and Natural Computing Algorithms. Springer, April 2011, pp. 158–168. https://dl.acm.org/doi/10.5555/1997052.1997071
X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Shows how multiplication by a power-of-two, which could be optimized to a bitshift in integers, can also be calculated quickly for floating point operands using integer addition on the sign and exponent bits of a floating point number.)

Approximate Caching

Caching or "memoization" is the optimization of storing a computation to re-use it later. Typically, this is the same exact computation, but some newer techniques have used caching of approximate values, so as to get an approximation of the calculated value when reused later. An example is the use of Locality Sensitive Hashing to detect "near-exact" vectors, so as to cache and reuse an entire vector dot product calculation. See more about hashing algorithms and caching optimizations in neural networks.

Papers with approximate caching optimizations:

Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356

Advanced Number Systems and Model Inference

There are a variety of alternative mathematical constructs such as the Residue Number System (RNS) and the Posit Number System (PNS); see advanced number systems. For an addition-based method of approximate multiplication, see the Logarithmic Number System (LNS). Papers on the use of advanced number systems with neural networks include:

G Alsuhli, V Sakellariou, H Saleh, M Al-Qutayri, Number Systems for Deep Neural Network Architectures: A Survey, 2023, https://arxiv.org/abs/2307.05035 (A very comprehensive survey.)
Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer, HAWQ-V3: Dyadic Neural Network Quantization, Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11875-11886, 2021, https://arxiv.org/abs/2011.10680 (Dyadic numbers.)
S. Salamat, M. Imani, S. Gupta, and T. Rosing, RNSnet: In-memory neural network acceleration using residue number system, 2018, In Proceedings of the 2018 IEEE International Conference on Rebooting Computing (ICRC’18), 1–12, https://ieeexplore.ieee.org/document/8638592 (Residue Number System)
Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D. Kudithipudi, Deep positron: A deep neural network using the posit number system. 2019, In Proceedings of the 2019 Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 1421–1426, https://arxiv.org/abs/1812.01762 (Posit Number System)
Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, and Dhireesha Kudithipudi, Performance-efficiency trade-off of low-precision numerical formats in deep neural networks, 2019, In Proceedings of the 2019 Conference for Next Generation Arithmetic (CoNGA’19), ACM, New York, NY, Article 3, 9 pages, https://doi.org/10.1145/3316279.3316282

Approximate Transformer Components

Research has turned to approximating the larger building block components inside the Transformer architecture. See also high-level Transformer optimization techniques, such as quantization, attention head pruning and layer pruning. Papers on high-level approximations of Transformer components are below in areas such as:

Attention Head Approximation

See the research on approximate attention head architectures and attention optimization in general.

Activation Function Approximation

See approximations of activation functions.

Softmax Approximation

See research on softmax optimization and approximation.

Approximating Normalization

The normalization layer can be coded as an approximate normalization layer or alternatively, there is also pruned normalization (removed).

Approximating Other Transformer Components

Other general papers on approximations for Transformer architectures (and neural networks in general):

Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi, NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference, Dec 2021, https://arxiv.org/pdf/2112.02191 (Approximation using look-up tables.)
Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., Jones, L., Schuster, M., Shazeer, N., Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Chen, Z., Wu, Y., and Hughes, M. The best of both worlds: Combining recent advances in neural machine translation. In ACL, 2018, https://arxiv.org/abs/1804.09849 (Hybrid Transformer architectures.)
Ma, J. and Yarats, D. On the adequacy of untuned warmup for adaptive optimization. arXiv:1910.04209, 2019. https://arxiv.org/abs/1910.04209
J Zhong, Z Liu, X Chen, Apr 2023, Transformer-based models and hardware acceleration analysis in autonomous driving: A survey, https://arxiv.org/abs/2304.10891 (Sections on approximating various components of Transformers.)

Approximate Neural Networks

More research papers on approximation used with neural networks, in general:

Z. Peng et al. 2018. AXNet: ApproXimate computing using an end-to-end trainable neural network. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) https://ieeexplore.ieee.org/document/8605388 (Ensemble dual-model method where one model is a fast approximatation of the other.)
Matevž Fabjančič, Octavian Machidon, Hashim Sharif, Yifan Zhao, Saša Misailović, Veljko Pejović, March 2023, Mobiprox: Supporting Dynamic Approximate Computing on Mobiles, https://arxiv.org/abs/2303.11291 (Uses probabilistic approximations, such as loop perforation, for fast neural networks on mobile.)
Jorge Castro-Godínez, Deykel Hernández-Araya, Muhammad Shafique, Jörg Henkel, 2020, Approximate acceleration for CNN-based applications on IoT edge devices, 2020 IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS), https://ieeexplore.ieee.org/document/9069040
Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356 (Examines some early approximate neural networks such as AxNN.)
W Dong, G Kestor, D Li, 2023, Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications, HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, August 2023, Pages 31–44, https://doi.org/10.1145/3588195.3592985, https://dl.acm.org/doi/abs/10.1145/3588195.3592985
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints, In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, ACM MobiSys, Singapore, 26–30 June 2016, pp. 123–136, https://dl.acm.org/doi/10.1145/2906388.2906396
F Manca, F Ratto, 2023, ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs arXiv preprint arXiv:2309.13321, https://arxiv.org/pdf/2309.13321.pdf (Approximation techniques applied to edge computing.)
HJ Damsgaard, A Ometov, J Nurmi, 2023, ACM Computing Surveys, Approximation Opportunities in Edge Computing Hardware: A Systematic Literature Review https://dl.acm.org/doi/abs/10.1145/3572772, PDF: https://dl.acm.org/doi/pdf/10.1145/3572772
M. A. Hanif, F. Khalid, and M. Shafique, CANN: Curable approximations for high-performance deep neural network accelerators, in Proc. 56th Annu. Design Automat. Conf. (DAC). New York, NY, USA: Association for Computing Machinery, 2019, pp. 1–6. https://ieeexplore.ieee.org/document/8806937

Approximation to Avoid Redundant Computations

A simple approximate calculation can sometimes be performed as a preliminary, so as to avoid expensive calculations in some cases (see also conditional computation and caching optimizations). This method is sometimes called "common case first" or "simple case first". General papers about using this at the low-level are below, but the logical extension to the high-level is "big-little models" (see ensemble architectures).

Duvindu Piyasena, Rukshan Wickramasinghe, Debdeep Paul, Siew Kei Lam, and Meiqing Wu. 2019. Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies. Proceedings 29th International Conference on Field-Programmable Logic and Applications, FPL 2019 (9 2019), 354–359, https://ieeexplore.ieee.org/document/8891989, PDF: https://siewkeilam.github.io/ei-research-group/Paper/2019H-Duvindu-FPL.pdf (Calculates an approximate result to inexactly avoid cases where the exact computations would be negative, and would be reduced to zero by RELU activation.)
Yuxiang Huan, Yifan Qin, Yantian You, Lirong Zheng, and Zhuo Zou. Sep 2016. A multiplication reduction technique with near-zero approximation for embedded learning in IoT devices. 2016 29th IEEE International System-on-Chip Conference (SOCC), 102–107. https://ieeexplore.ieee.org/abstract/document/7905445 (Avoids near-zero low multiplications on small values, that would result in zero, thereby skipping a wasteful multiplication.)
Maedeh Hemmat, Joshua San Miguel, and Azadeh Davoodi. 2020. AirNN: A Featherweight Framework for Dynamic Input-Dependent Approximation of CNNs. Transactions on Computer-Aided Design of Integrated Circuits and Systems. https://ieeexplore.ieee.org/document/9239327 (Approximates weight computations by pre-computing them into groups offline, and only using some of the weights in calculations during inference, effectively dynamically pruning the other weights to zero.)
Minkyu Kim and Jae Sun Seo. 2021. An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access. IEEE Journal of Solid-State Circuits 56, 3 (2021), 803–813, https://ieeexplore.ieee.org/document/9229157 (Approximate convolutions with most-significant bits are done first.)

Approximate Computing General Research

The general idea of using approximations in computing, as a general algorithm with trade-offs, has a considerable body of research. Here are a few of the theoretical papers:

D. Palomino, M. Shafique, A. Susin, J. Henkel, “Thermal Optimization using Adaptive Approximate Computing for Video Coding”, IEEE/ACM 19th Design, Automation and Test in Europe Conference (DATE), 2016, https://ieeexplore.ieee.org/document/7459495
V. Mrazek, M. A. Hanif et al., “autoax: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components,” in DAC’19. ACM, 2019, https://arxiv.org/abs/1902.10807
R. Nair, “Big data needs approximate computing: technical perspective”, ACM Communications, 58(1): 104, 2015. https://dl.acm.org/doi/10.1145/2688072
A. K. Mishra, R. Barik, S. Paul, “iACT: A Software-Hardware Framework for Understanding the Scope of Approximate Computing”, Workshop on Approximate Computing Across the System Stack (WACAS), 2014. PDF: https://sampa.cs.washington.edu/wacas14/papers/mishra.pdf
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Architecture support for disciplined approximate programming”, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. PDF: https://www.cs.cornell.edu/~asampson/media/papers/truffle-asplos2012.pdf
V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan, “Analysis and characterization of inherent application resilience for approximate computing”, Design Automation Conference (DAC), 2013. https://ieeexplore.ieee.org/document/6560706
J. Choi and S. Venkataramani, Approximate Computing Techniques for Deep Neural Networks. Cham: Springer, 2019, pp. 307–329, Chapter 15, https://link.springer.com/chapter/10.1007/978-3-319-99322-5_15
S. Venkataramani, S. T. Chakradhar, K. Roy, and A. Raghunathan, Approximate computing and the quest for computing efficiency, Proceedings of the 52nd Annual Design Automation Conference, ACM (2015), p. 120, https://ieeexplore.ieee.org/document/7167251
G. Pekhimenko, D. Koutra, K. Qian, “Approximate computing: Application analysis and hardware design”, May 2013, PDF: www.cs.cmu.edu/~gpekhime/Projects/15740/paper.pdf
Weiqiang Liu, Fabrizio Lombardi (Book Editors), Approximate Computing, 2022, https://link.springer.com/book/10.1007/978-3-030-98347-5, https://www.amazon.com/Approximate-Computing-Weiqiang-Liu-ebook/dp/B0BBKR65SB/
Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356
VV Kulkarni, 2020, Approximate computing techniques for accelerating compute intensive workloads, https://www.ideals.illinois.edu/items/115960, PDF: https://www.ideals.illinois.edu/items/115960/bitstreams/379143/object?dl=1
Amir Yazdanbakhsh; Divya Mahajan; Hadi Esmaeilzadeh; Pejman Lotfi-Kamran, 2017, AxBench: A multiplatform benchmark suite for approximate computing, IEEE Design & Test, Volume 34, Issue 2, April 2017, https://ieeexplore.ieee.org/abstract/document/7755728/, PDF: https://ieeexplore.ieee.org/ielaam/6221038/7862860/7755728-aam.pdf, PDF: http://axbench.org/papers/dt.darksilicon16-camera.pdf
Michael Ringenburg, Adrian Sampson, Isaac Ackerman, Luis Ceze, and Dan Grossman. 2015. Monitoring and debugging the quality of results in approximate programs. In International Conference on Architectural Support for Programming Languages and Operating Systems. 399–411. https://dl.acm.org/doi/10.1145/2775054.2694365, PDF: https://homes.cs.washington.edu/~luisceze/publications/approxdebug-asplos15.pdf
Thomas Y. Yeh, Petros Faloutsos, Milos Ercegovac, Sanjay J. Patel, and Glenn Reinman. 2007. The art of deception: Adaptive precision reduction for area efficient physics acceleration. 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). pp. 394–406. https://ieeexplore.ieee.org/document/4408271
Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In ACM SIGARCH Computer Architecture News, Vol. 42. 35–50, https://dl.acm.org/doi/10.1145/2654822.2541948
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 2022, Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, ACM Computing Surveys, Volume 55, Issue 4, No. 83, pp 1–36 https://doi.org/10.1145/3527156, https://dl.acm.org/doi/10.1145/3527156, https://arxiv.org/abs/2203.08737
Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W Lee, et al. A^3: Accelerating attention mechanisms in neural networks with approximation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 328–341. IEEE, 2020. https://arxiv.org/abs/2002.10941
Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 12 Feb 2024, TransAxx: Efficient Transformers with Approximate Computing, https://arxiv.org/abs/2402.07545 (Using approximations in Vision Transformer architectures.)
Salar Shakibhamedan, Amin Aminifar, Nima TaheriNejad, Axel Jantsch, 2024, EASE: Energy Optimization through Adaptation — A Review of Runtime Energy-Aware Approximate Deep Learning Algorithms, https://eclectx.org/Publications/2024_M13.pdf (Survey paper on techniques for adaptive inference with a focus on approximations of inference, including loop performance, stochastic algorithms, approximate arithmetic, quantization, pruning and low-rank.)
John Fraser Hart, Jul 1, 1978 Computer Approximations, https://www.amazon.com/Computer-Approximations-John-Fraser-Hart/dp/0882756427/
Teofilo F. Gonzalez, Sep 30, 2020 Handbook of Approximation Algorithms and Metaheuristics, Second Edition: Two-Volume Set (Chapman & Hall/CRC Computer and Information Science Series), https://www.amazon.com/Handbook-Approximation-Algorithms-Metaheuristics-Second/dp/0367570289/
Ivan Markovsky, Aug 3, 2018, Low-Rank Approximation: Algorithms, Implementation, Applications (Communications and Control Engineering) Part of: Communications and Control Engineering (62 books), https://www.amazon.com/Low-Rank-Approximation-Implementation-Applications-Communications/dp/3319896199/
Vijay V. Vazirani, Jul 2, 2001, Approximation Algorithms, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazirani/dp/3540653678/
David P. Williamson and David B. Shmoys, Apr 26, 2011, The Design of Approximation Algorithms, https://www.amazon.com/Design-Approximation-Algorithms-David-Williamson-ebook/dp/B009019XCG/
A. M. Dalloo, A. J. Humaidi, A. K. A. Mhdawi and H. Al-Raweshidy, "Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future Directions," in IEEE Access, doi: 10.1109/ACCESS.2024.3467375. https://ieeexplore.ieee.org/document/10693435
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Guohao Dai, 6 Oct 2024, Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective, https://arxiv.org/abs/2410.04466
Yu-Ching Hu, September 2024, Efficient Accelerator-Rich Computers for Future Applications, Ph.D. Thesis, Computer Science, https://escholarship.org/content/qt68w3z4vq/qt68w3z4vq.pdf
Ayad M. Dalloo, Amjad J. Humaidi, Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies, Results in Engineering, 2024, 103451, ISSN 2590-1230, https://doi.org/10.1016/j.rineng.2024.103451 https://www.sciencedirect.com/science/article/pii/S2590123024017031 https://github.com/AyadMDalloo/DatalvlAxC
Gansen Hu, Zhaoguo Wang, Jinglin Wei, Wei Huang, Haibo Chen, 17 Jan 2025, Accelerating Large Language Models through Partially Linear Feed-Forward Network, https://arxiv.org/abs/2501.10054 (Inspired by constant folding, the optimization is merging the two MatMuls in an FFN by approximating the itervening non-linear activation function (e.g., RELU or GELU), with linear functions and merging the two matrices using matrix-multiplication associativity.)
Wonjae Lee, Taeyoung Kim, Hyungbin Park, 23 Jul 2025, Fourier Neural Operators for Non-Markovian Processes:Approximation Theorems and Experiments, https://arxiv.org/abs/2507.17887
Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan, 24 Jul 2025, On the Approximation of Stationary Processes using the ARMA Model, https://arxiv.org/abs/2408.10610
Yunfei Yang, 18 Jul 2025, On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks, https://arxiv.org/abs/2409.00901
Daniel Greenhut and Dan Feldman, 19 Jul 2025, $k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation, https://arxiv.org/abs/2507.14631
Jianghang Gu, Ling Wen, Yuntian Chen, Shiyi Chen, 20 Jul 2025, An explainable operator approximation framework under the guideline of Green's function, https://arxiv.org/abs/2412.16644
Sachin Garg, Micha{\l} Derezi\'nski, 19 Jul 2025, Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystr\"om Method, https://arxiv.org/abs/2506.17556
Guanqun Ma, David Lenz, Hanqi Guo, Tom Peterka, Bei Wang, 11 Aug 2025, Extracting Complex Topology from Multivariate Functional Approximation: Contours, Jacobi Sets, and Ridge-Valley Graphs, https://arxiv.org/abs/2508.07637
Matthew Fahrbach, Mehrdad Ghadiri, 8 Aug 2025, A Tight Lower Bound for the Approximation Guarantee of Higher-Order Singular Value Decomposition, https://arxiv.org/abs/2508.06693
Bogdan Butyrin, Artemy Rubtsov, Alexey Naumov, Vladimir Ulyanov, Sergey Samsonov, 11 Aug 2025, Gaussian Approximation for Two-Timescale Linear Stochastic Approximation, https://arxiv.org/abs/2508.07928
Shim Soon Yong, 11 Aug 2025, ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space, https://arxiv.org/abs/2507.10638
Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu, 29 Jul 2025, Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation, https://arxiv.org/abs/2507.21455
Jiawei Liu, Chenwang Wu, Defu Lian, Enhong Chen, 31 Jul 2025, Efficient Machine Unlearning via Influence Approximation, https://arxiv.org/abs/2507.23257
Anthony Nouy and Bertrand Michel, 31 Jul 2025, Weighted least-squares approximation with determinantal point processes and generalized volume sampling, https://arxiv.org/abs/2312.14057
Gary Froyland and Kevin K\"uhl, 30 Jul 2025, Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation, https://arxiv.org/abs/2505.05085
Yongchao Huang, 31 Jul 2025, RL as Regressor: A Reinforcement Learning Approach for Function Approximation, https://arxiv.org/abs/2508.00174
Sergei Gleyzer, Hanh Nguyen, Dinesh P. Ramakrishnan, Eric A. F. Reinhardt, 1 Aug 2025, Sinusoidal Approximation Theorem for Kolmogorov-Arnold Networks, https://arxiv.org/abs/2508.00247
Soumyajit Guin, Vivek S. Borkar, Shalabh Bhatnagar, 3 Aug 2025, An Actor-Critic Algorithm with Function Approximation for Risk Sensitive Cost Markov Decision Processes, https://arxiv.org/abs/2502.11604
Anastasis Kratsios, Bum Jun Kim, Takashi Furuya, 6 Aug 2025, Approximation Rates in Besov Norms and Sample-Complexity of Kolmogorov-Arnold Networks with Residual Connections, https://arxiv.org/abs/2504.15110
Prashant Gupta, Aashi Jindal, Jayadeva, and Debarka Sengupta, 7 Aug 2025, Guided Random Forest and its application to data approximation, https://arxiv.org/abs/1909.00659
Hannes Waclawek and Stefan Huber, 7 Aug 2025, Energy Optimized Piecewise Polynomial Approximation Utilizing Modern Machine Learning Optimizers, https://arxiv.org/abs/2503.09329
Ben Adcock, 8 Aug 2025, Optimal sampling for least-squares approximation, https://arxiv.org/abs/2409.02342
Johannes Aspman, Vyacheslav Kungurtsev, Reza Roohi Seraji, 12 Aug 2025, Tame Riemannian Stochastic Approximation, https://arxiv.org/abs/2302.00709
Liwei Jiang, Abhishek Roy, Krishna Balasubramanian, Damek Davis, Dmitriy Drusvyatskiy, Sen Na, 12 Aug 2025, Online Covariance Estimation in Nonsmooth Stochastic Approximation, https://arxiv.org/abs/2502.05305
Gen Li, Yuchen Zhou, Yuting Wei, Yuxin Chen, 13 Aug 2025, Faster Diffusion Models via Higher-Order Approximation, https://arxiv.org/abs/2506.24042
Mohammad Mozaffari, Amir Yazdanbakhsh, Maryam Mehri Dehnavi, 14 Aug 2025, SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression, https://arxiv.org/abs/2410.09615
Quentin Ploussard, Xiang Li, Matija Pavi\v{c}evi\'c, 13 Aug 2025, Tightening the mixed integer linear formulation for the piecewise linear approximation in general dimensions, https://arxiv.org/abs/2508.09395
Daniel Hsu, 18 Aug 2025, Dimension lower bounds for linear approaches to function approximation, https://arxiv.org/abs/2508.13346
Marina Sheshukova, Sergey Samsonov, Denis Belomestny, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov, 19 Aug 2025, Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent, https://arxiv.org/abs/2502.06719
Haoru Tan, Sitong Wu, Xiuzhe Wu, Wang Wang, Bo Zhao, Zeke Xie, Gui-Song Xia, and Xiaojuan Qi, 20 Aug 2025, Understanding Data Influence with Differential Approximation, https://arxiv.org/abs/2508.14648
Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir, 20 Aug 2025, Improving Actor-Critic Training with Steerable Action-Value Approximation Errors, https://arxiv.org/abs/2406.03890
Mohammad Amin Esabat, Saeed Jaamei, Fatemeh Asadi, 24 Aug 2025, DeepCFD: Efficient near-ground airfoil lift coefficient approximation with deep convolutional neural networks, https://arxiv.org/abs/2508.17278
Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci, 23 Aug 2025, LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation, https://arxiv.org/abs/2502.20583
Tobias Weber, B\'alint Mucs\'anyi, Lenard Rommel, Thomas Christie, Lars Kas\"uschke, Marvin Pf\"ortner, Philipp Hennig, 22 Jul 2025, laplax -- Laplace Approximations with JAX, https://arxiv.org/abs/2507.17013
Ariel Neufeld, Tuan Anh Nguyen, 22 Jul 2025, Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense, https://arxiv.org/abs/2409.20431
Nedeljko Radulovic, Albert Bifet, Fabian Suchanek, 12 Aug 2025, BELLA: Black box model Explanations by Local Linear Approximations, https://arxiv.org/abs/2305.11311
Michael Mayer and Mario V. W\"uthrich, 18 Aug 2025, Shapley Values: Paired-Sampling Approximations, https://arxiv.org/abs/2508.12947