Aussie AI
End-to-End Logarithmic Models
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
End-to-End Logarithmic Models
A pure logarithmic model is one that maintains its calculations using the Logarithmic Number System. Alsuhli et al. (2023) refers to this approach as an “end-to-end” LNS model, which means performing all calculations the “log-domain” (i.e. working on logarithms of values, rather than the original values).
The idea is basically to change every multiplication by a weight into an addition, and any division into a subtraction. Instead of weights, the logarithm of a weight is stored and used throughout the layers. A full implementation of this end-to-end idea requires not just arithmetic changes, but also changes to the various Transformer components such as normalization, Softmax, and so on.
Note that there are several other ways to use logarithms in AI engines and the LNS is not the same as:
- Logarithmic bitshift quantization
- Approximate multiplication arithmetic with logarithms
- Advanced number systems: Dyadic numbers, multi-base numbers, etc.
LNS models are not an approximation. The idea for logarithmic numbers is not an approximation, but exact computations. The calculations occur in the “log-domain” but are intended to represent the full original calculations in the original linear domain. The aim is to convert to logs at the start and then un-convert back to the original numbers, with the same results, but faster. In practice, the precision may be somewhat lower because the log-domain is much more contracted than the linear-domain, so some low-order fractional digits may be lost. Hence, the method may be somewhat approximate in that sense, although its goal is exactness.
Intermediate computations should also be stored as a logarithmic value, such as embeddings or probabilities, so that both sides of a MatMul are logarithmic, allowing addition to be used instead of arithmetic multiplication operations. This requires adjustments to other Transformer architectural components, such as normalization and Softmax.
Theoretically, it should be workable once everything is changed to log-domain. However, practical problems arise because MatMul and vector dot product also require addition operations (after the multiplications), and LNS addition is slow because log-domain addition isn't normal addition, and cannot be easily hardware-accelerated.
Logarithmic weight arithmetic differs from normal weight multiplication. For weights greater than 1, the logarithm is positive and addition occurs; for positive fractional weights from 0..1, which are effectively a division, the logarithm is negative and subtraction is used (or adding of a negative value, equivalently). If the weight is exactly 1, the logarithm is exactly 0, and adding 0 is as harmless as multiplying by 1. Potentially, the technique could involve integers or floating-point numbers to represent the logarithm.
Literature review. Research papers on end-to-end LNS models:
- D. Miyashita, E. H. Lee, and B. Murmann, 2016, Convolutional neural networks using logarithmic data representation, arXiv preprint arXiv:1603.01025, 2016. https://arxiv.org/abs/1603.01025 (A major paper on using log-domain weights and activations, using addition of log-domain values instead of multiplication, which also covers the difficulties with accumulation.)
- G. Alsuhli, V. Sakellariou, H. Saleh, M. Al-Qutayri, 2023, Number Systems for Deep Neural Network Architectures: A Survey, https://arxiv.org/abs/2307.05035 (Extensive survey paper with a deep dive into the theory of LNS and other systems such as Residue Number System and Posit numbers, with application to neural networks. Also covers LNS usage with activation functions and Softmax.)
- Saeedeh Jahanshahi, Amir Sabbagh Molahosseini & Azadeh Alsadat Emrani Zarandi, 2023, uLog: a software-based approximate logarithmic number system for computations on SIMD processors, Journal of Supercomputing 79, pages 1750–1783 (2023), https://link.springer.com/article/10.1007/s11227-022-04713-y (Paper licensed under CC-BY-4.0, unchanged: http://creativecommons.org/licenses/by/4.0/)
- A. Sanyal, P. A. Beerel, and K. M. Chugg, 2020, Neural network training with approximate logarithmic computations, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 3122–3126. https://arxiv.org/abs/1910.09876 (End-to-end LNS model for both training and inference. Converts “leaky-ReLU” activation function and Softmax to log-domain.)
- J Zhao, S Dai, R Venkatesan, B Zimmer, 2022, LNS-Madam: Low-precision training in logarithmic number system using multiplicative weight update, IEEE Transactions on Computers, Vol. 71, No. 12, Dec 2022, https://ieeexplore.ieee.org/abstract/document/9900267/, PDF: https://ieeexplore.ieee.org/iel7/12/4358213/09900267.pdf (LNS in training of models. Uses different logarithm bases, including fractional powers of two, and LNS addition via table lookups.)
- E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S., 2017, Wong, Lognet: Energy-efficient neural networks using logarithmic computation, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 5900–5904. https://ieeexplore.ieee.org/document/7953288 (Uses LNS multiplication in the log-domain, but still does accumulate/addition in the linear-domain.)
- Maxime Christ, Florent de Dinechin, Frédéric Pétrot, 2022, Low-precision logarithmic arithmetic for neural network accelerators, 33rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2022), IEEE, Jul 2022, Gothenburg, Sweden. ff10.1109/ASAP54787.2022.00021ff. ffhal-03684585f, https://ieeexplore.ieee.org/abstract/document/9912091/, PDF: https://inria.hal.science/hal-03684585/document (Use of LNS in model inference, with coverage of dropping the sign bit and handling of zeros.)
- J. Johnson, 2018, Rethinking floating-point for deep learning, arXiv preprint arXiv:1811.01721, 2018, https://arxiv.org/abs/1811.01721 (Uses an end-to-end LNS version called “exact log-linear multiply-add (ELMA)” which is a “hybrid log multiply/linear add” method. Uses a Kulisch accumulator for addition.)
For more research papers on end-to-end LNS models, see https://www.aussieai.com/research/logarithmic#end2end.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |