Aussie AI

Obstacles to Stardom

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Obstacles to Stardom

Several problems need to be overcome to use end-to-end LNS for models, including:

  • LNS addition is expensive
  • Hardware acceleration
  • Zero numbers
  • Negative numbers
  • Floating-point addition
  • LNS Transformer components

Addition problems. Addition and subtraction are slow and problematic in LNS-based systems, so must be approximated or accelerated in various ways. It seems ironic to need to accelerate addition, since the whole point of the use of LNS is to accelerate multiplication by changing it into addition! But it's two different types of addition: the original linear-domain multiplication changes to normal fast addition, but then the original addition needs to change to log-domain addition, which is hard.

Hardware acceleration is problematic. The simple vector dot product can be accelerated via the “fused-multiply-addition” (FMA) type of vectorized operations. The status of LNS vectorization is incomplete. Obviously, all CPUs and GPUs have accelerated vectorized addition. And there are several research hardware accelerations of LNS addition in the research literature. But what we really need is a fused version of normal addition with followup LNS addition, which is the equivalent of FMA in the linear domain.

Zero problems. Zero weights must be handled separately, since the logarithm of zero is infinite. This requires a test for zero as part of the logic, or an algorithmic method to avoid zero values (e.g. using an extra bit flag to represent zero-ness). Alternatively, a hardware version of LNS would need to handle a zero reasonably.

Negative number problems. Negative numbers are also problematic in the LNS, and models usually have both positive and negative weights. Since logarithms cannot be used on a negative number, the logarithm of the absolute value of the weight must be used, with an alternative method (e.g. sign bit) used to handle negative weights differently, so that the engine knows to subtract the weight's logarithm, rather than add in the LNS arithmetic. Alternatively, weights might be scaled so they are all positive, to avoid the log-of-negatives problem.

Does it work? The use of logarithmic numbers hasn't become widely used in AI models, possibly because vector dot product and matrix multiplication require not just multiplication, but addition of multiplications, and addition is difficult in LNS (usually approximate). Both training and inference need to be performed in LNS. Conversion back-and-forth between LNS and floating-point weights and probabilities also adds some overhead (in both training and inference), and possibly some more inaccuracy for inference. These issues might limit the model's accuracy compared to non-logarithmic floating-point.

Floating-point addition. Furthermore, an LNS model stores the logarithms of weights as floating-point numbers, and thus requires floating-point addition rather than integer addition. The gain from changing floating-point multiplication to floating-point addition is nowhere near as large as changing it to integer arithmetic operations (e.g. as used in logarithmic quantization or integer-only quantization methods). Indeed, paradoxically, there are even circumstances where floating-point addition is worse than floating-point multiplication, because addition requires sequential non-parallelizable sub-operations, but this depends on the hardware acceleration and the exact representation of floating-point numbers used.

No memory benefit. Another concern is that research papers report that AI model inference is usually memory-bound rather than CPU-bound, with the GPU waiting to receive data because reading it from RAM is slower. In memory-bound cases, the conversion of arithmetic from multiplication to addition does not address the main bottleneck, and the LNS may have reduced benefit. The LNS does not allow the use of smaller data sizes, since it stores logarithms of weights and internal computations as floating-point, whereas quantization can use integers or smaller bit widths.

Research-only. The use of end-to-end LNS models has not gone mainstream. Some of the problematic issues with additions involving weights and activation functions, and in relation to training with LNS weights, are described in Alsuhli et al. (2023). These concerns limit the use of LNS numbers in an end-to-end method, and suggest the use of alternatives such as approximate logarithmic multiplication or logarithm-antilogarithm multiplications (Alsuhli et al., 2023). Nevertheless, there are several attempts in the literature to use LNS for model training and inference in various ways, starting with Arnold et al. (1991), using theory dating back to the 1980s.

One final thought. Here's the funny thing about doing end-to-end LNS models: an AI model is already doing logarithms, so we're trying to do logarithms-of-logarithms. Remember that the logits output from a model are in the log-domain, and Softmax has to convert them by exponentiation, so they're in the linear-domain. Maybe there's a way to back it up a level, and use LNS for the log-domain computations in the model itself? My brain shuts down and screams for ice-cream whenever I try to think about this idea.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++