Aussie AI

Data Structure Double Initialization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Data Structure Double Initialization

If you have an initialization routine that does a lot of work, it sometimes becomes a slug by accident. I'm not talking about a single variable initialization, but the initialization of a large program data structure at startup, like a precomputed lookup-table or a perfect hashing algorithm. In the design patterns vocabulary, such a situation is a “singleton” data structure, where only a single object ever exists in the program. It's easy to lose track of whether its initialization routine has been called, and then it gets called twice (or more!).

An example would be some of the precomputation methods whereby a large lookup-table is initialized at program startup. For example, a 24-bit lookup table has been used elsewhere in this book to optimize AI activation functions such as GELU.

The way to avoid the slug of double-initialization is simply to track calls to the initialization routine. The idiom that I use is a local static variable of type bool at the start of the initialization function:

    static bool s_once = false;
    if (s_once) {
        yassert(!s_once);  // Should be once only
        return;  // Avoid double intialization!
    }
    s_once = true;

Another way is to actually count the calls with an integer, which is a generalization that works for additional scenarios:

    static int s_calls = 0;
    ++s_calls;
    if (s_calls > 1) {
        yassert(s_calls <= 1);
        return;  // Avoid double intialization!
    }

Note that I've shown how to wrap these multiple lines of code up into a single “yassert_once” macro in Chapter 41, if you want a simpler method.

Singleton global objects. If you've done the hard yards to declare a big data structure like this as its own class, then you can simply instantiate only one object (i.e. as a global). The C++ class infrastructure does well in ensuring that a constructor is only called once. Even so, it may be worthwhile to declare a static data member and use similar logic to ensure that initialization on this object isn't ever done twice.

In any of these situations, it's a worthwhile investment of a couple of CPU instructions, an increment and a test, to avoid accidentally running the whole routine again. Since the code is virtually identical for all cases, to avoid copy-paste typos, you could even hide these few statements behind a standard C++ preprocessor macro with a name of your choosing Or you could even use an inline function with the “return” statement changed to throwing an exception.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++