Aussie AI

Random Number Seeds

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Random Number Seeds

Neural network code often uses random numbers to improve accuracy via a stochastic algorithm. For example, the top-k decoding uses randomness for creativity and to prevent the repetitive looping that can occur with greedy decoding. And you might use randomness to generate input tests when you're trying to thrash the model with random prompt strings.

But that's not good for debugging! We don't want randomness when we're trying to reproduce a bug!

Hence, we want it to be random for users, but not when we're debugging. Random numbers need a “seed” to get started, so we can just save and re-use the seed for a debugging session. This idea can be applied to old-style rand/srand functions or to the newer <random> libraries.

Seeding the random number generator in old-style C++ is done via the “srand” function. The longstanding way to initialize the random number generator, so it's truly random, is to use the current time:

    srand(time(NULL));

Note that seeding with a guessable value is a security risk. Hence, it's safer to use some additional arithmetic on the time return value.

After seeding, the “rand” function can be used to get a truly unpredictable set of random numbers. The random number generator works well and is efficient. A generalized plan is to have a debugging or regression testing mode where the seed is fixed.

    if (g_aussie_debug_srand_seed != 0) {
        // Debugging mode
        srand(g_aussie_debug_srand_seed);   // Non-random randomness!
    }
    else {  // Normal run
        srand(time(NULL));
    }

The test harness has to set the global debug variable “g_aussie_debug_srand_seed” whenever it's needed for a regression test. For example, either it's manually hard-coded into a testing function, or it could be set via a command-line argument to your test harness executable, so the program can be scripted to run with a known seed.

This is better, but if we have a bug in production, we won't know the seed number. So, the better code also prints out the seed number (or logs it) in case you need to use it later to reproduce a bug that occurred live.

    if (g_aussie_debug_srand_seed != 0) {
        srand(g_aussie_debug_srand_seed);   // Debug mode
    }
    else {  // Normal run
        long int iseed = (long)time(NULL);
        fprintf(stderr, "INFO: Random number seed: %ld 0x%lx\n", iseed, iseed);
        srand(iseed);
    }

An extension would be to also print out the seed in error context information on assertion failures or other internal errors.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++