Aussie AI

Timing C++ Code

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Timing C++ Code

There are a number of reasons why it can be useful to time the execution of a program. Timing C++ code can be useful in determining which statements should be optimized whereas profilers may only indicate which functions are consuming time. Timing code can also determine the relative efficiency of various operations and give you valuable information about writing code for your machine (e.g. is shifting faster than integer multiplication?).

The time Command. If the full execution time for a program is all that is needed, the Linux time command can be used to calculate the time required by a program. There are two versions — a stand-alone utility in /bin and a command built into csh. The command to run is usually:

    time a.out

A different executable name could also be used and command line arguments can also be specified.

Code Instrumentation. If a more detailed speed analysis is needed, it is possible to add C++ self-instrumentation code to your program to monitor its own performance. The basic idea is to use the standard library functions to monitor the time before and after an action.

The most useful function is the “clock” function which counts the number of clock ticks since the program began executing. The “time” function, which keeps track of the real calendar time could also be used, but it is not a true indication of processor time on a large multi-user system. The clock function is correct for both single user and multi-user systems.

The clock function returns a value of type clock_t (typically long or int) that counts the number of clock ticks. This value can be converted to seconds by dividing by the constant CLOCKS_PER_SEC, also declared in <time.h>.

The basic idea of timing C++ code blocks is to call the clock function before and after an operation and examine the difference between the number of clicks. The code below examines the relative speed of shift and multiplication operations on int operands.

    void profile_shifts()
    {
        const int MILLION = 1000000;
        const int ITERATIONS = 100 * MILLION;

        int x = 1, y = 2, z = 3;

        clock_t before = clock();
        for (int i = 0; i < ITERATIONS; i++)
            x = y << z;
        printf("%d Shifts took %f seconds\n", ITERATIONS,
            (double)(clock() - before) / CLOCKS_PER_SEC);

        before = clock();
        for (int i = 0; i < ITERATIONS; i++)
            x = y * z;
        printf("%d Multiplications took %f seconds\n", ITERATIONS,
            (double)(clock() - before) / CLOCKS_PER_SEC);
    }

clock Portability Pitfall. Note that some implementations on older Unix versions don’t conform to the C++ standard and return the number of clock ticks since the first call to the clock function. This means that a single call to clock at the end of the program would always return zero. Hence, it is more portable to measure the number of clock ticks between two calls to clock, one at the start and one at the end. Obviously, you can also put the first call to “clock” at the start of the “main” function to avoid this rare glitch. Note that on implementations that are correct, a call at the start of “main” may be non-zero due to the overhead of global and static C++ object instantiations (i.e. constructors for global objects), which occurs before entering main.

Clock Tick Integer Division Pitfall. Note that the clock_t type and CLOCKS_PER_SEC constant are both integers. Hence, here's a bug:

    clock_t diff = clock() - before;
    double seconds = diff / CLOCKS_PER_SEC; // Bug!

The problem is that it's integer division, so it inaccurately truncates to an integer. You need a typecast to float or double on either side of the division operator.

    clock_t diff = clock() - before;
    double seconds = diff / (double)CLOCKS_PER_SEC; // Correct

Clock Tick Overflow Pitfall. The clock function also has a problem with wraparound on some implementations. Because of its high resolution, the number of clock ticks can quickly overflow the maximum value that can be stored by the type clock_t. On one system the clock function will wrap around after only 36 minutes. If the program being timed runs for longer than this period, the use of clock can be misleading. One solution is to use the “time” function rather than “clock” when executions are longer, but this usually only has resolution to the nearest second.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++