Aussie AI
Precalculating C++ Source Files
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Precalculating C++ Source Files
One way to improve on the precomputation of a big array is to skip it entirely during startup by writing a lot of code. It's like using an AI coding copilot, only it's not really. I mean, come on, the day an AI writes better code than me is the day that I retire to the hologram beach with my robot dog companions.
The idea here is to write a program to generate a C++ source file that contains the global precomputed lookup table. Yes, it's a C++ program that creates part of a C++ program, which is almost like your AI has become self-aware, only one step away from Skynet. Well, maybe not, it's just a dumb C++ program written by a dumb human creating some dumb data.
Anyway, this auto-generated C++ code can be compiled and linked into your C++ program, and used like a global array of data in other parts of the program. Zero calculations are required at runtime, and the data can be read-only.
The benefit is that this auto-generated code method does not even require the time cost of startup initialization for any precomputations. There's not even the cost of data file loading. Instead, the data is auto-loaded by the linker-loader during executable file instantiation (i.e. when the user starts the app). The only downsides for the user are that the size of the executable program increases, which means more disk space usage, and that application program startup may take longer and it will use more memory (regardless of whether it ever needs this precomputed data). Also, various offline tasks take longer for the software developers, such as compilation and linking for testing, which is why we bill per hour.
I tried this out for precalculating GELU with a 24-bit table.
The C++ source file was size 514k for 24-bit precomputation table of size 1<<24
.
This is what the auto-generated source code should look like:
// Precomputed table source code: GELU, "gelu_precomp_24bits.cpp" float g_gelu_table_precompute_24bits[] = { 0f, 1.793662034335765850782373866611092648039e-43f, 3.587324068671531701564747733222185296077e-43f, 5.380986103007297552347121599833277944116e-43f, 7.174648137343063403129495466444370592155e-43f, ... ... };
Here's the code to generate the code to generate the code to generate the code:
void aussie_generic_setup_table_FP32_24bits_PRINT_SOURCE( // Print C++ of 24-bits GELU precomputed table char* nickname, char* outfname, float (*fnptr)(float), // e.g. GELU int maxn, // eg. 1<<24 float arrout[] // array to store (optional, can be NULL) ) { if (!fnptr) { yassert(fnptr); return; } // Generate C++ source code so we can pre-compile the precomputed GELU table (24-bits) // There are 2^24 = 16.7 million numbers... FILE* fp = stdout; bool writingfile = false; bool add_commented_number = true; if (outfname && *outfname) { fp = fopen(outfname, "w"); if (!fp) { yassert(fp); // file write failed return; // fail } writingfile = true; add_commented_number = false; // No extra comments for file output version } unsigned int u = 0; fprintf(fp, "// Precomputed table source code: %s, \"%s\"\n", nickname, outfname); fprintf(fp, "float g_gelu_table_precompute_24bits[] = { \n"); char numbuf[5000] = ""; for (; u < maxn /*1<<24*/ ; u++) { // For all 2^24=~16.7M... unsigned int uval = u << 8; // put zeros in the least significant 8 mantissa bits float f = AUSSIE_UINT_TO_FLOAT(uval); float g = fnptr(f); // Call GELU or whatever if (arrout) arrout[u] = g; // Store precomputed data (e.g. GELU)... // Format: %g means the smaller of %e or %f // ... %e is the exponent format (scientific-like format) char* buf = numbuf; sprintf(buf, "%40.40gf", g); // Format %g (Number) and suffix "f" (float constant type) if (strchr(buf, 'n')) { // Nan or "-nan" ... strcpy(buf, "0.0 /*nan*/"); // Dummy value for NaN } // Remove prefix padding spaces... while (buf[0] == ' ') buf++; // Remove suffix zeros ... int len = (int)strlen(buf); if (buf[len - 1] == 'f') len--; // skip suffix f if (buf[len - 1] == '0') { while (len > 5) { if (buf[len - 1] == '0' && isdigit(buf[len - 2])) { if (buf[len] == 'f') { buf[len - 1] = 'f'; // remove it, but leave 'f'... buf[len] = 0; } else { buf[len - 1] = 0; // remove it... buf[len] = 0; } len--; } else break; } } if (add_commented_number) { fprintf(fp, "%s // (%40.40f) [%u] \n", buf, f, u); } else { // No comments... fprintf(fp, "%s,\n", buf); } // Progress update if (u % 100000 == 0 && u != 0) { if (writingfile) fprintf(stdout, "%u -- %s\n", u, buf); // Progress to stdout... fprintf(fp, "// U= [%u]\n", u); // Comment occasionally } } fprintf(fp, "}; \n"); // Close initializer... if (fp && fp != stdout) fclose(fp); }
Conclusions on Source Code Generation:
Does it work?
Yes and no.
It builds the output file quite quickly, zipping through 1<<24
computations
and writing to disk.
But I can't get this 24-bit version with its 500k CPP source file to actually compile in the Microsoft Visual Studio IDE.
Maybe it works on Windows command-line or Linux GCC, but I haven't tried.
Anyway, this self-generating code idea is certainly quite workable for table lookups of approximations for FP16 numbers (16-bit half-precision floating-point), because the lookup table needs to “only” contain 2^16=65,536 numbers. This is about a 200k C++ source file in plain text, and creates linked data of about 65k times 4 bytes equals about 256k space usage (or half that space if you also store the computation as 16-bit numbers rather than 32-bit floats or integers).
I know what you're thinking: And one final point, just between you and me, an idea is forming in your head. Maybe, just maybe, you could try to precompute an entire AI model into C++ source code, and build it into an EXE file. In theory, you could save the cost of the model loading phase of inference. But it'll be a multi-gigabyte executable file, even if you can build it, and you probably can't anyway, because it'll likely be so large as to crash your C++ compiler. I'm not even sure that Windows or Linux could load such a big executable file. But maybe there are some high-end computing architectures where this might work. Good luck with that! (A better plan is to look into using an ML compiler.)
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |