Aussie AI
Bonus Materials for Generative AI in C++
-
Last Updated 3rd August, 2024
-
by David Spuler, Ph.D.
Bonus materials for the book Generative AI in C++ by David Spuler include:
- Free AI Book: Full Text Online
- Reference lists (for each chapter)
- Research paper lists and citations
- Source code availability
- Table of Contents
- Code examples
Full text book chapters (free online):
Part I: AI Projects in C++
Chapter 1. Introduction to AI in C++
Chapter 2. Transformers & LLMs
Chapter 3. AI Phones
Chapter 4. AI on Your Desktop
Chapter 5. Design Choices & Architectures
Chapter 6. Training, Fine-Tuning & RAG
Chapter 7. Deployment Architecture
Part II: Basic C++ Optimizations
Chapter 8. Bitwise Operations
Chapter 9. Floating Point Arithmetic
Chapter 10. Arithmetic Optimizations
Chapter 11. Compile-Time Optimizations
Chapter 12. Pointer Arithmetic
Chapter 13. Algorithm Speedups
Chapter 14. Memory Optimizations
Part III: Parallel C++ Optimizations
Chapter 15. Loop Vectorization
Chapter 16. Hardware Acceleration
Chapter 17. AVX Intrinsics
Chapter 18. Parallel Data Structures
Part IV: Transformer Components in C++
Chapter 19. Encoders & Decoders
Chapter 20. Attention
Chapter 21. Activation Functions
Chapter 22. Vector Algorithms
Chapter 23. Tensors
Chapter 24. Normalization
Chapter 25. Softmax
Chapter 26. Decoding Algorithms
Chapter 27. Tokenizer and Vocabulary
Part V: Optimizing Transformers in C++
Chapter 28. Deslugging AI Engines
Chapter 29. Caching Optimizations
Chapter 30. Vectorization
Chapter 31. Kernel Fusion
Chapter 32. Quantization
Chapter 33. Pruning
Chapter 34. MatMul/GEMM
Chapter 35. Lookup Tables & Precomputation
Chapter 36. AI Memory Optimizations
Part VI: Enterprise AI in C++
Chapter 37. Tuning, Profiling & Benchmarking
Chapter 38. Platform Portability
Chapter 39. Quality
Chapter 40. Reliability
Chapter 41. Self-Testing Code
Chapter 42. Debugging
Part VII: Research on AI Optimization
Chapter 43. Overview of AI Research
Chapter 44. Advanced Quantization
Chapter 45. Knowledge Distillation
Chapter 46. Structured Pruning
Chapter 47. Early Exit and Layer Pruning
Chapter 48. Width Pruning
Chapter 49. Length Pruning
Chapter 50. Adaptive Inference
Chapter 51. Zero-Multiplication Models
Chapter 52. Logarithmic Models
Chapter 53. Arithmetic Optimization Research
Chapter 54. Ensemble Multi-Model Architectures
Chapter 55. Advanced Number Systems
Chapter 56. Neural Architecture Search
Appendices
Appendix 1: C++ Slug Catalog
Bonus chapters:
- C++ Bug Catalog (Bonus chapter online)
- C++ Bug Symptom Diagnosis (Bonus chapter online)
- C++ Portability Bug Catalog (Bonus chapter online)
New Hot Research Areas: More research papers:
- On-device inference (native phone and PC AI)
- Generalized speculative decoding
- Consensus decoding
- KV Cache Compression/Quantization
- Prefill optimizations (decoder-only engines)
- KV cache recomputation with early exit
- Deep prefill, shallow decoder architecture
- Fixed-point quantization (integer)
- Fixed-point arithmetic
- Block floating-point arithmetic
- FFN sublayer pruning
Updates to Hot Research Topics: Longstanding research areas with many recent additions:
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |
Part I: AI Projects in C++
- Research papers:
1. Introduction to AI in C++
- Research papers:
- Market Research
- Transformer architectures (overview)
- AI phones
- AI PCs (desktops/laptops)
2. Transformers & LLMs
- Research papers:
- Market Research
- Transformer architectures (overview)
- AI phones
- AI PCs (desktops/laptops)
3. AI Phones
- Research papers:
4. AI on Your Desktop
- Research papers:
- AI PCs (desktops/laptops)
- Market Research
- Edge device inference (mobile/PC)
- GenAI market evolution
5. Design Choices & Architectures
- Research papers:
- Transformer architectures (overview)
- Market Research
- AI phones
- AI PCs (desktops/laptops)
6. Training, Fine-Tuning & RAG
- Research papers:
7. Deployment Architecture
Part II: Basic C++ Optimizations
- Research papers:
8. Bitwise Operations
- Research papers:
9. Floating Point Arithmetic
10. Arithmetic Optimizations
- Research papers:
11. Compile-Time Optimizations
- Research papers:
12. Pointer Arithmetic
- Research papers:
13. Algorithm Speedups
- Research papers:
14. Memory Optimizations
- Research papers:
Part III: Parallel C++ Optimizations
- Research papers:
15. Loop Vectorization
- Research papers:
- Loop optimizations (overview)
- Loop fusion (merging loops)
- Loop unrolling
- Loop perforation
- Loop reordering
- Loop tiling
- Loop reversal
- Loop fission (splitting a loop)
- Loop interchange
- Loop coalescing
- Loop-invariant code motion ("hoisting")
- Loop distribution
- Pointer arithmetic
- Loop peeling (unrolling first iterations)
- Loop splittingLoop sentinel
- Loop collapsing
- Loop normalization
- Loop strip mining (Loop sectioning)
- Loop skewing
- Loop spreading
- Parallelization
- Vectorization
- Kernel operator fusion (merging two operations)
- Kernel fission (splitting)
16. Hardware Acceleration
- Research papers:
17. AVX Intrinsics
- Research papers:
18. Parallel Data Structures
- Research papers:
Part IV: Transformer Components in C++
- Research papers:
19. Encoders & Decoders
20. Attention
- Research papers:
21. Activation Functions
- Research papers:
22. Vector Algorithms
- Research papers:
23. Tensors
- Research papers:
- Tensor decomposition
- Faster matrix multiplication (e.g. Winograd, Strassen)
- Approximate matrix multiplication
24. Normalization
- Research papers:
25. Softmax
- Research papers:
26. Decoding Algorithms
- Research papers:
27. Tokenizer and Vocabulary
- Research papers:
Part V: Optimizing Transformers in C++
- Research papers:
28. Deslugging AI Engines
- Research papers:
29. Caching Optimizations
- Research papers:
30. Vectorization
- Research papers:
- Vectorization
- Parallelization
- Pipelining
- Kernel operator fusion (merging two operations)
- Kernel fission (splitting)
31. Kernel Fusion
- Research papers:
- Kernel operator fusion (merging two operations)
- Kernel fission (splitting)
- Loop fusion (merging loops)
- Loop fission (splitting a loop)
- Fused Multi-Head Attention (MHA)
- Fused activation functions
- Fused RELU
- Fused GELU
- Fused SwiGLU
- Fused normalization (e.g. "fused LayerNorm")
- Fused Softmax
- Fused multiply-add (FMA)
- Fused transpose
- Negative skipping
32. Quantization
- Research papers:
33. Pruning
- Research papers:
34. MatMul/GEMM
- Research papers:
- Faster matrix multiplication (e.g. Winograd, Strassen)
- Approximate matrix multiplication
- Transpose cache
- Fused multiply-add (FMA)
- Fused transpose
- Vector dot product optimization
- FFN pruning
- Fused add-bias
- Bias vector pruning
- Low-rank matrices
- Matrix Algebra (factorization)
- Approximate matrix multiplication
- Butterfly matrices
- Monarch matrices
- Sparse matrices (sparsification)
35. Lookup Tables & Precomputation
- Research papers:
36. AI Memory Optimizations
- Research papers:
Part VI: Enterprise AI in C++
- Research papers:
37. Tuning, Profiling & Benchmarking
- Research papers:
38. Platform Portability
- Research papers:
- C++ Portability Bug Catalog (Bonus chapter online)
39. Quality
- Research papers:
40. Reliability
- Research papers:
41. Self-Testing Code
- Research papers:
42. Debugging
- Research papers:
Part VII: Research on AI Optimization
- Research papers:
43. Overview of AI Research
- Research papers:
44. Advanced Quantization
- Research papers:
- Quantization research
- Model compression research
- Binary quantization
- Ternary quantization
- 2-bit quantization (INT2)
- 3-bit quantization (INT3)
- 4-bit quantization (INT4)
- 5-bit quantization (INT5)
- 6-bit quantization (INT6)
- 7-bit quantization (INT7)
- 8-bit quantization (INT8)
- Integer quantization
- Integer-only arithmetic quantization
- FP8 quantization
- Logarithmic power-of-two quantization (bitshift quantization)
- Double bitshift power-of-two quantization
- Division quantization
- Cluster-based quantization (Weight clustering)
- Dyadic quantization
- Fake quantization
- Simulated quantization
- Stochastic quantization (probabilistic)
- Weight clustering
45. Knowledge Distillation
- Research papers:
46. Structured Pruning
- Research papers:
47. Early Exit and Layer Pruning
- Research papers:
- Early exit (dynamic layer pruning)
- Layer pruning
- Depth pruning (overview)
- Layer skipping
- Shallow decoder architecture (layer pruning)
- Layer fusion
- Layer reordering
48. Width Pruning
- Research papers:
49. Length Pruning
- Research papers:
50. Adaptive Inference
- Research papers:
- End-to-End integer inference
- Dynamic inference (adaptive inference)
- Skipping optimizations
51. Zero-Multiplication Models
- Research papers:
- Zero-Multiplication Models (overview)
- Integer-only Transformers
- Binary quantization
- Ternary quantization
- 2-bit quantization (INT2)
- Adder networks
- Bitshift-add networks
- Bitshift power-of-2 quantization
- Double bitshift quantization
- Add-as-integer networks
- Logarithmic Models
- Bitwise neural networks
- Diff-squared networks
- Log-sum-exp (LSE) networks
- Max-Plus networks
- Min-Max-Plus networks
- Morphological networks
- Trigonometric approximate inference
- Weightless Neural Networks (WNNs)
- XNOR networks
- End-to-End integer inference
52. Logarithmic Models
53. Arithmetic Optimization Research
- Research papers:
- Advanced AI Mathematics
- Integer-only Transformers
- Integer-only arithmetic quantization
- End-to-End integer inference
- Reciprocal multiplication
- Constant folding
- Common subexpression elimination
- Strength reduction
- Foating point bitwise arithmetic
- Addition optimizations
- Approximate addition
- Multiplication algorithms
- Approximate multiplication
- Logarithmic approximate multiplication
- Division optimizations
- Approximate division
- Bitwise operator inference
- Bitserial operations
54. Ensemble Multi-Model Architectures
- Research papers:
55. Advanced Number Systems
- Research papers:
- Advanced AI Mathematics
- Integer-only Transformers
- End-to-End integer inference
- Foating point bitwise arithmetic
- Posit number system (PNS)
- Residue number system (RNS)
- Logarithmic number system (LNS)
- Dyadic numbers
- Double-base number system (DBNS)
- Dynamic number systems
- Hybrid number systems
- Tropical algebra (max-plus)
- MiniMax algebra
- Multi-dimensional logarithmic number system (MDLNS)
- Multiple-Base Number System (MBNS)
- Matrix Algebra (factorization)
- Approximate matrix multiplication
- Butterfly matrices
- Monarch matrices
56. Neural Architecture Search
- Research papers:
Appendix 1: C++ Slug Catalog
- Research papers:
More AI Research
Read more about:
- GenAI market research
- AI on Phones
- Inference Optimizations
- Loop Optimizations
- Code Optimizations
- « Research Home