Aussie AI
Aussie AI Code Examples Overview
Generative AI in C++ Book Chapters
All chapters in the book "Generative AI in C++" examine coding of Transformer internals:
Part I: AI Projects in C++
Chapter 1. Introduction to AI in C++
Chapter 2. Transformers & LLMs
Chapter 3. AI Phones
Chapter 4. AI on Your Desktop
Chapter 5. Design Choices & Architectures
Chapter 6. Training, Fine-Tuning & RAG
Chapter 7. Deployment Architecture
Part II: Basic C++ Optimizations
Chapter 8. Bitwise Operations
Chapter 9. Floating Point Arithmetic
Chapter 10. Arithmetic Optimizations
Chapter 11. Compile-Time Optimizations
Chapter 12. Pointer Arithmetic
Chapter 13. Algorithm Speedups
Chapter 14. Memory Optimizations
Part III: Parallel C++ Optimizations
Chapter 15. Loop Vectorization
Chapter 16. Hardware Acceleration
Chapter 17. AVX Intrinsics
Chapter 18. Parallel Data Structures
Part IV: Transformer Components in C++
Chapter 19. Encoders & Decoders
Chapter 20. Attention
Chapter 21. Activation Functions
Chapter 22. Vector Algorithms
Chapter 23. Tensors
Chapter 24. Normalization
Chapter 25. Softmax
Chapter 26. Decoding Algorithms
Chapter 27. Tokenizer and Vocabulary
Part V: Optimizing Transformers in C++
Chapter 28. Deslugging AI Engines
Chapter 29. Caching Optimizations
Chapter 30. Vectorization
Chapter 31. Kernel Fusion
Chapter 32. Quantization
Chapter 33. Pruning
Chapter 34. MatMul/GEMM
Chapter 35. Lookup Tables & Precomputation
Chapter 36. AI Memory Optimizations
Part VI: Enterprise AI in C++
Chapter 37. Tuning, Profiling & Benchmarking
Chapter 38. Platform Portability
Chapter 39. Quality
Chapter 40. Reliability
Chapter 41. Self-Testing Code
Chapter 42. Debugging
Part VII: Research on AI Optimization
Chapter 43. Overview of AI Research
Chapter 44. Advanced Quantization
Chapter 45. Knowledge Distillation
Chapter 46. Structured Pruning
Chapter 47. Early Exit and Layer Pruning
Chapter 48. Width Pruning
Chapter 49. Length Pruning
Chapter 50. Adaptive Inference
Chapter 51. Zero-Multiplication Models
Chapter 52. Logarithmic Models
Chapter 53. Arithmetic Optimization Research
Chapter 54. Ensemble Multi-Model Architectures
Chapter 55. Advanced Number Systems
Chapter 56. Neural Architecture Search
Appendices
Appendix 1: C++ Slug Catalog
Bonus Appendix: C++ Bug Symptom Diagnosis
Bonus Appendix: C++ Portability Bug Catalog
All chapters and more: full text online chapters:
Aussie AI Apps
Aussie AI advanced consumer apps:
AI Coding Blog Articles
Blog articles for AI developers:
- 500+ Techniques for LLM Inference Optimization
- State-of-the-Art LLM Backends
- Low Latency Programming
- Debugging OpenAI Node.js API Wrappers
- Reasoning Decoding Algorithms
- Reasoning Inference Optimization
- Reasoning is the New AI Middleware
- Generalizing Prefix KV Caching to RAG Chunks
- RAG Optimization via Caching
- All Aussie AI Blog Articles
Memory-Safe C++ Blog Articles
General C++ coding articles on safety:
- DIY Preventive C++ Memory Safety
- Canary Values & Redzones for Memory-Safe C++
- User-After-Free Memory Errors in C++
- Array Bounds Violations and Memory Safe C++
- Poisoning Memory Blocks for Safer C++
- Uninitialized Memory Safety in C++
- DIY Memory Safety in C++
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
- Safe C++ Text Buffers with snprintf
- Safe C++ Coding Book
CUDA C++ Programming Blog Articles
Blogs for CUDA C++ developers covering optimization and debugging:
- CUDA C++ Floating Point Exceptions
- CUDA Memory Coalescing Optimizations
- CUDA GPU Thread Divergence
- CUDA Basic C++ Programming Mistakes
- CUDA C++ Optimization Book
- CUDA C++ Debugging Book
AI Coding Research Pages
- Chain-of-Thought Efficiency Optimization
- Hot Inference Optimization Techniques
- Sequential Speculative Decoding
- Quantization (floating-point-free zone!)
- Pruning (cut in four dimensions!)
- Parameter Sharing (Weight Sharing)
- Activation Function Optimizations
- Normalization Optimizations
- Softmax Optimizations
- FFN/MLP Optimizations
- MatMul/GEMM/GEMV Optimizations (really hard stuff)
- Positional Encoding Optimizations
- Decoding Algorithm Variants and Optimizations
- Speculative Decoding Types and Optimizations (endless papers...)
- PEFT and LoRA Optimizations
- Attention Module Types and Optimizations (go long!)
- Prompt Caching and KV Cache Optimizations
- Zero-Multiplication Models (ban the * operator!)
- Research Overview Index (many more topics)