Aussie AI
What are Ensemble Architecturesand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What are Ensemble Architectures?
If one AI engine is amazing, imagine what two could do. Or ten. Or a hundred.
The idea of multi-model architectures recently received a large boost with the rumor that OpenAI's GPT-4 has an eight-model architecture. The unofficial leak of this confidential information could be false, but suggests that GPT-4 has a “Mixture-of-Experts” (MoE) architecture with 8 models, each of size about 220 billion parameters, for a total of 1.76 trillion parameters. An MoE architecture uses some decision method or heuristic (or possibly a learned feature) to send a query to different models, as discussed more below.
The idea of using two or more AI engines together to complete a task is far from new. There are many research papers on different types of multi-model architectures. This area of research is called “ensemble learning” or “multi-model” engines.
There are many ways that AI engines could cooperate to achieve more than one would alone. This is an area ripe for exploration, where we have only scratched the surface of possibilities. On the other hand, with today's high cost of GPUs limiting what can be done in both AI inference and training, the full realization of ensemble AI algorithms is still in the distant future.
One way in which two AI models work together has become common in practice: using the output of one model as input text for the training data set of a new model. This has been an effective technique for improving downstream models, but it isn't usually classed as an ensemble algorithm, although there is a paper about it with Honovich et al. (2022). This idea is similar to Knowledge Distillation, but differs in that its goal isn't to create a cut-down smaller model, but usually to improve accuracy of a large model.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |