Aussie AI
Training and Fine-tuning
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Training and Fine-tuning
What's the difference between training and fine-tuning? At least three zeros.
Training is how you shove all of the brain power from the entire Wikipedia corpus into a bunch of numbers. It takes a long time and the GDP of a small country to train a big model. Training is the big cost of a lot of AI projects.
The good news about training is that if you mess it up, you have to start all over again. Well, this isn't quite true, because training runs in batches of data. If the evaluation fails, you have to revert to the prior model candidate, since you can't “un-train” an AI model. However, a review can also suggest areas where a model needs more training, or needs to be directed towards new behavior or personality features. In addition to batched training, there is also research on “incremental learning” as a thing.
What is fine-tuning? Fine-tuning refers to smaller amounts of training that are done to a model that's already been fully trained. If you're training a new model from scratch, even a small one, then that's training, not fine-tuning.
The most common use of fine-tuning is to modify a powerful foundation model to do something more specific. Most foundation models have been broadly trained on general information. You might want to specialize the model for a particular use case or to use a new set of data. This can be done two ways:
- Fine-tuning
- Retrieval Augmentation Generation (RAG)
Proprietary business data is a common reason to fine-tune a foundation model (but there's also RAG to consider). For example, to create a support chatbot for customers using your products, you can customize a foundation model to know about your company's internal product documents via fine-tuning. To do this, you would fine-tune the foundation model using this extra internal data. In this way, a small amount of fine-tuning has added knowledge to the model about new data, which it can then incorporate into its answers to users.
RAG is not training. Note that Retrieval Augmentation Generation (RAG) is not a type of training or fine-tuning. In fact, it's a way to avoid them like the plague. RAG is an architectural add-on where the Transformer can talk to a component that knows how to “retrieve” extra information or documents, such as proprietary internal business documents about your products. This extra data is used as input context during inference of the model, thereby extending the basic model to answer questions specific to this extra material. The point is that it avoids the expense of training and fine-tuning, while incurring some extra cost in implementing the RAG component.
Data sets. High-quality training data is fundamental to both training and RAG techniques. Historically, much of the training data sets have been painstakingly compiled by humans. A newer technique is to use the output of one LLM as the input training dataset for another model. This method and other types of “synthetic data” are being used more fully.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |