Aussie AI

Fine-Tuning

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Fine-Tuning

How does fine-tuning work? An existing foundation model is trained with new materials using the standard AI training methods. The use of extra specialist text to further train a model is called “fine-tuning.” This is a longstanding method in AI theory and fine-tuning can be performed in all of the major AI platforms. In the fine-tuning approach, the result of the re-training is that proprietary information about your products is all “inside” the model.

Training Algorithm

The general training algorithm at a very high level is as follows:

    (a) Split the training data 80/20 (sometimes 90/10) into data to train with (training dataset) and data to evaluate the result of the training (validation dataset). If you have enough training data, use multiple training and validation datasets.

    (b) Feed each input into the network, compare with the answer using the “loss function” to generate an “error”, and using the error, tweak the weights according to the learning rate.

    (c) After all the 80% of data is fed in, use the validation dataset to evaluate the new model's performance. This is using new data that the model has not seen yet, also in a question and expected response format.

    (d) Based on the evaluation, you can accept the model or make major changes. For example, if you give it totally unseen data (i.e. the 20%) and it only responds correctly 50% of the time, you need to decide whether to continue with the next training dataset, or if it's time to redesign the model and try again. If the model performs poorly, you have to allocate blame: if the training data is good, if the model's structure is correct, if the loss function is correct, if the learning rate for incremental changes to the weights on each iteration are aggressive enough, or too aggressive, if biases are wrong, etc. To do this, tweak the model meta-parameters (e.g. number of layers, number of nodes per layer, etc.) or change the training algorithm meta-parameters (e.g. learning rate), and go back to the first step and start over.

This is only a top-level outline of a training algorithm. There are many improvements and finesses to get to a fully advanced fine-tuning algorithm.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++