Aussie AI

What is Knowledge Distillationand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What is Knowledge Distillation?

Knowledge Distillation (KD) is an inference speedup technique where a larger pre-trained model is used to train a smaller more-efficient model. The “teacher” model trains the “student” model. When used successfully, the result is a smaller model with faster inference that closely matches the accuracy of the larger model. Hence, it is a type of “model compression” because the large model is effectively “compressed” into a smaller model. The method is basically:

  1. Start with a big model.
  2. Repeatedly query the big model.
  3. Transfer results to train a small model.
  4. Use only the small model for inference.

The use of distillation is widespread in both industry and research settings. It is a well-known effective technique of generating a more efficient version of a large model.

Distillation is not technically an ensemble method, because the larger model is not used during inference. Hence, it is not the same as “big-small” dual inference architectures.

Distillation differs from “fine tuning” or “re-training”, which involve extra training on the (large) model, whereas knowledge distillation involves training a new, smaller model from scratch. Distillation is not a training speedup because it still requires training the larger model first, and then the smaller model. This increases training cost overall so as to reduce future inference cost.

Distillation is more technically involved than the commonly-used methods of training a new model from the output of another large model (sometimes called “dataset distillation” or “synthetic data”). That technique is not actually distillation in its proper sense. Rather, knowledge distillation algorithms involve more complex transfer of learning from the internals of a large model to inside the small model.

Recent advances in Knowledge Distillation include (a) novel ways to directly transfer the learning, with weighting approaches rather than exact probability transfer, and (b) multi-model distillation approaches whereby the smaller student model can gain information from multiple teachers.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++