Aussie AI

Multi-Teacher Knowledge Distillation

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Multi-Teacher Knowledge Distillation

Ensemble knowledge distillation generalizes the basic distillation algorithm to have more than one teacher model, rather than a single teacher-student pair of models. There is research to suggest that distillation can be even more effective with multiple teacher models. Ensemble distillation is a type of “ensemble learning.”

Various strategies have been employed, such as sequential versus parallel teaching, and model monitoring where a teacher monitors the student for correctness of its results. There are many papers, but although basic knowledge distillation is mainstream, the use of ensemble distillation remains mostly a research method.

Research papers on ensemble distillation:

  1. Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, and Lei Li. 2021, Learning from deep model via exploring local targets, 2021. https://openreview.net/forum?id=5slGDu_bVc6 (Distillation with multiple teachers)
  2. Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020, Improved knowledge distillation via teacher assistant, In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 5191– 5198. AAAI Press, 2020. https://arxiv.org/abs/1902.03393 (multiple teachers)
  3. Jangho Kim, Minsung Hyun, Inseop Chung, and Nojun Kwak. 2020, Feature fusion for online mutual knowledge distillation, In 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, 2021, pp. 4619–4625. IEEE, 2020. https://arxiv.org/abs/1904.09058 (Ensemble methods for distillation.)
  4. Inseop Chung, Seonguk Park, Jangho Kim, and Nojun Kwak. 2020, Feature-map-level online adversarial knowledge distillation, In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 2006–2015. PMLR, 2020. https://arxiv.org/abs/2002.01775 (Multiple teacher models.)
  5. Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen. 2020, Online knowledge distillation with diverse peers, In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 3430–3437. AAAI Press, 2020a https://arxiv.org/abs/1912.00350 (Ensemble distillation with multiple “peer” teachers.)
  6. Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Orm ´ andi, George E. Dahl, and Geoffrey E. Hinton. 2018, Large scale distributed neural network training through online distillation, In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. https://arxiv.org/abs/1804.03235
  7. Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi, 2021, Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher, arXiv preprint arXiv:2110.08532, 2021. https://arxiv.org/abs/2110.08532
  8. Y. Zhang, T. Xiang, T. M. Hospedales and H. Lu, 2018, Deep mutual learning, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4320-4328, Jun. 2018. https://arxiv.org/abs/1706.00384
  9. L. Yuan, F. E. Tay, G. Li, T. Wang and J. Feng, 2020, Revisiting knowledge distillation via label smoothing regularization, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3903-3911, Jun. 2020. https://arxiv.org/abs/1909.11723 (Improved learning, and also looks at reverse student-to-teacher learning.)

For more research papers on ensemble KD, see https://www.aussieai.com/research/knowledge-distillation#ensemble.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++