Aussie AI
Dataset Distillation
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Dataset Distillation
The technique of “dataset distillation” borrows the same terminology, but is a different technique to knowledge distillation. This term refers to methods to reduce a training dataset to a derived set of training data, such as to (theoretically) sidestep privacy or copyright concerns. The dataset is smaller and theoretically can be used to train a similarly capable model.
Research papers on dataset distillation:
- T. Wang, J.-Y. Zhu, A. Torralba and A. A. Efros, 2018, Dataset distillation, arXiv:1811.10959, 2018. https://arxiv.org/abs/1811.10959
- Yu R, Liu S, Wang X, 2023, Dataset Distillation: A Comprehensive Review, https://arxiv.org/abs/2301.07014 Or Honovich, Thomas Scialom, Omer Levy, Timo Schick, Dec 2022, Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor, https://arxiv.org/abs/2212.09689, https://github.com/orhonovich/unnatural-instructions (Using a model to automatically create a training data set, including automatically creating both instructions and responses.)
For additional research papers on dataset distillation, see https://www.aussieai.com/research/knowledge-distillation#dataset-distillation.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |