Aussie AI

Dataset Distillation

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Dataset Distillation

The technique of “dataset distillation” borrows the same terminology, but is a different technique to knowledge distillation. This term refers to methods to reduce a training dataset to a derived set of training data, such as to (theoretically) sidestep privacy or copyright concerns. The dataset is smaller and theoretically can be used to train a similarly capable model.

Research papers on dataset distillation:

T. Wang, J.-Y. Zhu, A. Torralba and A. A. Efros, 2018, Dataset distillation, arXiv:1811.10959, 2018. https://arxiv.org/abs/1811.10959
Yu R, Liu S, Wang X, 2023, Dataset Distillation: A Comprehensive Review, https://arxiv.org/abs/2301.07014 Or Honovich, Thomas Scialom, Omer Levy, Timo Schick, Dec 2022, Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor, https://arxiv.org/abs/2212.09689, https://github.com/orhonovich/unnatural-instructions (Using a model to automatically create a training data set, including automatically creating both instructions and responses.)

For additional research papers on dataset distillation, see https://www.aussieai.com/research/knowledge-distillation#dataset-distillation.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Dataset Distillation

Dataset Distillation

Quick Links

Product

New to Writing?

Writing Styles