Aussie AI

Types of Early Exit

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Types of Early Exit

Early exit is a form of dynamic layer pruning, since it skips (prunes) some of the model layers. Early exit means avoiding the calculations for all layers after the just-finished one, whereas “layer skipping” can skip one to continue on the following layer, and “layer reordering” is a strange generalization where layers can get executed or skipped in any order.

There are different ways to do early exiting. Early exiting algorithms have to use a decision method, usually called a “classifier” to choose whether or not to exit at a given layer. Xu and McAuley (2022) categorize three different subtypes of early exits, based on the criteria used to decide when to exit:

Confidence estimation
Internal ensemble
Learning to exit

Confidence estimation is the use of a metric predicting that confidence is high enough to exit; internal ensemble uses multiple metrics with a requirement for enough metrics to agree; learning to exit has the model attempting to learn when to exit.

A special type of early exit is the “shallow decoder” Transformer architecture. The idea has also been applied in training, many years prior. The terms “dropout” and “early stopping” have also occasionally been used to mean inference early exit, but usually refer to training method optimizations with a similar goal to reduce training times.

Always-exit test: Why not early exit on 100% of inference calculations? For starters, you get a 100% guarantee of inaccuracy, whereas varying the number of layers helps optimize between easy and hard queries. Furthermore, this strategy is not really early exit! Always exiting with a simplistic decision test, such as always exiting at layer N=5, is effectively the same as static layer pruning of layers N>=6, but without the benefit of reduced model storage space. However, implementing this always-exit test dynamically can still be beneficial during testing of the efficacy of the model in terms of the layout count, such as when deciding the number of layers to use. Accuracy of the model for different values for N can be tested dynamically without rebuilding the model file.

Early exit is also one of multiple strategies for “dynamic inference”. Some papers refer to dynamic inference changes as “adaptive neural networks”, where they change execution depending on the inputs. Some types of early exit, such as hierarchical early exit, are similar to research on cascades for DNNs and CNNs.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Types of Early Exit

Types of Early Exit

Quick Links

Product

New to Writing?

Writing Styles