Aussie AI

Layer Reordering

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Layer Reordering

An interesting technique, that generalizes the use of layers, is “layer reordering.” The idea is motivated by the realization that Transformer layers are building blocks which output the same format as their input. Hence, not only can you remove a layer (early exit or layer pruning) or skip a layer (layer skipping), or run the same layer twice (layer fusion), but it can be generalized in any way. You can pick and choose which layers to run, and in what order, and how often. You could even run each layer twice, or run all the layers in reverse, or whatever.

Layer reordering usually refers to entire Transformer layers. For other types of merging or reordering of separate sub-layer structures within Transformer layers, see kernel operator fusion. For discussion of the order of layer normalization subcomponents, see normalization reordering.

Layer reordering seems like it shouldn't work. After all, didn't we expend all those GPU cycles to carefully work out the correct weights for each layer? Isn't it true that the first layers do the broad analysis and the upper layers do the finessing? So, early exiting makes some kind of sense, because it just skips the finer details at the end, but randomly reordering things seems weird. Nevertheless, there are some research papers that explore layer reordering and its generalizations.

Research papers on layer reordering:

Ofir Press, Noah A. Smith, Omer Levy, 2019, Improving Transformer Models by Reordering their Sublayers, arXiv preprint arXiv:1911.03864, 2019, https://arxiv.org/abs/1911.03864 (Layer reordering! Includes analysis of multiple layers, and also reordering self-attention and feed-forward sub-components in a “sandwich” architecture.)
Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu, Mar 2021, IOT: Instance-wise Layer Reordering for Transformer Structures, https://arxiv.org/abs/2103.03457
Elicia Ye, March 2023, Greedy Ordering of Layer Weight Matrices in Transformers Improves Translation, https://arxiv.org/abs/2302.02123

For more research on the layer reordering, refer to https://www.aussieai.com/research/layer-pruning#reordering.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Layer Reordering

Layer Reordering

Quick Links

Product

New to Writing?

Writing Styles