Aussie AI

Weight Clustering

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Weight Clustering

Weight clustering is conceptually like pruning and quantization combined, and is sometimes called “cluster-based quantization”. A group of weights are merged with similar weights, to make all of the similar weights have exactly the same weight. Hashing has also been used to group weights for clustering.

Research papers on weight clustering:

Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Jiaming Xie, Yun Liang, Sijia Liu, Xue Lin, Yanzhi Wang, 2018, A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM, November 2018, https://arxiv.org/abs/1811.01907
Steven J. Nowlan; Geoffrey E. Hinton, 1992, Simplifying Neural Networks by Soft Weight-Sharing, Neural Computation, 4(4), July 1992, https://ieeexplore.ieee.org/abstract/document/6796174
Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu, RPTQ: Reorder-based Post-training Quantization for Large Language Models, May 2023, https://arxiv.org/abs/2304.01089
TensorFlow, 2023, Weight clustering, https://www.tensorflow.org/model_optimization/guide/clustering
A. Zhou, A. Yao, Y. Guo, L. Xu and Y. Chen, 2017, Incremental network quantization: Towards lossless CNNs with low-precision weights, arXiv:1702.03044, 2017. https://arxiv.org/abs/1702.03044 (Groups large and small weights)
W. Chen, J. T. Wilson, S. Tyree, K. Weinberger and Y. Chen, 2015, Compressing neural networks with the hashing trick, Proc. ICML, pp. 2285-2294, 2015. https://arxiv.org/abs/1504.04788 (Uses hashing to do weight clustering/grouping weights.)
Maedeh Hemmat, Joshua San Miguel, Azadeh Davoodi, 2021, AirNN: A Featherweight Framework for Dynamic Input-Dependent Approximation of CNNs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.40, no.10, pp.2090-2103, 2021. https://ieeexplore.ieee.org/document/9239327 (Input-dependent inference optimization via layer-wise weight clustering and early exit based on a termination condition.)
Maedeh Hemmat; Azadeh Davoodi, March 2019, Dynamic Reconfiguration of CNNs for Input-Dependent Approximation, 20th International Symposium on Quality Electronic Design (ISQED), https://ieeexplore.ieee.org/document/8697843 (Dynamically decides how many clusters of similar weights to use, depending on input.)
B Rokh, A Azarpeyvand, A Khanteymoori, 2023, A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification, ACM Transactions on Intelligent Systems, PDF: https://dl.acm.org/doi/pdf/10.1145/3623402 (Includes a survey of weight clustering.)
W Cheng, W Zhang, H Shen, Y Cai, X He, K Lv, 2023, Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs, arXiv preprint arXiv:2309.05516, PDF: https://arxiv.org/pdf/2309.05516.pdf (Examination of rounding schemes in PTQ and QAT for quantization and weight clustering.)

See more updated research paper citations in the Aussie AI literature review at https://www.aussieai.com/research/quantization#weight-clustering.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Weight Clustering

Weight Clustering

Quick Links

Product

New to Writing?

Writing Styles