Aussie AI

Weight Clustering

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Weight Clustering

Weight clustering is conceptually like pruning and quantization combined, and is sometimes called “cluster-based quantization”. A group of weights are merged with similar weights, to make all of the similar weights have exactly the same weight. Hashing has also been used to group weights for clustering.

Research papers on weight clustering:

  1. Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Jiaming Xie, Yun Liang, Sijia Liu, Xue Lin, Yanzhi Wang, 2018, A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM, November 2018, https://arxiv.org/abs/1811.01907
  2. Steven J. Nowlan; Geoffrey E. Hinton, 1992, Simplifying Neural Networks by Soft Weight-Sharing, Neural Computation, 4(4), July 1992, https://ieeexplore.ieee.org/abstract/document/6796174
  3. Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu, RPTQ: Reorder-based Post-training Quantization for Large Language Models, May 2023, https://arxiv.org/abs/2304.01089
  4. TensorFlow, 2023, Weight clustering, https://www.tensorflow.org/model_optimization/guide/clustering
  5. A. Zhou, A. Yao, Y. Guo, L. Xu and Y. Chen, 2017, Incremental network quantization: Towards lossless CNNs with low-precision weights, arXiv:1702.03044, 2017. https://arxiv.org/abs/1702.03044 (Groups large and small weights)
  6. W. Chen, J. T. Wilson, S. Tyree, K. Weinberger and Y. Chen, 2015, Compressing neural networks with the hashing trick, Proc. ICML, pp. 2285-2294, 2015. https://arxiv.org/abs/1504.04788 (Uses hashing to do weight clustering/grouping weights.)
  7. Maedeh Hemmat, Joshua San Miguel, Azadeh Davoodi, 2021, AirNN: A Featherweight Framework for Dynamic Input-Dependent Approximation of CNNs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.40, no.10, pp.2090-2103, 2021. https://ieeexplore.ieee.org/document/9239327 (Input-dependent inference optimization via layer-wise weight clustering and early exit based on a termination condition.)
  8. Maedeh Hemmat; Azadeh Davoodi, March 2019, Dynamic Reconfiguration of CNNs for Input-Dependent Approximation, 20th International Symposium on Quality Electronic Design (ISQED), https://ieeexplore.ieee.org/document/8697843 (Dynamically decides how many clusters of similar weights to use, depending on input.)
  9. B Rokh, A Azarpeyvand, A Khanteymoori, 2023, A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification, ACM Transactions on Intelligent Systems, PDF: https://dl.acm.org/doi/pdf/10.1145/3623402 (Includes a survey of weight clustering.)
  10. W Cheng, W Zhang, H Shen, Y Cai, X He, K Lv, 2023, Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs, arXiv preprint arXiv:2309.05516, PDF: https://arxiv.org/pdf/2309.05516.pdf (Examination of rounding schemes in PTQ and QAT for quantization and weight clustering.)

See more updated research paper citations in the Aussie AI literature review at https://www.aussieai.com/research/quantization#weight-clustering.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++