Aussie AI
Neural Architecture Search
-
Last Updated 7 December, 2024
-
by David Spuler, Ph.D.
Neural Architecture Search (NAS) is the very fancy way that AI researchers say things like this: how big should I make the model? How many weights? How many layers? What vocabulary?
Why NAS?
Choosing these numbers is actually a very hard problem. In the early days, these choices were done either randomly or by trial-and-error, which is expensive when you're talking about GPUs. If you go too large, then the model is over-parameterized and unnecessarily expensive. Go too low, and the model won't be very accurate, or might not even work at all. Hence, a large body of research on "NAS" has developed about systematic ways to find optimal sizes of the models on the various dimensions.
The biggest number is how many billions of weights the model should use, but this is actually dependent on a number of other numeric sizes. These weights are called "parameters" and the various other sizes are called "hyper-parameters" of the model, so NAS is also sometimes called "Hyper-Parameter Optimization" (HPO). The sizes and dimensions of models that NAS aims to determine includes:
- Number of layers
- Embedding size
- Vocabulary size
- Number of attention heads
- Context size
NAS versus Model Compression
There are some parallels between neural architecture search and model compression, especially structural pruning. NAS aims to select the model hyperparamters before or during training, whereas model compression comes in afterwards and changes the model. Some types of pruning are very similar to NAS outcomes, such as:
- Depth pruning (e.g. layer pruning)
- Width pruning (e.g. head pruning)
- Length pruning (e.g. token pruning, embedding pruning)
As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.
Survey Papers on NAS
Some of the review and survey papers on NAS:
- Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, Xin Wang, 2022, A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Computing Surveys 54(4):76:1–76:34, https://arxiv.org/abs/2006.02903
- Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba, Naigang Wang, 2021, Hardware-Aware Neural Architecture Search: Survey and Taxonomy. In: International Joint Conference on Artificial Intelligence (IJCAI), https://arxiv.org/abs/2101.09336
- Dilyara Baymurzina, Eugene Golikov, Mikhail Burtsev, 2022, Neurocomputing A review of neural architecture search, Volume 474, 14 February 2022, Pages 82-93, https://www.sciencedirect.com/science/article/abs/pii/S0925231221018439
- Thomas Elsken, Jan Hendrik Metzen, Frank Hutter, 2019, Neural architecture search: a survey, The Journal of Machine Learning Research, Volume 20Issue 1, pp. 1997–2017, https://dl.acm.org/doi/10.5555/3322706.3361996, https://arxiv.org/abs/1808.05377
- Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati, 2019, A Survey on Neural Architecture Search, https://arxiv.org/abs/1905.01392
- Shiqing Liu, Haoyu Zhang, Yaochu Jin, Oct 2022, A Survey on Computationally Efficient Neural Architecture Search, https://arxiv.org/abs/2206.01520
- Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter, Jan 2023, Neural Architecture Search: Insights from 1000 Papers, https://arxiv.org/abs/2301.08727
- Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, Difan Deng, Marius Lindauer, Nov 2021, Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges, https://arxiv.org/abs/2107.05847
- Inas Bachiri, September 2024, A Literature Review on Combining Neural Architecture Search and Compiler Optimizations for Neural Network Acceleration, DOI:10.13140/RG.2.2.10612.16009, Thesis for: Master in Computer Science, https://www.researchgate.net/publication/384190836_A_Literature_Review_on_Combining_Neural_Architecture_Search_and_Compiler_Optimizations_for_Neural_Network_Acceleration https://www.researchgate.net/profile/Inas-Bachiri/publication/384190836_A_Literature_Review_on_Combining_Neural_Architecture_Search_and_Compiler_Optimizations_for_Neural_Network_Acceleration/links/66ed912c6b101f6fa4f3d6ce/A-Literature-Review-on-Combining-Neural-Architecture-Search-and-Compiler-Optimizations-for-Neural-Network-Acceleration.pdf
General Research Papers on NAS
Some of the research papers on NAS:
- Odema, M., Rashid, N., Demirel, B. U., and Faruque, M. A. A. (2021). Lens: Layer distribution enabled neural architecture search in edge-cloud hierarchies. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 403–408, https://arxiv.org/abs/2107.09309
- A. Wong, M. Famuori, M. J. Shafiee, F. Li, B. Chwyl, and J. Chung, “YOLO nano: A highly compact you only look once convolutional neural network for object detection,” 2019, arXiv:1910.01271. https://arxiv.org/abs/1910.01271
- David R So, Chen Liang, and Quoc V Le. 2019. The evolved transformer. arXiv preprint arXiv:1901.11117. https://arxiv.org/abs/1901.11117
- Mingxing Tan and Quoc V Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.1, https://arxiv.org/abs/1905.11946, Code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
- Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu, July 2023, Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities https://arxiv.org/abs/2307.01998
- C Fu, 2023, Machine Learning Algorithm and System Co-design for Hardware Efficiency, Ph.D. thesis, Computer Science, University of California San Diego, https://escholarship.org/content/qt52q368p3/qt52q368p3.pdf
- Aaron Klein, Jacek Golebiowski, Xingchen Ma, Valerio Perrone, Cedric Archambeau, 3 May 2024, Structural Pruning of Pre-trained Language Models via Neural Architecture Search, https://arxiv.org/abs/2405.02267 (Post-training structured pruning of sub-networks based on NAS, also with weight sharing and several different focus areas of pruning including attention heads, FFNs, and layers.)
- Ganesh Jawahar, April 2024, Methods for design of efficient on-device natural language processing architectures, Ph.D. thesis, Computer Science, The University of British Columbia (Vancouver) https://open.library.ubc.ca/media/download/pdf/24/1.0441384/4
- Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- 62. Chu, X.; Zhang, B.; Xu, R. FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12219–12228. http://dx.doi.org/10.1109/ICCV48922.2021.01202 https://arxiv.org/abs/1907.01845 (NAS in the context of weight sharing architectures.)
- Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram Vishwanath, Arun K. Somani, 2023, A Survey of Techniques for Optimizing Transformer Inference, https://arxiv.org/abs/2307.07982
- Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_CVPR_2019/html/Tan_MnasNet_Platform_Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.html.
- David Spuler, March 2024, Chapter 56. Neural Architecture Search, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
- Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra, 27 Jun 2024 (v2), MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Meta Research, https://arxiv.org/abs/2402.14905 Code: https://github.com/facebookresearch/MobileLLM
- Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
- Matias Martinez, 2 Aug 2024, The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines, https://arxiv.org/abs/2408.01050
- Szabolcs Cséfalvay, James Imber, 31 Jan 2023 (v2), Self-Compressing Neural Networks, https://arxiv.org/abs/2301.13142
- Ye Qiao, Haocheng Xu, Yifan Zhang, Sitao Huang, 26 Aug 2024, MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs, https://arxiv.org/abs/2408.15034
- Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, and Xiaoyi Zhang. 2024. Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24). Association for Computing Machinery, New York, NY, USA, Article 648, 1–19. https://doi.org/10.1145/3613904.3642628 https://dl.acm.org/doi/full/10.1145/3613904.3642628
- Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
- Jiantong Jiang, Ajmal Mian, 1 Sep 2024, FastBO: Fast HPO and NAS with Adaptive Fidelity Identification, https://arxiv.org/abs/2409.00584
- Hung-Yueh Chiang, Diana Marculescu, 27 Aug 2024, SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search, https://arxiv.org/abs/2408.15395
- Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li, 17 Jul 2024, MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs, https://arxiv.org/abs/2407.18267
- Xuan Shen, Pu Zhao, Yifan Gong, Zhenglun Kong, Zheng Zhan, Yushu Wu, Ming Lin, Chao Wu, Xue Lin, Yanzhi Wang, 25 Sep 2024, Search for Efficient Large Language Models, https://arxiv.org/abs/2409.17372 (Looking for subnets inside models as an alternative to NAS.)
- Rhea Sanjay Sukthanker, Benedikt Staffler, Frank Hutter, Aaron Klein, 9 Oct 2024, LLM Compression with Neural Architecture Search, https://arxiv.org/abs/2410.06479 (NAS with width/attention head and layer pruning.)
- Inas Bachiri, September 2024, A Literature Review on Combining Neural Architecture Search and Compiler Optimizations for Neural Network Acceleration, DOI:10.13140/RG.2.2.10612.16009, Thesis for: Master in Computer Science, https://www.researchgate.net/publication/384190836_A_Literature_Review_on_Combining_Neural_Architecture_Search_and_Compiler_Optimizations_for_Neural_Network_Acceleration https://www.researchgate.net/profile/Inas-Bachiri/publication/384190836_A_Literature_Review_on_Combining_Neural_Architecture_Search_and_Compiler_Optimizations_for_Neural_Network_Acceleration/links/66ed912c6b101f6fa4f3d6ce/A-Literature-Review-on-Combining-Neural-Architecture-Search-and-Compiler-Optimizations-for-Neural-Network-Acceleration.pdf
- Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong, 1 Nov 2024 (v3), Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623 https://github.com/sail-sg/scaling-with-vocab https://hf.co/spaces/sail/scaling-with-vocab-demo
This is not the full list of papers, I add with reasonable certainty, given that one survey paper stated there have been over 1,000 papers written on NAS since 2021. If this is your chosen dissertation topic, better start writing that lit review section early!
NAS and Dynamic Inference Optimization
Dynamic NAS is not yet a mainstream use of NAS searching. NAS has traditionally been applied to finding models without regard to dynamic approaches. An emerging area of research is to consider the hyperparameters of dynamic inference optimizations as part of searching the problem space for an optimal model.
Research papers on "dynamic NAS" include:
- Matteo Gambella, Manuel Roveri, "EDANAS: Adaptive Neural Architecture Search for Early Exit Neural Networks", 2023 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2023. https://ieeexplore.ieee.org/document/10191876 (NAS applied to early-exit dynamic inference.)
- Chakkrit Termritthikun, Yeshi Jamtsho, Jirarat Ieamsaard, Paisarn Muneesawang, Ivan Lee, 2021, EEEA-Net: An Early Exit Evolutionary Neural Architecture Search, Engineering Applications of Artificial Intelligence Volume 104, September 2021, 104397, https://www.sciencedirect.com/science/article/abs/pii/S0952197621002451, https://arxiv.org/abs/2108.06156, Code: https://github.com/chakkritte/EEEA-Net (A 2021 paper on NAS applied to early-exit.)
- KT Chitty-Venkata, Y Bian, M Emani, V Vishwanath, Jan 2023 Differentiable Neural Architecture, Mixed Precision and Accelerator Co-search, IEEE Access, DOI:10.1109/ACCESS.2023.3320133, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10266308
- Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Szymon Migacz, Alex Fit Florea, 2022, GPUNet: Searching the Deployable Convolution Neural Networks for GPUs, https://arxiv.org/abs/2205.00841 (A general NAS system that could be applied statically or dynamically.)
- Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri, 24 Jan 2024, NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks, https://arxiv.org/abs/2401.13330
- David Spuler, March 2024, Chapter 56. Neural Architecture Search, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Angie Boggust, Venkatesh Sivaraman, Yannick Assogba, Donghao Ren, Dominik Moritz, Fred Hohman, 6 Aug 2024, Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments, https://arxiv.org/abs/2408.03274
- Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
- Akhiad Bercovich, Tomer Ronen, Talor Abramovich, Nir Ailon, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Netanel Haber, Ehud Karpas, Itay Levy, Shahar Mor, Zach Moshe, Najeeb Nabwani, Omri Puny, Ran Rubin, Itamar Schen, Ido Shahaf, Oren Tropp, Omer Ullman Argov, Ran Zilberstein, Ran El-Yaniv, 28 Nov 2024, Puzzle: Distillation-Based NAS for Inference-Optimized LLMs,NVIDIA Research, https://arxiv.org/abs/2411.19146 (This is dynamic NAS on a vast scale in a search space of size 10^138, because the optimization is applied with low granularity to each block in attention and FFN subcomponents of each layer.)
- Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli and Michael Poli, Liquid AI, December 2, 2024, Automated Architecture Synthesis via Targeted Evolution, https://arxiv.org/abs/2411.17800 https://www.liquid.ai/research/automated-architecture-synthesis-via-targeted-evolution
More AI Research
Read more about: