Aussie AI
Easy-vs-Hard Queries
-
Last Updated 11 October, 2024
-
by David Spuler, Ph.D.
What is Easy-vs-Hard Query Optimization? This optimization is based on the observation in LLM theory that some queries are "easy" to compute, whereas others are "hard" to predict. This has the obvious optimization idea of sending the easy queries to a small model, and only doing full computation of a large model on the "hard" queries.
This idea is a type of "adaptive inference" where the model does different computations according to the inputs. Some of the ways to do this include:
- Mixture-of-experts
- Cascades and other dynamic routing methods
- Big-little architectures
- Early exiting
- Speculative decoding
Research on Easy-Hard Architectures. Various papers have examined the easy-versus-hard query distinction and related optimizations:
- Qingyuan Wang, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John, 26 Mar 2024, Tiny Models are the Computational Saver for Large Models, https://arxiv.org/abs/2403.17726v1 (Choose tiny or small models after an initial layer of the larger model, combining early exit with easy-hard queries for multi-model inference.)
- Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng, 25 Jan 2024, Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing, https://arxiv.org/abs/2312.14472 (Dynamic routing based on easy vs hard queries to optimize training.)
- M Salehi, S Mehta, A Kusupati, A Farhadi, H Hajishirzi, 2023, Sharcs: Efficient transformers through routing with dynamic width sub-networks https://arxiv.org/pdf/2310.12126.pdf (Direct queries to subnetworks with different widths.)
- Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
- Avi Schwarzschild. Easy-To-Hard, October 2021. https://github.com/aks2203/easy-to-hard
- Y Wang, K Chen, H Tan, K Guo, 2023, Tabi: An Efficient Multi-Level Inference System for Large Language Models, EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems, Rome, Italy May 2023, Pages 233–248, https://doi.org/10.1145/3552326.3587438, https://dl.acm.org/doi/10.1145/3552326.3587438, PDF: https://cse.hkust.edu.hk/~kaichen/papers/tabi-eurosys23.pdf (Dynamic routing to small or large LLMs based on the query.)
- David Spuler, March 2024, Chapter 50. Adaptive Inference, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Tianle Li, Wei-Lin Chiang, Lisa Dunlap, May 20, 2024, Introducing Hard Prompts Category in Chatbot Arena, https://lmsys.org/blog/2024-05-17-category-hard/
- Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
- Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
- Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, John Langford, Furong Huang, 6 Oct 2024, EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? https://arxiv.org/abs/2410.04571
More AI Research
Read more about: