Aussie AI

Cascades

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Cascades

Cascades are a type of model inference optimization where execution flows down through a “cascade” of sub-structures, with the routing sequence depending on the inputs. This optimization mainly relates to early types of neural networks (i.e. DNNs and CNNs), rather than Transformer model architectures.

Cascade optimization is similar to “dynamic routing”, early exiting (especially “hierarchical early-exit”), and dynamic structural pruning (e.g. filter pruning, channel pruning, width pruning). The general class of algorithms is dynamic inference optimization (also called “adaptive inference”), where the model's execution path is changed dynamically, depending on the inputs.

The basic cascades architecture is not an ensemble architecture, but simply dynamic inference through a single model. However, this idea can be generalized to multiple paths through multiple models, which can either be an AI heuristic, or can alternatively be a simple matter of job scheduling in a deployment architecture. The area of cascades for DNNs/CNNs has generally received less research attention with the rise of Transformers, but there are still many papers.

Research papers on cascade optimizations:

P. Panda, A. Sengupta, and K. Roy, 2016, Conditional deep learning for energy-efficient and enhanced pattern recognition, in Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2016. https://arxiv.org/abs/1509.08971
Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris, 2023, MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge, 2023 IEEE Symposium on Computers and Communications (ISCC), pp.411-416, 2023. https://ieeexplore.ieee.org/document/10217872
Oihane Gómez-Carmona, Diego Casado-Mansilla, Diego López-de-Ipiña, Javier García-Zubia, 2022, Optimizing Computational Resources for Edge Intelligence Through Model Cascade Strategies, IEEE Internet of Things Journal, vol.9, no.10, pp.7404-7417, 2022. https://ieeexplore.ieee.org/document/9564246
Sam Leroux, Steven Bohez, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, Bart Dhoedt, 2017, The cascading neural network: building the Internet of Smart Things, Knowledge and Information Systems, 2017. https://doi.org/10.1007/s10115-017-1029-1
Wang, X., Luo, Y., Crankshaw, D., Tumanov, A., Yu, F., and Gonzalez, J. E. (2018). Idk cascades: Fast deep learning by learning not to overthink, https://arxiv.org/abs/1706.00885
Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, and Alexander J. Smola. 2020. Transformer on a Diet, arXiv e-prints (2020), arXiv:2002.06170. https://arxiv.org/abs/2002.06170
K. Neshatpour, F. Behnia, H. Homayoun, and A. Sasan. 2018, ICNN: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation, In Design, Automation, and Test in Europe Conference, pages 551–556, 2018. https://ieeexplore.ieee.org/document/8342068
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2017, Fractalnet: Ultra-deep neural networks without residuals, In ICLR, 2017 https://arxiv.org/abs/1605.07648 (Not cascades, but similar conceptually.)
H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. 2015, A convolutional neural network cascade for face detection, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5325–5334, 2015. https://ieeexplore.ieee.org/document/7299170
Y. Sun, X. Wang, and X. Tang. 2013, Deep convolutional network cascade for facial point detection, In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3476–3483. IEEE, 2013. https://ieeexplore.ieee.org/document/6619290
Thomas Dean, Mark A Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik. 2013. Fast, accurate detection of 100,000 object classes on a single machine, In Proc. CVPR. https://web.stanford.edu/class/cs231m/references/hashing-dpm.pdf
A. Kouris, S. I. Venieris, C. Bouganis, 2018, Cascade CNN: Pushing the performance limits of quantisation in convolutional neural networks, in: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), 2018, pp. 155–1557. doi:10.1109/FPL.2018.00034. http://dx.doi.org/10.1109/FPL.2018.00034
A. Kouris, S. Venieris, C.-S. Bouganis, 2019, A throughput-latency co-optimised cascade of convolutional neural network classifiers, IEEE, 2019. http://hdl.handle.net/10044/1/75445, http://hdl.handle.net/10044/1/75445
E. S. Marquez, J. S. Hare, M. Niranjan, 2018, Deep cascade learning, IEEE Transactions on Neural Networks and Learning Systems 29 (11) (2018) 5475–5485. doi:10.1109/TNNLS.2018.2805098. http://dx.doi.org/10.1109/TNNLS.2018.2805098
Berestizshevsky, K., Even, G., 2019, Dynamically sacrificing accuracy for reduced computation: Cascaded inference based on softmax confidence, In: Lecture Notes in Computer Science, pp. 306–320. Springer International Publishing (2019). https://doi.org/10.1007/978-3-030-30484-3_26 (Early exit; somewhat related to cascades.)
Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q., 2017, Multi-scale dense networks for resource efficient image classification, In: 6th International Conference on Learning Representations, ICLR 2018 (2018). https://doi.org/10.48550/arXiv.1703.09844 https://arxiv.org/abs/1703.09844 (Hierarchical early-exit scheme with multiple models is conceptually similar to cascades.)
Jayakodi, N.K., Chatterjee, A., Choi, W., Doppa, J.R., Pande, P.P., 2018, Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37(11), 2881–2893 (2018). https://doi.org/10.1109/tcad.2018.2857338, https://arxiv.org/abs/1901.10584
Passalis, N., Raitoharju, J., Tefas, A., Gabbouj, M., 2020, Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits, Pattern Recognition 105, 107346 (2020). https://doi.org/10.1016/j.patcog.2020.107346, PDF: https://hal.science/hal-03265174/document (Hierarchical early exit is similar to cascades.)
A Moos, 2023, Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs, arXiv preprint arXiv:2309.03530, https://arxiv.org/pdf/2309.03530.pdf (Fast inference for a soccer-playing robot with cascade-like hierarchical early exits.)
H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. 2015, A convolutional neural network cascade for face detection, CVPR, 2015. https://ieeexplore.ieee.org/document/7299170
F. Yang, W. Choi, and Y. Lin. 2016, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, CVPR, 2016. https://ieeexplore.ieee.org/document/7780603, PDF: https://www.cvlibs.net/projects/autonomous_vision_survey/literature/Yang2016CVPR.pdf (Cascaded rejection classifiers.)
Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu May 2015, HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition, https://arxiv.org/abs/1410.0736
Y Tang, T Iwaguchi, H Kawasaki, 2023, Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy, arXiv preprint arXiv:2309.03445, https://arxiv.org/abs/2309.03445, Code: https://github.com/piggy2009/DM_underwater (Skipping iteratively is somewhat similar to cascading.)
D. Kang, J. Emmons, F. Abuzaid, P. Bailis and M. Zaharia, 2017, NoScope: Optimizing neural network queries over video at scale, Proc. VLDB Endowment, vol. 10, no. 11, pp. 1586-1597, 2017. https://arxiv.org/abs/1703.02529 (Cascades when analyzing images in video in real-time.)
P. Viola and M. Jones. 2001, Rapid object detection using a boosted cascade of simple features, In CVPR, 2001. https://ieeexplore.ieee.org/document/990517
Zhaowei Cai; Mohammad Saberian; Nuno Vasconcelos. 2015, Learning complexity-aware cascades for deep pedestrian detection, In ICCV, 2015. https://ieeexplore.ieee.org/document/8686227
Rodrigo Verschae, Javier Ruiz-del-Solar & Mauricio Correa, 2008, A unified learning framework for object detection and classification using nested cascades of boosted classifiers, Machine Vision and Applications, 19(2), 2008, https://link.springer.com/article/10.1007/s00138-007-0084-0
K. Neshatpour, F. Behnia, H. Homayoun, and A. Sasan. 2018, ICNN: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation, In Design, Automation, and Test in Europe Conference, pages 551–556, 2018. https://ieeexplore.ieee.org/document/8342068 (Sequences of small feed-forward networks focus on parts of an image.)
Francesco Daghero, Alessio Burrello, Daniele Jahier Pagliari, Luca Benini, Enrico Macii, Massimo Poncino, 2020, Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With Class-Dependent Confidence, 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp.1-4, 2020. https://ieeexplore.ieee.org/document/9294863, https://arxiv.org/abs/2204.03431v1 (An improved stopping policy for early exits on easy-input classification tasks.)
Yiding Wang, Kai Chen, Haisheng Tan, and Kun Guo. 2023, Tabi: An efficient multi-level inference system for large language models, In Proceedings of the Eighteenth European Conference on Computer Systems, pages 233–248, 2023. https://dl.acm.org/doi/10.1145/3552326.3587438, PDF: https://yidingwang.xyz/public/files/tabi_eurosys23.pdf (Has multiple models, some big, some small, with characteristics similar to ensembles, big-little, and cascades.)
P Kavehzadeh, M Valipour, M Tahaei, A Ghodsi, 2023, Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT), arXiv preprint, https://arxiv.org/pdf/2309.08968.pdf (Cascade-like item: SortedNet method unlocks the potential of intermediate layers.)