Aussie AI

Collaborative Inference

Last Updated 22 October, 2025

by David Spuler, Ph.D.

Collaborative inference is a type of multi-model ensemble AI optimization strategy where two or more engines combine to perform inference calculations. There are two basic architectures:

Multi-component partial inference
Multi-component full inference

In multi-component partial inference, multiple sub-components contribute to a single inference computation. For example, parts of the inference computation can be spread out across multiple machines or multiple GPUs, and then combined together to complete the inference result. The output is a single prediction for decoding.

The alternative is multi-component full inference, where multiple components (or entire models) perform a full inference, with results combined at the end. All of the inference computations occur independently. Each model or component generates its own separate prediction of output tokens and their probabilities. Then a decision mechanism analyzes the outputs of each model, and decides on which final token to output.

There are several variations on either of these two approaches. Particular types of collaborative inference include:

Speculative Decoding
Consensus-based decoding
Mutually-guided decoding
Big-Little Architectures
Committee-based inference
Ensemble Decoding
Swarm inference (swarm decoding)

Research on Collaborative Inference (Generally)

Research papers on collaborative inference include:

G Xu, Z Hao, Y Luo, H Hu, J An, S Mao, 2023, DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices, arXiv preprint arXiv:2309.05015, https://arxiv.org/abs/2309.05015
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Hao Peng, Ximing Lu, Dragomir Radev, Yejin Choi, Noah A. Smith, Oct 2022, Twist Decoding: Diverse Generators Guide Each Other, https://arxiv.org/abs/2205.09273, Code: https://github.com/jungokasai/twist_decoding (Twist decoding is a type of collaborative inference.)
J Kasai, 2023, Towards Efficient, Customizable, and Communal Natural Language Processing, Ph.D. thesis, Computer Science and Engineering, University of Washington, https://www.proquest.com/openview/604084b574dcd05e41eb6e33682a3537/1 (Impressive thesis includes twist decoding amid other topics.)
Jinduo Song, Zhicheng Liu, Xiaofei Wang, Chao Qiu, Xu Chen, 2021, "Adaptive and Collaborative Edge Inference in Task Stream with Latency Constraint", ICC 2021, IEEE International Conference on Communications, pp.1-6, https://ieeexplore.ieee.org/document/9500892
C Luo, J Chen, X Feng, J Zhang, J Li, 2023, Sustainable Collaborative Inference in Intelligent Transportation Systems IEEE Transactions on Intelligent Transportation, https://ieeexplore.ieee.org/document/10239242
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, Lingjia Tang, 2017, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Comput. Archit. News, vol. 52, no. 4, pp. 615–629, https://dl.acm.org/doi/10.1145/3037697.3037698
Z. Hao, G. Xu, Y. Luo, H. Hu, J. An, and S. Mao, June 2022, “Multi-agent collaborative inference via dnn decoupling: Intermediate feature compression and edge learning,” IEEE Trans. Mob. Comput., 2022, https://arxiv.org/abs/2205.11854
J. Kim, Y. Park, G. Kim, and S. J. Hwang, “Splitnet: Learning to semantically split deep networks for parameter reduction and model parallelization,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 1866–1874. http://proceedings.mlr.press/v70/kim17b/kim17b.pdf
Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim, “ µlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization,” in Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, G. Candea, R. van Renesse, and C. Fetzer, Eds. ACM, 2019, pp. 45:1–45:15. https://dl.acm.org/doi/10.1145/3302424.3303950
T. Mohammed, C. Joe-Wong, R. Babbar, and M. D. Francesco, “Distributed inference acceleration with adaptive DNN partitioning and offloading,” in 39th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, ON, Canada, July 6-9, 2020. IEEE, 2020, pp. 854–863, https://ieeexplore.ieee.org/document/9155237
S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “CNNPC: end-edge-cloud collaborative CNN inference with joint model partition and compression,” IEEE Trans. Parallel Distributed Syst., vol. 33, no. 10, pp. 4039–4056, 2022. https://ieeexplore.ieee.org/document/9782528
X Xu, K Yan, S Han, B Wang, X Tao, P Zhang, 2023, Learning-Based Edge-Device Collaborative DNN Inference in IoVT Networks IEEE Internet of Things Journal, https://ieeexplore.ieee.org/abstract/document/10258387
Dec 2023, Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation, Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen, https://arxiv.org/abs/2311.09049 Code: https://github.com/RUCAIBox/LC-Rec/
Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk, Nov 2023, Adaptive Early Exiting for Collaborative Inference over Noisy Wireless Channels, https://arxiv.org/abs/2311.18098 (Early exiting combined with collaborative inference.)
Junho Wohn, February 2024, Optimizing Deep Learning Model Inference using Efficient Model Partitioning on Edge Devices, Thesis for the Master of Science, Graduate School of Hanyang University, https://repository.hanyang.ac.kr/handle/20.500.11754/188388, PDF: https://hanyang.dcollection.net/public_resource/pdf/200000726139_20240331200233.pdf (Compiles models using the TVM deep learning compiler and then partitions them across multiple edge devices for collaborative edge inference.)
Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
Emre Kilcioglu, March 2024, Collaborative On-device CNN Inference: Design and Optimization of Communication and Computation, Ph.D. thesis, Engineering Sciences and Technology, UCLouvain, PDF: https://dial.uclouvain.be/pr/boreal/object/boreal%3A286224/datastream/PDF_01/view
David Spuler, March 2024, Chapter 54. Ensemble Multi-Model Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Mingjin Zhang, 2024, High-performance scheduling of deep learning tasks in collaborative edge computing, Ph.D. Thesis, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, https://theses.lib.polyu.edu.hk/bitstream/200/13080/3/7528.pdf (Scheduling of inference and training tasks on edge devices with techniques such as model splitting/partitioning.)
Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
J. Niu, W. Zhang, C. J. Xue and N. Guan, 2024, "RTiL: Real-Time Inference of Large Language Models on Memory-Constrained GPU Devices," 2024 IEEE 30th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Sokcho, Korea, Republic of, 2024, pp. 21-30, doi: 10.1109/RTCSA62462.2024.00013. https://ieeexplore.ieee.org/abstract/document/10695719
Akrit Mudvari, Yuang Jiang, Leandros Tassiulas, 16 Oct 2024 (v2), SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization, https://arxiv.org/abs/2410.10759
Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen, 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models, https://arxiv.org/abs/2411.00492
Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang, 11 Nov 2024, WDMoE: Wireless Distributed Mixture of Experts for Large Language Models, https://arxiv.org/abs/2411.06681
Yingxuan Yang, Qiuying Peng, Jun Wang, Weinan Zhang, 21 Nov 2024, Multi-LLM-Agent Systems: Techniques and Business Perspectives, https://arxiv.org/abs/2411.14033
Yuntian Chen, Zhanyong Tang, Tianpei Lu, Bingsheng Zhang, Zhiying Shi, Zheng Wang, 21 Dec 2024, Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation. https://arxiv.org/abs/2412.16537
Sehoon Kim, Oct 2024, Full Stack Approach for Efficient Deep Learning Inference, Doctor of Philosophy, Computer Science, University of California, Berkeley, https://escholarship.org/content/qt4wf834q8/qt4wf834q8.pdf
X. Zheng, W. Zhang, C. Hu, L. Zhu and C. Zhang, "Cloud-Edge-End Collaborative Inference in Mobile Networks: Challenges and Solutions," in IEEE Network, doi: 10.1109/MNET.2025.3533581. https://ieeexplore.ieee.org/abstract/document/10852347
Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov, 6 Feb 2025, When One LLM Drools, Multi-LLM Collaboration Rules, https://arxiv.org/abs/2502.04506
Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu, 16 May 2025, Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity, https://arxiv.org/abs/2505.11107
Yang Liu, Bingjie Yan, Tianyuan Zou, Jianqing Zhang, Zixuan Gu, Jianbing Ding, Xidong Wang, Jingyi Li, Xiaozhou Ye, Ye Ouyang, Qiang Yang, Ya-Qin Zhang, 24 Apr 2025, Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks, https://arxiv.org/abs/2504.17421
J. Pablo Mu\~noz and Jinjie Yuan, 7 Aug 2025, RTTC: Reward-Guided Collaborative Test-Time Compute, https://arxiv.org/abs/2508.10024
Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy, 14 Aug 2025, Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution, https://arxiv.org/abs/2407.15881
Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li, 23 Jul 2025, HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery, https://arxiv.org/abs/2507.17209
Arpan Dasgupta, Mizhaan Maniyar, Awadhesh Srivastava, Sanat Kumar, Amrita Mahale, Aparna Hedge, Arun Suggala, Karthikeyan Shanmugam, Aparna Taneja, Milind Tambe, 22 Jul 2025, Learning to Call: A Field Trial of a Collaborative Bandit Algorithm for Improved Message Delivery in Mobile Maternal Health, https://arxiv.org/abs/2507.16356
Bo Hou and Xin Tan and Kai Zheng and Fang Liu and Yinghao Zhu and Li Zhang, 22 Jul 2025, LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning, https://arxiv.org/abs/2507.16395
Sabrina Livanec, Laura Londo\~no, Michael Gorki, Adrian R\"ofer, Abhinav Valada, Andrea Kiesel, 22 Jul 2025, Designing for Difference: How Human Characteristics Shape Perceptions of Collaborative Robots, https://arxiv.org/abs/2507.16480
Hao Tuo, Yan Li, Xuanning Hu, Haishi Zhao, Xueyan Liu, Bo Yang, 22 Jul 2025, A Collaborative Framework Integrating Large Language Model and Chemical Fragment Space: Mutual Inspiration for Lead Design, https://arxiv.org/abs/2507.13580
Liang Zhang, Xiaoming Zhai, Jionghao Lin, Jionghao Lin, Jennifer Kleiman, Diego Zapata-Rivera, Carol Forsyth, Yang Jiang, Xiangen Hu, Arthur C. Graesser, 2 May 2025, Exploring Communication Strategies for Collaborative LLM Agents in Mathematical Problem-Solving, https://arxiv.org/abs/2507.17753
Zhangqi Liu, 22 Jul 2025, Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems, https://arxiv.org/abs/2507.17774
Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun, 23 Jul 2025, Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale, https://arxiv.org/abs/2507.17985
Donghoon Shin, Daniel Lee, Gary Hsieh, Gromit Yeuk-Yin Chan, 24 Jul 2025, PosterMate: Audience-driven Collaborative Persona Agents for Poster Design, https://arxiv.org/abs/2507.18572
Kester Wong, Sahan Bulathwela and Mutlu Cukurova, 19 Jul 2025, Explainable Collaborative Problem Solving Diagnosis with BERT using SHAP and its Implications for Teacher Adoption, https://arxiv.org/abs/2507.14584
Xinheng Lyu, Yuci Liang, Wenting Chen, Meidan Ding, Jiaqi Yang, Guolin Huang, Daokun Zhang, Xiangjian He, and Linlin Shen, 19 Jul 2025, WSI-Agents: A Collaborative Multi-Agent System for Multi-Modal Whole Slide Image Analysis, https://arxiv.org/abs/2507.14680
Tuo Zhang, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative, https://arxiv.org/abs/2508.07329
Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni, 9 Aug 2025, A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning, https://arxiv.org/abs/2408.07057
Simone Bendazzoli, Sanna Persson, Mehdi Astaraki, Sebastian Pettersson, Vitali Grozman, Rodrigo Moreno, 28 May 2025, MAIA: A Collaborative Medical AI Platform for Integrated Healthcare Innovation, https://arxiv.org/abs/2507.19489
Tolga Dimlioglu, Anna Choromanska, 27 Jul 2025, Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning, https://arxiv.org/abs/2507.20424
Yizhe Zhang, 28 Jul 2025, Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only "Better or Worse" Expert Feedback, https://arxiv.org/abs/2507.05815
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
Yukino Terui, Yuka Inoue, Yohei Hamakawa, Kosuke Tatsumura, Kazue Kudo, 29 Jul 2025, Collaborative filtering based on nonnegative/binary matrix factorization, https://arxiv.org/abs/2410.10381
Hongyan Cheng, Chengzhang Yu, Yanshu Shi, Chiyue Wang, Cong Liu, and Zhanpeng Jin, 30 Jul 2025, Collaborative Medical Triage under Uncertainty: A Multi-Agent Dynamic Matching Approach, https://arxiv.org/abs/2507.22504
Peng-Yi Wu, Pei-Cing Huang, Ting-Yu Chen, Chantung Ku, Ming-Yen Lin, Yihuang Kang, 30 Jul 2025, Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework, https://arxiv.org/abs/2507.22464
Yuzhen Gao, Qianqian Wang, Yongheng Sun, Cui Wang, Yongquan Liang, Mingxia Liu, 30 Jul 2025, Learning from Heterogeneous Structural MRI via Collaborative Domain Adaptation for Late-Life Depression Assessment, https://arxiv.org/abs/2507.22321
Thanh Hoang-Minh, 30 Jul 2025, Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs, https://arxiv.org/abs/2507.03947
Evan Rose, Hidde Lycklama, Harsh Chaudhari, Anwar Hithnawi, Alina Oprea, 1 Aug 2025, UTrace: Poisoning Forensics for Private Collaborative Learning, https://arxiv.org/abs/2409.15126
Shiyang Duan, Yuan Tian, Qi Bing, Xiaowei Shao, 3 Aug 2025, Bayes-Entropy Collaborative Driven Agents for Research Hypotheses Generation and Optimization, https://arxiv.org/abs/2508.01746
En Yu, Jie Lu, Kun Wang, Xiaoyu Yang, Guangquan Zhang, 3 Aug 2025, Drift-aware Collaborative Assistance Mixture of Experts for Heterogeneous Multistream Learning, https://arxiv.org/abs/2508.01598
Ziqi Sheng, Junyan Wu, Wei Lu, Jiantao Zhou, 2 Aug 2025, Weakly-Supervised Image Forgery Localization via Vision-Language Collaborative Reasoning Framework, https://arxiv.org/abs/2508.01338
Yi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Lizhe Zhang, Yan Liu, Bin Qin, 3 Aug 2025, Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy, https://arxiv.org/abs/2508.01696
Yang Zhao, Chengxiao Dai, Wei Zhuo, Tan Chuan Fu, Yue Xiu, Dusit Niyato, Jonathan Z. Low, Eugene Ho Hong Zhuang, Daren Zong Loong Tan, 3 Aug 2025, AGENTICT$^2$S:Robust Text-to-SPARQL via Agentic Collaborative Reasoning over Heterogeneous Knowledge Graphs for the Circular Economy, https://arxiv.org/abs/2508.01815
Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia, Jiawei Xu, Jinyu Xiang, Yizhang Lin, Tianming Liu, Tongliang Liu, Yu Su, Huan Sun, Glen Berseth, Jianyun Nie, Ian Foster, Logan Ward, Qingyun Wu, Yu Gu, Mingchen Zhuge, Xinbing Liang, Xiangru Tang, Haohan Wang, Jiaxuan You, Chi Wang, Jian Pei, Qiang Yang, Xiaoliang Qi, Chenglin Wu, 2 Aug 2025, Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems, https://arxiv.org/abs/2504.01990
Siyuan Li, Yifan Yu, Yanchen Deng, Zhihao Zhang, Mengjing Chen, Fangzhou Zhu, Tao Zhong, Jianye Hao, Peng Liu, Bo An, 5 Aug 2025, Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming, https://arxiv.org/abs/2508.03030
Arthur Cho, 4 Aug 2025, GrandJury: A Collaborative Machine Learning Model Evaluation Protocol for Dynamic Quality Rubrics, https://arxiv.org/abs/2508.02926
Marta Moscati, Shah Nawaz, Markus Schedl, 5 Aug 2025, Parameter-Efficient Single Collaborative Branch for Recommendation, https://arxiv.org/abs/2508.03518
Asutosh Hota, Jussi P.P. Jokinen, 7 Aug 2025, NomicLaw: Emergent Trust and Strategic Argumentation in LLMs During Collaborative Law-Making, https://arxiv.org/abs/2508.05344
Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, and Minlie Huang, 7 Aug 2025, JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering, https://arxiv.org/abs/2508.05087
Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Mart\'in-Mart\'in, 7 Aug 2025, Mixed-Initiative Dialog for Human-Robot Collaborative Manipulation, https://arxiv.org/abs/2508.05535
Nikita Sukhorukov, Danil Gusak, Evgeny Frolov, 8 Aug 2025, Maximum Impact with Fewer Features: Efficient Feature Selection for Cold-Start Recommenders through Collaborative Importance Weighting, https://arxiv.org/abs/2508.06455
Shibin Su, Guoqiang Liang, De Cheng, Shizhou Zhang, Lingyan Ran, Yanning Zhang, 12 Aug 2025, Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL, https://arxiv.org/abs/2508.08677
Ratun Rahman, 12 Aug 2025, Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence, https://arxiv.org/abs/2504.17703
Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C.M. Leung, 12 Aug 2025, Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey, https://arxiv.org/abs/2505.01821
Yue Yao, Zhen Xu, Youzhu Liu, Kunyuan Ma, Yuxiu Lin, Mohan Jiang, 13 Aug 2025, Integrating Feature Attention and Temporal Modeling for Collaborative Financial Risk Assessment, https://arxiv.org/abs/2508.09399
Hao Yu, Xin Yang, Boyang Fan, Xuemei Cao, Hanlin Gu, Lixin Fan, Qiang Yang, 13 Aug 2025, Large-Small Model Collaborative Framework for Federated Continual Learning, https://arxiv.org/abs/2508.09489
Muqing Li, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang, 10 Aug 2025, CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge, https://arxiv.org/abs/2508.09208
Lingyu Chen, Yawen Zeng, Yue Wang, Peng Wan, Guo-chen Ning, Hongen Liao, Daoqiang Zhang, Fang Chen, 13 Aug 2025, COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets, https://arxiv.org/abs/2508.09886
Xinyi Li, Sai Wang, Yutian Lin, Yu Wu, Yi Yang, 14 Aug 2025, Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis, https://arxiv.org/abs/2508.10967
Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui, 15 Aug 2025, CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems, https://arxiv.org/abs/2508.11287
Xuyang Zhao, Shiwan Zhao, Hualong Yu, Liting Zhang, Qicheng Li, 16 Aug 2025, AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning, https://arxiv.org/abs/2508.11995
Wentao Li, Yonghu He, Kun Gao, Qing Liu and Yali Zheng, 7 Aug 2025, Collaborative Learning-Enhanced Lightweight Models for Predicting Arterial Blood Pressure Waveform in a Large-scale Perioperative Dataset, https://arxiv.org/abs/2508.11669
Mohammad Ishzaz Asif Rafid, Morsalin Sakib, 16 Aug 2025, Substituting Proof of Work in Blockchain with Training-Verified Collaborative Model Computation, https://arxiv.org/abs/2508.12138
Chiranjit Mitra, 17 Aug 2025, Synchronization Dynamics of Heterogeneous, Collaborative Multi-Agent AI Systems, https://arxiv.org/abs/2508.12314
Chen Qian, Xinran Yu, Zewen Huang, Danyang Li, Qiang Ma, Fan Dang, Xuan Ding, Guangyong Shang, Zheng Yang, 18 Aug 2025, SpotVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer, https://arxiv.org/abs/2508.12638
Xizhan Gao, Wei Hu, 18 Aug 2025, DCSCR: A Class-Specific Collaborative Representation based Network for Image Set Classification, https://arxiv.org/abs/2508.12745
Saptarshi Nath, Christos Peridis, Eseoghene Benjamin, Xinran Liu, Soheil Kolouri, Peter Kinnell, Zexin Li, Cong Liu, Shirin Dora, and Andrea Soltoggio, 18 Aug 2025, Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems, https://arxiv.org/abs/2506.05577
Can Jin, Hongwu Peng, Qixin Zhang, Yujin Tang, Dimitris N. Metaxas, Tong Che, 19 Aug 2025, Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning, https://arxiv.org/abs/2504.09772
Jo\~ao Vitor de Carvalho Silva and Douglas G. Macharet, 20 Aug 2025, Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination, https://arxiv.org/abs/2508.14635
Lixiang Yan, 20 Aug 2025, From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning, https://arxiv.org/abs/2508.14825
Amir Kermanshahani, Ebrahim Ardeshir-Larijani, Rakesh Saini and Saif Al-Kuwari, 12 Aug 2025, Collaborative Filtering using Variational Quantum Hopfield Associative Memory, https://arxiv.org/abs/2508.14906
Simon Lepage, Jeremie Mary, David Picard, 12 Aug 2025, Closing the Performance Gap in Generative Recommenders with Collaborative Tokenization and Efficient Modeling, https://arxiv.org/abs/2508.14910
Sindhuja Penchala, Saketh Reddy Kontham, Prachi Bhattacharjee, Sareh Karami, Mehdi Ghahremani, Noorbakhsh Amiri Golilarz, and Shahram Rahimi, 5 Aug 2025, Learning in Focus: Detecting Behavioral and Collaborative Engagement Using Vision Transformers, https://arxiv.org/abs/2508.15782
Zirui Li and Stephan Husung and Haoze Wang, 22 Aug 2025, LLM-Assisted Semantic Alignment and Integration in Collaborative Model-Based Systems Engineering Using SysML v2, https://arxiv.org/abs/2508.16181
Yu Yan, Sheng Sun, Zixiang Tang, Teli Liu, Min Liu, 22 Aug 2025, Collaborative Stance Detection via Small-Large Language Model Consistency Verification, https://arxiv.org/abs/2502.19954
Kimia Ehsani, Walid Saad, 4 Sep 2025, Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models, https://arxiv.org/abs/2509.03837
Zhe Huang, Shuo Wang, Yongcai Wang, Lei Wang, 4 Sep 2025, CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection, https://arxiv.org/abs/2502.14891
Shiqin Han, Manning Gao, Menghua Jiang, Yuncheng Jiang, Haifeng Hu, Sijie Mai, 27 Aug 2025, Uncertainty-Aware Collaborative System of Large and Small Models for Multimodal Sentiment Analysis, https://arxiv.org/abs/2509.04459
Martin Lochner and Keegan Keplinger, 25 Aug 2025, Collaborative Intelligence: Topic Modelling of Large Language Model use in Live Cybersecurity Operations, https://arxiv.org/abs/2508.18488
Jiaqi Wu, Jing Liu, Yang Liu, Lixu Wang, Zehua Wang, Wei Chen, Zijian Tian, Richard Yu, Victor C.M. Leung, 26 Aug 2025, A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks, https://arxiv.org/abs/2508.18803
Jooyoung Lee, Xiaochen Zhu, Georgi Karadzhov, Tom Stafford, Andreas Vlachos, Dongwon Lee, 26 Aug 2025, Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems, https://arxiv.org/abs/2503.04945
Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Rui Pan, Yujia Yang, Congzhang Shao, Yuewen Liu, Jinglin Li, 27 Aug 2025, Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception, https://arxiv.org/abs/2508.19638
Jaeman Son, Hyunsoo Kim, 28 Aug 2025, Human-AI Collaborative Bot Detection in MMORPGs, https://arxiv.org/abs/2508.20578
Jiaxi Huang, Yan Huang, Yixian Zhao, Wenchao Meng, Jinming Xu, 28 Aug 2025, CoCoL: A Communication Efficient Decentralized Collaborative Method for Multi-Robot Systems, https://arxiv.org/abs/2508.20898
Yeawon Lee, Xiaoyang Wang, Christopher C. Yang, 29 Aug 2025, Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture, https://arxiv.org/abs/2508.21803
Jacob Eisenstein and Reza Aghajani and Adam Fisch and Dheeru Dua and Fantine Huot and Mirella Lapata and Vicky Zayats and Jonathan Berant, 28 Aug 2025, Don't lie to your friends: Learning what you know from collaborative self-play, https://arxiv.org/abs/2503.14481
Guillermo Villate-Castillo, Javier Del Ser, Borja Sanz, 29 Aug 2025, A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement, https://arxiv.org/abs/2411.04090
Li Dengjin and Guo Yanming and Xie Yuxiang and Li Zheng and Chen Jiangming and Li Xiaolong and Lao Mingrui, 27 Aug 2025, Learning from Peers: Collaborative Ensemble Adversarial Training, https://arxiv.org/abs/2509.00089
Caterina Fuster-Barcelo, Gonzalo R. Rios-Munoz, and Arrate Munoz-Barrutia, 2 Sep 2025, Scaffolding Collaborative Learning in STEM: A Two-Year Evaluation of a Tool-Integrated Project-Based Methodology, https://arxiv.org/abs/2509.02355
Peiwen Xing, Aske Plaat, Niki van Stein, 29 Aug 2025, CoComposer: LLM Multi-agent Collaborative Music Composition, https://arxiv.org/abs/2509.00132
Donald Loveland, Xinyi Wu, Tong Zhao, Danai Koutra, Neil Shah, Mingxuan Ju, 1 Sep 2025, Understanding and Scaling Collaborative Filtering Optimization from the Perspective of Matrix Rank, https://arxiv.org/abs/2410.23300
Donald Loveland, Mingxuan Ju, Tong Zhao, Neil Shah, Danai Koutra, 31 Aug 2025, On the Role of Weight Decay in Collaborative Filtering: A Popularity Perspective, https://arxiv.org/abs/2505.11318
Sean P. Walton, Ben J. Evans, Alma A. M. Rahat, James Stovold, Jakub Vincalek, 3 Sep 2025, From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative Design, https://arxiv.org/abs/2402.07911
Abhijnan Nath, Carine Graff and Nikhil Krishnaswamy, 7 Sep 2025, Let's Roleplay: Examining LLM Alignment in Collaborative Dialogues, https://arxiv.org/abs/2509.05882
Dake Chen, Haoyang Zhang, Hanbin Wang, Yunhao Huo, Yuzhao Li, Junjie Wang, 7 Sep 2025, GameGPT: Multi-agent Collaborative Framework for Game Development, https://arxiv.org/abs/2310.08067
Jiayi Miao, Dingxin Lu, Zhuqi Wang, 10 Sep 2025, A Multimodal RAG Framework for Housing Damage Assessment: Collaborative Optimization of Image Encoding and Policy Vector Retrieval, https://arxiv.org/abs/2509.09721
Sasi Kiran Gaddipati, Farhana Keya, Gollam Rabby, S\"oren Auer, 14 Sep 2025, AIssistant: An Agentic Approach for Human--AI Collaborative Scientific Work on Reviews and Perspectives in Machine Learning, https://arxiv.org/abs/2509.12282
Yuting Liu, Qiang Zhou, Hanzhe Li, Chenqi Gong, Jingjing Gu, 15 Sep 2025, C3DE: Causal-Aware Collaborative Neural Controlled Differential Equation for Long-Term Urban Crowd Flow Prediction, https://arxiv.org/abs/2509.12289
William van den Bogert, Madhavan Iyengar, Nima Fazeli, 16 Sep 2025, Built Different: Tactile Perception to Overcome Cross-Embodiment Capability Differences in Collaborative Manipulation, https://arxiv.org/abs/2409.14896
Yichen Han, Bojun Liu, Zhengpeng zhou, Guanyu Liu, Zeng Zhang, Yang Yang, Wenli Wang, Isaac N Shi, Yunyan, Lewei He, Tianyu Shi, 14 Sep 2025, MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization, https://arxiv.org/abs/2509.11361
Yifan Liu, Yaokun Liu, Zelin Li, Zhenrui Yue, Gyuseok Lee, Ruichen Yao, Yang Zhang, Dong Wang, 22 Aug 2025, Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation, https://arxiv.org/abs/2509.10468
Hongliang Li, Jinan Xu, Gengping Cui, Changhao Guan, Fengran Mo, Kaiyu Huang, 15 Sep 2025, Multilingual Collaborative Defense for Large Language Models, https://arxiv.org/abs/2505.11835
Tao Yang, Xuefeng Jiang, Wei Li, Peiyu Liu, Jinming Wang, Weijie Hao, Qiang Yang, 18 Sep 2025, Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks, https://arxiv.org/abs/2204.09942
Harper Reed, Michael Sugimura, Angelo Zangari, 16 Sep 2025, AI Agents with Human-Like Collaborative Tools: Adaptive Strategies for Enhanced Problem-Solving, https://arxiv.org/abs/2509.13547
Youngbin Choi, Seunghyuk Cho, Minjong Lee, MoonJeong Park, Yesong Ko, Jungseul Ok, Dongwoo Kim, 17 Sep 2025, CoPL: Collaborative Preference Learning for Personalizing LLMs, https://arxiv.org/abs/2503.01658
Jeongeun Lee, Seongku Kang, Won-Yong Shin, Jeongwhan Choi, Noseong Park, Dongha Lee, 17 Sep 2025, Towards Unified and Adaptive Cross-Domain Collaborative Filtering via Graph Signal Processing, https://arxiv.org/abs/2407.12374

Consensus Decoding

Consensus decoding is a type of collaborative inference where multiple models must form a "consensus" for the predicted output token. The idea is that two or more models perform inference independently, each predicting token probabilities, and then their results are combined to output a "best" token. Note that this differs from approaches such as speculative decoding (or other more generalized types of collaborative inference), where the two models affect each other's inference in progress.

Research papers on consensus decoding include:

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen, Dec 2023, Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation, https://arxiv.org/abs/2311.09049 Code: https://github.com/RUCAIBox/LC-Rec/
Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk, Nov 2023, Adaptive Early Exiting for Collaborative Inference over Noisy Wireless Channels, https://arxiv.org/abs/2311.18098 (Early exiting combined with collaborative inference.)
Adam Pauls, John DeNero and Dan Klein, 2009, Consensus Training for Consensus Decoding in Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1418–1427, https://aclanthology.org/D09-1147.pdf
Nir Shlezinger; Erez Farhan; Hai Morgenstern; Yonina C. Eldar, 2021, Collaborative Inference via Ensembles on the Edge, ICASSP 2021, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), https://ieeexplore.ieee.org/abstract/document/9414740
Nir Shlezinger; Ivan V. Bajić, 2022, Collaborative Inference for AI-Empowered IoT Devices, IEEE Internet of Things Magazine (Volume: 5, Issue: 4, December 2022), https://ieeexplore.ieee.org/abstract/document/10012474
Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia, 7 May 2024, Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks, https://arxiv.org/abs/2405.04249
David Spuler, March 2024, Chapter 54. Ensemble Multi-Model Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Gengrui Zhang, Shiquan Zhang, Michail Bachras, Yuqiu Zhang, Hans-Arno Jacobsen, 11 Mar 2025, Cabinet: Dynamically Weighted Consensus Made Fast, https://arxiv.org/abs/2503.08914
Luyao Tang, Kunze Huang, Chaoqi Chen, Yuxuan Yuan, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang, 14 Aug 2025, Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction, https://arxiv.org/abs/2508.10731
Shijun Guo, Haoran Xu, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyi Zhang, Yishan Song, Jiwei Chen, 11 Jul 2025, H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance, https://arxiv.org/abs/2507.13370
Myeung Suk Oh, Zhiyao Zhang, FNU Hairi, Alvaro Velasquez, Jia Liu, 9 Aug 2025, Consensus-based Decentralized Multi-agent Reinforcement Learning for Random Access Network Optimization, https://arxiv.org/abs/2508.07001
Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, and Sara Beery, 31 Jul 2025, Consensus-Driven Active Model Selection, https://arxiv.org/abs/2507.23771
Cathy Speed, Ahmed A. Metwally, 12 Aug 2025, The Human-AI Hybrid Delphi Model: A Structured Framework for Context-Rich, Expert Consensus in Complex Domains, https://arxiv.org/abs/2508.09349
Saksham Arora, 22 Aug 2025, Consensus Is All You Need: Gossip-Based Reasoning Among Large Language Models, https://arxiv.org/abs/2508.18292
Afan Ali and Irfanullah Khan, 26 Aug 2025, SkyTrust: Blockchain-Enhanced UAV Security for NTNs with Dynamic Trust and Energy-Aware Consensus, https://arxiv.org/abs/2508.18735
Polina Gordienko, Christoph Jansen, Thomas Augustin, Martin Rechenauer, 26 Aug 2025, Consensus in Motion: A Case of Dynamic Rationality of Sequential Learning in Probability Aggregation, https://arxiv.org/abs/2504.14624
Alexandra Fetsch, Iurii Savvateev, Racem Ben Romdhane, Martin Wiedmann, Artemiy Dimov, Maciej Durkalec, Josef Teichmann, Jakob Zinsstag, Konstantinos Koutsoumanis, Andreja Rajkovic, Jason Mann, Mauro Tonolla, Monika Ehling-Schulz, Matthias Filter, Sophia Johler, 12 Sep 2025, Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building, https://arxiv.org/abs/2509.09906
Yu Cui and Hang Fu and Haibin Zhang and Licheng Wang and Cong Zuo, 14 Sep 2025, Free-MAD: Consensus-Free Multi-Agent Debate, https://arxiv.org/abs/2509.11035
Ankur Samanta, Akshayaa Magesh, Youliang Yu, Runzhe Wu, Ayush Jain, Daniel Jiang, Boris Vidolov, Paul Sajda, Yonathan Efroni, Kaveh Hassani, 18 Sep 2025, Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment, https://arxiv.org/abs/2509.15172