Aussie AI

Training Optimization

Last Updated 22 October, 2025

by David Spuler, Ph.D.

Training is very expensive, leading to a rise in papers on optimization of model training methods. Training cost is typically many multiples of inference, but obviously the total inference cost can overshadow training cost given enough users. Nevertheless, the total cost of training to the industry is likely to remain high, since almost all use cases require not only initial training, but also ongoing fine-tuning and re-training.

Research on training algorithms in general:

Unsupervised learning
Reinforceument Learning from Human Feedback (RLHF)
In-context learning (ICL)
Direct Preference Optimization (DPO)
Self-supervised learning (automated AI Feedback)
Human-In-The-Loop (HITL)

General concepts in LLM reasoning and model capabilities:

Information on improving the accuracy and/or speed of training algorithms:

Training speed optimizations
Distributed training
Federated Learning
Loss functions
Gradient optimizers
Early dropout
Network optimizations
Training costs

Improvements in resiliency of the training infastructure for multi-GPU clusters in data centers:

Research on the data used in pre-training:

Modified types of pre-training for models:

Fine-tuning methods include:

Fine-tuning (traditional full-parameter)
Parameter-Efficient Fine-Tuning (PEFT)
LoRA (low-rank adapters)
QLoRA (quantized LoRA)
Multi-LoRA
Post-Optimization Fine-Tuning (POFT) (e.g., after quantization, pruning)

Lesser-known alternatives to fine-tuning being researched for improving model capabilities that require only a single inference step, but may also require a short training-like phase:

Prompt tuning (extended vocabulary PEFT, typically with extra soft tokens prepended to prompt)
Decoding-based reasoning in single inference step (e.g., tree decoding)

Retrieval-based alternatives to fine-tuning for extra LLM capabilities and intelligence/accuracy (without requiring any extra training):

Plug-ins (data source integrations)
RAG
RALM (generalized retrieval)
TAG (database table data)
Agent architectures (read/write capabilities)
Agentic RAG
Agentic workflow (multi-agent, multi-step)

Non-retrieval methods of giving LLMs additional context information for their inference queries, but only with a single inference query (and without traditional RAG-type data retrieval):

Tool usage
TALM
Inference "hooks"

Prompt engineering enhancements to LLM capabilities (single-step):

Basic prompting methods (e.g., examples, formatting)
Step-by-step prompting
Emotional prompting
Least-to-Most
Self-ask prompting
Concise prompting

Advanced topics in prompt engineering (single-shot):

Prompt optimization techniques
Programmatic prompt optimization (auto-prompting)
Advanced prompt engineering (overview)

Inference-based reasoning algorithms with multiple steps combining prompt engineering and inference processing of queries:

Chain-of-Thought (COT)
Tree-of-Thought
Skeleton of Thought
ReAct (Reason-and-Act)
Self-reflection (often just called "reflection")
LLM as Judge
Best of N (BoN)
Multi-step inference for reasoning (overview)

Addressing limitations of model intelligence:

Other directions for model intelligence:

Planning
Followup questions
Interactive prompting
Program execution models (e.g., LLM generates Python code to run)
Symbolic reasoning
Concept models ("large concept models" or LCMs)

Survey Papers on Training Optimizations

Survey papers on speeding up training:

Yarally T, Cruz L, Feitosa D, et al (2023), Uncovering energy-efficient practices in deep learning training: Preliminary steps towards green AI. International Conference on AI Engineering - Software Engineering for AI (CAIN), https://arxiv.org/abs/2303.13972
A. Apicella, F. Donnarumma, F. Isgrò, and R. Prevete, A survey on modern trainable activation functions, Neural Networks, vol. 138, pp.14–32, 2021, https://arxiv.org/abs/2005.00817 (Extensive survey all about training with activation functions, e.g. RELU, Swish, Maxout, leaky RELU.)
R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/ (Survey of on-device training for TinyML/edge computing.)
P Freire, E Manuylovich, JE Prilepsky, SK Turitsyn, 2023, Artificial neural networks for photonic applications—from algorithms to implementation: tutorial, Advances in Optics and Photonics, Sep 2023, https://opg.optica.org/directpdfaccess/f0ae8746-2f89-4ac4-bb598eda29c7977c_539680/aop-15-3-739.pdf?da=1&id=539680&seq=0&mobile=no (Large survey covering many aspects of the future of training optimization.)
Marcos Treviso, Tianchu Ji, Ji-Ung Lee, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Pedro H. Martins, Andre F. T. Martins, Pe- ´ ter Milder, Colin Raffel, Edwin Simpson, Noam Slonim, Niranjan Balasubramanian, Leon Derczynski, Roy Schwartz, Aug 2022, Efficient Methods for Natural Language Processing: A Survey. arxiv:2209.00099[cs], August 2022. http://arxiv.org/abs/2209.00099
MM YAPICI, N Topaloğlu, 2021, Computers and Informatics, Performance comparison of deep learning frameworks https://dergipark.org.tr/en/pub/ci/issue/60236/769457, PDF: https://dergipark.org.tr/en/download/article-file/1201877 (Examines Torch, Theano, Caffe, Caffe2, MXNet, Keras, TensorFlow, and CNTK frameworks in terms of training speed.)
H. Jahangir, S. K. Goel and S. Khurana, "Scaling Up the Transformers: A Survey of Training and Inference Optimization Techniques," 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Greater Noida, India, 2024, pp. 1-6, doi: 10.1109/ICEECT61758.2024.10739061. https://ieeexplore.ieee.org/abstract/document/10739061
Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
R Abdulkadirov, P Lyakhov, N Nagornov, 2023, Survey of Optimization Algorithms in Modern Neural Networks https://www.mdpi.com/2227-7390/11/11/2466 https://www.mdpi.com/2227-7390/11/11/2466/pdf
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
Zehao Xiao, Cees G. M. Snoek, 6 Nov 2024, Beyond Model Adaptation at Test Time: A Survey. https://arxiv.org/abs/2411.03687
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, Jie Tang, 23 Jan 2025, Parameter-Efficient Fine-Tuning for Foundation Models, https://arxiv.org/abs/2501.13787

Training Speed Optimizations

Papers with specific techniques for optimization of training in terms of throughput, latency or processing speed, rather than accuracy or perplexity of results (chosen out of literally thousands):

Campos, V., Jou, B., i Nieto, X. G., Torres, J., and Chang, S.-F. (2018). Skip RNN: Learning to skip state updates in recurrent neural networks. In International Conference on Learning Representations. https://openreview.net/forum?id=HkwVAXyCW
Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi, 2023, SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks, https://arxiv.org/abs/2309.00255 (Generalization of multi-dimensional pruning, by training a large neural network with many sub-networks across different width and depth dimensions.)
W. Jung, D. Jung, B. Kim, S. Lee, W. Rhee, and J. Ahn, “Restructuring Batch Normalization to Accelerate CNN Training,” in The Conference on Systems and Machine Learning, 2019, https://arxiv.org/abs/1807.01702
EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian O Gordon, HV Habi, A Netzer, arXiv preprint arXiv:2309.11531, 2023, https://arxiv.org/pdf/2309.11531.pdf Code: https://github.com/sony/model_optimization
Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
S Tuli, NK Jha, 2023, TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference, IEEE Transactions on Computer-Aided Design, https://ieeexplore.ieee.org/abstract/document/10144614/, https://arxiv.org/pdf/2303.14882
M. Mathieu, M. Henaff, and Y. LeCun, 2014, “Fast training of convolutional networks through FFTs,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., https://arxiv.org/abs/1312.5851
D Zhu, N Yang, L Wang, Y Song, W Wu, F Wei, 2023, PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training https://arxiv.org/abs/2309.10400
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism, http://arxiv.org/abs/1811.06965
Jonas Geiping, Tom Goldstein, Dec 2022, Cramming: Training a Language Model on a Single GPU in One Day, https://arxiv.org/abs/2212.14034 Code: https://github.com/JonasGeiping/cramming (Note: uses Pytorch nvFuser deep learning compiler, which seems to be deprecated now.)
Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Yong Wu, Sameh Gobriel, Charlie Tai, Anshumali Shrivastava, Mar 2021, Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More, https://arxiv.org/abs/2103.10891, Code: https://github.com/RUSH-LAB/SLIDE (Fast training on CPUs using AVX-512 and locality-sensitive hashing of vectors.)
GY Lee, T Dam, MM Ferdaus, DP Poenar, VN Duong, Oct 2023, Unlocking the capabilities of explainable fewshot learning in remote sensing, https://arxiv.org/pdf/2310.08619.pdf
Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, and Xipeng Qiu, June 2023, Full parameter fine-tuning for large language models with limited resources, arXiv preprint arXiv:2306.09782, https://arxiv.org/abs/2306.09782 (Fused gradient computation and parameter update saves memory in training kernel by not saving the gradient tensor in memory.)
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
Benjue Weng, 13 Apr 2024, Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies, https://arxiv.org/abs/2404.09022 (Reviewing fine-tuning of large models.)
Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang, 2024, Look Ahead or Look Around? ATheoretical Comparison Between Autoregressive and Masked Pretraining, https://openreview.net/pdf?id=2rPoTgEmjV Code: https://github.com/PKU-ML/LookAheadLookAround (Evaluates autoregressive and masked methods in training.)
Haikuo Shao; Jinming Lu; Meiqi Wang; Zhongfeng Wang, 2023, An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization, IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Early Access), https://ieeexplore.ieee.org/document/10251161
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu, Nov 2023, Initializing Models with Larger Ones, https://arxiv.org/abs/2311.18823 Code: https://github.com/OscarXZQ/weight-selection
Noam Shazeer, Mitchell Stern, Apr 2018, Adafactor: Adaptive Learning Rates with Sublinear Memory Cost, https://arxiv.org/abs/1804.04235
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, Feb 2018, Mixed Precision Training, https://arxiv.org/abs/1710.03740
M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro, “Megatron-LM: Training multi-billion parameter language models using model parallelism,” arXiv preprint arXiv:1909.08053, 2019, https://arxiv.org/abs/1909.08053
Ruixiang Tang, Dehan Kong, Longtao Huang, Hui Xue May 2023 Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning, https://arxiv.org/abs/2305.17256
Diana Hu, 29/03/2024, Building AI Models is faster and cheaper than you probably think, Y Combinator, https://www.ycombinator.com/blog/building-ai-models
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu, 23 Feb 2024, MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, https://arxiv.org/abs/2402.15627
Carlo Nicolini, Jacopo Staiano, Bruno Lepri, Raffaele Marino, 13 Mar 2024, The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models, https://arxiv.org/abs/2403.08739 (Understanding how LLM parameters change over time during training.)
Truong Giang Do, Le Huy Khiem, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Binh T. Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven HOI, Oct 2023, HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts, EMNLP 2023 Conference, https://openreview.net/forum?id=fL8AKDvELp Code: https://github.com/giangdip2410/hyperrouter
S Guo, J Xu, LL Zhang, M Yang, Oct 2023, Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models, arXiv preprint arXiv:2310.05015, https://arxiv.org/pdf/2310.05015.pdf Code: https://github.com/microsoft/Moonlit/tree/main/Compresso
H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. In International Conference on Learning Representations, September 2019. https://openreview.net/forum?id=Syx4wnEtvH
Shar Narasimhan. NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI, August 2019. https://developer.nvidia.com/blog/training-bert-with-gpus/
R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
R Abdulkadirov, P Lyakhov, N Nagornov, 2023, Survey of Optimization Algorithms in Modern Neural Networks https://www.mdpi.com/2227-7390/11/11/2466 https://www.mdpi.com/2227-7390/11/11/2466/pdf
David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Kirill Kolodiazhnyi, May 15, 2020, Hands-On Machine Learning with C++: Build, train, and deploy end-to-end machine learning and deep learning pipelines, https://www.amazon.com/Hands-Machine-Learning-end-end/dp/1789955335/
Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
Adi Gangidi, KR Kishore, Jenya Lee, June 12, 2024, How Meta trains large language models at scale, Meta Research, https://engineering.fb.com/2024/06/12/data-infrastructure/training-large-language-models-at-scale-meta/
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng, 25 Jan 2024, Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing, https://arxiv.org/abs/2312.14472 (Dynamic routing based on easy vs hard queries to optimize training.)
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane, 19 Jul 2024 (v2), The Future of Large Language Model Pre-training is Federated, https://arxiv.org/abs/2405.10853
Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu, 23 Aug 2024, Memory-Efficient LLM Training with Online Subspace Descent, https://arxiv.org/abs/2408.12857 https://github.com/kyleliang919/Online-Subspace-Descent
Sophia R. Cunningham,Dominique Archambault,Austin Kung, May 2024, Efficient Training and Inference: Techniques for Large Language Models Using Llama, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171651876.65094225/v1
Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou, 23 Aug 2024, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 (Training using low-rank matrices to approximate attention.)
Agarwal, Saurabh, Aug 2024, Minimizing Data Movement in Machine Learning Systems, Ph.D. Thesis, Computer Sciences, University of Wisconsin--Madison, https://digital.library.wisc.edu/1711.dl/MKLIYRPB24A5R9D https://search.library.wisc.edu/digital/AMKLIYRPB24A5R9D PDF: https://asset.library.wisc.edu/1711.dl/QXSTVAIXECHQA8L/R/file-62b54.pdf?dl https://www.proquest.com/openview/c1ae2a92106d7ec681a7296cd163e0c1/1 (Dataflow optimization in training and also "clustered head attention" for memory-efficient inference, an extension of multi-head attention similar to layer-wise head fusion/pruning.)
Jaime Sevilla Edu Roldán, May 28, 2024, Training Compute of Frontier AI Models Grows by 4-5x per Year, Epoch AI blog, https://epochai.org/blog/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year
Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu, Dec 2023, Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models, https://arxiv.org/abs/2311.03687 (Benchmarks model speed for training, fine-tuning and inference with various optimizations such as ZeRO, quantization, offloading/recomputation, and Flash Attention.)
Ari Lotter, Jeffrey Quesnelle, Umer H. Adil, Dillon Rolnick, Esteban La Rocca, A Preliminary Report on Distro, 2024, https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf https://venturebeat.com/wp-content/uploads/2024/08/A_Preliminary_Report_on_DisTrO.pdf (Reducing the inter-GPU networking bandwidth cost during training.)
WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai, 22 Aug 2024, Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters, https://arxiv.org/abs/2408.12596
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos, 9 Oct 2024, TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training, https://arxiv.org/abs/2410.06511
Byron (Pin-Lun)Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, 14 Oct 2024, Liger Kernel: Efficient Triton Kernels for LLM Training, https://arxiv.org/abs/2410.10989 http://github.com/linkedin/Liger-Kernel
Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar, 24 Oct 2024, A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs, https://arxiv.org/abs/2410.18779
Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri, 25 Oct 2024, Computational Bottlenecks of Training Small-scale Large Language Models, https://arxiv.org/abs/2410.19456
Wasim Rajput, Oct 30, 2024, Developing Large Language Models (LLMs): A Step-by-Step Guide from Concept to Deployment. How LLMs like ChatGPT, Gemini, and Others are Developed, https://medium.com/the-generator/from-concept-to-deployment-a-practical-guide-to-developing-large-language-models-llms-d60b5841cade
Zehao Xiao, Cees G. M. Snoek, 6 Nov 2024, Beyond Model Adaptation at Test Time: A Survey. https://arxiv.org/abs/2411.03687
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Sebastian Raschka, October 29, 2024, Build a Large Language Model (From Scratch), Manning, https://github.com/rasbt/LLMs-from-scratch https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
Hao Ge, Fangcheng Fu, Haoyang Li, Xuanyu Wang, Sheng Lin, Yujie Wang, Xiaonan Nie, Hailin Zhang, Xupeng Miao, and Bin Cui. 2024. Enabling Parallelism Hot Switching for Efficient Training of Large Language Models. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP '24). Association for Computing Machinery, New York, NY, USA, 178–194. https://doi.org/10.1145/3694715.3695969 https://dl.acm.org/doi/abs/10.1145/3694715.3695969
Erik Wijmans, Brody Huval, Alexander Hertzberg, Vladlen Koltun, Philipp Krähenbühl, 13 Nov 2024, Cut Your Losses in Large-Vocabulary Language Models, https://arxiv.org/abs/2411.09009 https://github.com/apple/ml-cross-entropy (Memory-efficient computation of cross-entropy in training.)
R. Li, D. Fu, C. Shi, Z. Huang and G. Lu, "Efficient LLMs Training and Inference: An Introduction," in IEEE Access, doi: 10.1109/ACCESS.2024.3501358. https://ieeexplore.ieee.org/abstract/document/10756602 https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10756602
Nir Barazida, Mar 9, 2022, Distributed training of deep learning models: handling stragglers and latency in synchronous training A review of the challenges in Synchronous distributed training and best solutions for stragglers and high latency https://towardsdatascience.com/stragglers-and-latency-in-synchronous-distributed-training-of-deep-learning-models-43783b0266d9
Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz, 21 Mar 2017 (v3), Revisiting Distributed Synchronous SGD, https://arxiv.org/abs/1604.00981
Palak (Microsoft Research India), Rohan Gandhi (Microsoft Research India), Karan Tandon (Microsoft Research India), Debopam Bhattacherjee (Microsoft Research India), Venkata N. Padmanabhan (Microsoft Research India), 16 Nov 2024, Improving training time and GPU utilization in geo-distributed language model training, https://arxiv.org/abs/2411.14458
Chenghao Hu and Baochun Li. 2024. Menos: Split Fine-Tuning Large Language Models with Efficient GPU Memory Sharing. In Proceedings of the 25th International Middleware Conference (MIDDLEWARE '24). Association for Computing Machinery, New York, NY, USA, 185–198. https://doi.org/10.1145/3652892.3700758 https://dlnext.acm.org/doi/10.1145/3652892.3700758 https://iqua.ece.toronto.edu/papers/chenghao-middleware24.pdf
Carl Franzen, August 27, 2024, ‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency, https://venturebeat.com/ai/this-could-change-everything-nous-research-unveils-new-tool-to-train-powerful-ai-models-with-10000x-efficiency/
Carl Franzen, December 2, 2024, Nous Research is training an AI model using machines distributed across the internet, https://venturebeat.com/ai/nous-research-is-training-an-ai-model-using-machines-distributed-across-the-internet/
Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, Xuanyu Wang, Jiawen Niu, Jie Jiang, Bin Cui, 10 Dec 2024, Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences, https://arxiv.org/abs/2412.07894
Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi, 31 Dec 2024, 2 OLMo 2 Furious, https://arxiv.org/abs/2501.00656
Zongbiao Li , Xiezhao Li , Yinghao Cui , Yijun Chen , Zhixuan Gu , Yuxuan Liu , Wenbo Zhu , Fei Jia , Ke Liu , Qifeng Li , Junyao Zhan , Jiangtao Zhou , Chenxi Zhang , Qike Liu, 31 Dec 2024, Automatically Planning Optimal Parallel Strategy for Large Language Models, https://arxiv.org/abs/2501.00254
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu, 14 Jan 2025, Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models, https://arxiv.org/abs/2501.08453 (Efficient training of text-to-video models.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
https://theses.hal.science/tel-04890912/file/ZHAO_XUNYI_2024.pdf Xunyi Zhao. Optimizing Memory Usage when Training Deep Neural Networks. Computer Science [cs]. Université de Bordeaux, France, 2024. English. NNT: 2024BORD0411 . tel-04890912
Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
Tanya Rodchenko, Natasha Noy, Nino Scherrer, Jennifer Prendki, 23 Jan 2025, Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling, https://arxiv.org/abs/2501.13779
Tech Fund, Feb 03, 2025, The Winners from DeepSeek, Nvidia, and The Outlook in AI: A tour of the space & AI-exposed stocks, https://www.techinvestments.io/p/the-winners-from-deepseek-nvidia
Thor Olavsrud, How DeepSeek changes the gen AI equation for CIOs, 30 Jan 2025, https://www.cio.com/article/3813555/what-cios-should-learn-now-that-deepseek-is-here.html (" the future of gen AI lies in innovative, cost-efficient approaches")
Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
Di Chai, Pengbo Li, Feiyuan Zhang, Yilun Jin, Han Tian, Junxue Zhang, Kai Chen, 1 Feb 2025, Enhancing Token Filtering Efficiency in Large Language Model Training with Collider, https://arxiv.org/abs/2502.00340 (Token reduction in training.)
XYZ Labs, Feb 23, 2025, Open Reasoner Zero: A Breakthrough in AI Training Efficiency Matches DeepSeek with Just 1/30th of Training Steps. Major AI Figures Including Kai-Fu Lee, Harry Shum, and Xiangyu Zhang Unveil Revolutionary Open-Source Training Method. https://xyzlabs.substack.com/p/open-reasoner-zero-a-breakthrough
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
J Lin, Z Liu, Y You, J Wang, W Zhang, R Zhao, 2025, WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training, PPoPP ’25, March 1–5, 2025, Las Vegas, NV, USA, https://dl.acm.org/doi/pdf/10.1145/3710848.3710869 https://doi.org/10.1145/3710848.3710869
Eli Verwimp, Guy Hacohen, Tinne Tuytelaars, 28 Feb 2025, Same accuracy, twice as fast: continuous training surpasses retraining from scratch, https://arxiv.org/abs/2502.21147
Hao Ge, Junda Feng, Qi Huang, Fangcheng Fu, Xiaonan Nie, Lei Zuo, Haibin Lin, Bin Cui, Xin Liu, 28 Feb 2025, ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs, https://arxiv.org/abs/2502.21231 (Addressing training inefficiencies when training data ranges from short to very long queries, including via hybrid data parallelism and communications optimizations.)
Zhang, ZX., Wen, YB., Lyu, HQ. et al. AI Computing Systems for Large Language Models Training. J. Comput. Sci. Technol. 40, 6–41 (2025). https://doi.org/10.1007/s11390-024-4178-1 https://link.springer.com/article/10.1007/s11390-024-4178-1
Dr. Ashish Bamania, Apr 26, 2025, You Don’t Need Backpropagation To Train Neural Networks Anymore: A deep dive into the ‘NoProp’ algorithm that eliminates the need for Forward pass and Backpropagation to train neural networks, and learning to code it from scratch, https://ai.gopubby.com/you-dont-need-backpropagation-to-train-neural-networks-anymore-e989d75564cb
Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu, 19 May 2025 (v2), MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production,https://arxiv.org/abs/2505.11432
MiniMax: Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, (and many more authors), 16 Jun 2025, MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention, https://arxiv.org/abs/2506.13585 https://github.com/MiniMax-AI/MiniMax-M1 (A 456B MoE reasoning model trained with RL and has various optimizations in training efficiency and attention kernel.)
Michael Nuñez, July 11, 2025, Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free, https://venturebeat.com/ai/moonshot-ais-kimi-k2-outperforms-gpt-4-in-key-benchmarks-and-its-free/ (One trillion parameters with 32B experts activated each time. Examines new training optimizer MuonClip as more efficient and more stable than variants of AdamW for training.)
Carles Gelada, Jacob Buckman, Sean Zhang, Txus Bach, 6 Jul 2025, Scaling Context Requires Rethinking Attention, https://arxiv.org/abs/2507.04239
John Edwards, Jul 22, 2025 7 things you need to know about AI and the data center, https://www.cio.com/article/222623/7-things-to-know-about-ai-in-the-data-center.html
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin, 30 Nov 2023, Zero Bubble Pipeline Parallelism, https://arxiv.org/abs/2401.10241 https://github.com/sail-sg/zero-bubble-pipeline-parallelism
Joel Lamy-Poirier, 6 Jul 2023 (v2), Breadth-First Pipeline Parallelism https://arxiv.org/abs/2211.05953
MiniMax, 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
Sam McCandlish, Jared Kaplan, Dario Amodei, OpenAI Dota Team, 14 Dec 2018, An Empirical Model of Large-Batch Training, OpenAI, https://arxiv.org/abs/1812.06162
Cameron R. Wolfe, Ph.D., Apr 28, 2025, Llama 4: The Challenges of Creating a Frontier-Level LLM: The full story behind Llama 4 and Meta's huge pivot in research strategy, https://cameronrwolfe.substack.com/p/llama-4
Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, (many more authors), 11 Apr 2025 (v2), Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs, https://arxiv.org/abs/2504.07866 (135B model trained on 13.2T tokens using Ascend NPUs.)
Sean Goedecke, Aug 2025, What's the strongest AI model you can train on a laptop in five minutes? https://www.seangoedecke.com/model-on-a-mbp/
Ben Dickson, August 18, 2025, GEPA optimizes LLMs without costly reinforcement learning, https://venturebeat.com/ai/gepa-optimizes-llms-without-costly-reinforcement-learning/
Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
Fabian Schaipp, Alexander H\"agele, Adrien Taylor, Umut Simsekli, Francis Bach, 23 Jul 2025, The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training, https://arxiv.org/abs/2501.18965
Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, Aleksandar Samard\v{z}i\'c, 21 Jul 2025, TorchAO: PyTorch-Native Training-to-Serving Model Optimization, https://arxiv.org/abs/2507.16099
Philip Zmushko, Aleksandr Beznosikov, Martin Tak\'a\v{c}, Samuel Horv\'ath, 14 Aug 2025, FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training, https://arxiv.org/abs/2411.07837
Yue Hu and Zanxia Cao and Yingchao Liu, 26 Jul 2025, Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training, https://arxiv.org/abs/2507.19968
Mayumi Nakano, Yuya Seki, Shuta Kikuchi, Shu Tanaka, 28 Jul 2025, Optimization Performance of Factorization Machine with Annealing under Limited Training Data, https://arxiv.org/abs/2507.21024
Jiayi Tian, Jinming Lu, Hai Li, Xiangwei Wang, Cong Hao, Ian Young, Zheng Zhang, 6 Aug 2025, Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization, https://arxiv.org/abs/2501.06663
Jiaqi Zhao, Weili Guan, Ming Li, Miao Zhang, 6 Aug 2025, Boost Post-Training Quantization via Null Space Optimization for Large Language Models, https://arxiv.org/abs/2506.11044
Ziyin Gu, Jingyao Wang, Ran Zuo, Chuxiong Sun, Zeen Song, Changwen Zheng, Wenwen Qiang, 7 Aug 2025, Group Causal Policy Optimization for Post-Training Large Language Models, https://arxiv.org/abs/2508.05428
Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnaw\'e, Yann Pequignot, Alexandre Larouche, Christian Gagn\'e, Irina Rish, Ola Ahmad, Audrey Durand, 12 Aug 2025, A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy, https://arxiv.org/abs/2508.14079
Samiul Basir Bhuiyan, Md. Sazzad Hossain Adib, Mohammed Aman Bhuiyan, Muhammad Rafsan Kabir, Moshiur Farazi, Shafin Rahman, Nabeel Mohammed, 18 Aug 2025, Z-Pruner: Post-Training Pruning of Large Language Models for Efficiency without Retraining, https://arxiv.org/abs/2508.15828
Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao, 22 Aug 2025, Efficient RL Training for Reasoning Models via Length-Aware Optimization, https://arxiv.org/abs/2505.12284
Dharsan Ravindran, Kevin Wang, Zhuoyuan Cao, Saleh Abdelrahman, Jeffery Wu, 5 Sep 2025, Enhancing Self-Driving Segmentation in Adverse Weather Conditions: A Dual Uncertainty-Aware Training Approach to SAM Optimization, https://arxiv.org/abs/2509.04735
Lang Feng, Zhenghai Xue, Tingcong Liu, Bo An, 3 Sep 2025, Group-in-Group Policy Optimization for LLM Agent Training, https://arxiv.org/abs/2505.10978
Talha Tahir, 8 Sep 2025, The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization, https://arxiv.org/abs/2509.09712
Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan, 12 Sep 2025, Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective, https://arxiv.org/abs/2509.10371
Marco Mordacci and Michele Amoretti, 19 Sep 2025, Training Variational Quantum Circuits Using Particle Swarm Optimization, https://arxiv.org/abs/2509.15726
Arnab Kanti Tarafder and Yidong Gong and Pradeep Kumar, 16 Sep 2025, Optimization of GNN Training Through Half-precision, https://arxiv.org/abs/2411.01109
Ying Cao, Kun Yuan, Ali H. Sayed, 14 Sep 2025, On the Escaping Efficiency of Distributed Adversarial Training Algorithms, https://arxiv.org/abs/2509.11337
Chuan He, Zhanwang Deng, Zhaosong Lu, 15 Sep 2025, Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training, https://arxiv.org/abs/2509.11983
Andrei Chertkov, Artem Basharin, Mikhail Saygin, Evgeny Frolov, Stanislav Straupe, Ivan Oseledets, 18 Sep 2025, Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers, https://arxiv.org/abs/2509.15113
Kai Yi, 10 Sep 2025, Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization, https://arxiv.org/abs/2509.08233
Tom Almog, 16 Sep 2025, An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training, https://arxiv.org/abs/2509.13516

Fine-Tuning

Papers on fine-tuning optimizations:

Libo Qin, Qiguang Chen, Xiachong Feng, Yang Wu, Yongheng Zhang, Yinghui Li, Min Li, Wanxiang Che, Philip S. Yu, 21 May 2024, Large Language Models Meet NLP: A Survey, https://arxiv.org/abs/2405.12819 (A survey of research into how LLMs, with and without fine-tuning, perform in various NLP use cases, such as mathematical reasoning, dialogue understanding, translation, and more.)
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
Benjue Weng, 13 Apr 2024, Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies, https://arxiv.org/abs/2404.09022 (Reviewing fine-tuning of large models.)
Tal Peretz, 15 NOV 2023, The Developer's Guide to Production-Grade LLM Apps: Advanced Techniques for Maximizing LLM Performance, https://buildingaistuff.com/p/the-developers-guide-to-production
Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim, 18 Jan 2024, Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation, https://arxiv.org/abs/2401.08417
David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
Pranav Patel, 2024, In-depth guide to fine-tuning LLMs with LoRA and QLoRA, https://www.mercity.ai/blog-post/guide-to-fine-tuning-llms-with-lora-and-qlora
Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, Xipeng Qiu, 6 Jun 2024 (v2), Full Parameter Fine-tuning for Large Language Models with Limited Resources, https://arxiv.org/abs/2306.09782 Code: https://github.com/OpenLMLab/LOMO (Low-memory usage for full-parameter fine-tuning.)
Louis-François Bouchard, Louie Peters, May 2024, Chapter 10: Fine-Tuning, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
Valentina Alto, 2024, Chapter 11: Fine-Tuning Large Language Models, Building LLM-Powered Applications: Create intelligence apps and agents with large language models, Packt Publishing, https://www.amazon.com/Building-LLM-Apps-Intelligent-Language/dp/1835462316/
Aarushi Kansal, Chapter 5: Fine-Tuning: The Theory, Chapter 6: Fine-Tuning: Hands-On,, Building Generative AI-Powered Apps: A Hands-on Guide for Developers, Apress, https://www.amazon.com/Building-Generative-AI-Powered-Apps-Hands-ebook/dp/B0CTXXP1S4/
Xinji Mai, Zeng Tao, Junxiong Lin, Haoran Wang, Yang Chang, Yanlan Kang, Yan Wang, Wenqiang Zhang, 27 Jun 2024, From Efficient Multimodal Models to World Models: A Survey, https://arxiv.org/abs/2407.00118 (A survey of multimodal models with coverage of many optimization techniques.)
Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
Dan Peng, Zhihui Fu, Jun Wang, 1 Jul 2024, PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs, https://arxiv.org/abs/2407.01031 (Running fine-tuning on a smartphone via a low-memory optimization using a "derivative-free" "zeroth-order" technique called MeZo, with advantages such as privacy.)
OpenAI, August 20, 2024, Fine-tuning now available for GPT-4o, https://openai.com/index/gpt-4o-fine-tuning/
Judy Hanwen Shen, Inioluwa Deborah Raji, Irene Y. Chen, 8 Aug 2024, The Data Addition Dilemma, https://arxiv.org/abs/2408.04154
Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen, 28 May 2024 (v3) Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark, https://arxiv.org/abs/2402.11592 Code: https://github.com/ZO-Bench/ZO-LLM
Junjie Ye, Yuming Yang, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, 24 Sep 2024, Empirical Insights on Fine-Tuning Large Language Models for Question-Answering, https://arxiv.org/abs/2409.15825
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
Foundry AI, Oct 2024, When Should You Move Beyond Prompting and Start Fine-Tuning? https://thefoundryai.com/blog/fine-tuning
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
Towards AI, December 24, 2024, Llm Fine Tuning Guide: Do You Need It and How to Do It https://towardsai.net/p/artificial-intelligence/llm-fine-tuning-guide-do-you-need-it-and-how-to-do-it-4
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
Maxime Heuillet, Yufei Cui, Boxing Chen, Audrey Durand, Prasanna Parthasarathi, 13 Aug 2025, Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts, https://arxiv.org/abs/2508.10123
Tianjun Yuan, Jiaxiang Geng, Pengchao Han, Xianhao Chen, Bing Luo, 14 Aug 2025, Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models, https://arxiv.org/abs/2508.10349
Dongyue Li and Hongyang R. Zhang, 13 Aug 2025, Improved Regularization and Robustness for Fine-tuning in Neural Networks, https://arxiv.org/abs/2111.04578
Yanxia Deng, Aozhong Zhang, Selcuk Gurses, Naigang Wang, Zi Yang and Penghang Yin, 14 Aug 2025, CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization, https://arxiv.org/abs/2501.18475
Suhas G Hegde, Shilpy Kaur, Aruna Tiwari, 14 Aug 2025, VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models, https://arxiv.org/abs/2503.19530
Andrew P. Berg, Qian Zhang, Mia Y. Wang, 14 Aug 2025, 15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning, https://arxiv.org/abs/2506.11049
Sol\`ene Debuys\`ere, Nicolas Trouv\'e, Nathan Letheule, Olivier L\'ev\^eque, Elise Colin, 14 Aug 2025, Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Images, https://arxiv.org/abs/2506.13307
Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong, 23 Jul 2025, LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning, https://arxiv.org/abs/2506.15606
Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang, 22 Jul 2025, Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning, https://arxiv.org/abs/2507.16302
Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda, 22 Jul 2025, Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning, https://arxiv.org/abs/2507.16795
Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li, 22 Jul 2025, Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance, https://arxiv.org/abs/2407.17029
Furong Peng, Jinzhen Gao, Xuan Lu, Kang Liu, Yifan Huo, Sheng Wang, 22 Jul 2025, Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning, https://arxiv.org/abs/2506.17576
Binghua Li, Ziqing Chang, Tong Liang, Chao Li, Toshihisa Tanaka, Shigeki Aoki, Qibin Zhao, Zhe Sun, 24 Jul 2025, Parameter-Efficient Fine-Tuning of 3D DDPM for MRI Image Generation Using Tensor Networks, https://arxiv.org/abs/2507.18112
Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang, 24 Jul 2025, Zeroth-Order Fine-Tuning of LLMs in Random Subspaces, https://arxiv.org/abs/2410.08989
Tim Rensmeyer, Denis Kramer, Oliver Niggemann, 18 Jul 2025, On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach, https://arxiv.org/abs/2507.13805
Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal, 18 Jul 2025, GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention, https://arxiv.org/abs/2507.13598
Rafiq Kamel, Filippo Guerranti, Simon Geisler, Stephan G\"unnemann, 15 Jul 2025, SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation, https://arxiv.org/abs/2507.13381
Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Xiaolong Ma, Jaewoo Lee, Jin Lu, Geng Yuan, 18 Jul 2025, Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning, https://arxiv.org/abs/2502.03304
Harsh Nilesh Pathak and Randy Paffenroth, 18 Jul 2025, Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers, https://arxiv.org/abs/2507.14353
Fufang Wen and Shichang Zhang, 14 Jul 2025, Retention analysis of edited knowledge after fine-tuning, https://arxiv.org/abs/2507.14198
Yujia Tong, Jingling Yuan, Tian Zhang, Jianquan Liu, Chuang Hu, 19 Jul 2025, DFQ-ViT: Data-Free Quantization for Vision Transformers without Fine-tuning, https://arxiv.org/abs/2507.14481
Wooseok Ha, Yuansi Chen, 19 Jul 2025, When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts, https://arxiv.org/abs/2507.14661
Roy H. Jennings, Genady Paikin, Roy Shaul, Evgeny Soloveichik, 20 Jul 2025, Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression, https://arxiv.org/abs/2507.14997
Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang, 19 Jul 2025, Fine-Tuning Diffusion Generative Models via Rich Preference Optimization, https://arxiv.org/abs/2503.11720
Xingke Yang and Liang Li and Sicong Li and Liwei Guan and Hao Wang and Xiaoqi Qi and Jiang Liu and Xin Fu and Miao Pan, 9 Aug 2025, Fed MobiLLM: Efficient Federated LLM Fine-Tuning over Heterogeneous Mobile Devices via Server Assisted Side-Tuning, https://arxiv.org/abs/2508.06765
Brendan R. Hogan, Will Brown, Adel Boyarsky, Anderson Schneider, Yuriy Nevmyvaka, 9 Aug 2025, Technical Report: Full-Stack Fine-Tuning for the Q Programming Language, https://arxiv.org/abs/2508.06813
Amal Saadallah, Abdulaziz Al-Ademi, 11 Aug 2025, Adaptive Fine-Tuning via Pattern Specialization for Deep Time Series Forecasting, https://arxiv.org/abs/2508.07927
Bujar Raufi, 10 Aug 2025, Fine-Tuning Large Language Models Using EEG Microstate Features for Mental Workload Assessment, https://arxiv.org/abs/2508.07283
Zhaorui Tan, Tan Pan, Kaizhu Huang, Weimiao Yu, Kai Yao, Chen Jiang, Qiufeng Wang, Anh Nguyen, Xin Guo, Yuan Cheng, Xi Yang, 11 Aug 2025, Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification, https://arxiv.org/abs/2508.07577
Vishwas M. Shetty, Jiusi Zheng, Abeer Alwan, 11 Aug 2025, G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification, https://arxiv.org/abs/2508.07836
Xingke Yang and Liang Li and Zhiyi Wan and Sicong Li and Xiaoqi Qi and Jiang Liu and Tomoaki Ohtsuki and Xin Fu and Miao Pan, 9 Aug 2025, PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning, https://arxiv.org/abs/2507.01216
Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan, 11 Aug 2025, In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation, https://arxiv.org/abs/2409.07796
Qingguo Wang, 10 Aug 2025, Accurate Measles Rash Detection via Vision Transformer Fine-Tuning, https://arxiv.org/abs/2005.09112
Atharva Nijasure, Tanya Chowdhury, James Allan, 10 Aug 2025, How Relevance Emerges: Interpreting LoRA Fine-Tuning in Reranking LLMs, https://arxiv.org/abs/2504.08780
Yining Huang,Bin Li,Keke Tang,Meilian Chen, 28 Jul 2025, LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning, https://arxiv.org/abs/2507.20999
Roman Mach\'a\v{c}ek and Anastasiia Grishina and Max Hort and Leon Moonen, 26 Jul 2025, The Impact of Fine-tuning Large Language Models on Automated Program Repair, https://arxiv.org/abs/2507.19909
Fabrizio Nunnari, Alakshendra Jyotsnaditya Ramkrishna Singh, Patrick Gebhard, 27 Jul 2025, Color histogram equalization and fine-tuning to improve expression recognition of (partially occluded) faces on sign language datasets, https://arxiv.org/abs/2507.20197
Wei Lu, Daniel L. Chen, Christian B. Hansen, 28 Jul 2025, Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach, https://arxiv.org/abs/2507.20796
Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin, 28 Jul 2025, Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards, https://arxiv.org/abs/2505.16789
Yifu Han and Geo Zhang, 27 Jul 2025, Reinforcement learning fine-tuning of language model for instruction following and math reasoning, https://arxiv.org/abs/2506.21560
Zixuan Chen and Weikai Lu and Xin Lin and Ziqian Zeng, 27 Jul 2025, SDD: Self-Degraded Defense against Malicious Fine-tuning, https://arxiv.org/abs/2507.21182
Zengyang Li, Yimeng Li, Binbin Huang, Peng Liang, Ran Mo, Hui Liu, Yutao Ma, 29 Jul 2025, Fine-Tuning Code Language Models to Detect Cross-Language Bugs, https://arxiv.org/abs/2507.21954
Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi, 19 Jun 2025, Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing, https://arxiv.org/abs/2507.21084
Georg Slamanig, Francesco Corti, Olga Saukh, 31 Jul 2025, From LLMs to Edge: Parameter-Efficient Fine-Tuning on Edge Devices, https://arxiv.org/abs/2507.23536
Sirine Arfa, Bernhard Vogginger, Christian Mayr, 31 Jul 2025, Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform, https://arxiv.org/abs/2507.23562
Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang and Khaled Ben Letaief, 31 Jul 2025, Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks, https://arxiv.org/abs/2504.10403
Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong, 31 Jul 2025, H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity, https://arxiv.org/abs/2507.22633
Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel, 30 Jul 2025, ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology, https://arxiv.org/abs/2503.17564
Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen, 30 Jul 2025, Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning, https://arxiv.org/abs/2507.22565
Yebo Wu, Jingguang Li, Zhijiang Guo and Li Li, 31 Jul 2025, Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages, https://arxiv.org/abs/2508.00041
Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Anton van den Hengel, Ehsan Abbasnejad, 1 Aug 2025, Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product, https://arxiv.org/abs/2508.00230
Shayan Jalilian, Abdul Bais, 31 Jul 2025, SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters, https://arxiv.org/abs/2508.00213
Prerana Ramkumar, 1 Aug 2025, SU-ESRGAN: Semantic and Uncertainty-Aware ESRGAN for Super-Resolution of Satellite and Drone Imagery with Fine-Tuning for Cross Domain Evaluation, https://arxiv.org/abs/2508.00750
Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie Neubauer, 1 Aug 2025, Online Fine-Tuning of Carbon Emission Predictions using Real-Time Recurrent Learning for State Space Models, https://arxiv.org/abs/2508.00804
Derin Cayir, Renjie Tao, Rashi Rungta, Kai Sun, Sean Chen, Haidar Khan, Minseok Kim, Julia Reinspach, Yue Liu, 3 Aug 2025, Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning, https://arxiv.org/abs/2508.01543
Yixin Shen, 4 Aug 2025, Kronecker-LoRA: hybrid Kronecker-LoRA adapters for scalable, sustainable fine-tuning, https://arxiv.org/abs/2508.01961
Amitava Das, Abhilekh Borah, Vinija Jain, Aman Chadha, 4 Aug 2025, AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization, https://arxiv.org/abs/2508.02079
Yilun Liu, Yunpu Ma, Yuetian Lu, Shuo Chen, Zifeng Ding, Volker Tresp, 4 Aug 2025, Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules, https://arxiv.org/abs/2508.02587
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia, 4 Aug 2025, CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning, https://arxiv.org/abs/2508.02219
Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Aastha Verma, Natraj Raman, Sriram Gopalakrishnan, Niladri Chatterjee, Tanmoy Chakraborty, 3 Aug 2025, Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation, https://arxiv.org/abs/2411.04358
Yinbin Han, Meisam Razaviyayn, Renyuan Xu, 3 Aug 2025, Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence, https://arxiv.org/abs/2412.18164
Jack Chen, Fazhong Liu, Naruto Liu, Yuhan Luo, Erqu Qin, Harry Zheng, Tian Dong, Haojin Zhu, Yan Meng, Xiao Wang, 4 Aug 2025, Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs, https://arxiv.org/abs/2505.13026
Yidong Chai (1 and 2), Yang Liu (1 and 2), Yonghang Zhou (1 and 2), Jiaheng Xie (3), Daniel Dajun Zeng (4) ((1) School of Management, Hefei University of Technology, Hefei, China, (2) Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, China, (3) Department of Accounting and MIS, Lerner College of Business and Economics, University of Delaware, Newark, Delaware, U.S., (4) Institute of Automation, Chinese Academy of Sciences, Beijing, China), 31 Jul 2025, A Bayesian Hybrid Parameter-Efficient Fine-Tuning Method for Large Language Models, https://arxiv.org/abs/2508.02711
Jingyi Chen, Ju Seung Byun, Micha Elsner, Pichao Wang, Andrew Perrault, 5 Aug 2025, Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback, https://arxiv.org/abs/2508.03123
Yutong Chen, Jiandong Gao, Ji Wu, 5 Aug 2025, Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning, https://arxiv.org/abs/2505.17988
Joel Walsh, Siddarth Mamidanna, Benjamin Nye, Mark Core, and Daniel Auerbach, 6 Aug 2025, Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading, https://arxiv.org/abs/2508.04063
Ali Taheri Ghahrizjani, Alireza Taban, Qizhou Wang, Shanshan Ye, Abdolreza Mirzaei, Tongliang Liu, Bo Han, 6 Aug 2025, Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning, https://arxiv.org/abs/2508.04329
Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang, 19 Jul 2025, MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning, https://arxiv.org/abs/2508.03700
Yanjie Dong, Haijun Zhang, Chengming Li, Song Guo, Victor C. M. Leung, Xiping Hu, 6 Aug 2025, Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches, https://arxiv.org/abs/2408.10691
Bohao Wu, Qingyun Wang, Yue Guo, 6 Aug 2025, Explain Less, Understand More: Jargon Detection via Personalized Parameter-Efficient Fine-tuning, https://arxiv.org/abs/2505.16227
Mahdi Nazari Ashani, Ali Asghar Alesheikh, Saba Kazemi, Kimya Kheirkhah, Yasin Mohammadi, Fatemeh Rezaie, Amir Mahdi Manafi, Hedieh Zarkesh, 6 Aug 2025, Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS), https://arxiv.org/abs/2508.04846
Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens, 6 Aug 2025, Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning, https://arxiv.org/abs/2508.04848
Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
Dai Do, Manh Nguyen, Svetha Venkatesh, Hung Le, 7 Aug 2025, SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models, https://arxiv.org/abs/2508.05015
Zhongheng Yang, Aijia Sun, Yushang Zhao, Yinuo Yang, Dannier Li, Chengrui Zhou, 7 Aug 2025, RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders, https://arxiv.org/abs/2508.05289
Younwoo Choi, Muhammad Adil Asif, Ziwen Han, John Willes, Rahul G. Krishnan, 7 Aug 2025, Teaching LLMs How to Learn with Contextual Fine-Tuning, https://arxiv.org/abs/2503.09032
Jin Khye Tan (Faculty of Computer Science and Information Technology, Universiti Malaya), En Jun Choong, Ethan Jeremiah Chitty, Yan Pheng Choo, John Hsin Yang Wong, Chern Eu Cheah, 4 Aug 2025, Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports, https://arxiv.org/abs/2508.05669
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng, 6 Aug 2025, DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection, https://arxiv.org/abs/2508.05694
Han Gao, Timo Hartmann, Botao Zhong, Kai Lia, Hanbin Luo, 5 Aug 2025, Domain-Specific Fine-Tuning and Prompt-Based Learning: A Comparative Study for developing Natural Language-Based BIM Information Retrieval Systems, https://arxiv.org/abs/2508.05676
Jucheng Hu, Surong Yang, Lijun Wu, Dongzhan Zhou, 8 Aug 2025, DONOD: Efficient and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning, https://arxiv.org/abs/2504.14810
Mahmoud Salhab, Shameed Sait, Mohammad Abusheikh, Hasan Abusheikh, 12 Aug 2025, Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning, https://arxiv.org/abs/2508.08912
Dong Wang, Haris \v{S}iki\'c, Lothar Thiele, Olga Saukh, 12 Aug 2025, Forget the Data and Fine-Tuning! Just Fold the Network to Compress, https://arxiv.org/abs/2502.10216
Sajjad Ghiasvand and Haniyeh Ehsani Oskouie and Mahnoosh Alizadeh and Ramtin Pedarsani, 12 Aug 2025, Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, https://arxiv.org/abs/2505.15130
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong, 12 Aug 2025, Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning, https://arxiv.org/abs/2506.03850
Jan Tauberschmidt, Sophie Fellenz, Sebastian J. Vollmer, Andrew B. Duncan, 5 Aug 2025, Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems, https://arxiv.org/abs/2508.09156
Bokeng Zheng, Jianqiang Zhong, Jiayi Liu, Xiaoxi Zhang, 13 Aug 2025, Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks, https://arxiv.org/abs/2508.09532
Zainab Khan, Ahmed Hussain, Mukesh Thakur, Arto Hellas, and Panos Papadimitratos, 12 Aug 2025, NEFMind: Parameter-Efficient Fine-Tuning of Open-Source LLMs for Telecom APIs Automation, https://arxiv.org/abs/2508.09240
Basile Lewandowski, Robert Birke, Lydia Y. Chen, 14 Aug 2025, Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models, https://arxiv.org/abs/2508.10993
Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou, 15 Aug 2025, On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting, https://arxiv.org/abs/2508.11408
Baihong Qian, Haotian Fan, Wenjie Liao, Yunqiu Wang, Tao Li, and Junhui Cui, 15 Aug 2025, Better Supervised Fine-tuning for VQA: Integer-Only Loss, https://arxiv.org/abs/2508.11170
Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu, 15 Aug 2025, ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection, https://arxiv.org/abs/2508.11281
Yuan Li, Zhengzhong Liu, and Eric Xing, 16 Aug 2025, Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models, https://arxiv.org/abs/2508.11953
Daria Diatlova, Nikita Balagansky, Alexander Varlamov, Egor Spirin, 16 Aug 2025, VARAN: Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks, https://arxiv.org/abs/2508.12061
Minseon Kim, Jin Myung Kwak, Lama Alssum, Bernard Ghanem, Philip Torr, David Krueger, Fazl Barez, Adel Bibi, 17 Aug 2025, Rethinking Safety in LLM Fine-tuning: An Optimization Perspective, https://arxiv.org/abs/2508.12531
Yuhao Zhou, Jindi Lv, Yuxin Tian, Dan Si, Qing Ye, Jiancheng Lv, 18 Aug 2025, Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach, https://arxiv.org/abs/2508.12673
Manning Zhu, Songtao Guo, Pengzhan Zhou, Yansong Ning, Chang Han, Dewen Qiao, 18 Aug 2025, FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment, https://arxiv.org/abs/2508.12727
Julia Sammartino, Libby Barak, Jing Peng, Anna Feldman, 15 Aug 2025, When Does Language Transfer Help? Sequential Fine-Tuning for Cross-Lingual Euphemism Detection, https://arxiv.org/abs/2508.11831
Shiwei Li, Xiandi Luo, Xing Tang, Haozhao Wang, Hao Chen, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics, https://arxiv.org/abs/2505.23194
Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu, 15 Aug 2025, GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation, https://arxiv.org/abs/2311.11319
Keyu Chen, Wenchao Sun, Hao Cheng, Sifa Zheng, 18 Aug 2025, RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation, https://arxiv.org/abs/2505.03344
Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee, 19 Aug 2025, Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation, https://arxiv.org/abs/2508.14031
Hassan Barmandah, 19 Aug 2025, Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation, https://arxiv.org/abs/2508.13525
Eric Nuertey Coleman, Luigi Quarantiello, Ziyue Liu, Qinwen Yang, Samrat Mukherjee, Julio Hurtado and Vincenzo Lomonaco, 19 Aug 2025, Parameter-Efficient Continual Fine-Tuning: A Survey, https://arxiv.org/abs/2504.13822
Yajie Zhou and Xiaoyi Pang and Zhibo Wang, 20 Aug 2025, AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption, https://arxiv.org/abs/2505.24773
Xujia Wang, Yunjia Qi, Bin Xu, 20 Aug 2025, LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization, https://arxiv.org/abs/2507.04487
Mayla R. Boguslav, Adam Kiehl, David Kott, G. Joseph Strecker, Tracy Webb, Nadia Saklou, Terri Ward, Michael Kirby, 20 Aug 2025, Fine-tuning foundational models to code diagnoses from veterinary health records, https://arxiv.org/abs/2410.15186
Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang, 22 Aug 2025, AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs, https://arxiv.org/abs/2508.16153
Sungmin Kang, Jisoo Kim, Salman Avestimehr, Sunwoo Lee, 22 Aug 2025, GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation, https://arxiv.org/abs/2508.16191
Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa, 22 Aug 2025, RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs, https://arxiv.org/abs/2508.16546
Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang, 21 Aug 2025, CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning, https://arxiv.org/abs/2508.15868
Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani, 21 Aug 2025, Decentralized Low-Rank Fine-Tuning of Large Language Models, https://arxiv.org/abs/2501.15361
Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao and Wenpin Tang, 21 Aug 2025, Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning, https://arxiv.org/abs/2502.01819
Jack Youstra, Mohammed Mahfoud, Yang Yan, Henry Sleight, Ethan Perez, Mrinank Sharma, 23 Aug 2025, Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks, https://arxiv.org/abs/2508.17158
Wenhong Zhu, Ruobing Xie, Rui Wang, Xingwu Sun, Di Wang, Pengfei Liu, 25 Aug 2025, Proximal Supervised Fine-Tuning, https://arxiv.org/abs/2508.17784
Bin Pan, Shiyu Shen, Zongbin Wang, Zhenwei Shi and Xia Xu, 23 Aug 2025, Preserving Domain Generalization in Fine-Tuning via Joint Parameter Selection, https://arxiv.org/abs/2508.16976
Haojie Zhang, 24 Aug 2025, DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2508.17337
Yuhao Zhang, Shaoming Duan, Jinhang Su, Chuanyi Liu, Peiyi Han, 4 Sep 2025, SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning, https://arxiv.org/abs/2509.03937
Junyu Yan, Feng Chen, Yuyang Xue, Yuning Du, Konstantinos Vilouras, Sotirios A. Tsaftaris, Steven McDonagh, 4 Sep 2025, SWiFT: Soft-Mask Weight Fine-tuning for Bias Mitigation, https://arxiv.org/abs/2508.18826
Wei Huang, Huang Wei, Yinggui Wang, 4 Sep 2025, DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression, https://arxiv.org/abs/2509.01221
Cheng Peng, Xinyu Dong, Mengxian Lyu, Daniel Paredes, Yaoyun Zhang, Yonghui Wu, 5 Sep 2025, A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning, https://arxiv.org/abs/2509.04753
Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi, Josh Kimball, Ling Liu, 5 Sep 2025, Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning, https://arxiv.org/abs/2408.09600
William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane, 5 Sep 2025, Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning, https://arxiv.org/abs/2506.14387
Gang Hu, Yinglei Teng, Pengfei Wu, and Nan Wang, 26 Aug 2025, FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge, https://arxiv.org/abs/2508.18663
Qing Xiao, Yingshan Peng and PeiPei Zhang, 26 Aug 2025, Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database, https://arxiv.org/abs/2508.18732
Qihang Ai, Pi Bu, Yue Cao, Yingyao Wang, Jihao Gu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Zhicheng Zheng, Jun Song, Yuning Jiang, Bo Zheng, 27 Aug 2025, InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning, https://arxiv.org/abs/2508.19679
Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi, 26 Aug 2025, Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments, https://arxiv.org/abs/2508.19376
Yuhang Liu, Tao Li, Zhehao Huang, Zuopeng Yang, and Xiaolin Huang, 27 Aug 2025, Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models, https://arxiv.org/abs/2508.19564
Fahao Chen, Jie Wan, Peng Li, Zhou Su, Dongxiao Yu, 26 Aug 2025, Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices, https://arxiv.org/abs/2508.19078
Manuel Mosquera, Melissa Robles, Johan Rodriguez, Ruben Manrique, 26 Aug 2025, Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study, https://arxiv.org/abs/2508.19481
Fatema Siddika, Md Anwar Hossen, J. Pablo Mu\~noz, Tanya Roosta, Anuj Sharma, Ali Jannesari, 27 Aug 2025, FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation, https://arxiv.org/abs/2508.20295
Weitao Feng, Lixu Wang, Tianyi Wei, Jie Zhang, Chongyang Gao, Sinong Zhan, Peizhuo Lv, Wei Dong, 28 Aug 2025, Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning, https://arxiv.org/abs/2508.20697
Jinyuan Feng, Chaopeng Wei, Tenghai Qiu, Tianyi Hu, Zhiqiang Pu, 28 Aug 2025, CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning, https://arxiv.org/abs/2505.17553
Ali Nazari and Michael Weiss, 28 Aug 2025, Fine-Tuning Topics through Weighting Aspect Keywords, https://arxiv.org/abs/2502.08496
Jessica Liang, Anirudh Bharadwaj, 29 Aug 2025, QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models, https://arxiv.org/abs/2508.21810
Guofu Liao, Taotao Wang, Shengli Zhang, Jiqun Zhang, Shi Long, and Dacheng Tao, 29 Aug 2025, zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs, https://arxiv.org/abs/2508.21393
Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He and Lijun Wu, 29 Aug 2025, Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning, https://arxiv.org/abs/2508.21589
Jo\~ao Valente, Atabak Dehban, Rodrigo Ventura, 29 Aug 2025, CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models, https://arxiv.org/abs/2508.21732
Yanxiao Zhao, Yaqian Li, Zihao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long, 31 Aug 2025, SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs, https://arxiv.org/abs/2509.00930
Elie Thellier (EPIONE), Huiyu Li (EPIONE), Nicholas Ayache (EPIONE), Herv\'e Delingette (EPIONE), 20 Aug 2025, Mitigating Data Exfiltration Attacks through Layer-Wise Learning Rate Decay Fine-Tuning, https://arxiv.org/abs/2509.00027
Shikun Liu, Deyu Zou, Nima Shoghi, Victor Fung, Kai Liu, Pan Li, 30 Aug 2025, RoFt-Mol: Benchmarking Robust Fine-Tuning with Molecular Graph Foundation Models, https://arxiv.org/abs/2509.00614
Xinlu Zhang, Na Yan, Yang Su, Yansha Deng, Toktam Mahmoodi, 1 Sep 2025, Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks, https://arxiv.org/abs/2509.01750
Wenlong Mou, 2 Sep 2025, Is RL fine-tuning harder than regression? A PDE learning approach for diffusion models, https://arxiv.org/abs/2509.02528
Asif Mohammed Saad, Umme Niraj Mahi, 2 Sep 2025, SegFormer Fine-Tuning with Dropout: Advancing Hair Artifact Removal in Skin Lesion Analysis, https://arxiv.org/abs/2509.02156
Sifeng Shang, Jiayi Zhou, Chenyu Lin, Minxian Li, Kaiyang Zhou, 1 Sep 2025, Fine-tuning Quantized Neural Networks with Zeroth-order Optimization, https://arxiv.org/abs/2505.13430
Xingyu Su, Xiner Li, Masatoshi Uehara, Sunwoo Kim, Yulai Zhao, Gabriele Scalia, Ehsan Hajiramezanali, Tommaso Biancalani, Degui Zhi, Shuiwang Ji, 30 Aug 2025, Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design, https://arxiv.org/abs/2507.00445
Xin Chen, Shuaijun Chen, Omid Tavallaie, Nguyen Tran, Shuhuang Xiang, Albert Zomaya, 30 Aug 2025, Convergence Analysis of Aggregation-Broadcast in LoRA-enabled Distributed Fine-Tuning, https://arxiv.org/abs/2508.01348
Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen and Ziqian Zeng, 31 Aug 2025, RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis, https://arxiv.org/abs/2502.18517
Linus Jern, Valter Uotila, Cong Yu, Bo Zhao, 1 Sep 2025, Agent-Q: Fine-Tuning Large Language Models for Quantum Circuit Generation and Optimization, https://arxiv.org/abs/2504.11109
Christopher Subia-Waud (Rayonlabs Team), 3 Sep 2025, Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation, https://arxiv.org/abs/2506.07940
Xiang Yuan, Jun Shu, Deyu meng, Zongben Xu, 31 Aug 2025, Feed Two Birds with One Scone: Exploiting Function-Space Regularization for Both OOD Robustness and ID Fine-Tuning Performance, https://arxiv.org/abs/2509.05328
ZiXuan Zhang, Bowen Hao, Yingjie Li, Hongzhi Yin, 6 Sep 2025, ZhiFangDanTai: Fine-tuning Graph-based Retrieval-Augmented Generation Model for Traditional Chinese Medicine Formula, https://arxiv.org/abs/2509.05867
Joe Wilder, Nikhil Kadapala, Benji Xu, Mohammed Alsaadi, Aiden Parsons, Mitchell Rogers, Palash Agarwal, Adam Hassick, Laura Dietz, 8 Sep 2025, UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction, https://arxiv.org/abs/2509.06883
Lishan Yang, Nam Kha Nguygen, Po Hu, Wei Emma Zhang, Yanjun Shu, Mong Yuan Sim and Weitong Chen, 1 Sep 2025, FediLoRA: Heterogeneous LoRA for Federated Multimodal Fine-tuning under Missing Modalities, https://arxiv.org/abs/2509.06984
Xiao Li and Bharat Gandhi and Ming Zhan and Mohit Nehra and Zhicheng Zhang and Yuchen Sun and Meijia Song and Naisheng Zhang and Xi Wang, 9 Sep 2025, Fine-Tuning Vision-Language Models for Visual Navigation Assistance, https://arxiv.org/abs/2509.07488
Michele Joshua Maggini, Dhia Merzougui, Rabiraj Bandyopadhyay, Ga\"el Dias, Fabrice Maurel, Pablo Gamallo, 9 Sep 2025, Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning, https://arxiv.org/abs/2509.07768
Jiahao Chen, Zhiyuan Huang, Yurou Liu, Bing Su, 12 Sep 2025, LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios, https://arxiv.org/abs/2509.09926
Himanshu Thakur, Eshani Agrawal, Smruthi Mukund, 18 Aug 2025, Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors, https://arxiv.org/abs/2509.09689
Talha Tahir, 8 Sep 2025, The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization, https://arxiv.org/abs/2509.09712
Hao Zhang, Bo Huang, Zhenjia Li, Xi Xiao, Hui Yi Leong, Zumeng Zhang, Xinwei Long, Tianyang Wang, Hao Xu, 11 Sep 2025, Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models, https://arxiv.org/abs/2509.09119
Honghui Xu, Shiva Shrestha, Wei Chen, Zhiyuan Li, Zhipeng Cai, 11 Sep 2025, DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models, https://arxiv.org/abs/2509.09097
Leonardo Matone, Ben Abramowitz, Ben Armstrong, Avinash Balakrishnan, Nicholas Mattei, 11 Sep 2025, DeepVoting: Learning and Fine-Tuning Voting Rules with Canonical Embeddings, https://arxiv.org/abs/2408.13630
Marko Tuononen, Heikki Penttinen, Ville Hautam\"aki, 19 Sep 2025, Targeted Fine-Tuning of DNN-Based Receivers via Influence Functions, https://arxiv.org/abs/2509.15950
Baichuan Huang, Ananth Balashankar, Amir Aminifar, 19 Sep 2025, BEFT: Bias-Efficient Fine-Tuning of Language Models, https://arxiv.org/abs/2509.15974
Youngwon Choi, Jaeyoon Jung, Hyeonyu Kim, Huu-Kim Nguyen, Hwayeon Kim, 18 Sep 2025, Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech data, https://arxiv.org/abs/2509.15389
Ishika Agarwal, Dilek Hakkani-T\"ur, 19 Sep 2025, Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data, https://arxiv.org/abs/2502.09969
Shiwan Zhao, Xuyang Zhao, Jiaming Zhou, Aobo Kong, Qicheng Li, Yong Qin, 19 Sep 2025, Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning, https://arxiv.org/abs/2509.15157
MSR Avinash, 7 Sep 2025, Profiling LoRA/QLoRA Fine-Tuning Efficiency on Consumer GPUs: An RTX 4060 Case Study, https://arxiv.org/abs/2509.12229
Hangzhan Jin, Sitao Luan, Sicheng Lyu, Guillaume Rabusseau, Reihaneh Rabbany, Doina Precup, Mohammad Hamdaqa, 8 Sep 2025, RL Fine-Tuning Heals OOD Forgetting in SFT, https://arxiv.org/abs/2509.12235
Mengyi Deng, Xin Li, Tingyu Zhu, Zhicheng Yang, Zhijiang Guo, Wei Wang, 16 Sep 2025, When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning, https://arxiv.org/abs/2509.13079
Bo Yin, Xingyi Yang, Xinchao Wang, 16 Sep 2025, Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning, https://arxiv.org/abs/2509.13240
Rodrigo M Carrillo-Larco, 16 Sep 2025, LLMs for energy and macronutrients estimation using only text data from 24-hour dietary recalls: a parameter-efficient fine-tuning experiment using a 10-shot prompt, https://arxiv.org/abs/2509.13268
Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim, 16 Sep 2025, A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs, https://arxiv.org/abs/2509.12649
Yao Liang, Dongcheng Zhao, Feifei Zhao, Guobin Shen, Yuwei Wang, Dongqi Liang, Yi Zeng, 16 Sep 2025, MVPBench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values, https://arxiv.org/abs/2509.08022
Pengcheng Luo, Yunyang Zhao, Bowen Zhang, Genke Yang, Boon-Hee Soong, Chau Yuen, 30 Aug 2025, SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning, https://arxiv.org/abs/2509.10486
Milan Marocchi, Matthew Fynn, Kayapanda Mandana, Yue Rong, 15 Sep 2025, Scaling to Multimodal and Multichannel Heart Sound Classification: Fine-Tuning Wav2Vec 2.0 with Synthetic and Augmented Biosignals, https://arxiv.org/abs/2509.11606
Lei Wang, Jieming Bian, Letian Zhang, Jie Xu, 18 Sep 2025, Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning, https://arxiv.org/abs/2509.15087
Yeongbin Seo and Dongha Lee and Jaehyung Kim and Jinyoung Yeo, 18 Sep 2025, Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning, https://arxiv.org/abs/2509.15188
Gustavo Sandoval, Denys Fenchenko and Junyao Chen, 15 Sep 2025, Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models, https://arxiv.org/abs/2509.14271
Chenjun Li, Laurin Lux, Alexander H. Berger, Martin J. Menten, Mert R. Sabuncu, Johannes C. Paetzold, 17 Sep 2025, Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis, https://arxiv.org/abs/2503.09808
Yu Cheng Chih, Yong Hao Hou, 10 Sep 2025, Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model, https://arxiv.org/abs/2509.08381
Alejandro Moreno Arcas, Albert Sanchis, Jorge Civera, Alfons Juan, 10 Sep 2025, HOFT: Householder Orthogonal Fine-tuning, https://arxiv.org/abs/2505.16531
Pittawat Taveekitworachai, Potsawee Manakul, Sarana Nutanong, Kunat Pipatanakul, 10 Sep 2025, Prior Prompt Engineering for Reinforcement Fine-Tuning, https://arxiv.org/abs/2505.14157
Shambhavi Krishna, Atharva Naik, Chaitali Agarwal, Sudharshan Govindan, Taesung Lee, Haw-Shiuan Chang, 17 Sep 2025, Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning, https://arxiv.org/abs/2509.13624
Haoteng Yin, Rongzhe Wei, Eli Chien, Pan Li, 16 Sep 2025, Privately Learning from Graphs with Applications in Fine-tuning Large Language Models, https://arxiv.org/abs/2410.08299
Adel ElZemity, Budi Arief and Shujun Li, 17 Sep 2025, CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning, https://arxiv.org/abs/2503.09334

Data Sets

Research papers on datasets used for training:

Sean Williams, James Huckle, 30 May 2024, Easy Problems That LLMs Get Wrong, https://arxiv.org/abs/2405.19616 Code: https://github.com/autogenai/easy-problems-that-llms-get-wrong
Raghav Jain, Daivik Sojitra, Arkadeep Acharya, Sriparna Saha, Adam Jatowt, Sandipan Dandapat, December 2023, Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.418/ PDF: https://aclanthology.org/2023.emnlp-main.418.pdf
Gayathri Saranathan, Mahammad Parwez Alam, James Lim, Suparna Bhattacharya, Soon Yee Wong, Foltin Martin & Cong Xu, 2024, DELE: Data Efficient LLM Evaluation, Hewlett Packard Labs, Navigating and Addressing Data Problems for Foundation Models (DPFM) Workshop, ICLR 2024, https://openreview.net/pdf?id=I8bsxPWLNF
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, Nov 2023, A Survey of Large Language Models, https://arxiv.org/abs/2303.18223
Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan, 26 Feb 2024, MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT, https://arxiv.org/abs/2402.16840 Code: https://github.com/mbzuai-oryx/MobiLlama
Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly, 29 Jan 2024, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380
Cobus Greyling, Dec 2023, A Comprehensive Survey of Large Language Models (LLMs), https://cobusgreyling.medium.com/a-comprehensive-survey-of-large-language-models-llms-946a30d9288e
Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt, 17 Jun 2024, MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens, https://arxiv.org/abs/2406.11271
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
Sharon Goldman, August 24, 2024, The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive, https://fortune.com/2024/08/23/data-labeling-ai-scaleai-snorkel-costs/ (The high cost of data labeling.)
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao, 20 Feb 2024 (v2), Large Language Models: A Survey, https://arxiv.org/abs/2402.06196
Reddit Signs AI Content Licensing Deal Ahead of IPO, https://www.bloomberg.com/news/articles/2024-02-16/reddit-is-said-to-sign-ai-content-licensing-deal-ahead-of-ipo?srnd=undefined&sref=b0SdE1lu&tpcc=NL_Marketing
Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn, Jun 06, 2024, Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI, https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
Georgia Argyro, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou, 10 Sep 2024, Prompt2Fashion: An automatically generated fashion dataset, https://arxiv.org/abs/2409.06442
Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Shuo Shang, 23 Sep 2024, MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding, https://arxiv.org/abs/2409.14818
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Pierre-Carl Langlais, Anastasia Stasenko, Catherine Arnett, November 13, 2024, Releasing the largest multilingual open pretraining dataset, https://huggingface.co/blog/Pclanglais/two-trillion-tokens-open
Arindam Mitra , Ahmed Awadallah , Yash Lara , November 14, 2024, Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/orca-agentinstruct-agentic-flows-can-be-effective-synthetic-data-generators/
Paul Sawers, Dec 2024, Harvard and Google to release 1 million public-domain books as AI training dataset, https://techcrunch.com/2024/12/12/harvard-and-google-to-release-1-million-public-domain-books-as-ai-training-dataset/
Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
Ali Forootani, 22 Mar 2025, A Survey on Mathematical Reasoning and Optimization with Large Language Models, https://arxiv.org/abs/2503.17726
Cameron R. Wolfe, Ph.D., May 19, 2025, A Guide for Debugging LLM Training Data: Data-centric techniques and tools that anyone should use when training an LLM, https://cameronrwolfe.substack.com/p/llm-debugging
Yi Dong, Yusuke Muraoka, Scott Shi, and Yi Zhang, 14 Aug 2025, MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance, https://arxiv.org/abs/2508.10429
Haydn Thomas Jones, Natalie Maus, Josh Magnus Ludan, Maggie Ziyu Huan, Jiaming Liang, Marcelo Der Torossian Torres, Jiatao Liang, Zachary Ives, Yoseph Barash, Cesar de la Fuente-Nunez, Jacob R. Gardner, Mark Yatskar, 14 Aug 2025, A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design, https://arxiv.org/abs/2508.10899
Ziye Deng, Ruihan He, Jiaxiang Liu, Yuan Wang, Zijie Meng, Songtao Jiang, Yong Xie, Zuozhu Liu, 14 Aug 2025, Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset, https://arxiv.org/abs/2508.10528
Feiran Li, Qianqian Xu, Shilong Bao, Boyu Han, Zhiyong Yang, Qingming Huang, 14 Aug 2025, Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation, https://arxiv.org/abs/2508.10672
Yuzhuo Xiao, Zeyu Han, Yuhan Wang, Huaizu Jiang, 4 Aug 2025, XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs, https://arxiv.org/abs/2508.09999
Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee, 14 Aug 2025, GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes, https://arxiv.org/abs/2504.06866
Quang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai-Kit Yeung, 14 Aug 2025, MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning, https://arxiv.org/abs/2508.04549
Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma, 23 Jul 2025, Dataset Distillation as Data Compression: A Rate-Utility Perspective, https://arxiv.org/abs/2507.17221
Md Min-Ha-Zul Abedin and Tazqia Mehrub, 22 Jul 2025, Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset, https://arxiv.org/abs/2507.16952
Mashiro Toyooka, Kiyoharu Aizawa and Yoko Yamakata, 23 Jul 2025, A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task, https://arxiv.org/abs/2507.17232
Yuanchen Shi, Biao Ma, Longyin Zhang, and Fang Kong, 23 Jul 2025, Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline, https://arxiv.org/abs/2405.08427
David Kurtenbach, Lior Shamir, 15 Jul 2025, An open dataset of neural networks for hypernetwork research, https://arxiv.org/abs/2507.15869
Morad Tukan, Loay Mualem, Eitan Netzer, Liran Sigalat, 22 Jul 2025, Improving Model Classification by Optimizing the Training Dataset, https://arxiv.org/abs/2507.16729
Yasser Ashraf, Ahmed Sharshar, Velibor Bojkovic, Bin Gu, 22 Jul 2025, SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities, https://arxiv.org/abs/2507.16151
Aaron Ho (1), Lorenzo Zanisi (2), Bram de Leeuw (3), Vincent Galvan (1), Pablo Rodriguez-Fernandez (1), Nathaniel T. Howard (1) ((1) MIT Plasma Science and Fusion Center, Cambridge, USA, (2) UKAEA Culham Centre for Fusion Energy, Abingdon, UK, (3) Radboud University, Nijmegen, Netherlands), 21 Jul 2025, Efficient dataset construction using active learning and uncertainty-aware neural networks for plasma turbulent transport surrogate models, https://arxiv.org/abs/2507.15976
Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum, 22 Jul 2025, Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning, https://arxiv.org/abs/2507.16746
Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Andrew O. Johnson, Ziming Wei, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen, 21 Jul 2025, Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs, https://arxiv.org/abs/2503.07664
Daniel Grimm, Ahmed Abouelazm, J. Marius Z\"ollner, 24 Jul 2025, Goal-based Trajectory Prediction for improved Cross-Dataset Generalization, https://arxiv.org/abs/2507.18196
Paulo Mendes (1), Eva Maia (1), Isabel Pra\c{c}a (1) ((1) GECAD, ISEP, Polytechnic of Porto, Portugal), 23 Jul 2025, MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection, https://arxiv.org/abs/2507.17978
Maria Vlachou, 24 Jul 2025, Fashion-AlterEval: A Dataset for Improved Evaluation of Conversational Recommendation Systems with Alternative Relevant Items, https://arxiv.org/abs/2507.18017
Xuebo Jin, Longfei Gao, Anshuo Tong, Zhengyang Chen, Jianlei Kong, Ning Sun, Huijun Ma, Qiang Wang, Yuting Bai, Tingli Su, 24 Jul 2025, TCM-Tongue: A Standardized Tongue Image Dataset with Pathological Annotations for AI-Assisted TCM Diagnosis, https://arxiv.org/abs/2507.18288
Minje Park, Jeonghwa Lim, Taehyung Yu, and Sunghoon Joo, 24 Jul 2025, A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation, https://arxiv.org/abs/2507.18323
Baoyao Yang, Wanyun Li, Dixin Chen, Junxiang Chen, Wenbin Yao, Haifeng Lin, 24 Jul 2025, VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding, https://arxiv.org/abs/2507.18552
Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning, https://arxiv.org/abs/2507.18616
Sam Gordon James, Miranda Elaine Glynis Armstrong, Aisling Ann O'Kane, Harry Emerson and Zahraa S. Abdallah, 7 May 2025, BrisT1D Dataset: Young Adults with Type 1 Diabetes in the UK using Smartwatches, https://arxiv.org/abs/2507.17757
Gabriel Jarry, Ramon Dalmau, Philippe Very, Franck Ballerini, Stephania-Denisa Bocu, 24 Jul 2025, GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences, https://arxiv.org/abs/2507.18330
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra, Roma Patel, Ding Wang, Mark D\'iaz, Alicia Parrish, Aida Mostafazadeh Davani, Zoe Ashwood, Michela Paganini, Vinodkumar Prabhakaran, Verena Rieser, Lora Aroyo, 15 Jul 2025, Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models, https://arxiv.org/abs/2507.13383
Paul E. Calzada, Zahin Ibnat, Tanvir Rahman, Kamal Kandula, Danyu Lu, Sujan Kumar Saha, Farimah Farahmandi, Mark Tehranipoor, 9 Jul 2025, VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation, https://arxiv.org/abs/2507.13369
Xiao Wang, Qian Zhu, Shujuan Wu, Bo Jiang, Shiliang Zhang, Yaowei Wang, Yonghong Tian, Bin Luo, 18 Jul 2025, When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework, https://arxiv.org/abs/2507.13659
Morteza Bodaghi, Majid Hosseini, Raju Gottumukkala, Ravi Teja Bhupatiraju, Iftikhar Ahmad, Moncef Gabbouj, 16 Jul 2025, UL-DD: A Multimodal Drowsiness Dataset Using Video, Biometric Signals, and Behavioral Data, https://arxiv.org/abs/2507.13403
Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin, 18 Jul 2025, A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions, https://arxiv.org/abs/2507.14245
Daniel Fein, Gabriela Aranguiz-Dias, 18 Jul 2025, Influence Functions for Preference Dataset Pruning, https://arxiv.org/abs/2507.14344
Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi, Zeynep Yildirim, 18 Jul 2025, MiDeSeC: A Dataset for Mitosis Detection and Segmentation in Breast Cancer Histopathology Images, https://arxiv.org/abs/2507.14271
Refik Samet, Nooshin Nemati, Emrah Hancer, Serpil Sak, Bilge Ayca Kirmizi, 18 Jul 2025, NuSeC: A Dataset for Nuclei Segmentation in Breast Cancer Histopathology Images, https://arxiv.org/abs/2507.14272
Deyun Zhang, Xiang Lan, Shijia Geng, Qinghao Zhao, Sumei Fan, Mengling Feng, and Shenda Hong, 21 Jul 2025, MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations, https://arxiv.org/abs/2507.15255
Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, Shiming Xiang, 9 Aug 2025, MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction, https://arxiv.org/abs/2508.06859
Keyu Li, Mohan Jiang, Dayuan Fu, Yunze Wu, Xiangkun Hu, Dequan Wang, Pengfei Liu, 9 Aug 2025, DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery, https://arxiv.org/abs/2508.06960
Naseem Machlovi, Maryam Saleki, Innocent Ababio, Ruhul Amin, 9 Aug 2025, Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach, https://arxiv.org/abs/2508.07063
Xiaoyuan Zhu, Muru Zhang, Ollie Liu, Robin Jia, Willie Neiswanger, 8 Aug 2025, LLM Unlearning Without an Expert Curated Dataset, https://arxiv.org/abs/2508.06595
Xiaobo Zhang (1 and 2), Congqing He (2), Ying He (1 and 2), Jian Peng (1), Dajie Fu (1), Tien-Ping Tan (2) ((1) School of Information Engineering, Jiangxi Vocational College of Finance & Economics, Jiujiang, China, (2) School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia), 9 Aug 2025, ESNERA: Empirical and semantic named entity alignment for named entity dataset merging, https://arxiv.org/abs/2508.06877
Muhammad Dehan Al Kautsar, Aswin Candra, Muhammad Alif Al Hakim, Maxalmina Satria Kahfi, Fajri Koto, Alham Fikri Aji, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Genta Indra Winata, 9 Aug 2025, SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages, https://arxiv.org/abs/2508.07069
Licheng Zhang, Bach Le, Naveed Akhtar, Tuan Ngo, 11 Aug 2025, DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models, https://arxiv.org/abs/2508.07714
Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang and Yanbin Hao, 11 Aug 2025, UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models, https://arxiv.org/abs/2508.07766
Vojt\v{e}ch Stan\v{e}k, Karel Srna, Anton Firc, Kamil Malinka, 11 Aug 2025, SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis, https://arxiv.org/abs/2508.07944
Unisha Joshi, 6 Aug 2025, Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection, https://arxiv.org/abs/2508.06552
Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar, 7 Aug 2025, ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos, https://arxiv.org/abs/2508.06570
Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh, Dorien Herremans, 11 Aug 2025, End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation, https://arxiv.org/abs/2508.06387
Bermet Burkanova, Payam Jome Yazdian, Chuxuan Zhang, Trinity Evans, Paige Tutt\"os\'i, Angelica Lim, 25 Jul 2025, Salsa as a Nonverbal Embodied Language -- The CoMPAS3D Dataset and Benchmarks, https://arxiv.org/abs/2507.19684
Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Di\'ana Hidv\'egi, Hend Alrasheed, Mohsen Bahrami, 25 Jul 2025, NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology, https://arxiv.org/abs/2507.19697
Tan-Minh Nguyen, Hoang-Trung Nguyen, Trong-Khoi Dao, Xuan-Hieu Phan, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, 26 Jul 2025, VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering, https://arxiv.org/abs/2507.19995
Adrien Bazoge, 28 Jul 2025, MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation, https://arxiv.org/abs/2507.20917
Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks and Amir Abdullah, 27 Jul 2025, TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research, https://arxiv.org/abs/2503.12730
Yutong Liu, Ziyue Zhang, Ban Ma-bao, Yuqing Cai, Yongbin Yu, Renzeng Duojie, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi, 27 Jul 2025, FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation, https://arxiv.org/abs/2505.14351
Robin Burchard and Kristof Van Laerhoven, 28 Jul 2025, Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset, https://arxiv.org/abs/2505.20788
Shenghe Zheng, Qianjia Cheng, Junchi Yao, Mengsong Wu, Haonan He, Ning Ding, Yu Cheng, Shuyue Hu, Lei Bai, Dongzhan Zhou, Ganqu Cui, Peng Ye, 28 Jul 2025, Scaling Physical Reasoning with the PHYSICS Dataset, https://arxiv.org/abs/2506.00022
Andreas Spilz, Heiko Oppel, Jochen Werner, Kathrin Stucke-Straub, Felix Capanni and Michael Munz, 6 Jun 2025, GAITEX: Human motion dataset from impaired gait and rehabilitation exercises of inertial and optical sensor data, https://arxiv.org/abs/2507.21069
Ariel E. Stassi, Yanina Boria, J. Mat\'ias Di Martino and Gregory Randall, 7 Jul 2025, iLSU-T: an Open Dataset for Uruguayan Sign Language Translation, https://arxiv.org/abs/2507.21104
Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu, 29 Jul 2025, Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation, https://arxiv.org/abs/2507.21455
Basak Demirok, Mucahid Kutlu, Selin Mergen, 29 Jul 2025, MultiAIGCD: A Comprehensive dataset for AI Generated Code Detection Covering Multiple Languages, Models,Prompts, and Scenarios, https://arxiv.org/abs/2507.21693
Mohammed Baharoon, Luyang Luo, Michael Moritz, Abhinav Kumar, Sung Eun Kim, Xiaoman Zhang, Miao Zhu, Mahmoud Hussain Alabbad, Maha Sbayel Alhazmi, Neel P. Mistry, Kent Ryan Kleinschmidt, Brady Chrisler, Sathvik Suryadevara, Sri Sai Dinesh Jaliparthi, Noah Michael Prudlo, Mark David Marino, Jeremy Palacio, Rithvik Akula, Hong-Yu Zhou, Ibrahim Ethem Hamamci, Scott J. Adams, Hassan Rayhan AlOmaish, Pranav Rajpurkar, 29 Jul 2025, ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports, https://arxiv.org/abs/2507.22030
Salvatore Sinno, Markus Bertl, Arati Sahoo, Bhavika Bhalgamiya, Thomas Gro{\ss}, Nicholas Chancellor, 29 Jul 2025, Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing, https://arxiv.org/abs/2502.03086
Xiaoyi Feng, Kaifeng Zou, Caichun Cen, Tao Huang, Hui Guo, Zizhou Huang, Yingli Zhao, Mingqing Zhang, Ziyuan Zheng, Diwei Wang, Yuntao Zou, Dagang Li, 29 Jul 2025, LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering, https://arxiv.org/abs/2506.02733
Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri, 29 Jul 2025, SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures, https://arxiv.org/abs/2507.00209
Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang, 25 Mar 2025, OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching, https://arxiv.org/abs/2503.21813
Bastien Le Guellec, Kokou Adambounou, Lisa C Adams, Thibault Agripnidis, Sung Soo Ahn, Radhia Ait Chalal, Tugba Akinci D Antonoli, Philippe Amouyel, Henrik Andersson, Raphael Bentegeac, Claudio Benzoni, Antonino Andrea Blandino, Felix Busch, Elif Can, Riccardo Cau, Armando Ugo Cavallo, Christelle Chavihot, Erwin Chiquete, Renato Cuocolo, Eugen Divjak, Gordana Ivanac, Barbara Dziadkowiec Macek, Armel Elogne, Salvatore Claudio Fanni, Carlos Ferrarotti, Claudia Fossataro, Federica Fossataro, Katarzyna Fulek, Michal Fulek, Pawel Gac, Martyna Gachowska, Ignacio Garcia Juarez, Marco Gatti, Natalia Gorelik, Alexia Maria Goulianou, Aghiles Hamroun, Nicolas Herinirina, Krzysztof Kraik, Dominik Krupka, Quentin Holay, Felipe Kitamura, Michail E Klontzas, Anna Kompanowska, Rafal Kompanowski, Alexandre Lefevre, et al. (43 additional authors not shown), 25 Jul 2025, PARROT: An Open Multilingual Radiology Reports Dataset, https://arxiv.org/abs/2507.22939
Yuto Haneji, Taichi Nishimura, Hirotaka Kameko, Keisuke Shirai, Tomoya Yoshida, Keiya Kajimura, Koki Yamamoto, Taiyu Cui, Tomohiro Nishimoto, Shinsuke Mori, 31 Jul 2025, EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts, https://arxiv.org/abs/2410.05343
Eylon Caplan, Tania Chakraborty, Dan Goldwasser, 31 Jul 2025, Splits! A Flexible Dataset and Evaluation Framework for Sociocultural Linguistic Investigation, https://arxiv.org/abs/2504.04640
Thomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani, 30 Jul 2025, Accenture-NVS1: A Novel View Synthesis Dataset, https://arxiv.org/abs/2503.18711
Feng Zhu, Zihang Zhang, Kangcheng Teng, Abduhelil Yakup and Xiaohong Zhang, 31 Jul 2025, SmartPNT-MSF: A Multi-Sensor Fusion Dataset for Positioning and Navigation Research, https://arxiv.org/abs/2507.19079
Hongjie Chen, Akshay Mehra, Josh Kimball, Ryan A. Rossi, 29 Jul 2025, Measuring Time-Series Dataset Similarity using Wasserstein Distance, https://arxiv.org/abs/2507.22189
Vanessa Rebecca Wiyono, David Anugraha, Ayu Purwarianti, Genta Indra Winata, 29 Jul 2025, IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian, https://arxiv.org/abs/2507.22159
Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V. Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, Aleksandr Y. Romanov, 30 Jul 2025, AlphaDent: A dataset for automated tooth pathology detection, https://arxiv.org/abs/2507.22512
Lucas Correia, Jan-Christoph Goos, Thomas B\"ack, Anna V. Kononova, 31 Jul 2025, PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series, https://arxiv.org/abs/2411.13951
Kejia Gao, Liguo Zhou, Mingjun Liu, Alois Knoll, 1 Aug 2025, E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking, https://arxiv.org/abs/2504.10812
Zihan Zheng, Tianle Cui, Chuwen Xie, Jiahui Zhang, Jiahui Pan, Lewei He, Qianglong Chen, 2 Aug 2025, NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset, https://arxiv.org/abs/2508.01330
Xuan Liu, Siru Ouyang, Xianrui Zhong, Jiawei Han, Huimin Zhao, 1 Aug 2025, FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models, https://arxiv.org/abs/2508.01055
Ali Forootani, Raffaele Iervolino, 3 Aug 2025, Asynchronous Federated Learning with non-convex client objective functions and heterogeneous dataset, https://arxiv.org/abs/2508.01675
Zhihao Zhu, Jiale Han, Yi Yang, 27 Jul 2025, HoneyImage: Verifiable, Harmless, and Stealthy Dataset Ownership Verification for Image Models, https://arxiv.org/abs/2508.00892
Huyu Wu, Duo Su, Junjie Hou, Guang Li, 2 Aug 2025, Dataset Condensation with Color Compensation, https://arxiv.org/abs/2508.01139
Han Wang, Zhuoran Wang, Roy Ka-Wei Lee, 3 Aug 2025, HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection, https://arxiv.org/abs/2508.01712
Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De la Torre, 3 Aug 2025, Improving Noise Efficiency in Privacy-preserving Dataset Distillation, https://arxiv.org/abs/2508.01749
Junyi Mo, Jiayu Li, Duo Zhang, Elynn Chen, 3 Aug 2025, ACT-Tensor: Tensor Completion Framework for Financial Dataset Imputation, https://arxiv.org/abs/2508.01861
Fan Gao, Cheng Huang, Nyima Tashi, Yutong Liu, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeg Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng, Hao Wang, Yongbin Yu, 4 Aug 2025, TIBSTC-CoT: A Multi-Domain Instruction Dataset for Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2508.01977
Raviraj Joshi, Rakesh Paul, Kanishk Singla, Anusha Kamath, Michael Evans, Katherine Luna, Shaona Ghosh, Utkarsh Vaidya, Eileen Long, Sanjay Singh Chauhan, Niranjan Wartikar, 3 Aug 2025, CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications, https://arxiv.org/abs/2508.01710
Cuno Sankey-Olsen, Rasmus Hvass Olesen, Tobias Oliver Eberhard, Andreas Triantafyllopoulos, Bj\"orn Schuller, Ilhan Aslan, 4 Aug 2025, Detecting COPD Through Speech Analysis: A Dataset of Danish Speech and Machine Learning Approach, https://arxiv.org/abs/2508.02354
Nazmun N Khan, Taylor Sweet, Chase A Harvey, Calder Knapp, Dean J. Krusienski, David E Thompson, 4 Aug 2025, The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset, https://arxiv.org/abs/2508.02417
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue, 2 Aug 2025, CityNav: A Large-Scale Dataset for Real-World Aerial Navigation, https://arxiv.org/abs/2406.14240
Zedong Peng, Zeju Li, Mingzhe Gao, Qiang Xu, Chen Zhang, Jieru Zhao, 4 Aug 2025, ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis, https://arxiv.org/abs/2507.03255
Shaofeng Yin, Ting Lei, Yang Liu, 5 Aug 2025, ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools, https://arxiv.org/abs/2508.03284
Sai Ma, Zhuang Li, John A Taylor, 5 Aug 2025, Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery, https://arxiv.org/abs/2508.03127
Kaiwen Zhao, Bharathan Balaji, Stephen Lee, 5 Aug 2025, CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation, https://arxiv.org/abs/2508.03489
Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy L\"owe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl, 5 Aug 2025, The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air Capture, https://arxiv.org/abs/2508.03162
Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique, 5 Aug 2025, PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset, https://arxiv.org/abs/2503.02497
Chenxi Wang, Jizhan Fang, Xiang Chen, Bozhong Tian, Ziwen Xu, Huajun Chen, Ningyu Zhang, 5 Aug 2025, ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems, https://arxiv.org/abs/2503.20756
Mei Jiang, Houping Yue, Bingdong Li, Hao Hao, Ying Qian, Bo Jiang, and Aimin Zhou, 6 Aug 2025, SID: Benchmarking Guided Instruction Capabilities in STEM Education with a Socratic Interdisciplinary Dialogues Dataset, https://arxiv.org/abs/2508.04563
Shengchao Chen, Guodong Long, Jing Jiang, 6 Aug 2025, FeDaL: Federated Dataset Learning for Time Series Foundation Models, https://arxiv.org/abs/2508.04045
Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, Gyuwon Jung, 18 Jul 2025, Understanding Human Daily Experience Through Continuous Sensing: ETRI Lifelog Dataset 2024, https://arxiv.org/abs/2508.03698
Xiao Wang, Xufeng Lou, Shiao Wang, Ju Huang, Lan Chen, Bo Jiang, 6 Aug 2025, Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset, https://arxiv.org/abs/2403.05839
Naba Rizvi, Harper Strickland, Daniel Gitelman, Tristan Cooper, Alexis Morales-Flores, Michael Golden, Aekta Kallepalli, Akshat Alurkar, Haaset Owens, Saleha Ahmedi, Isha Khirwadkar, Imani Munyaka, Nedjma Ousidhoum, 6 Aug 2025, AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context, https://arxiv.org/abs/2410.16520
Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang, 5 Aug 2025, NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models, https://arxiv.org/abs/2503.12772
Xiao Wang, Haiyang Wang, Shiao Wang, Qiang Chen, Jiandong Jin, Haoyu Song, Bo Jiang, Chenglong Li, 6 Aug 2025, RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework, https://arxiv.org/abs/2504.10018
Pouyan Navard, Yasemin Ozkut, Srikar Adhikari, Elaine Situ-LaCasse, Josie Acu\~na, Adrienne Yarnish, Alper Yilmaz, 5 Aug 2025, ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound, https://arxiv.org/abs/2508.04735
Changle Qu, Sunhao Dai, Ke Guo, Liqin Zhao, Yanan Niu, Xiao Zhang, Jun Xu, 7 Aug 2025, KuaiLive: A Real-time Interactive Dataset for Live Streaming Recommendation, https://arxiv.org/abs/2508.05633
Vladimir Frants, Sos Agaian, 6 Aug 2025, Quaternion-Hadamard Network: A Novel Defense Against Adversarial Attacks with a New Dataset, https://arxiv.org/abs/2502.10452
Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen, 6 Aug 2025, A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation, https://arxiv.org/abs/2404.03253
Ingo Ziegler, Abdullatif K\"oksal, Desmond Elliott, Hinrich Sch\"utze, 6 Aug 2025, CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation, https://arxiv.org/abs/2409.02098
Zekun Liu, Xiaowen Huang, Jitao Sang, 1 Aug 2025, ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations, https://arxiv.org/abs/2508.05667
Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, Jingkuan Song, 8 Aug 2025, Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation, https://arxiv.org/abs/2508.06426
Jucheng Hu, Surong Yang, Lijun Wu, Dongzhan Zhou, 8 Aug 2025, DONOD: Efficient and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning, https://arxiv.org/abs/2504.14810
Nikolaos Dionelis, Alessandra Feliciotti, Mattia Marconcini, Devis Peressutti, Nika Oman Kadunc, JaeWan Park, Hagai Raja Sinulingga, Steve Andreas Immanuel, Ba Tran, Caroline Arnold, Nicolas Long\'ep\'e, 8 Aug 2025, Building Age Estimation: A New Multi-Modal Benchmark Dataset and Community Challenge, https://arxiv.org/abs/2502.13818
Zhenhui Ou, Dawei Li, Zhen Tan, Wenlin Li, Huan Liu, Siyuan Song, 9 Aug 2025, Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Research, https://arxiv.org/abs/2508.09203
Yuxiao Wang, Yu Lei, Wolin Liang, Weiying Xue, Zhenao Wei, Nan Zhuang, Qi Liu, 13 Aug 2025, What-Meets-Where: Unified Learning of Action and Contact Localization in a New Dataset, https://arxiv.org/abs/2508.09428
Amir Hosseinian, Ashkan Dehghani Zahedani, Umer Mansoor, Noosheen Hashemi, Mark Woodward, 13 Aug 2025, January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis, https://arxiv.org/abs/2508.09966
Grigor Bezirganyan, Sana Sellami, Laure Berti-\'Equille, S\'ebastien Fournier, 13 Aug 2025, LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data, https://arxiv.org/abs/2406.09864
Chunan Liu, Aurelien Pelissier, Yanjun Shao, Lilian Denzler, Andrew C.R. Martin, Brooks Paige and Mar\'ia Rodr\'iguez Mart\'inez, 13 Aug 2025, AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking, https://arxiv.org/abs/2506.17857
Angela John, Selvyn Allotey, Till Koebe, Alexandra Tyukavina, Ingmar Weber, 15 Aug 2025, A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts, https://arxiv.org/abs/2508.11349
Wentao Li, Yonghu He, Kun Gao, Qing Liu and Yali Zheng, 7 Aug 2025, Collaborative Learning-Enhanced Lightweight Models for Predicting Arterial Blood Pressure Waveform in a Large-scale Perioperative Dataset, https://arxiv.org/abs/2508.11669
Manish Shukla, 17 Aug 2025, Interpreting Time Series Forecasts with LIME and SHAP: A Case Study on the Air Passengers Dataset, https://arxiv.org/abs/2508.12253
Ananya Singha, Harshita Sahijwani, Walt Williams, Emmanuel Aboah Boateng, Nick Hausman, Miguel Di Luca, Keegan Choudhury, Chaya Binet, Vu Le, Tianwei Chen, Oryan Rokeah Chen, Sulaiman Vesal, Sadid Hasan, 14 Aug 2025, Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs, https://arxiv.org/abs/2508.11715
Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse, 17 Aug 2025, A Large-Scale Web Search Dataset for Federated Online Learning to Rank, https://arxiv.org/abs/2508.12353
Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos and Frank Kargl, 19 Aug 2025, Assessing Trustworthiness of AI Training Dataset using Subjective Logic -- A Use Case on Bias, https://arxiv.org/abs/2508.13813
Jonathan A. Karr Jr., Benjamin F. Herbst, Ting Hua, Matthew Hauenstein, Georgina Curto, Nitesh V. Chawla, 14 Aug 2025, Combating Homelessness Stigma with LLMs: A New Multi-Modal Dataset for Bias Detection, https://arxiv.org/abs/2508.13187
Hunter McNichols, Fareya Ikram, Andrew Lan, 19 Aug 2025, The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course, https://arxiv.org/abs/2503.07928
Anirudh Sundar, Christopher Richardson, Adar Avsian, Larry Heck, 19 Aug 2025, iTBLS: A Dataset of Interactive Conversations Over Tabular Information, https://arxiv.org/abs/2404.12580
Chinmoy Biswas, Nafis Faisal, Vivek Chowdhury, Abrar Al-Shadid Abir, Sabir Mahmud, Mithon Rahman, Shaikh Anowarul Fattah, Hafiz Imtiaz, 12 Aug 2025, Load Forecasting on A Highly Sparse Electrical Load Dataset Using Gaussian Interpolation, https://arxiv.org/abs/2508.14069
Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, and Yongjae Lee, 7 Aug 2025, FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering, https://arxiv.org/abs/2508.14052
Sujit Roy, Dinesha V. Hegde, Johannes Schmude, Amy Lin, Vishal Gaur, Rohit Lal, Kshitiz Mandal, Talwinder Singh, Andr\'es Mu\~noz-Jaramillo, Kang Yang, Chetraj Pandey, Jinsu Hong, Berkay Aydin, Ryan McGranaghan, Spiridon Kasapis, Vishal Upendran, Shah Bahauddin, Daniel da Silva, Marcus Freitag, Iksha Gurung, Nikolai Pogorelov, Campbell Watson, Manil Maskey, Juan Bernabe-Moreno, Rahul Ramachandran, 18 Aug 2025, SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction, https://arxiv.org/abs/2508.14107
Yuzhuo Li, Di Zhao, Tingrui Qiao, Yihao Wu, Bo Pang, Yun Sing Koh, 20 Aug 2025, MetaWild: A Multimodal Dataset for Animal Re-Identification with Environmental Metadata, https://arxiv.org/abs/2501.13368
Rabeeh Karimi Mahabadi, Sanjeev Satheesh, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, 20 Aug 2025, Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset, https://arxiv.org/abs/2508.15096
Manuel Serna-Aguilera, Fiona L. Goggin, Aranyak Goswami, Alexander Bucksch, Suxing Liu, Khoa Luu, 19 Aug 2025, AGP: A Novel Arabidopsis thaliana Genomics-Phenomics Dataset and its HyperGraph Baseline Benchmarking, https://arxiv.org/abs/2508.14934
Laura De Grazia, Pol Pastells, Mauro V\'azquez Chas, Desmond Elliott, Danae S\'anchez Villegas, Mireia Farr\'us, Mariona Taul\'e, 21 Aug 2025, MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos, https://arxiv.org/abs/2504.11169
Ruiqi Wu, Yuang Yao, Tengfei Ma, Chenran Zhang, Na Su, Tao Zhou, Geng Chen, Wen Fan, Yi Zhou, 22 Aug 2025, Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning, https://arxiv.org/abs/2508.16129
Andreas Loizou and Dimitrios Tsoumakos, 22 Aug 2025, Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning, https://arxiv.org/abs/2508.16255
Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, and Shinji Watanabe, 22 Aug 2025, Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet, https://arxiv.org/abs/2508.16576
Jerry Cao-Xue, Tien Comlekoglu, Keyi Xue, Guanliang Wang, Jiang Li, Gordon Laurie, 21 Aug 2025, Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset, https://arxiv.org/abs/2508.15986
Boran Zhao, Hetian Liu, Zihang Yuan, Li Zhu, Fan Yang, Lina Xie Tian Xia, Wenzhe Zhao, Pengju Ren, 19 Aug 2025, AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training, https://arxiv.org/abs/2508.16647
Syed Nazmus Sakib, Nafiul Haque, Mohammad Zabed Hossain, and Shifat E. Arman, 23 Aug 2025, PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science, https://arxiv.org/abs/2508.17117
Siying Zhou, Yiquan Wu, Hui Chen, Xavier Hu, Kun Kuang, Adam Jatowt, Ming Hu, Chunyan Zheng, Fei Wu, 24 Aug 2025, ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation, https://arxiv.org/abs/2508.17234
Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Joao Manoel Herrera Pinheiro, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker, 24 Aug 2025, A Synthetic Dataset for Manometry Recognition in Robotic Applications, https://arxiv.org/abs/2508.17468
Yan Cathy Hua, Paul Denny, J\"org Wicker, Katerina Taskova, 23 Aug 2025, EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks, https://arxiv.org/abs/2508.17008
Dakuan Lu, Xiaoyu Tan, Rui Xu, Tianchu Yao, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi, 24 Aug 2025, SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain, https://arxiv.org/abs/2501.15587
Hua Li, Shijie Lian, Zhiyuan Li, Runmin Cong, Chongyi Li, Laurence T. Yang, Weidong Zhang, Sam Kwong, 25 Aug 2025, Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation, https://arxiv.org/abs/2505.15581
Andy Bonnetto and Haozhe Qi and Franklin Leong and Matea Tashkovska and Mahdi Rad and Solaiman Shokur and Friedhelm Hummel and Silvestro Micera and Marc Pollefeys and Alexander Mathis, 25 Aug 2025, EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models, https://arxiv.org/abs/2506.01608
Mika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa P\'erez, 22 Aug 2025, Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization, https://arxiv.org/abs/2508.16200
Rafael Ayll\'on-Gavil\'an, David Guijo-Rubio, Antonio Manuel G\'omez-Orellana, David Guijo-Rubio, Francisco B\'erchez-Moreno, V\'ictor Manuel Vargas-Yun and Pedro A. Guti\'errez, 23 Jul 2025, TOC-UCO: a comprehensive repository of tabular ordinal classification datasets, https://arxiv.org/abs/2507.17348
Run-Ze Fan and Zengzhi Wang and Pengfei Liu, 22 Jul 2025, MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning, https://arxiv.org/abs/2507.16812
Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz, David Barber, 24 Jul 2025, Beyond Internal Data: Constructing Complete Datasets for Fairness Testing, https://arxiv.org/abs/2507.18561
Ruizhe Chen, Zhiting Fan, Tianze Luo, Heqing Zou, Zhaopeng Feng, Guiyang Xie, Hansheng Zhang, Zhuochen Wang, Zuozhu Liu, Huaijian Zhang, 24 Jul 2025, Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning, https://arxiv.org/abs/2507.18100
Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim, 24 Jul 2025, VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks, https://arxiv.org/abs/2407.19795
Xin Gu, Gautam Kamath, Zhiwei Steven Wu, 23 Jul 2025, Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance, https://arxiv.org/abs/2303.01256
Temiloluwa Prioleau, Baiying Lu, Yanjun Cui, 18 Jul 2025, Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions, https://arxiv.org/abs/2507.14077
Joanna Komorniczak, 20 Jul 2025, Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm, https://arxiv.org/abs/2507.15132
Zihang Ma and Qitian Yin, 21 Jul 2025, Graph Attention Specialized Expert Fusion Model for Node Classification: Based on Cora and Pubmed Datasets, https://arxiv.org/abs/2507.15784
Mohammed Alkhowaiter, Norah Alshahrani, Saied Alshahrani, Reem I. Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A. Alghamdi, Khalid Almubarak, 19 Jul 2025, Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations, https://arxiv.org/abs/2507.14688
Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, and Kuk-Jin Yoon, 21 Jul 2025, Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning, https://arxiv.org/abs/2507.04790
Bartlomiej Chybowski, Shima Abdullateef, Hollan Haule, Alfredo Gonzalez-Sulser, Javier Escudero, 10 Aug 2025, PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets, https://arxiv.org/abs/2508.07253
Cem Ata Baykara, Saurav Raj Pandey, Ali Burak \"Unal, Harlin Lee, and Mete Akg\"un, 11 Aug 2025, Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG Datasets, https://arxiv.org/abs/2508.08159
Cristian Cosentino, Annamaria Defilippo, Marco Dossena, Christopher Irwin, Sara Joubbi and Pietro Li\`o, 10 Aug 2025, HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways, https://arxiv.org/abs/2508.07308
Sarina Penquitt, Jonathan Klees, Rinor Cakaj, Daniel Kondermann, Matthias Rottmann, Lars Schmarje, 6 Aug 2025, From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets, https://arxiv.org/abs/2508.06556
Yuya Kawakami, Daniel Cayan, Dongyu Liu, and Kwan-Liu Ma, 8 Aug 2025, ClimateSOM: A Visual Analysis Workflow for Climate Ensemble Datasets, https://arxiv.org/abs/2508.06732
Sajjad Rezvani Boroujeni, Hossein Abedi, Tom Bush, 29 Jul 2025, Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control, https://arxiv.org/abs/2505.03134
Nicolas Lapautre, Maria Marchenko, Carlos Miguel Pati\~no, Xin Zhou, 14 Aug 2025, Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets, https://arxiv.org/abs/2508.10758
Yuchang Zhu, Huizhe Zhang, Bingzhe Wu, Jintang Li, Zibin Zheng, Peilin Zhao, Liang Chen, Yatao Bian, 14 Aug 2025, Measuring Diversity in Synthetic Datasets, https://arxiv.org/abs/2502.08512
Fabrizio Nunnari, Alakshendra Jyotsnaditya Ramkrishna Singh, Patrick Gebhard, 27 Jul 2025, Color histogram equalization and fine-tuning to improve expression recognition of (partially occluded) faces on sign language datasets, https://arxiv.org/abs/2507.20197
Gabriel Downer, Sean Craven, Damian Ruck, Jake Thomas, 28 Jul 2025, Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models, https://arxiv.org/abs/2507.20704
Aria Salari, Abtin Djavadifar, Xiangrui Liu, Homayoun Najjaran, 30 Jul 2025, Object Recognition Datasets and Challenges: A Review, https://arxiv.org/abs/2507.22361
Farid Ariai, Joel Mackenzie and Gianluca Demartini, 30 Jul 2025, Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges, https://arxiv.org/abs/2410.21306
Maziyar Panahi, 3 Aug 2025, OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets, https://arxiv.org/abs/2508.01630
Kenneth Enevoldsen, Kristian N{\o}rgaard Jensen, Jan Kostkan, Bal\'azs Szab\'o, M\'arton Kardos, Kirten Vad, Andrea Blasi N\'u\~nez, Gianluca Barmina, Jacob Nielsen, Rasmus Larsen, Peter Vahlstrup, Per M{\o}ldrup Dalum, Desmond Elliott, Lukas Galke, Peter Schneider-Kamp, Kristoffer Nielbo, 4 Aug 2025, Dynaword: From One-shot to Continuously Developed Datasets, https://arxiv.org/abs/2508.02271
Bhavesh Neekhra, Debayan Gupta, Partha Pratim Chakravarti, 5 Aug 2025, On the (In)Significance of Feature Selection in High-Dimensional Datasets, https://arxiv.org/abs/2508.03593
J. Alex Hurt, Trevor M. Bajkowski, Grant J. Scott, Curt H. Davis, 4 Aug 2025, Evaluation and Analysis of Deep Neural Transformers and Convolutional Neural Networks on Modern Remote Sensing Datasets, https://arxiv.org/abs/2508.02871
Wesley Brewer, Murali Meena Gopalakrishnan, Matthias Maiterth, Aditya Kashi, Jong Youl Choi, Pei Zhang, Stephen Nichols, Riccardo Balin, Miles Couchman, Stephen de Bruyn Kops, P.K. Yeung, Daniel Dotson, Rohini Uma-Vaideswaran, Sarp Oral, Feiyi Wang, 5 Aug 2025, Intelligent Sampling of Extreme-Scale Turbulence Datasets for Accurate and Efficient Spatiotemporal Model Training, https://arxiv.org/abs/2508.03872
Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li, 6 Aug 2025, Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets, https://arxiv.org/abs/2505.02118
Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter, 7 Aug 2025, Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?, https://arxiv.org/abs/2508.05474
Minwoo Oh, Minsu Park, Eunil Park, 8 Aug 2025, Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline, https://arxiv.org/abs/2504.21772
Connor Wilhelm, Dan Ventura, 12 Aug 2025, Distilling Reinforcement Learning into Single-Batch Datasets, https://arxiv.org/abs/2508.09283
Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller, 13 Aug 2025, Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?, https://arxiv.org/abs/2508.09888
Simon Kl\"uttermann, Emmanuel M\"uller, 13 Aug 2025, Rare anomalies require large datasets: About proving the existence of anomalies, https://arxiv.org/abs/2508.09894
Aishik Mandal, Prottay Kumar Adhikary, Hiba Arnaout, Iryna Gurevych, Tanmoy Chakraborty, 13 Aug 2025, A Comprehensive Survey of Datasets for Clinical Mental Health AI Systems, https://arxiv.org/abs/2508.09809
Lingyu Chen, Yawen Zeng, Yue Wang, Peng Wan, Guo-chen Ning, Hongen Liao, Daoqiang Zhang, Fang Chen, 13 Aug 2025, COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets, https://arxiv.org/abs/2508.09886
Sai Krishna Mendu, Harish Yenala, Aditi Gulati, Shanu Kumar, Parag Agrawal, 12 Aug 2025, Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs, https://arxiv.org/abs/2505.02009
Gauri Jain, Dominik Rothenh\"ausler, Kirk Bansak, Elisabeth Paulson, 15 Aug 2025, CTRL Your Shift: Clustered Transfer Residual Learning for Many Small Datasets, https://arxiv.org/abs/2508.11144
SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon, 17 Aug 2025, Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation, https://arxiv.org/abs/2412.10436
Mizuki Ohira, Toshimichi Saito, 17 Aug 2025, A Recurrent Neural Network based Clustering Method for Binary Data Sets in Education, https://arxiv.org/abs/2508.13224
Wanjun Hu, 19 Aug 2025, Typed Topological Structures Of Datasets, https://arxiv.org/abs/2508.14008
Haohang Xu, Chengjie Liu, Qihang Wang, Wenhao Huang, Yongjian Xu, Weiyu Chen, Anlan Peng, Zhijun Li, Bo Li, Lei Qi, Jun Yang, Yuan Du, and Li Du, 27 Jun 2025, Image2Net: Datasets, Benchmark and Hybrid Framework to Convert Analog Circuit Diagrams into Netlists, https://arxiv.org/abs/2508.13157
Qian Zhanga, Ruilin Zhang, Jun Xiao, Yifan Liu and Zhe Wang, 12 Aug 2025, MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets, https://arxiv.org/abs/2508.14073
Ishaan Mahapatra and Nihar R. Mahapatra, 14 Aug 2025, Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases, https://arxiv.org/abs/2508.14089
Corinna Coupette and Jeremy Wayland and Emily Simons and Bastian Rieck, 20 Aug 2025, No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets, https://arxiv.org/abs/2502.02379
Sen Yan, Chinmaya Kaundanya, Noel E. O'Connor, Suzanne Little, Mingming Liu, 22 Aug 2025, Machine Learning in Micromobility: A Systematic Review of Datasets, Techniques, and Applications, https://arxiv.org/abs/2508.16135
Sridevi Bonthu, S.Rama Sree, M.H.M. Krishna Prasad, 19 Aug 2025, Statistical Comparative Analysis of Semantic Similarities and Model Transferability Across Datasets for Short Answer Grading, https://arxiv.org/abs/2508.15837
Julian Oestreich and Lydia M\"uller, 21 Aug 2025, Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets, https://arxiv.org/abs/2508.15910
Andreas Loizou and Dimitrios Tsoumakos, 22 Aug 2025, Analytics Modelling over Multiple Datasets using Vector Embeddings, https://arxiv.org/abs/2502.17060
Aaron Rodrigues, Mahmood Hegazy and Azzam Naeem, 22 Aug 2025, Enhancing and Scaling Search Query Datasets for Recommendation Systems, https://arxiv.org/abs/2505.11176
Nikolaos Pavlidis, Vasilis Perifanis, Symeon Symeonidis, Pavlos S. Efraimidis, 24 Aug 2025, Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets, https://arxiv.org/abs/2508.17391
Prashant Gupta, 23 Aug 2025, Learning ON Large Datasets Using Bit-String Trees, https://arxiv.org/abs/2508.17083
Sarina Penquitt, Tobias Riedlinger, Timo Heller, Markus Reischl, Matthias Rottmann, 25 Aug 2025, Learning to Detect Label Errors by Making Them: A Method for Segmentation and Object Detection Datasets, https://arxiv.org/abs/2508.17930
Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl and Tobias R\"oddiger, 12 Aug 2025, WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition, https://arxiv.org/abs/2508.16604
Milad Hoseinpour, Vladimir Dvorkin, 25 Aug 2025, Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets, https://arxiv.org/abs/2506.11281
Charles Jones and Ben Glocker, 4 Sep 2025, A Primer on Causal and Statistical Dataset Biases for Fair and Robust Image Analysis, https://arxiv.org/abs/2509.04295
Adrian Catalin Lutu, Ioana Pintilie, Elena Burceanu, Andrei Manolache, 4 Sep 2025, ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset, https://arxiv.org/abs/2509.04449
Qizhou Wang, Hanxun Huang, Guansong Pang, Sarah Erfani, Christopher Leckie, 4 Sep 2025, AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds, https://arxiv.org/abs/2509.04345
Harald St\"orrle and Anastasia Hort, 19 Aug 2025, A Small Dataset May Go a Long Way: Process Duration Prediction in Clinical Settings, https://arxiv.org/abs/2509.03522
Matilde Contestabile, Chiara Ferrara, Alberto Giovannetti, Giovanni Parrillo, Andrea Vandin, 25 Aug 2025, The ProLiFIC dataset: Leveraging LLMs to Unveil the Italian Lawmaking Process, https://arxiv.org/abs/2509.03528
Iro Lim, Haein Ji, and Byungjun Kim, 4 Sep 2025, Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling, https://arxiv.org/abs/2509.03932
Junjie Wang, Yuxiang Zhang, Minghao Liu, Yin Zhang, Yatai Ji, Weihao Xuan, Nie Lin, Kang Zhu, Zhiqiang Lin, Yiming Ren, Chunyang Jiang, Yiyao Yu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Qunshu Liu, Yujiu Yang, Ge Zhang, Ruibin Yuan, Bei Chen, Wenhu Chen, 4 Sep 2025, PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents, https://arxiv.org/abs/2406.13923
Hong Ye Tan, Emma Slade, 3 Sep 2025, Dataset Distillation as Pushforward Optimal Quantization, https://arxiv.org/abs/2501.07681
Dizhan Xue, Shengsheng Qian, Chuanrui Hu, Changsheng Xu, 4 Sep 2025, Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model, https://arxiv.org/abs/2503.23746
Krittanon Kaewtawee, Wachiravit Modecrua, Krittin Pachtrachai, Touchapon Kraisingkorn, 5 Sep 2025, Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales, https://arxiv.org/abs/2509.04871
Jiequn Han, Kui Ren, Nathan Soedjak, 4 Sep 2025, Instance-Wise Adaptive Sampling for Dataset Construction in Approximating Inverse Problem Solutions, https://arxiv.org/abs/2509.04583
Mohammed Khalil, Mohammed Sabry, 4 Sep 2025, ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation, https://arxiv.org/abs/2407.19835
Rustam Tagiew (1), Ilkay Wunderlich (2), Mark Sastuba (1), Kilian G\"oller (3) and Steffen Seitz (3) ((1) German Centre for Rail Traffic Research at the Federal Railway Authority, (2) EYYES GmbH, (3) Conrad Zuse School of Embedded Composite AI and the Chair of Fundamentals of Electrical Engineering of Dresden University of Technology), 5 Sep 2025, RailGoerl24: G\"orlitz Rail Test Center CV Dataset 2024, https://arxiv.org/abs/2504.00204
Janet Wang, Xin Hu, Yunbei Zhang, Diabate Almamy, Vagamon Bamba, Konan Amos S\'ebastien Koffi, Yao Koffi Aubin, Zhengming Ding, Jihun Hamm, Rie R. Yotsu, 26 Aug 2025, eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin Diseases, https://arxiv.org/abs/2508.18608
Nowshin Sharmily, Rusab Sarmun, Muhammad E. H. Chowdhury, Mir Hamidul Hussain, Saad Bin Abul Kashem, Molla E Majid, and Amith Khandakar, 23 Aug 2025, Automated Landfill Detection Using Deep Learning: A Comparative Study of Lightweight and Custom Architectures with the AerialWaste Dataset, https://arxiv.org/abs/2508.18315
Jaehwan Jeong, Tuan-Anh Vu, Mohammad Jony, Shahab Ahmad, Md. Mukhlesur Rahman, Sangpil Kim, M. Khalid Jawed, 26 Aug 2025, AgriChrono: A Multi-modal Dataset Capturing Crop Growth and Lighting Variability with a Field Robot, https://arxiv.org/abs/2508.18694
Ilias Driouich, Hongliu Cao, Eoin Thomas, 26 Aug 2025, Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework, https://arxiv.org/abs/2508.18929
Soumen Ghosh, Christine Jestin Hannan, Rajat Vashistha, Parveen Kundu, Sandra Brosda, Lauren G.Aoude, James Lonie, Andrew Nathanson, Jessica Ng, Andrew P. Barbour, Viktor Vegh, 26 Aug 2025, Stress-testing cross-cancer generalizability of 3D nnU-Net for PET-CT tumor segmentation: multi-cohort evaluation with novel oesophageal and lung cancer datasets, https://arxiv.org/abs/2508.18612
Ri Su, Zhao Chen, Caleb Chen Cao, Nan Tang, Lei Chen, 27 Aug 2025, SCAR: A Characterization Scheme for Multi-Modal Dataset, https://arxiv.org/abs/2508.19659
Sumon Kanti Dey, Jeanne M. Powell, Azra Ismail, Jeanmarie Perrone, Abeed Sarker, 26 Aug 2025, Inference Gap in Domain Expertise and Machine Intelligence in Named Entity Recognition: Creation of and Insights from a Substance Use-related Dataset, https://arxiv.org/abs/2508.19467
Anusha Kamath, Kanishk Singla, Rakesh Paul, Raviraj Joshi, Utkarsh Vaidya, Sanjay Singh Chauhan, Niranjan Wartikar, 27 Aug 2025, Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis, https://arxiv.org/abs/2508.19831
Aakash Tripathi, Asim Waqas, Matthew B. Schabath, Yasin Yilmaz, Ghulam Rasool, 27 Aug 2025, HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models, https://arxiv.org/abs/2405.07460
Shuo Shao, Yiming Li, Mengren Zheng, Zhiyang Hu, Yukun Chen, Boheng Li, Yu He, Junfeng Guo, Dacheng Tao, Zhan Qin, 27 Aug 2025, DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective, https://arxiv.org/abs/2507.05622
Yijia Guo and Junqing Zhang and Y.-W. Peter Hong, 28 Aug 2025, Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach, https://arxiv.org/abs/2508.20861
Ali Ramlaoui, Martin Siron, Inel Djafar, Joseph Musielewicz, Amandine Rossello, Victor Schmidt, Alexandre Duval, 28 Aug 2025, LeMat-Traj: A Scalable and Unified Dataset of Materials Trajectories for Atomistic Modeling, https://arxiv.org/abs/2508.20875
Seunghyeon Jung, Seoyoung Hong, Jiwoo Jeong, Seungwon Jeong, Jaerim Choi, Hoki Kim, Woojin Lee, 28 Aug 2025, CaddieSet: A Golf Swing Dataset with Human Joint Features and Ball Information, https://arxiv.org/abs/2508.20491
Hashim Ali, Surya Subramani, Lekha Bollinani, Nithin Sai Adupa, Sali El-Loh, Hafiz Malik, 28 Aug 2025, Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System, https://arxiv.org/abs/2508.20983
Lianpeng Qiao, Ziqi Cao, Kaiyu Feng, Ye Yuan, Guoren Wang, 28 Aug 2025, Graph-Based Feature Augmentation for Predictive Tasks on Relational Datasets, https://arxiv.org/abs/2508.20986
Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park, 28 Aug 2025, OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset, https://arxiv.org/abs/2301.06375
Jo\~ao Valente, Atabak Dehban, Rodrigo Ventura, 29 Aug 2025, CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models, https://arxiv.org/abs/2508.21732
Aishwarya Mirashi, Ananya Joshi, Raviraj Joshi, 29 Aug 2025, L3Cube-MahaSTS: A Marathi Sentence Similarity Dataset and Models, https://arxiv.org/abs/2508.21569
Nidhi Kowtal, Raviraj Joshi, 29 Aug 2025, L3Cube-MahaEmotions: A Marathi Emotion Recognition Dataset with Synthetic Annotations using CoTR prompting and Large Language Models, https://arxiv.org/abs/2506.00863
Tung Nguyen, Harkanwar Singh, Nilay Naharas, Lucas Bandarkar, Aditya Grover, 31 Aug 2025, IndiaWeatherBench: A Dataset and Benchmark for Data-Driven Regional Weather Forecasting over India, https://arxiv.org/abs/2509.00653
Hirofumi Tsuruta, Masaya Kumagai, 1 Sep 2025, MatPROV: A Provenance Graph Dataset of Material Synthesis Extracted from Scientific Literature, https://arxiv.org/abs/2509.01042
Smayan Khanna, Doruk Efe G\"okmen, Risi Kondor, Vincenzo Vitelli, 1 Sep 2025, Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size, https://arxiv.org/abs/2509.01541
Austin Meek, Carlos H. Mendoza-Cardenas, and Austin J. Brockmeier, 1 Sep 2025, Convolutional Monge Mapping between EEG Datasets to Support Independent Component Labeling, https://arxiv.org/abs/2509.01721
Han Chen, Hanchen Wang, Hongmei Chen, Ying Zhang, Lu Qin, Wenjie Zhang, 2 Sep 2025, HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis, https://arxiv.org/abs/2509.02113
Tongtong Feng, Xin Wang, Feilin Han, Leping Zhang, Wenwu Zhu, 25 Aug 2025, U2UData-2: A Scalable Swarm UAVs Autonomous Flight Dataset for Long-horizon Tasks, https://arxiv.org/abs/2509.00055
Kun Qiu, Ying Wang, Baoqian Li, Wenjun Zhu, 31 Aug 2025, Unsupervised Dataset Cleaning Framework for Encrypted Traffic Classification, https://arxiv.org/abs/2509.00701
Artur D\'iaz-Juan, Coloma Ballester, Gloria Haro, 1 Sep 2025, SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization, https://arxiv.org/abs/2509.01439
Seungkyu Lee, Nalim Kim, Yohan Jo, 1 Sep 2025, In-N-Out: A Parameter-Level API Graph Dataset for Tool Agents, https://arxiv.org/abs/2509.01560
Nishant Tanksale, Tanmay Kokate, Darshan Gohad, Sarvadnyaa Barate, Raviraj Joshi, 2 Sep 2025, L3Cube-IndicHeadline-ID: A Dataset for Headline Identification and Semantic Evaluation in Low-Resource Indian Languages, https://arxiv.org/abs/2509.02503
Hallee E. Wong and Jose Javier Gonzalez Ortiz and John Guttag and Adrian V. Dalca, 31 Aug 2025, MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance, https://arxiv.org/abs/2412.15058
Yanlin Zhang, Sungyong Chung, Nachuan Li, Dana Monzer, Hani S. Mahmassani, Samer H. Hamdar, and Alireza Talebpour, 3 Sep 2025, Can the Waymo Open Motion Dataset Support Realistic Behavioral Modeling? A Validation Study with Naturalistic Trajectories, https://arxiv.org/abs/2509.03515
Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S\'anchez-Valverde, Lara Jaques-P\'erez, Lourdes P\'erez-Rodr\'iguez, Kenji Takeda, Jos\'e Mar\'ia Salinas, Javier Alvarez-Valle, Joaqu\'in Galant Herrero, Antonio Pertusa, 3 Sep 2025, PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation, https://arxiv.org/abs/2411.05085
Braeden Sherritt, Isar Nejadgholi, Efstratios Aivaliotis, Khaled Mslmani and Marzieh Amini, 3 Sep 2025, WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada, https://arxiv.org/abs/2504.13231
Liming Xu and Yunbo Long and Alexandra Brintrup, 30 Aug 2025, SynDelay: A Synthetic Dataset for Delivery Delay Prediction, https://arxiv.org/abs/2509.05325
Seyed Muhammad Hossein Mousavi, Atiye Ilanloo, 31 Aug 2025, MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset, https://arxiv.org/abs/2509.05330
Honggang Jia, Xiucheng Wang, Nan Cheng, Ruijin Sun, Changle Li, 8 Sep 2025, UrbanMIMOMap: A Ray-Traced MIMO CSI Dataset with Precoding-Aware Maps and Benchmarks, https://arxiv.org/abs/2509.06270
Yunfei Guo, Tao Zhang, Wu Huang, Yao Song, 30 Aug 2025, A Dataset Generation Scheme Based on Video2EEG-SPGN-Diffusion for SEED-VD, https://arxiv.org/abs/2509.05321
Youssef Chakir and Iyad Lahsen-Cherif, 31 Aug 2025, ForensicsData: A Digital Forensics Dataset for Large Language Models, https://arxiv.org/abs/2509.05331
Ahad Jawaid, Yu Xiang, 5 Sep 2025, OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation, https://arxiv.org/abs/2509.05513
Leo Ho, Yinghao Huang, Dafei Qin, Mingyi Shi, Wangpok Tse, Wei Liu, Junichi Yamagishi, Taku Komura, 6 Sep 2025, InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios, https://arxiv.org/abs/2509.05747
Phongsakon Mark Konrad, Andrei-Alexandru Popa, Yaser Sabzehmeidani, Liang Zhong, Elisa A. Liehn, Serkan Ayvaz, 7 Sep 2025, Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets, https://arxiv.org/abs/2509.05892
Omkar Prabhu, 7 Sep 2025, Khana: A Comprehensive Indian Cuisine Dataset, https://arxiv.org/abs/2509.06006
Valentin Quesnel and Damien Sileo, 8 Sep 2025, Saturation-Driven Dataset Generation for LLM Mathematical Reasoning in the TPTP Ecosystem, https://arxiv.org/abs/2509.06809
Jinrui Yang, Timothy Baldwin, Trevor Cohn, 3 Nov 2023, Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval, https://arxiv.org/abs/2311.01870
Zhyar Rzgar K Rostam and G\'abor Kert\'esz, 7 Sep 2025, Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting, https://arxiv.org/abs/2504.19021
Peter Mortimer, Raphael Hagmanns, Miguel Granero, Thorsten Luettel, Janko Petereit, Hans-Joachim Wuensche, 8 Sep 2025, The GOOSE Dataset for Perception in Unstructured Environments, https://arxiv.org/abs/2310.16788
Nicholas Sung, Steven Spreizer, Mohamed Elrefaie, Kaira Samuel, Matthew C. Jones, and Faez Ahmed, 8 Sep 2025, BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aerodynamic Predictions, https://arxiv.org/abs/2509.07209
Cedric Caruzzo, Jong Chul Ye, 2 Sep 2025, CellPainTR: Generalizable Representation Learning for Cross-Dataset Cell Painting Analysis, https://arxiv.org/abs/2509.06986
Amelia Kovacs, Jerry Chee, Kimia Kazemian, Sarah Dean, 8 Sep 2025, Datasets for Navigating Sensitive Topics in Recommendation Systems, https://arxiv.org/abs/2509.07269
Gianluca Amprimo, Alberto Ancilotto, Alessandro Savino, Fabio Quazzolo, Claudia Ferraris, Gabriella Olmo, Elisabetta Farella, Stefano Di Carlo, 9 Sep 2025, EHWGesture -- A dataset for multimodal understanding of clinical gestures, https://arxiv.org/abs/2509.07525
Seyd Teymoor Seydi, 9 Sep 2025, Deep Learning-Based Burned Area Mapping Using Bi-Temporal Siamese Networks and AlphaEarth Foundation Datasets, https://arxiv.org/abs/2509.07852
Neeshu Rathi, Sanjeev Kumar, 8 Sep 2025, A Quantum Bagging Algorithm with Unsupervised Base Learners for Label Corrupted Datasets, https://arxiv.org/abs/2509.07040
Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma, 8 Sep 2025, Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges, https://arxiv.org/abs/2507.02074
Tong Chen, Raghavendra Selvan, 12 Sep 2025, A Discrepancy-Based Perspective on Dataset Condensation, https://arxiv.org/abs/2509.10367
Bruno Yui Yamate, Thais Rodrigues Neubauer, Marcelo Fantinato, Sarajane Marques Peres, 18 Aug 2025, Text-to-SQL Oriented to the Process Mining Domain: A PT-EN Dataset for Query Translation, https://arxiv.org/abs/2509.09684
Kaikai Zhao, Zhaoxiang Liu, Peng Wang, Xin Wang, Zhicheng Ma, Yajun Xu, Wenjing Zhang, Yibing Nan, Kai Wang, Shiguo Lian, 10 Sep 2025, MITS: A Large-Scale Multimodal Benchmark Dataset for Intelligent Traffic Surveillance, https://arxiv.org/abs/2509.09730
Ying Yuan, Xing-Yue Monica Ge, Aaron Archer Waterman, Tommaso Biancalani, David Richmond, Yogesh Pandit, Avtar Singh, Russell Littman, Jin Liu, Jan-Christian Huetter, Vladimir Ermakov, 10 Sep 2025, HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets, https://arxiv.org/abs/2509.09740
Utsab Saha, Tanvir Muntakim Tonoy, and Hafiz Imtiaz, 12 Sep 2025, Differentially Private Decentralized Dataset Synthesis Through Randomized Mixing with Correlated Noise, https://arxiv.org/abs/2509.10385
Emily Kaczmarek, Justin Szeto, Brennan Nichyporuk, Tal Arbel, 12 Sep 2025, SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets, https://arxiv.org/abs/2509.10453
Tong Chen, Raghavendra Selvan, 12 Sep 2025, Is Adversarial Training with Compressed Datasets Effective?, https://arxiv.org/abs/2402.05675
Marianna Nezhurina and J\"org Franke and Taishi Nakamura and Timur Carstensen and Niccol\`o Ajroldi and Ville Komulainen and David Salinas and Jenia Jitsev, 12 Sep 2025, Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison, https://arxiv.org/abs/2509.09009
Maria Risques and Kratika Bhagtani and Amit Kumar Singh Yadav and Edward J. Delp, 11 Sep 2025, HISPASpoof: A New Dataset For Spanish Speech Forensics, https://arxiv.org/abs/2509.09155
Cynthia Moreira Maia, Lucas B. V. de Amorim, George D. C. Cavalcanti, and Rafael M. O. Cruz, 11 Sep 2025, PIPES: A Meta-dataset of Machine Learning Pipelines, https://arxiv.org/abs/2509.09512
Lei Wang, Piotr Koniusz, Yongsheng Gao, 11 Sep 2025, Video Understanding by Design: How Datasets Shape Architectures and Insights, https://arxiv.org/abs/2509.09151
Doha Nam, Taehyoun Kim, Duksan Ryu, Jongmoon Baik, 11 Sep 2025, Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset, https://arxiv.org/abs/2509.09192
Victor Livernoche, Akshatha Arodi, Andreea Musulan, Zachary Yang, Adam Salvail, Ga\'etan Marceau Caron, Jean-Fran\c{c}ois Godbout, Reihaneh Rabbany, 11 Sep 2025, OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection, https://arxiv.org/abs/2509.09495
Meghan Wilkinson and Robert H Thomson, 11 Sep 2025, What Does Normal Even Mean? Evaluating Benign Traffic in Intrusion Detection Datasets, https://arxiv.org/abs/2509.09564
Kordel K. France, Ovidiu Daescu, 11 Sep 2025, Diffusion Graph Neural Networks for Robustness in Olfaction Sensors and Datasets, https://arxiv.org/abs/2506.00455
Henning H\"ofener (1), Farina Kock (1), Martina Pontones (2), Tabita Ghete (2 and 3), David Pfrang (1), Nicholas Dickel (4), Meik Kunz (4), Daniela P. Schacherer (1), David A. Clunie (5), Andrey Fedorov (6), Max Westphal (1), Markus Metzler (2 and 3 and 7) ((1) Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany, (2) Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany, (3) Bavarian Cancer Research Center (BZKF), Erlangen, Germany, (4) Medical Informatics, Friedrich-Alexander University of Erlangen-N\"urnberg, Erlangen, Germany, (5) PixelMed Publishing LLC, Bangor, PA, USA, (6) Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA, (7) Comprehensive Cancer Center Erlangen-EMN, Erlangen, Germany), 19 Sep 2025, From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction, https://arxiv.org/abs/2509.15895
Shubham Kavane, Kajol Kulkarni, Harald Koestler, 17 Sep 2025, ChannelFlow-Tools: A Standardized Dataset Creation Pipeline for 3D Obstructed Channel Flows, https://arxiv.org/abs/2509.15236
Nomi Yu (1), Md Ferdous Alam (1), A. John Hart (1), and Faez Ahmed (1) ((1) Massachusetts Institute of Technology), 17 Sep 2025, GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing, https://arxiv.org/abs/2509.15246
Benedikt W. Hosp, 19 Sep 2025, FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets, https://arxiv.org/abs/2408.03591
Michael Galarnyk, Rutwik Routu, Vidhyakshaya Kannan, Kosha Bheda, Prasun Banerjee, Agam Shah, Sudheer Chava, 19 Sep 2025, ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses, https://arxiv.org/abs/2408.04675
Hanjun Luo, Yingbin Jin, Xinfeng Li, Xuecheng Liu, Ruizhe Chen, Tong Shang, Kun Wang, Qingsong Wen, Zuozhu Liu, 19 Sep 2025, DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition, https://arxiv.org/abs/2409.11022
Maximus Powers, Shaina Raza, Alex Chang, Rehana Riaz, Umang Mavani, Harshitha Reddy Jonala, Ansh Tiwari, Hua Wei, 15 Sep 2025, Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes, https://arxiv.org/abs/2410.08388
Rohan Tan Bhowmik, Youn Soo Jung, Juan Aguilera, Mary Prunicki, Kari Nadeau, 14 Sep 2025, California Wildfire Inventory (CAWFI): An Extensive Dataset for Predictive Techniques based on Artificial Intelligence, https://arxiv.org/abs/2509.11015
Farbod Bijary, Mohsen Ebadpour, Amirhosein Tajbakhsh, 14 Sep 2025, Agentic Username Suggestion and Multimodal Gender Detection in Online Platforms: Introducing the PNGT-26K Dataset, https://arxiv.org/abs/2509.11136
Yonghao Weng and Liqiang Gao and Linwu Zhu and Jian Huang, 14 Sep 2025, MatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis, https://arxiv.org/abs/2509.11335
Loka Li, Wong Yu Kang, Minghao Fu, Guangyi Chen, Zhenhao Chen, Gongxu Luo, Yuewen Sun, Salman Khan, Peter Spirtes, Kun Zhang, 14 Sep 2025, PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits, https://arxiv.org/abs/2509.11362
Grigori Fursin and Daniel Altunay, 14 Sep 2025, Framing AI System Benchmarking as a Learning Task: FlexBench and the Open MLPerf Dataset, https://arxiv.org/abs/2509.11413
Zhizheng Wang, Yifan Yang, Qiao Jin, Zhiyong Lu, 11 Sep 2025, Gene-R1: Reasoning with Data-Augmented Lightweight LLMs for Gene Set Analysis, https://arxiv.org/abs/2509.10575
Haiyu Yang, Enhong Liu, Jennifer Sun, Sumit Sharma, Meike van Leerdam, Sebastien Franceschini, Puchun Niu, Miel Hostens, 15 Sep 2025, A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset, https://arxiv.org/abs/2509.12047
Rodrigo M. Carrillo-Larco, Jesus Lov\'on Melgarejo, Manuel Castillo-Cara, Gusseppe Bravo-Rocca, 15 Sep 2025, PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams - Dataset Construction and Evaluation, https://arxiv.org/abs/2509.11517
Mikhail Kulyabin, Jan Joosten, Choro Ulan uulu, Nuno Miguel Martins Pacheco, Fabian Ries, Filippos Petridis, Jan Bosch, and Helena Holmstr\"om Olsson, 15 Sep 2025, User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums, https://arxiv.org/abs/2509.11777
Daniel Lepe-Soltero, Thierry Arti\`eres, Ana\"is Baudot, Paul Villoutreix, 15 Sep 2025, MODIS: Multi-Omics Data Integration for Small and unpaired datasets, https://arxiv.org/abs/2503.18856
Christian Intern\`o, Andrea Castellani, Sebastian Schmitt, Fabio Stella, Barbara Hammer, 15 Sep 2025, Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation, https://arxiv.org/abs/2506.20525
Julian Junyan Wang, Victor Xiaoqi Wang, 14 Sep 2025, Leveraging Large Language Models to Democratize Access to Costly Datasets for Academic Research, https://arxiv.org/abs/2412.02065
Amy Rafferty, Rishi Ramaesh, Ajitha Rajan, 18 Sep 2025, Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges, https://arxiv.org/abs/2509.15107
Happymore Masoka, 10 Sep 2025, Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion, https://arxiv.org/abs/2509.14249
Roman Kovalchuk, Mariana Romanyshyn, Petro Ivaniuk, 18 Sep 2025, Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction, https://arxiv.org/abs/2509.14504
Luca Rolshoven, Vishvaksenan Rasiah, Srinanda Br\"ugger Bose, Sarah Hostettler, Lara Burkhalter, Matthias St\"urmer, Joel Niklaus, 18 Sep 2025, Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland, https://arxiv.org/abs/2410.13456
Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal, 18 Sep 2025, "What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets, https://arxiv.org/abs/2506.21532
Christopher Wiedeman, Anastasiia Sarmakeeva, Elena Sizikova, Daniil Filienko, Miguel Lago, Jana G. Delfino, Aldo Badano, 18 Sep 2025, T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images, https://arxiv.org/abs/2507.04038
Woohyun Cho and Youngmin Kim and Sunghyun Lee and Youngjae Yu, 18 Sep 2025, MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation, https://arxiv.org/abs/2505.18614
Bingjian Yang, Danni Xu, Kaipeng Niu, Wenxuan Liu, Zheng Wang, Mohan Kankanhalli, 8 Sep 2025, A New Dataset and Benchmark for Grounding Multimodal Misinformation, https://arxiv.org/abs/2509.08008
Yonghyun Kim, Junhyung Park, Joonhyung Bae, Kirak Kim, Taegyun Kwon, Alexander Lerch, Juhan Nam, 10 Sep 2025, PianoVAM: A Multimodal Piano Performance Dataset, https://arxiv.org/abs/2509.08800
Rafa{\l} Osadnik, Pablo G\'omez, Eleni Bohacek, Rickbir Bahia, 9 Sep 2025, MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery, https://arxiv.org/abs/2509.08027
Shambhavi Krishna, Atharva Naik, Chaitali Agarwal, Sudharshan Govindan, Taesung Lee, Haw-Shiuan Chang, 17 Sep 2025, Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning, https://arxiv.org/abs/2509.13624
Adel ElZemity, Budi Arief and Shujun Li, 17 Sep 2025, CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning, https://arxiv.org/abs/2503.09334
Rajvee Sheth, Himanshu Beniwal, Mayank Singh, 17 Sep 2025, COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing, https://arxiv.org/abs/2503.21670
Sean Michael Kerner, October 17, 2025, World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video, https://venturebeat.com/data-infrastructure/worlds-largest-open-source-multimodal-dataset-delivers-17x-training

Synthetic Data

Research paper on LLM-generated synthetic data for training:

Skurzhanskyi, O.H., Marchenko, O.O. & Anisimov, A.V., 2024, Specialized Pre-Training of Neural Networks on Synthetic Data for Improving Paraphrase Generation. Cybern Syst Anal 2024 https://doi.org/10.1007/s10559-024-00658-7 https://link.springer.com/article/10.1007/s10559-024-00658-7
Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly, 29 Jan 2024, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380
André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster, 4 Jan 2024, Comprehensive Exploration of Synthetic Data Generation: A Survey https://arxiv.org/abs/2401.02524
Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
David Spuler, March 2024, Chapter 45. Knowledge Distillation, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
A Gudibande, E Wallace, C Snell, X Geng, H Liu 2023, The false promise of imitating proprietary llms, https://arxiv.org/abs/2305.15717
Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang 2023, Aligning large language models with human: A survey, https://arxiv.org/abs/2307.12966
Y Gu, L Dong, F Wei, M Huang, 2023, Knowledge Distillation of Large Language Models, https://arxiv.org/abs/2306.08543
X Wan, R Sun, H Dai, SO Arik, T Pfister, 2023, Better zero-shot reasoning with self-adaptive prompting, https://arxiv.org/abs/2305.14106
S Horawalavithana, S Munikoti, I Stewart, 2023, SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions, https://arxiv.org/abs/2307.01139
X Daull, P Bellot, E Bruno, V Martin, 2023, Complex QA and language models hybrid architectures, Survey, https://arxiv.org/abs/2302.09051
Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou, 2023, Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation, https://arxiv.org/abs/2308.01240
W AlShikh, M Daaboul, K Goddard, B Imel, 2023, Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning, https://arxiv.org/abs/2307.03692
Z He, Z Xie, R Jha, H Steck, D Liang, Y Feng, 2023, Large Language Models as Zero-Shot Conversational Recommenders, https://arxiv.org/abs/2308.10053
NVIDIA, June 2024, Nemotron-4 340B Technical Report, https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf (Architecture is decoder-only with GQA, SentencePiece tokenizer, causal attention masks, RoPE, 96 layers, 96 heads, 8 KV heads, 256,000 vocabulary, 18432 internal dimension, context window 4096, and uses squared RELU.)
Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
Louie Peters, Aug 27, 2024, Two Paths to Small LMs? Synthetic Data (Phi 3.5) vs Pruning & Distillation (Llama-3.1-Minitron), https://newsletter.towardsai.net/p/114-two-paths-to-small-lms-synthetic
Aatish Bhatia, Aug. 25, 2024, When A.I.’s Output Is a Threat to A.I. Itself: As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results, NY Times, https://www.nytimes.com/interactive/2024/08/26/upshot/ai-synthetic-data.html
Shumailov, I., Shumaylov, Z., Zhao, Y. et al. 2024, AI models collapse when trained on recursively generated data. Nature 631, 755–759. https://doi.org/10.1038/s41586-024-07566-y https://www.nature.com/articles/s41586-024-07566-y
Damien Ferbach, Quentin Bertrand, Avishek Joey Bose, Gauthier Gidel, 12 Jun 2024, Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences, https://arxiv.org/abs/2407.09499
Ryan McNeal, Aug 27, 2024, ChatGPT and GPT-4 could get a sweet upgrade this fall with 'strawberry', https://www.androidauthority.com/openai-strawberry-ai-3475682/
Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai, 10 Aug 2024 (v2), Best Practices and Lessons Learned on Synthetic Data, https://arxiv.org/abs/2404.07503
Georgia Argyro, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou, 10 Sep 2024, Prompt2Fashion: An automatically generated fashion dataset, https://arxiv.org/abs/2409.06442
Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli, 12 Sep 2024, Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources, https://arxiv.org/abs/2409.08239
Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi, 29 Aug 2024, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus, 3 Oct 2024, Training Language Models on Synthetic Edit Sequences Improves Code Synthesis, https://arxiv.org/abs/2410.02749
Ke Wang, Jiahui Zhu, Minjie Ren, Zeming Liu, Shiwei Li, Zongye Zhang, Chenkai Zhang, Xiaoyu Wu, Qiqi Zhan, Qingjie Liu, Yunhong Wang, 16 Oct 2024, A Survey on Data Synthesis and Augmentation for Large Language Models, https://arxiv.org/abs/2410.12896
Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He, 23 Oct 2024, SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains, https://arxiv.org/abs/2410.17952
Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, (and many more authors), 4 Nov 2024, Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, https://arxiv.org/abs/2411.02265 https://github.com/Tencent/Hunyuan-Large https://huggingface.co/tencent/Tencent-Hunyuan-Large
Arindam Mitra , Ahmed Awadallah , Yash Lara , November 14, 2024, Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/orca-agentinstruct-agentic-flows-can-be-effective-synthetic-data-generators/
Seungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin Lawrence, Sean Welleck, Graham Neubig, 4 Dec 2024, Evaluating Language Models as Synthetic Data Generators, https://arxiv.org/abs/2412.03679
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang, Yuzhong Qu, 27 Dec 2024, TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data, https://arxiv.org/abs/2412.19544?
Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
FZ Subah, Oct 2025, Mitigating and Assessing Bias and Fairness in Large Language Model-Generated Synthetic Tabular Data, Masters Thesis, Department of Engineering, University of Cambridge, https://www.mlmi.eng.cam.ac.uk/files/2023-2024/fzs21_mitigating_2024.pdf
Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, Sai Akhil Puranam, Shashishekar Ramakrishna, Jan 2025, Synthetic Data Generation Using Large Language Models for Financial Question Answering, Proceedings of the Joint Workshop of the 9th FinNLP, the 6th FNP, and the 1st LLMFinLegal, pages 76–95 January 19–20, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.finnlp-1.7.pdf
Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen, 25 Jan 2025, LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion, https://arxiv.org/abs/2501.15089
Minsang Kim, Seungjun Baek, 6 Feb 2025, Syntriever: How to Train Your Retriever with Synthetic Data from LLMs, https://arxiv.org/abs/2502.03824
Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
Joshua Ong Jun Leang, Giwon Hong, Wenda Li, Shay B. Cohen, 18 Feb 2025, Theorem Prover as a Judge for Synthetic Data Generation, https://arxiv.org/abs/2502.13137
Maria Korolov, Jun 25, 2025, 7 ways synthetic data creates business value, https://www.cio.com/article/4003262/7-ways-synthetic-data-creates-business-value.html
Ali Zolnour, Hossein Azadmaleki, Yasaman Haghbin, Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sina Rashidi, Masoud Khani, AmirSajjad Taleban, Samin Mahdizadeh Sani, Maryam Dadkhah, James M. Noble, Suzanne Bakken, Yadollah Yaghoobzadeh, Abdol-Hossein Vahabie, Masoud Rouhizadeh, Maryam Zolnoori, 8 Aug 2025, LLMCARE: Alzheimer's Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data, https://arxiv.org/abs/2508.10027
Nitin Rai, Nathan S. Boyd, Gary E. Vallad, Arnold W. Schumann, 13 Aug 2025, Improving watermelon (Citrullus lanatus) disease classification with generative artificial intelligence (GenAI)-based synthetic and real-field images via a custom EfficientNetV2-L model, https://arxiv.org/abs/2508.10156
Yuchang Zhu, Huizhe Zhang, Bingzhe Wu, Jintang Li, Zibin Zheng, Peilin Zhao, Liang Chen, Yatao Bian, 14 Aug 2025, Measuring Diversity in Synthetic Datasets, https://arxiv.org/abs/2502.08512
Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng, 22 Jul 2025, Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation, https://arxiv.org/abs/2507.17066
\'Alvaro Ruiz-R\'odenas, Jaime Pujante S\'aez, Daniel Garc\'ia-Algora, Mario Rodr\'iguez B\'ejar, Jorge Blasco and Jos\'e Luis Hern\'andez-Ramos, 21 Jul 2025, SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping, https://arxiv.org/abs/2507.16852
Rishemjit Kaur, Arshdeep Singh Bhankhar, Surangika Ranathunga, Jashanpreet Singh Salh, Sudhir Rajput, Vidhi, Kashish Mahendra, Bhavika Berwal, Ritesh Kumar, 22 Jul 2025, Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain, https://arxiv.org/abs/2507.16974
Yifan Wang, Runjin Chen, Bolian Li, David Cho, Yihe Deng, Ruqi Zhang, Tianlong Chen, Zhangyang Wang, Ananth Grama, Junyuan Hong, 22 Jul 2025, More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment, https://arxiv.org/abs/2504.02193
Shreya Saxena, Siva Prasad, Zishan Ahmad, Vishal Vaddina, 22 Jul 2025, ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training, https://arxiv.org/abs/2507.16478
Ivona Krchova, Michael Platzer, Paul Tiwald, 22 Jul 2025, Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling, https://arxiv.org/abs/2507.16419
Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau, 22 Jul 2025, T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs, https://arxiv.org/abs/2507.10183
Hoyeon Lee, Sejung Son, Ye-Eun Kang, Jong-Hwan Kim, 24 Jul 2025, Synthetic Data Generation for Phrase Break Prediction with Large Language Model, https://arxiv.org/abs/2507.18044
Basel Alshaikhdeeb, Ahmed Abdelmonem Hemedan, Soumyabrata Ghosh, Irina Balaur, and Venkata Satagopam, 24 Jul 2025, Generation of Synthetic Clinical Text: A Systematic Review, https://arxiv.org/abs/2507.18451
Zhengyun Zhao, Huaiyuan Ying, Yue Zhong, Sheng Yu, 24 Jul 2025, DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data, https://arxiv.org/abs/2507.18583
Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning, https://arxiv.org/abs/2507.18616
Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim, 24 Jul 2025, SIDA: Synthetic Image Driven Zero-shot Domain Adaptation, https://arxiv.org/abs/2507.18632
Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng, 24 Jul 2025, Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs, https://arxiv.org/abs/2507.18055
Yefeng Yuan, Yuhong Liu, Liang Cheng, 24 Jul 2025, A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models, https://arxiv.org/abs/2404.14445
Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp, 24 Jul 2025, Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation, https://arxiv.org/abs/2506.11790
Keito Inoshita, Rushia Harada, 15 Jul 2025, Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition, https://arxiv.org/abs/2507.13380
Junsu Kim, Yunhoe Ku, Seungryul Baek, 18 Jul 2025, Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning, https://arxiv.org/abs/2507.13739
Matthew A. Chan, Casey J. Pellizzari, Christopher A. Metzler, 17 Jul 2025, Inverse Synthetic Aperture Fourier Ptychography, https://arxiv.org/abs/2507.03733
Claudio Giusti, Luca Guarnera, Mirko Casu, Sebastiano Battiato, 19 Jul 2025, Fraud is Not Just Rarity: A Causal Prototype Attention Approach to Realistic Synthetic Oversampling, https://arxiv.org/abs/2507.14706
Anh Nguyen, Sam Schafft, Nicholas Hale, John Alfaro, 21 Jul 2025, FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs, https://arxiv.org/abs/2507.15839
Pan Peng, Hangyu Xu, 20 Jul 2025, Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts, https://arxiv.org/abs/2507.14835
Zijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong, 19 Jul 2025, Iceberg: Enhancing HLS Modeling with Synthetic Data, https://arxiv.org/abs/2507.09948
Rohit Kundu, Shan Jia, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury, 19 Jul 2025, TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data, https://arxiv.org/abs/2503.15867
Yewon Byun, Shantanu Gupta, Zachary C. Lipton, Rachel Leah Childers, Bryan Wilder, 8 Aug 2025, Using Imperfect Synthetic Data in Downstream Inference Tasks, https://arxiv.org/abs/2508.06635
Andrey Sidorenko and Paul Tiwald, 8 Aug 2025, Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN, https://arxiv.org/abs/2508.06647
Sabrina Namazova, Alessandra Brondetta, Younes Strittmatter, Matthew Nassar, Sebastian Musslick, 11 Aug 2025, Not Yet AlphaFold for the Mind: Evaluating Centaur as a Synthetic Participant, https://arxiv.org/abs/2508.07887
Raunak Narwal and Syed Abbas, 10 Aug 2025, BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics, https://arxiv.org/abs/2508.07239
Ethan Lo and Dan C. Lo, 18 Jul 2025, Exoplanet Detection Using Machine Learning Models Trained on Synthetic Light Curves, https://arxiv.org/abs/2507.19520
Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Zexue He, Shafiq Abedin, Jennifer Sun, Ben Wiesel, Eli Schwartz, Ahmed Nassar, Bo Wu, Assaf Arbelle, Aude Oliva, Dan Gutfreund, Leonid Karlinsky, Rogerio Feris, 31 May 2025, ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation, https://arxiv.org/abs/2507.19492
Tao Lian, Jose L. G\'omez, Antonio M. L\'opez, 26 Jul 2025, FedS2R: One-Shot Federated Domain Generalization for Synthetic-to-Real Semantic Segmentation in Autonomous Driving, https://arxiv.org/abs/2507.19881
Pavel Korshunov, Ketan Kotwal, Christophe Ecabert, Vidit Vidit, Amir Mohammadi, and Sebastien Marcel, 28 Jul 2025, Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data, https://arxiv.org/abs/2507.20782
Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, 25 Jul 2025, Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task, https://arxiv.org/abs/2310.09336
Yixin Wu, Feiran Zhang, Tianyuan Shi, Ruicheng Yin, Zhenghua Wang, Zhenliang Gan, Xiaohua Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 28 Jul 2025, Explainable Synthetic Image Detection through Diffusion Timestep Ensembling, https://arxiv.org/abs/2503.06201
Satyananda Kashyap, Sola Shirai, Nandana Mihindukulasooriya, Horst Samulowitz, 28 Jul 2025, StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation, https://arxiv.org/abs/2507.21340
Yida Tao, Yen-Chia Hsu, 29 Jul 2025, Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation, https://arxiv.org/abs/2507.22002
Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu, 31 Jul 2025, CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks, https://arxiv.org/abs/2507.23751
Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen, 31 Jul 2025, Continual Learning with Synthetic Boundary Experience Blending, https://arxiv.org/abs/2507.23534
Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata, 31 Jul 2025, SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions, https://arxiv.org/abs/2507.23784
Patricia A. Apell\'aniz and Ana Jim\'enez and Borja Arroyo Galende and Juan Parras and Santiago Zazo, 31 Jul 2025, Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios, https://arxiv.org/abs/2407.03080
Aleksander Ficek, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg, 30 Jul 2025, Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning, https://arxiv.org/abs/2502.13820
Georgi Ganev and Meenatchi Sundaram Muthu Selva Annamalai and Sofiane Mahiou and Emiliano De Cristofaro, 29 Jul 2025, The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data, https://arxiv.org/abs/2504.06923
Tom Or and Omri Azencot (Ben Gurion University of the Negev), 1 Aug 2025, Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics, https://arxiv.org/abs/2508.00784
Ivona Krchova, Mariana Vargas Vieyra, Mario Scriminaci, Andrey Sidorenko, 1 Aug 2025, Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK, https://arxiv.org/abs/2508.00718
Jianwei Wang, Ziming Wu, Fuming Lai, Shaobing Lian, Ziqian Zeng, 1 Aug 2025, SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought, https://arxiv.org/abs/2508.00574
Abdulmajid Murad, Massimiliano Ruocco, 4 Aug 2025, Pre-Tactical Flight-Delay and Turnaround Forecasting with Synthetic Aviation Data, https://arxiv.org/abs/2508.02294
Ahmad Rezaie Mianroodi, Amirali Rezaie, Niko Grisel Todorov, Cyril Rakovski, Frank Rudzicz, 2 Aug 2025, MedSynth: Realistic, Synthetic Medical Dialogue-Note Pairs, https://arxiv.org/abs/2508.01401
Vinicius Lima, Dzung T. Phan, Jayant Kalagnanam, Dhaval Patel, Nianjun Zhou, 5 Aug 2025, Toward a Trustworthy Optimization Modeling Agent via Verifiable Synthetic Data Generation, https://arxiv.org/abs/2508.03117
Oc\'eane Doremus, Ariel Guerra-Adames, Marta Avalos-Fernandez, Vianney Jouhet, C\'edric Gil-Jardin\'e, Emmanuel Lagarde, 4 Aug 2025, Synthetic medical data generation: state of the art and application to trauma mechanism classification, https://arxiv.org/abs/2508.02771
Shifeng Xie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, Themis Palpanas, and Ievgen Redko, 4 Aug 2025, CauKer: classification time series foundation models can be pretrained on synthetic data only, https://arxiv.org/abs/2508.02879
Yongyi Wang, Lingfeng Li, Bozhou Chen, Ang Li, Hanyu Liu, Qirui Zheng, Xionghui Yang, Wenxin Li, 6 Aug 2025, Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling, https://arxiv.org/abs/2508.04282
George Bredis, Stanislav Dereka, Viacheslav Sinii, Ruslan Rakhimov, Daniil Gavrilov, 6 Aug 2025, Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success, https://arxiv.org/abs/2508.04280
Mohd Ashhad and Ricardo Henao, 5 Aug 2025, Generating Accurate Synthetic Survival Data by Conditioning on Outcomes, https://arxiv.org/abs/2405.17333
Yunbo Long, Liming Xu, Alexandra Brintrup, 7 Aug 2025, LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion, https://arxiv.org/abs/2503.02161
Ingo Ziegler, Abdullatif K\"oksal, Desmond Elliott, Hinrich Sch\"utze, 6 Aug 2025, CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation, https://arxiv.org/abs/2409.02098
Alejandro Moreno R., Desale Fentaw, Samuel Palmer, Ra\'ul Salles de Padua, Ninad Dixit, Samuel Mugel, Roman Or\'us, Manuel Radons, Josef Menter, and Ali Abedi, 8 Aug 2025, Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS), https://arxiv.org/abs/2508.06251
Ojonugwa Oluwafemi Ejiga Peter, Akingbola Oluwapemiisin, Amalahu Chetachi, Adeniran Opeyemi, Fahmi Khalifa, and Md Mahmudur Rahman, 8 Aug 2025, Synthetic Data-Driven Multi-Architecture Framework for Automated Polyp Segmentation Through Integrated Detection and Mask Generation, https://arxiv.org/abs/2508.06170
Pavitra Chauhan, Mohsen Gamal Saad Askar, Kristian Svendsen, Bj{\o}rn Fjukstad, Brita Elvev{\aa}g, Lars Ailo Bongo, Edvard Pedersen, 8 Aug 2025, From research to clinic: Accelerating the translation of clinical decision support systems by making synthetic data interoperable, https://arxiv.org/abs/2308.02613
Shayan Alahyari, Mike Domaratzki, 8 Aug 2025, SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression, https://arxiv.org/abs/2504.21152
Arshia Ilaty, Hossein Shirazi, Hajar Homayouni, 11 Aug 2025, SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering, https://arxiv.org/abs/2508.08529
Audrey Poinsot, Panayiotis Panayiotou, Alessandro Leite, Nicolas Chesneau, \"Ozg\"ur \c{S}im\c{s}ek, Marc Schoenauer, 12 Aug 2025, Position: Causal Machine Learning Requires Rigorous Synthetic Experiments for Broader Adoption, https://arxiv.org/abs/2508.08883
Farah Atif, Nursultan Askarbekuly, Kareem Darwish, Monojit Choudhury, 4 Aug 2025, Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions, https://arxiv.org/abs/2508.08287
Vibeke Binz Vallevik, Anne Kjersti C. Befring, Severin Elvatun and Jan Franz Nygaard, 11 Aug 2025, Processing of synthetic data in AI development for healthcare and the definition of personal data in EU law, https://arxiv.org/abs/2508.08353
Aydin Zaboli and Junho Hong, 12 Aug 2025, Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection, https://arxiv.org/abs/2508.08593
Taedong Yun, Eric Yang, Mustafa Safdari, Jong Ha Lee, Vaishnavi Vinod Kumar, S. Sara Mahdavi, Jonathan Amar, Derek Peyton, Reut Aharony, Andreas Michaelides, Logan Schneider, Isaac Galatzer-Levy, Yugang Jia, John Canny, Arthur Gretton, Maja Matari\'c, 12 Aug 2025, Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions, https://arxiv.org/abs/2502.13135
Min Tang, Peng Lu, Qing Feng, 6 Aug 2025, Generating Feasible and Diverse Synthetic Populations Using Diffusion Models, https://arxiv.org/abs/2508.09164
Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li, 13 Aug 2025, Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation, https://arxiv.org/abs/2508.09987
Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Bofei Gao, Kangyang Luo, Wenhao Li, Yufei Huang, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun, 13 Aug 2025, Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning, https://arxiv.org/abs/2505.16483
Pratyush Maini, Vineeth Dorna, Parth Doshi, Aldo Carranza, Fan Pan, Jack Urbanek, Paul Burstein, Alex Fang, Alvin Deng, Amro Abbas, Brett Larsen, Cody Blakeney, Charvi Bannur, Christina Baek, Darren Teh, David Schwab, Haakon Mongstad, Haoli Yin, Josh Wills, Kaleigh Mentzer, Luke Merrick, Ricardo Monti, Rishabh Adiga, Siddharth Joshi, Spandan Das, Zhengping Wang, Bogdan Gaza, Ari Morcos, Matthew Leavitt, 14 Aug 2025, BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining, https://arxiv.org/abs/2508.10975
Liam Chalcroft and Ioannis Pappas and Cathy J. Price and John Ashburner, 15 Aug 2025, Synthetic Data for Robust Stroke Segmentation, https://arxiv.org/abs/2404.01946
Nitish Nagesh, Salar Shakibhamedan, Mahdi Bagheri, Ziyu Wang, Nima TaheriNejad, Axel Jantsch, Amir M. Rahmani, 15 Aug 2025, FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation, https://arxiv.org/abs/2508.11810
Jonas van Elburg, Peter van der Putten, Maarten Marx, 15 Aug 2025, Can we Evaluate RAGs with Synthetic Data?, https://arxiv.org/abs/2508.11758
Ahmet H. G\"uzel, Ilija Bogunovic, Jack Parker-Holder, 17 Aug 2025, Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data, https://arxiv.org/abs/2508.12356
Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov, 17 Aug 2025, Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment, https://arxiv.org/abs/2506.00845
Matey Krastev, Miklos Hamar, Danilo Toapanta, Jesse Brouwers, Yibin Lei, 19 Aug 2025, InPars+: Supercharging Synthetic Data Generation for Information Retrieval Systems, https://arxiv.org/abs/2508.13930
Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti, 19 Aug 2025, POPri: Private Federated Learning using Preference-Optimized Synthetic Data, https://arxiv.org/abs/2504.16438
Suleyman Olcay Polat, Poli A. Nemkova, Mark V. Albert, 20 Aug 2025, Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method, https://arxiv.org/abs/2508.14783
Samir Abdaljalil, Erchin Serpedin, Khalid Qaraqe, Hasan Kurban, 20 Aug 2025, Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference, https://arxiv.org/abs/2508.14735
Gaston Gustavo Rios, 20 Aug 2025, HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation, https://arxiv.org/abs/2508.14345
Saptarshi Neil Sinha and P. Julius Kuehn and Johannes Koppe and Arjan Kuijper and Michael Weinmann, 20 Aug 2025, Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data, https://arxiv.org/abs/2505.22291
Bidyapati Pradhan, Surajit Dasgupta, Amit Kumar Saha, Omkar Anustoop, Sriram Puttagunta, Vipul Mittal, Gopal Sarda, 21 Aug 2025, GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO, https://arxiv.org/abs/2508.15432
Jan Kapar, Kathrin G\"unther, Lori Ann Vallis, Klaus Berger, Nadine Binder, Hermann Brenner, Stefanie Castell, Beate Fischer, Volker Harth, Bernd Holleczek, Timm Intemann, Till Ittermann, Andr\'e Karch, Thomas Keil, Lilian Krist, Berit Lange, Michael F. Leitzmann, Katharina Nimptsch, Nadia Obi, Iris Pigeot, Tobias Pischon, Tamara Schikowski, B\"orge Schmidt, Carsten Oliver Schmidt, Anja M. Sedlmair, Justine Tanoey, Harm Wienbergen, Andreas Wienke, Claudia Wigmann and Marvin N. Wright, 19 Aug 2025, Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI, https://arxiv.org/abs/2508.14936
Juntao Tan, Liangwei Yang, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Tulika Manoj Awalgaonkar, Jianguo Zhang, Weiran Yao, Ming Zhu, Shirley Kokane, Silvio Savarese, Huan Wang, Caiming Xiong, Shelby Heinecke, 20 Aug 2025, PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, https://arxiv.org/abs/2502.20616
Arefeh Kazemi and Sri Balaaji Natarajan Kalaivendan and Joachim Wagner and Hamza Qadeer and Kanishk Verma and Brian Davis, 20 Aug 2025, Synthetic vs. Gold: The Role of LLM Generated Labels and Data in Cyberbullying Detection, https://arxiv.org/abs/2502.15860
Weijie Niu, Alberto Huertas Celdran, Karoline Siarsky, Burkhard Stiller, 22 Aug 2025, FEST: A Unified Framework for Evaluating Synthetic Tabular Data, https://arxiv.org/abs/2508.16254
Seyedali Mohammadi, Manas Paldhe, Amit Chhabra, 13 Aug 2025, LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions, https://arxiv.org/abs/2508.15801
Jerry Cao-Xue, Tien Comlekoglu, Keyi Xue, Guanliang Wang, Jiang Li, Gordon Laurie, 21 Aug 2025, Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset, https://arxiv.org/abs/2508.15986
Mika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa P\'erez, 22 Aug 2025, Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization, https://arxiv.org/abs/2508.16200
Stefania L. Moroianu, Christian Bluethgen, Pierre Chambon, Mehdi Cherti, Jean-Benoit Delbrouck, Magdalini Paschali, Brandon Price, Judy Gichoya, Jenia Jitsev, Curtis P. Langlotz, Akshay S. Chaudhari, 22 Aug 2025, Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data, https://arxiv.org/abs/2508.16783
Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Joao Manoel Herrera Pinheiro, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker, 24 Aug 2025, A Synthetic Dataset for Manometry Recognition in Robotic Applications, https://arxiv.org/abs/2508.17468
Weikang Wan, Jiawei Fu, Xiaodi Yuan, Yifeng Zhu, Hao Su, 24 Aug 2025, LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations, https://arxiv.org/abs/2508.17547
Rishikesh Devanathan, Varun Nathan, Ayush Kumar, 25 Aug 2025, Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation, https://arxiv.org/abs/2508.18210
Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower, 23 Aug 2025, GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection, https://arxiv.org/abs/2508.17057
Chenhao Xue, Yuanzhe Jin, Adrian Carrasco-Revilla, Joyraj Chakraborty, Min Chen, 4 Aug 2025, AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification, https://arxiv.org/abs/2508.10000
Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone, 4 Sep 2025, Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference, https://arxiv.org/abs/2509.04112
Chanon Puttanawarut, Natcha Fongsrisin, Porntep Amornritvanich, Cholatid Ratanatharathorn, Panu Looareesuwan, 4 Sep 2025, Synthetic Survival Data Generation for Heart Failure Prognosis Using Deep Generative Models, https://arxiv.org/abs/2509.04245
Aishik Mandal, Tanmoy Chakraborty, Iryna Gurevych, 4 Sep 2025, MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions, https://arxiv.org/abs/2509.04183
Mollie Shichman, Claire Bonial, Austin Blodgett, Taylor Hudson, Francis Ferraro, Rachel Rudinger, 3 Sep 2025, FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response, https://arxiv.org/abs/2502.18452
Seganrasan Subramanian, Abhigya Verma, 4 Sep 2025, Modular Techniques for Synthetic Long-Context Data Generation in Language Model Training and Evaluation, https://arxiv.org/abs/2509.01185
Mat\'ias Pizarro, Mike Laszkiewicz, Shawkat Hesso, Dorothea Kolossa, Asja Fischer, 4 Sep 2025, Exposing Synthetic Speech: Model Attribution and Detection of AI-generated Speech via Audio Fingerprints, https://arxiv.org/abs/2411.14013
Yogev Cohen, Dudi Ohayon, Romy Somkin, Yehudit Aperstein, Alexander Apartsin, 5 Sep 2025, Code Review Without Borders: Evaluating Synthetic vs. Real Data for Review Recommendation, https://arxiv.org/abs/2509.04810
Alpana Dubey, Suma Mani Kuriakose, Nitish Bhardwaj, 5 Sep 2025, SynGen-Vision: Synthetic Data Generation for training industrial vision models, https://arxiv.org/abs/2509.04894
Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren, 25 Aug 2025, Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails, https://arxiv.org/abs/2508.18384
Ilias Driouich, Hongliu Cao, Eoin Thomas, 26 Aug 2025, Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework, https://arxiv.org/abs/2508.18929
Dawei Li, Yue Huang, Ming Li, Tianyi Zhou, Xiangliang Zhang, Huan Liu, 27 Aug 2025, Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era, https://arxiv.org/abs/2508.19570
Zhan Shi, Yefeng Yuan, Yuhong Liu, Liang Cheng, Yi Fang, 25 Aug 2025, RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting, https://arxiv.org/abs/2508.19286
Michael Nidd, Christoph Miksovic, Thomas Gschwind, Francesco Fusco, Andrea Giovannini, Ioana Giurgiu, 27 Aug 2025, Bootstrapping Learned Cost Models with Synthetic SQL Queries, https://arxiv.org/abs/2508.19807
Jingze Zhang, Jiahe Qian, Yiliang Zhou, Yifan Peng, 28 Aug 2025, Enhancing Health Fact-Checking with LLM-Generated Synthetic Data, https://arxiv.org/abs/2508.20525
Sang Su Lee, Vineeth Loganathan, and Vijay Raghavan, 28 Aug 2025, Dynamic Synthetic Controls vs. Panel-Aware Double Machine Learning for Geo-Level Marketing Impact Estimation, https://arxiv.org/abs/2508.20335
Yijia Guo and Junqing Zhang and Y.-W. Peter Hong, 28 Aug 2025, Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach, https://arxiv.org/abs/2508.20861
Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary C. Lipton, 28 Aug 2025, Expert Routing with Synthetic Data for Continual Learning, https://arxiv.org/abs/2412.17009
Joshua Ward, Chi-Hua Wang, Guang Cheng, 28 Aug 2025, Privacy Auditing Synthetic Data Release through Local Likelihood Attacks, https://arxiv.org/abs/2508.21146
Pujan Thapa, Alexander Ororbia, Travis Desell, 28 Aug 2025, Class Incremental Continual Learning with Self-Organizing Maps and Variational Autoencoders Using Synthetic Replay, https://arxiv.org/abs/2508.21240
Jo\~ao Valente, Atabak Dehban, Rodrigo Ventura, 29 Aug 2025, CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models, https://arxiv.org/abs/2508.21732
Jorge Saldivar, Anna Gatzioura, Carlos Castillo, 28 Aug 2025, Synthetic CVs To Build and Test Fairness-Aware Hiring Tools, https://arxiv.org/abs/2508.21179
Nidhi Kowtal, Raviraj Joshi, 29 Aug 2025, L3Cube-MahaEmotions: A Marathi Emotion Recognition Dataset with Synthetic Annotations using CoTR prompting and Large Language Models, https://arxiv.org/abs/2506.00863
Shang Liu, Jing Wang, Wenji Fang, Zhiyao Xie, 26 Aug 2025, SynCircuit: Automated Generation of New Synthetic RTL Circuits Can Enable Big Data in Circuits, https://arxiv.org/abs/2509.00071
G. Charbel N. Kindji (MALT), Elisa Fromont (MALT), Lina Maria Rojas-Barahona, Tanguy Urvoy, 27 Aug 2025, Robust Detection of Synthetic Tabular Data under Schema Variability, https://arxiv.org/abs/2509.00092
Nikolaos Giakoumoglou, Andreas Floros, Kleanthis Marios Papadopoulos, Tania Stathaki, 2 Sep 2025, Unsupervised Training of Vision Transformers with Synthetic Negatives, https://arxiv.org/abs/2509.02024
Nikolaos Giakoumoglou, Andreas Floros, Kleanthis Marios Papadopoulos, Tania Stathaki, 2 Sep 2025, Fake & Square: Training Self-Supervised Vision Transformers with Synthetic Data and Synthetic Hard Negatives, https://arxiv.org/abs/2509.02029
Yevhen Havrylenko, Meelis K\"a\"arik and Artur Tuttar, 2 Sep 2025, Amputation-imputation based generation of synthetic tabular data for ratemaking, https://arxiv.org/abs/2509.02171
Hunter Gittlin, 29 Aug 2025, Beyond Synthetic Augmentation: Group-Aware Threshold Calibration for Robust Balanced Accuracy in Imbalanced Learning, https://arxiv.org/abs/2509.02592
Vikas Kashtriya and Pardeep Singh, 2 Sep 2025, Enhancing Machine Learning for Imbalanced Medical Data: A Quantum-Inspired Approach to Synthetic Oversampling (QI-SMOTE), https://arxiv.org/abs/2509.02863
Jorn K. Teutloff, 29 Aug 2025, Synthetic Founders: AI-Generated Social Simulations for Startup Validation Research in Computational Social Science, https://arxiv.org/abs/2509.02605
Leire Benito-Del-Valle, Pedro A. Moreno-S\'anchez, Itziar Egusquiza, Itsaso Vitoria, Artzai Pic\'on, Cristina L\'opez-Saratxaga, Adrian Galdran, 30 Aug 2025, Is Synthetic Image Augmentation Useful for Imbalanced Classification Problems? Case-Study on the MIDOG2025 Atypical Cell Detection Competition, https://arxiv.org/abs/2509.02612
Honglu Zhou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles, 3 Sep 2025, Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data, https://arxiv.org/abs/2509.03501
Liming Xu and Yunbo Long and Alexandra Brintrup, 30 Aug 2025, SynDelay: A Synthetic Dataset for Delivery Delay Prediction, https://arxiv.org/abs/2509.05325
Qiyuan Chen, Hongsen Huang, Qian Shao, Jiahe Chen, Jintai Chen, Hongxia Xu, Renjie Hua, Ren Chuan, Jian Wu, 6 Sep 2025, Icon$^{2}$: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation, https://arxiv.org/abs/2509.05605
Ching-Chun Chang and Isao Echizen, 6 Sep 2025, Tell-Tale Watermarks for Explanatory Reasoning in Synthetic Media Forensics, https://arxiv.org/abs/2509.05753
Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke, 8 Sep 2025, MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML, https://arxiv.org/abs/2509.06806
Debajyoti Mazumder, Aakash Kumar, Jasabanta Patro, 8 Sep 2025, Revealing the impact of synthetic native samples and multi-tasking strategies in Hindi-English code-mixed humour and sarcasm detection, https://arxiv.org/abs/2412.12761
Benjamin Hoffman, David Robinson, Marius Miron, Vittorio Baglione, Daniela Canestrari, Damian Elias, Eva Trapote, Felix Effenberger, Maddie Cusimano, Masato Hagiwara, Olivier Pietquin, 5 Sep 2025, Synthetic data enables context-aware bioacoustic sound event detection, https://arxiv.org/abs/2503.00296
Wang Wang, Mingyu Shi, Jun Jiang, Wenqian Ma, Chong Liu, Yasutaka Narazaki, Xuguang Wang, 5 Sep 2025, Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework, https://arxiv.org/abs/2507.05814
Seunghyeon Kim, Kyeongryeol Go, 22 Jul 2025, Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective, https://arxiv.org/abs/2507.16254
Xiaopeng Ke and Hexuan Deng and Xuebo Liu and Jun Rao and Zhenxi Song and Jun Yu and Min Zhang, 24 Jul 2025, AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs, https://arxiv.org/abs/2507.18584
Xi Long, Christy Boscardin, Lauren A. Maggio, Joseph A. Costello, Ralph Gonzales, Rasmyah Hammoudeh, Ki Lai, Yoon Soo Park, Brian C. Gin, 14 Aug 2025, Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis, https://arxiv.org/abs/2508.09458
Qiushi Sun, Jinyang Gong, Lei Li, Qipeng Guo, Fei Yuan, 25 Jul 2025, CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback, https://arxiv.org/abs/2507.22080
Xiaoling Hu, Xiangrui Zeng, Oula Puonti, Juan Eugenio Iglesias, Bruce Fischl, Yael Balbastre, 1 Aug 2025, Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation, https://arxiv.org/abs/2411.16719
Siyi Liu, Yujia Zheng, Yongqi Zhang, 4 Aug 2025, StructSynth: Leveraging LLMs for Structure-Aware Tabular Data Synthesis in Low-Data Regimes, https://arxiv.org/abs/2508.02601
Yong Lin and Shange Tang and Bohan Lyu and Ziran Yang and Jui-Hui Chung and Haoyu Zhao and Lai Jiang and Yihan Geng and Jiawei Ge and Jingruo Sun and Jiayun Wu and Jiri Gesi and Ximing Lu and David Acuna and Kaiyu Yang and Hongzhou Lin and Yejin Choi and Danqi Chen and Sanjeev Arora and Chi Jin, 5 Aug 2025, Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction, https://arxiv.org/abs/2508.03613
Parker Seegmiller, Kartik Mehta, Soumya Saha, Chenyang Tao, Shereen Oraby, Arpit Gupta, Tagyoung Chung, Mohit Bansal and Nanyun Peng, 22 Aug 2025, FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline, https://arxiv.org/abs/2508.16514
Feng Tian, Flora D. Salim, Hao Xue, 25 Aug 2025, TradingGroup: A Multi-Agent Trading System with Self-Reflection and Data-Synthesis, https://arxiv.org/abs/2508.17565
Sunguk Choi, Yonghoon Kwon, Heondeuk Lee, 26 Aug 2025, CAC-CoT: Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks, https://arxiv.org/abs/2508.18743
Timur Sattarov, Marco Schreyer, Damian Borth, 29 Aug 2025, Federated Diffusion Modeling with Differential Privacy for Tabular Data Synthesis, https://arxiv.org/abs/2412.16083
Ziyi Xia, Kun Luo, Hongjin Qian, Zheng Liu, 30 Aug 2025, Open Data Synthesis For Deep Research, https://arxiv.org/abs/2509.00375
Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen and Ziqian Zeng, 31 Aug 2025, RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis, https://arxiv.org/abs/2502.18517
Yuntao Du, Ninghui Li, 7 Sep 2025, Systematic Assessment of Tabular Data Synthesis, https://arxiv.org/abs/2402.06806
Laura Boggia, Bogdan Malaescu, 9 Sep 2025, Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters, https://arxiv.org/abs/2509.07451
Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, Reyhaneh Jabbarvand, 9 Sep 2025, Challenging Bug Prediction and Repair Models with Synthetic Bugs, https://arxiv.org/abs/2310.02407
Jackson Eshbaugh, Chetan Tiwari, Jorge Silveyra, 11 Sep 2025, A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes, https://arxiv.org/abs/2509.09794
Keunwoo Choi, Seungheon Doh, Juhan Nam, 18 Aug 2025, TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation, https://arxiv.org/abs/2509.09685
Basti\'an Gonz\'alez-Bustamante, Nando Verelst, Carla Cisternas, 11 Sep 2025, Emulating Public Opinion: A Proof-of-Concept of AI-Generated Synthetic Survey Responses for the Chilean Case, https://arxiv.org/abs/2509.09871
Jing Zhang, Alexandre Bousse, Chi-Hieu Pham, Kuangyu Shi, Julien Bert, 12 Sep 2025, Semi-Supervised Learning for Dose Prediction in Targeted Radionuclide: A Synthetic Data Study, https://arxiv.org/abs/2503.05367
Tung Vu, Lam Nguyen, Quynh Dao, 10 Sep 2025, PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability, https://arxiv.org/abs/2509.08910
Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng, 11 Sep 2025, Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function, https://arxiv.org/abs/2509.09197
Nazia Nafis, Inaki Esnaola, Alvaro Martinez-Perez, Maria-Cruz Villa-Uriol, Venet Osmani, 11 Sep 2025, Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review, https://arxiv.org/abs/2504.18544
Dimitris Tsirmpas and Ion Androutsopoulos and John Pavlopoulos, 11 Sep 2025, Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of Discussions, https://arxiv.org/abs/2503.16505
Sepehr Dehdashtian, Mashrur M. Morshed, Jacob H. Seidman, Gaurav Bharaj and Vishnu Naresh Boddeti, 19 Sep 2025, PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors, https://arxiv.org/abs/2509.15551
Nakul Sharma, 19 Sep 2025, Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data, https://arxiv.org/abs/2509.15859
Nomi Yu (1), Md Ferdous Alam (1), A. John Hart (1), and Faez Ahmed (1) ((1) Massachusetts Institute of Technology), 17 Sep 2025, GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing, https://arxiv.org/abs/2509.15246
Zitong Yang, Aonan Zhang, Hong Liu, Tatsunori Hashimoto, Emmanuel Cand\`es, Chong Wang, Ruoming Pang, 17 Sep 2025, Synthetic bootstrapped pretraining, https://arxiv.org/abs/2509.15248
Caitlin Cisar, Emily Sheffield, Joshua Drake, Alden Harrell, Subramanian Chidambaram, Nikita Nangia, Vinayak Arannil, Alex Williams, 18 Sep 2025, PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting, https://arxiv.org/abs/2509.15447
Junlong Jia, Xing Wu, Chaochen Gao, Ziyang Chen, Zijia Lin, Zhongzhi Li, Weinong Wang, Haotian Xu, Donghui Jin, Debing Zhang, Binghui Guo, 19 Sep 2025, LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs, https://arxiv.org/abs/2509.15568
Yixuan Yang, Zhen Luo, Tongsheng Ding, Junru Lu, Mingqi Gao, Jinyu Yang, Victor Sanchez, Feng Zheng, 19 Sep 2025, OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization, https://arxiv.org/abs/2506.07570
Alessandro Crimi and Andrea Brovelli, 15 Sep 2025, Prediction and Causality of functional MRI and synthetic signal using a Zero-Shot Time-Series Foundation Model, https://arxiv.org/abs/2509.12497
Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Yida Zhao, Liwen Zhang, Litu Ou, Dingchu Zhang, Xixi Wu, Jialong Wu, Xinyu Wang, Zile Qiao, Zhen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou, 16 Sep 2025, WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning, https://arxiv.org/abs/2509.13305
Riyaadh Gani, 12 Sep 2025, Physics-Informed Neural Networks vs. Physics Models for Non-Invasive Glucose Monitoring: A Comparative Study Under Realistic Synthetic Conditions, https://arxiv.org/abs/2509.12253
Nolan Platt and Pragyansmita Nayak, 16 Sep 2025, Multi-Model Synthetic Training for Mission-Critical Small Language Models, https://arxiv.org/abs/2509.13047
Shanmuka Sadhu, Arca Baran, Preeti Pandey, and Ayush Kumar, 15 Sep 2025, Task Decoding based on Eye Movements using Synthetic Data Augmentation, https://arxiv.org/abs/2509.11547
Rumeng Li, Xun Wang, Hong Yu, 5 Sep 2025, DualAlign: Generating Clinically Grounded Synthetic Data, https://arxiv.org/abs/2509.10538
Omkar Shailendra Vengurlekar, Adithya Pediredla, Suren Jayasuriya, 14 Sep 2025, SH-SAS: An Implicit Neural Representation for Complex Spherical-Harmonic Scattering Fields for 3D Synthetic Aperture Sonar, https://arxiv.org/abs/2509.11087
Milan Marocchi, Matthew Fynn, Kayapanda Mandana, Yue Rong, 15 Sep 2025, Scaling to Multimodal and Multichannel Heart Sound Classification: Fine-Tuning Wav2Vec 2.0 with Synthetic and Augmented Biosignals, https://arxiv.org/abs/2509.11606
Mikhail Kulyabin, Jan Joosten, Choro Ulan uulu, Nuno Miguel Martins Pacheco, Fabian Ries, Filippos Petridis, Jan Bosch, and Helena Holmstr\"om Olsson, 15 Sep 2025, User eXperience Perception Insights Dataset (UXPID): Synthetic User Feedback from Public Industrial Forums, https://arxiv.org/abs/2509.11777
Lauri Suomela, Sasanka Kuruppu Arachchige, German F. Torres, Harry Edelman, Joni-Kristian K\"am\"ar\"ainen, 15 Sep 2025, Synthetic vs. Real Training Data for Visual Navigation, https://arxiv.org/abs/2509.11791
Amirhossein Abaskohi, Spandana Gella, Giuseppe Carenini, Issam H. Laradji, 13 Sep 2025, FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering, https://arxiv.org/abs/2412.07030
Shengjie Ma, Xuhui Jiang, Chengjin Xu, Cehao Yang, Liyu Zhang, Jian Guo, 14 Sep 2025, Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models, https://arxiv.org/abs/2505.00979
Karan Dua, Puneet Mittal, Ranjeet Gupta, Hitesh Laxmichand Patel, 15 Sep 2025, SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models, https://arxiv.org/abs/2509.14270
Luisa Torquato Ni\~no and Hamza A. A. Gardi, 18 Sep 2025, Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies, https://arxiv.org/abs/2509.15045
Christopher Wiedeman, Anastasiia Sarmakeeva, Elena Sizikova, Daniil Filienko, Miguel Lago, Jana G. Delfino, Aldo Badano, 18 Sep 2025, T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images, https://arxiv.org/abs/2507.04038
Estelle Chigot, Dennis G. Wilson, Meriem Ghrib, Thomas Oberlin, 18 Sep 2025, Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation, https://arxiv.org/abs/2505.16360
Lauren H. Cooke, Matthias Jung, Jan M. Brendel, Nora M. Kerkovits, Borek Foldyna, Michael T. Lu, Vineet K. Raghu, 10 Sep 2025, RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts, https://arxiv.org/abs/2509.08640
Dietmar Offenhuber, 14 Sep 2025, Synthetic Data and the Shifting Ground of Truth, https://arxiv.org/abs/2509.13355
Inder Pal Singh, Nidhal Eddine Chenni, Abd El Rahman Shabayek, Arunkumar Rathinam, Djamila Aouada, 17 Sep 2025, Bridging the Synthetic-Real Gap: Supervised Domain Adaptation for Robust Spacecraft 6-DoF Pose Estimation, https://arxiv.org/abs/2509.13792
Gustavo Kruger, Nikhil Sachdeva, Michael Sobolev, 17 Sep 2025, Synthetic Data Generation for Screen Time and App Usage, https://arxiv.org/abs/2509.13892
Niklas Grieger, Siamak Mehrkanoon, Stephan Bialonski, 17 Sep 2025, Data-Efficient Sleep Staging with Synthetic Time Series Pretraining, https://arxiv.org/abs/2403.08592

Unnatural Instructions (Synthetic Data)

Research papers on "unnatural instructions," a type of synthetic data for training:

A Gudibande, E Wallace, C Snell, X Geng, H Liu 2023, The false promise of imitating proprietary llms, https://arxiv.org/abs/2305.15717
Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang 2023, Aligning large language models with human: A survey, https://arxiv.org/abs/2307.12966
Y Gu, L Dong, F Wei, M Huang, 2023, Knowledge Distillation of Large Language Models, https://arxiv.org/abs/2306.08543
X Wan, R Sun, H Dai, SO Arik, T Pfister, 2023, Better zero-shot reasoning with self-adaptive prompting, https://arxiv.org/abs/2305.14106
S Horawalavithana, S Munikoti, I Stewart, 2023, SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions, https://arxiv.org/abs/2307.01139
X Daull, P Bellot, E Bruno, V Martin, 2023, Complex QA and language models hybrid architectures, Survey, https://arxiv.org/abs/2302.09051
Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou, 2023, Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation, https://arxiv.org/abs/2308.01240
W AlShikh, M Daaboul, K Goddard, B Imel, 2023, Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning, https://arxiv.org/abs/2307.03692
Z He, Z Xie, R Jha, H Steck, D Liang, Y Feng, 2023, Large Language Models as Zero-Shot Conversational Recommenders, https://arxiv.org/abs/2308.10053

Distributed Training

Distributed training is the optimization of spreading training computations across multiple GPUs or multiple servers. Trillion parameter models are trained on large clusters of 100,000+ GPUs, with complex multi-server multi-GPU architectures. Distributed training can also be performed on much more spread-out architectures with servers communicating over the internet.

Some of the research papers on distributed training:

Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai, 22 Aug 2024, Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters, https://arxiv.org/abs/2408.12596
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Palak (Microsoft Research India), Rohan Gandhi (Microsoft Research India), Karan Tandon (Microsoft Research India), Debopam Bhattacherjee (Microsoft Research India), Venkata N. Padmanabhan (Microsoft Research India), 16 Nov 2024, Improving training time and GPU utilization in geo-distributed language model training, https://arxiv.org/abs/2411.14458
M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
Bowen Peng, Jeffrey Quesnelle, Diederik P. Kingma, 29 Nov 2024, DeMo: Decoupled Momentum Optimization, https://arxiv.org/abs/2411.19870 https://github.com/bloc97/DeMo (Extension to ADAM optimizer that greatly reduces network communication in training.)
Carl Franzen, August 27, 2024, ‘This could change everything!’ Nous Research unveils new tool to train powerful AI models with 10,000x efficiency, https://venturebeat.com/ai/this-could-change-everything-nous-research-unveils-new-tool-to-train-powerful-ai-models-with-10000x-efficiency/
Carl Franzen, December 2, 2024, Nous Research is training an AI model using machines distributed across the internet, https://venturebeat.com/ai/nous-research-is-training-an-ai-model-using-machines-distributed-across-the-internet/
Yicheng Feng, Yuetao Chen, Kaiwen Chen, Jingzong Li, Tianyuan Wu, Peng Cheng, Chuan Wu, Wei Wang, Tsung-Yi Ho, Hong Xu, 17 Dec 2024, Echo: Simulating Distributed Training At Scale, https://arxiv.org/abs/2412.12487
Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
Nir Barazida, Mar 9, 2022, Distributed training of deep learning models: handling stragglers and latency in synchronous training A review of the challenges in Synchronous distributed training and best solutions for stragglers and high latency https://towardsdatascience.com/stragglers-and-latency-in-synchronous-distributed-training-of-deep-learning-models-43783b0266d9
Zhuang Wang, Zhen Jia, Shuai Zheng, Zhen Zhang, Xinwei Fu, T. S. Eugene Ng, and Yida Wang. 2023. GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints. In Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23). Association for Computing Machinery, New York, NY, USA, 364–381. https://doi.org/10.1145/3600006.3613145 https://dl.acm.org/doi/10.1145/3600006.3613145 https://www.cs.rice.edu/~eugeneng/papers/SOSP23.pdf (First paper on in-memory checkpointing to CPU memory, and also covers interleaving of checkpointing network traffic with training traffic.)
Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou, 15 Apr 2024, AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes, https://arxiv.org/abs/2404.09679
Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang, 28 Jun 2024 (v2), Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training, https://arxiv.org/abs/2406.18820
Xinyi Liu, Yujie Wang, Shenhan Zhu, Fangcheng Fu, Qingshuo Liu, Guangming Lin, Bin Cui, 30 Apr 2025, Galvatron: An Automatic Distributed System for Efficient Foundation Model Training, https://arxiv.org/abs/2504.21411 https://github.com/PKU-DAIR/Hetu-Galvatron
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
Zihao Song, Shirantha Welikala, Panos J. Antsaklis and Hai Lin, 22 Jul 2025, Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach, https://arxiv.org/abs/2504.06439
Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman, 20 Jul 2025, PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training, https://arxiv.org/abs/2507.11683
Tolga Dimlioglu, Anna Choromanska, 27 Jul 2025, Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning, https://arxiv.org/abs/2507.20424
Samarth Gupta, Raghudeep Gadde, Rui Chen, Aleix M. Martinez, 20 Aug 2025, Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states, https://arxiv.org/abs/2508.14413
Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu, 4 Aug 2025, VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo, https://arxiv.org/abs/2508.02317
Xudong Liao, Yijun Sun, Han Tian, Xinchen Wan, Yilun Jin, Zilong Wang, Zhenghang Ren, Xinyang Huang, Wenxue Li, Kin Fai Tse, Zhizhen Zhong, Guyue Liu, Ying Zhang, Xiaofeng Ye, Yiming Zhang, Kai Chen, 4 Aug 2025, MixNet: A Runtime Reconfigurable Optical-Electrical Fabric for Distributed Mixture-of-Experts Training, https://arxiv.org/abs/2501.03905
Arefin Niam, Tevfik Kosar and M S Q Zulkar Nine, 5 Sep 2025, RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks, https://arxiv.org/abs/2509.05207
Yunfei Teng, Sixin Zhang, 3 Sep 2025, LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization, https://arxiv.org/abs/2509.03110
Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan, 12 Sep 2025, Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective, https://arxiv.org/abs/2509.10371
Ying Cao, Kun Yuan, Ali H. Sayed, 14 Sep 2025, On the Escaping Efficiency of Distributed Adversarial Training Algorithms, https://arxiv.org/abs/2509.11337
Yuwen Cao, Guijun Liu, Tomoaki Ohtsuki, Howard H. Yang, Tony Q. S. Quek, 31 Aug 2025, Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems, https://arxiv.org/abs/2509.10490
Wenjiao Feng and Rongxing Xiao and Zonghang Li and Hongfang Yu and Gang Sun and Long Luo and Mohsen Guizani and Qirong Ho and Steve Liu, 13 Sep 2025, Learning In Chaos: Efficient Autoscaling and Self-Healing for Multi-Party Distributed Training, https://arxiv.org/abs/2505.12815
Kai Yi, 10 Sep 2025, Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization, https://arxiv.org/abs/2509.08233
Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richt\'arik, 10 Sep 2025, FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models, https://arxiv.org/abs/2403.09904

Training Costs

Research on the total costs of performing LLM training:

Will Henshall June 3, 2024, The Billion-Dollar Price Tag of Building AI, Time, https://time.com/6984292/cost-artificial-intelligence-compute-epoch-report/
Epoch AI, 2024, How Much Does It Cost to Train Frontier AI Models? https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models
Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, David Owen, 31 May 2024, The rising costs of training frontier AI models, https://arxiv.org/abs/2405.21015
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
NovaSky, Jan 2025, Sky-T1: Train your own O1 preview model within $450, https://novasky-ai.github.io/posts/sky-t1/
Alberto Romero, Jan 2025, DeepSeek, a little-known Chinese startup, released R1 yesterday, https://substack.com/@thealgorithmicbridge/note/c-87664591-
Maxwell Zeff, February 5, 2025, Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50, https://techcrunch.com/2025/02/05/researchers-created-an-open-rival-to-openais-o1-reasoning-model-for-under-50/
Kyle Wiggers, January 11, 2025, Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450,https://techcrunch.com/2025/01/11/researchers-open-source-sky-t1-a-reasoning-ai-model-that-can-be-trained-for-less-than-450/
Alexandra Sternlicht, June 18, 2025, China’s MiniMax debuts M1 AI model that it says costs 200x less to train than OpenAI’s GPT-4, https://fortune.com/2025/06/18/chinas-minimax-m1-ai-model-200x-less-expensive-to-train-than-openai-gpt-4/
Epoch AI, July 2025, Large-Scale AI Models: Our Large-Scale AI Models dataset documents over 400 models trained with more than 10²³ floating point operations, at the leading edge of scale and capabilities, https://epoch.ai/data/large-scale-ai-models (Training costs at around $10m+; launched 200 large models in 2024.)
Sawdah Bhaimiya, Sep 6 2025, Behind the AI talent war: Why tech giants are paying millions to top hires, https://www.cnbc.com/2025/09/06/ai-talent-war-tech-giants-pay-talent-millions-of-dollars.html
Haijun Zhang, Jinxiang Wang, Zhenhua Yu, Yanyong Zhang, Xuejie Ji, Kaining Mao, Jun Zhang, Yaqing Zhang, Ting Wu, Fei Jie, Xiemin Huang, Zhifang Cai, Junhua Cheng, Shuwei Wang, Wei Li, Xiaoming Bao, Hua Xu, Shixiong Zhao, Jun Li, Hongwei Sun, Ziyang Zhang, Yi Xiong, Chunsheng Li, 3 Sep 2025, FlashRecovery: Fast and Low-Cost Recovery from Failures for Large-Scale Training of LLMs, https://arxiv.org/abs/2509.03047
Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng, 11 Sep 2025, Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function, https://arxiv.org/abs/2509.09197

Federated Learning

Research on federated learning, a type of distributed training for LLMs:

Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia, 7 May 2024, Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks, https://arxiv.org/abs/2405.04249
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti, 2024, Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures, PDF: https://cris.fbk.eu/bitstream/11582/343747/1/paper_49.pdf
H Woisetschläger, A Isenko, S Wang, R Mayer, 2023, Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly, https://arxiv.org/abs/2310.03150
Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane, 19 Jul 2024 (v2), The Future of Large Language Model Pre-training is Federated, https://arxiv.org/abs/2405.10853
Jaxpruner: A Concise Library for Sparsity Research, Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Woohyun Han, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart J.C. Bik, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci, Conference on Parsimony and Learning, PMLR 234:515-528, 2024. https://proceedings.mlr.press/v234/lee24a.html https://proceedings.mlr.press/v234/lee24a/lee24a.pdf https://openreview.net/forum?id=H2rCZCfXkS https://openreview.net/pdf?id=H2rCZCfXkS
Eric Samikwa, 2024, Resource-Aware Distributed Machine Learning for Artificial Intelligence of Things, Ph.D. thesis, Faculty of Science, University of Bern, Switzerland, https://boristheses.unibe.ch/5378/1/24samikwa_e_1_.pdf https://doi.org/10.48549/5378 (Multi-edge device with early exit, "micro-split" scheduling, split/federated learning, and distributed inference.)
Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
Shengwen Ding, Chenhui Hu, 24 Nov 2024, eFedLLM: Efficient LLM Inference Based on Federated Learning, https://arxiv.org/abs/2411.16003
Natalie Lang, Alejandro Cohen, Nir Shlezinger, 27 Mar 2024, Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates, https://arxiv.org/abs/2403.18375
Chengxi Li, Ming Xiao, Mikael Skoglund, 22 Mar 2024, Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation, https://arxiv.org/abs/2403.14905
Andrew Hard, Antonious M. Girgis, Ehsan Amid, Sean Augenstein, Lara McConnaughey, Rajiv Mathews, Rohan Anil, 14 Mar 2024, Learning from straggler clients in federated learning, https://arxiv.org/abs/2403.09086
Hongpeng Guo, Haotian Gu, Xiaoyang Wang, Bo Chen, Eun Kyung Lee, Tamar Eilam, Deming Chen, Klara Nahrstedt, 31 Jan 2024, FedCore: Straggler-Free Federated Learning with Distributed Coresets, https://arxiv.org/abs/2402.00219
Frederico Vicente, Cláudia Soares, Dušan Jakovetić, 13 May 2025, Modular Federated Learning: A Meta-Framework Perspective, https://arxiv.org/abs/2505.08646
Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu, 14 Aug 2025, A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning, https://arxiv.org/abs/2508.10315
Kejia Fan, Jianheng Tang, Zhirui Yang, Feijiang Han, Jiaxu Li, Run He, Yajiang Huang, Anfeng Liu, Houbing Herbert Song, Yunhuai Liu, Huiping Zhuang, 14 Aug 2025, APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares, https://arxiv.org/abs/2508.10732
Rodrigo Tertulino, 6 Aug 2025, A Robust Pipeline for Differentially Private Federated Learning on Imbalanced Clinical Data using SMOTETomek and FedProx, https://arxiv.org/abs/2508.10017
Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen, 11 Aug 2025, FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning, https://arxiv.org/abs/2508.10042
Tianjun Yuan, Jiaxiang Geng, Pengchao Han, Xianhao Chen, Bing Luo, 14 Aug 2025, Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models, https://arxiv.org/abs/2508.10349
Wenxuan Ye, Xueli An, Junfan Wang, Xueqiang Yan, Georg Carle, 14 Aug 2025, FedABC: Attention-Based Client Selection for Federated Learning with Long-Term View, https://arxiv.org/abs/2507.20871
Murtaza Rangwala, KR Venugopal, Rajkumar Buyya, 14 Aug 2025, Blockchain-Enabled Federated Learning, https://arxiv.org/abs/2508.06406
Mattia Sabella and Monica Vitali, 23 Jul 2025, Eco-Friendly AI: Unleashing Data Power for Green Federated Learning, https://arxiv.org/abs/2507.17241
Aritz P\'erez, Carlos Echegoyen and Guzm\'an Santaf\'e, 23 Jul 2025, Decentralized Federated Learning of Probabilistic Generative Classifiers, https://arxiv.org/abs/2507.17285
Amandeep Singh Bhatia, Sabre Kais, 23 Jul 2025, Enhancing Quantum Federated Learning with Fisher Information-Based Optimization, https://arxiv.org/abs/2507.17580
Dario Fenoglio, Gabriele Dominici, Pietro Barbiero, Alberto Tonda, Martin Gjoreski, Marc Langheinrich, 23 Jul 2025, Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning, https://arxiv.org/abs/2405.15632
Mehdi Khalaj, Shahrzad Golestani Najafabadi, Julita Vassileva, 23 Jul 2025, Privacy-Preserving Multimodal News Recommendation through Federated Learning, https://arxiv.org/abs/2507.15460
Binbin Ding, Penghui Yang, Sheng-Jun Huang, 22 Jul 2025, FLAIN: Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons, https://arxiv.org/abs/2408.08655
Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee, 22 Jul 2025, FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization, https://arxiv.org/abs/2506.23516
Baran Can G\"ul, Suraksha Nadig, Stefanos Tziampazis, Nasser Jazdi, Michael Weyrich, 22 Jul 2025, FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning, https://arxiv.org/abs/2507.15470
Obaidullah Zaland, Chanh Nguyen, Florian T. Pokorny and Monowar Bhuyan, 23 Jul 2025, Federated Learning for Large-Scale Cloud Robotic Manipulation: Opportunities and Challenges, https://arxiv.org/abs/2507.17903
Ahmad Alhonainy (1), Praveen Rao (1) ((1) University of Missouri, USA), 19 Jul 2025, Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments, https://arxiv.org/abs/2507.17772
Constantin Philippenko and Aymeric Dieuleveut, 24 Jul 2025, Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning, https://arxiv.org/abs/2308.01358
Daniel Commey, Kamel Abbad, Garth V. Crosby and Lyes Khoukhi, 18 Jul 2025, FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning, https://arxiv.org/abs/2507.13624
Sahar Ghoflsaz Ghinani and Elaheh Sadredini, 18 Jul 2025, FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning, https://arxiv.org/abs/2507.13591
Di Yu, Xin Du, Linshan Jiang, Huijing Zhang, Shuiguang Deng, 18 Jul 2025, Exploiting Label Skewness for Spiking Neural Networks in Federated Learning, https://arxiv.org/abs/2412.17305
Huan Wang, Haoran Li, Huaming Chen, Jun Yan, Jiahua Shi, Jun Shen, 18 Jul 2025, FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning, https://arxiv.org/abs/2507.06482
Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li, 18 Jul 2025, Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.01454
Nuria Rodr\'iguez-Barroso and Mario Garc\'ia-M\'arquez and M. Victoria Luz\'on and Francisco Herrera, 21 Jul 2025, Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work, https://arxiv.org/abs/2507.15796
Yajiao Dai, Jun Li, Zhen Mei, Yiyang Ni, Shi Jin, Zengxiang Li, Sheng Guo, Wei Xiang, 12 Jul 2025, Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis, https://arxiv.org/abs/2507.14181
Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain, 18 Jul 2025, FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning, https://arxiv.org/abs/2507.14322
Tianle Li, Yongzhi Huang, Linshan Jiang, Qipeng Xie, Chang Liu, Wenfeng Du, Lu Wang, and Kaishun Wu, 20 Jul 2025, FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios, https://arxiv.org/abs/2507.14980
Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang, 20 Jul 2025, Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data, https://arxiv.org/abs/2507.14999
Huiling Yang, Zhanwei Wang, and Kaibin Huang, 21 Jul 2025, Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity, https://arxiv.org/abs/2507.15601
Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang, 19 Jul 2025, VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning, https://arxiv.org/abs/2507.14625
Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li, 19 Jul 2025, VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking, https://arxiv.org/abs/2507.14629
Khoa Nguyen, Tanveer Khan, Antonis Michalas, 20 Jul 2025, A Privacy-Centric Approach: Scalable and Secure Federated Learning Enabled by Hybrid Homomorphic Encryption, https://arxiv.org/abs/2507.14853
Zhipeng Wang, Nanqing Dong, Jiahao Sun, William Knottenbelt, Yike Guo, 21 Jul 2025, zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning, https://arxiv.org/abs/2310.02554
Shunsuke Yoneda, Valdemar \v{S}v\'abensk\'y, Gen Li, Daisuke Deguchi, Atsushi Shimada, 21 Jul 2025, Ranking-Based At-Risk Student Prediction Using Federated Learning and Differential Features, https://arxiv.org/abs/2505.09287
Xinglin Zhao, Yanwen Wang, Xiaobo Liu, Yanrong Hao, Rui Cao, Xin Wen, 8 Aug 2025, A Federated Learning Framework for Handling Subtype Confounding and Heterogeneity in Large-Scale Neuroimaging Diagnosis, https://arxiv.org/abs/2508.06589
Md. Akmol Masud, Md Abrar Jahin, Mahmud Hasan, 8 Aug 2025, Stabilizing Federated Learning under Extreme Heterogeneity with HeteRo-Select, https://arxiv.org/abs/2508.06692
Yashwant Krishna Pagoti, Arunesh Sinha, Shamik Sural, 10 Aug 2025, Strategic Incentivization for Locally Differentially Private Federated Learning, https://arxiv.org/abs/2508.07138
Chenchen Lin, Xuehe Wang, 11 Aug 2025, Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks, https://arxiv.org/abs/2508.07676
Mohamad Assaad, Zeinab Nehme, Merouane Debbah, 11 Aug 2025, Communication-Efficient Zero-Order and First-Order Federated Learning Methods over Wireless Networks, https://arxiv.org/abs/2508.08013
Maozhen Zhang, Mengnan Zhao, Bo Wang, 11 Aug 2025, BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models, https://arxiv.org/abs/2508.08040
Cem Ata Baykara, Saurav Raj Pandey, Ali Burak \"Unal, Harlin Lee, and Mete Akg\"un, 11 Aug 2025, Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG Datasets, https://arxiv.org/abs/2508.08159
Roopkatha Banerjee, Sampath Koti, Gyanendra Singh, Anirban Chakraborty, Gurunath Gurrala, Bhushan Jagyasi and Yogesh Simmhan, 11 Aug 2025, Optimizing Federated Learning for Scalable Power-demand Forecasting in Microgrids, https://arxiv.org/abs/2508.08022
Zilong Zhao, Robert Birke, Aditya Kunar, Lydia Y. Chen, 11 Aug 2025, Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data, https://arxiv.org/abs/2108.07927
Dawood Wasif, Dian Chen, Sindhuja Madabushi, Nithin Alluru, Terrence J. Moore, Jin-Hee Cho, 9 Aug 2025, Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI, https://arxiv.org/abs/2503.16233
Kaveen Hiniduma, Zilinghan Li, Aditya Sinha, Ravi Madduri, Suren Byna, 11 Aug 2025, CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning, https://arxiv.org/abs/2505.23849
Ali Shakeri, Wei Emma Zhang, Amin Beheshti, Weitong Chen, Jian Yang and Lishan Yang, 22 Jul 2025, FedDPG: An Adaptive Yet Efficient Prompt-tuning Approach in Federated Learning Settings, https://arxiv.org/abs/2507.19534
Youngjoon Lee, Hyukjoon Lee, Jinu Gong, Yang Cao, Joonhyuk Kang, 26 Jul 2025, Debunking Optimization Myths in Federated Learning for Medical Image Classification, https://arxiv.org/abs/2507.19822
Liu junkang and Yuanyuan Liu and Fanhua Shang and Hongying Liu and Jin Liu and Wei Feng, 26 Jul 2025, FedSWA: Improving Generalization in Federated Learning with Highly Heterogeneous Data via Momentum-Based Stochastic Controlled Weight Averaging, https://arxiv.org/abs/2507.20016
Shuaipeng Zhang, Lanju Kong, Yixin Zhang, Wei He, Yongqing Zheng, Han Yu, Lizhen Cui, 28 Jul 2025, DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning, https://arxiv.org/abs/2507.20571
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He, 29 Jul 2025, Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning, https://arxiv.org/abs/2507.21494
Sven Lankester, Manel Slokom, Gustavo de Carvalho Bertoli, Matias Vizcaino, Emmanuelle Beauxis Aussalet, Laura Hollink, 15 Jul 2025, FedFlex: Federated Learning for Diverse Netflix Recommendations, https://arxiv.org/abs/2507.21115
Xinhai Yan, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang, 26 Jul 2025, FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning, https://arxiv.org/abs/2507.21177
Abdelrhman Gaber, Hassan Abd-Eltawab, John Elgallab, Youssif Abuzied, Dineo Mpanya, Turgay Celik, Swarun Kumar, Tamer ElBatt, 30 Jul 2025, FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization, https://arxiv.org/abs/2507.22963
David J Goetze, Dahlia J Felten, Jeannie R Albrecht, Rohit Bhattacharya, 30 Jul 2025, FLOSS: Federated Learning with Opt-Out and Straggler Support, https://arxiv.org/abs/2507.23115
Mohammad Karami, Fatemeh Ghassemi, Hamed Kebriaei, Hamid Azadegan, 31 Jul 2025, OptiGradTrust: Byzantine-Robust Federated Learning with Multi-Feature Gradient Analysis and Reinforcement Learning-Based Trust Weighting, https://arxiv.org/abs/2507.23638
Taeheon Lim, Joohyung Lee, Kyungjae Lee, Jungchan Cho, 31 Jul 2025, Mitigating Resolution-Drift in Federated Learning: Case of Keypoint Detection, https://arxiv.org/abs/2507.23461
Chen Zhang, Husheng Li, Xiang Liu, Linshan Jiang, Danxin Wang, 30 Jul 2025, Hypernetworks for Model-Heterogeneous Personalized Federated Learning, https://arxiv.org/abs/2507.22330
Wei Guo, Yiyang Duan, Zhaojun Hu, Yiqi Tong, Fuzhen Zhuang, Xiao Zhang, Jin Dong, Ruofan Wu, Tengfei Liu, Yifan Sun, 30 Jul 2025, Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data, https://arxiv.org/abs/2507.22488
Zhuocheng Liu, Zhishu Shen, Qiushi Zheng, Tiehua Zhang, Zheng Lei, Jiong Jin, 30 Jul 2025, A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks, https://arxiv.org/abs/2507.22339
Hongye Wang, Zhaoye Pan, Chang He, Jiaxiang Li, Bo Jiang, 30 Jul 2025, Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach, https://arxiv.org/abs/2507.22855
Bokun Wang and Axel Berg and Durmus Alp Emre Acar and Chuteng Zhou, 30 Jul 2025, Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point, https://arxiv.org/abs/2407.02610
Minyeong Choe, Cheolhee Park, Changho Seo, and Hyunil Kim, 30 Jul 2025, SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning, https://arxiv.org/abs/2409.14805
Hanchi Ren and Jingjing Deng and Xianghua Xie, 1 Aug 2025, Gradient Leakage Defense with Key-Lock Module for Federated Learning, https://arxiv.org/abs/2305.04095
Honoka Anada, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki, 1 Aug 2025, How to Evaluate Participant Contributions in Decentralized Federated Learning, https://arxiv.org/abs/2505.23246
Hangyu Li and Hongyue Wu and Guodong Fan and Zhen Zhang and Shizhan Chen and Zhiyong Feng, 1 Aug 2025, Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices, https://arxiv.org/abs/2506.20644
Jinnan Guo, Kapil Vaswani, Andrew Paverd, Peter Pietzuch, 1 Aug 2025, ExclaveFL: Providing Transparency to Federated Learning using Exclaves, https://arxiv.org/abs/2412.10537
Xin Chen, Shuaijun Chen, Omid Tavallaie, Nguyen Tran, Shuhuang Xiang, Albert Zomaya, 2 Aug 2025, Convergence Analysis of Aggregation-Broadcast in LoRA-enabled Federated Learning, https://arxiv.org/abs/2508.01348
Heting Liu, Junzhe Huang, Fang He, Guohong Cao, 3 Aug 2025, Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices, https://arxiv.org/abs/2508.01580
Ziru Niu, Hai Dong, A.K. Qin, 3 Aug 2025, Boosting Generalization Performance in Model-Heterogeneous Federated Learning Using Variational Transposed Convolution, https://arxiv.org/abs/2508.01669
Ali Forootani, Raffaele Iervolino, 3 Aug 2025, Asynchronous Federated Learning with non-convex client objective functions and heterogeneous dataset, https://arxiv.org/abs/2508.01675
Xiangwang Hou, Jingjing Wang, Fangming Guan, Jun Du, Chunxiao Jiang, Yong Ren, 3 Aug 2025, Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design, https://arxiv.org/abs/2508.01745
Ignacy St\k{e}pka, Nicholas Gisolfi, Kacper Tr\k{e}bacz, Artur Dubrawski, 3 Aug 2025, Mitigating Persistent Client Dropout in Asynchronous Decentralized Federated Learning, https://arxiv.org/abs/2508.01807
Qi Xiong, Hai Dong, Nasrin Sohrabi, Zahir Tari, 4 Aug 2025, FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning, https://arxiv.org/abs/2508.02136
Mirko Konstantin, Moritz Fuchs and Anirban Mukhopadhyay, 4 Aug 2025, ASMR: Angular Support for Malfunctioning Client Resilience in Federated Learning, https://arxiv.org/abs/2508.02414
Shunxian Gu, Chaoqun You, Bangbang Ren, Deke Guo, 4 Aug 2025, Communication and Computation Efficient Split Federated Learning in O-RAN, https://arxiv.org/abs/2508.02534
Junjie Shan, Ziqi Zhao, Jialin Lu, Rui Zhang, Siu Ming Yiu and Ka-Ho Chow, 2 Aug 2025, Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning, https://arxiv.org/abs/2411.14937
Sota Mashiko, Yuji Kawamata, Tomoru Nakayama, Tetsuya Sakurai, Yukihiko Okada, 1 Aug 2025, Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach, https://arxiv.org/abs/2501.12723
Keke Gai, Mohan Wang, Jing Yu, Dongjue Wang, Qi Wu, 3 Aug 2025, Adaptive Prototype Knowledge Transfer for Federated Learning with Mixed Modalities and Heterogeneous Tasks, https://arxiv.org/abs/2502.04400
Jiahui Bai, Hai Dong, A. K. Qin, 5 Aug 2025, On the Fast Adaptation of Delayed Clients in Decentralized Federated Learning: A Centroid-Aligned Distillation Approach, https://arxiv.org/abs/2508.02993
Weiyao Zhang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang, 5 Aug 2025, Heterogeneity-Oblivious Robust Federated Learning, https://arxiv.org/abs/2508.03579
Hao Di, Yi Yang, Haishan Ye, Xiangyu Chang, 5 Aug 2025, PPFL: A Personalized Federated Learning Framework for Heterogeneous Population, https://arxiv.org/abs/2310.14337
Hyungbin Kim, Incheol Baek, Yon Dohn Chung, 6 Aug 2025, Decoupled Contrastive Learning for Federated Learning, https://arxiv.org/abs/2508.04005
Tuan Nguyen, Khoa D Doan, and Kok-Seng Wong, 6 Aug 2025, FLAT: Latent-Driven Arbitrary-Target Backdoor Attacks in Federated Learning, https://arxiv.org/abs/2508.04064
Jianheng Tang, Zhirui Yang, Jingchao Wang, Kejia Fan, Jinfeng Xu, Huiping Zhuang, Anfeng Liu, Houbing Herbert Song, Leye Wang, Yunhuai Liu, 6 Aug 2025, FedHiP: Heterogeneity-Invariant Personalized Federated Learning Through Closed-Form Solutions, https://arxiv.org/abs/2508.04470
Borui Li, Li Yan, Junhao Han, Jianmin Liu, Lei Yu, 6 Aug 2025, SenseCrypt: Sensitivity-guided Selective Homomorphic Encryption for Joint Federated Learning in Cross-Device Scenarios, https://arxiv.org/abs/2508.04100
Borui Li, Li Yan, Jianmin Liu, 6 Aug 2025, SelectiveShield: Lightweight Hybrid Defense Against Gradient Leakage in Federated Learning, https://arxiv.org/abs/2508.04265
Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang, 5 Aug 2025, Traceable Black-box Watermarks for Federated Learning, https://arxiv.org/abs/2505.13651
Thinh Nguyen, Le Huy Khiem, Van-Tuan Tran, Khoa D Doan, Nitesh V Chawla, Kok-Seng Wong, 7 Aug 2025, pFedDSH: Enabling Knowledge Transfer in Personalized Federated Learning through Data-free Sub-Hypernetwork, https://arxiv.org/abs/2508.05157
Mirko Konstantin and Anirban Mukhopadhyay, 7 Aug 2025, Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning, https://arxiv.org/abs/2508.05224
Qinghua Yao, Xiangrui Xu, Zhize Li, 7 Aug 2025, X-VFL: A New Vertical Federated Learning Framework with Cross Completion and Decision Subspace Alignment, https://arxiv.org/abs/2508.05568
Sachin Dudda Nagaraju, Ashkan Moradi, Bendik Skarre Abrahamsen, and Mattijs Elschot, 7 Aug 2025, FedGIN: Federated Learning with Dynamic Global Intensity Non-linear Augmentation for Organ Segmentation using Multi-modal Images, https://arxiv.org/abs/2508.05137
Ce Na, Kai Yang, Dengzhao Fang, Yu Li, Jingtong Gao, Chengcheng Zhu, Jiale Zhang, Xiaobing Sun, Yi Chang, 8 Aug 2025, Graph Federated Learning for Personalized Privacy Recommendation, https://arxiv.org/abs/2508.06208
Yuze Liu, Tiehua Zhang, Zhishu Shen, Libing Wu, Shiping Chen and Jiong Jin, 1 Aug 2025, Towards Heterogeneity-Aware and Energy-Efficient Topology Optimization for Decentralized Federated Learning in Edge Environment, https://arxiv.org/abs/2508.08278
Dung T. Tran, Nguyen B. Ha, Van-Dinh Nguyen, Kok-Seng Wong, 11 Aug 2025, SHeRL-FL: When Representation Learning Meets Split Learning in Hierarchical Federated Learning, https://arxiv.org/abs/2508.08339
Keumseo Ryum, Jinu Gong, and Joonhyuk Kang, 12 Aug 2025, SHEFL: Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning, https://arxiv.org/abs/2508.08552
Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang, 12 Aug 2025, Distributed optimization: designed for federated learning, https://arxiv.org/abs/2508.08606
Yuvraj Dutta, Soumyajit Chatterjee, Sandip Chakraborty, Basabdatta Palit, 11 Aug 2025, Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications, https://arxiv.org/abs/2508.08479
Davide Domini, Gianluca Aguzzi, Lukas Esterle and Mirko Viroli, 12 Aug 2025, FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning, https://arxiv.org/abs/2502.08577
Ratun Rahman, 12 Aug 2025, Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence, https://arxiv.org/abs/2504.17703
Zhekai Zhou, Shudong Liu, Zhaokun Zhou, Yang Liu, Qiang Yang, Yuesheng Zhu, Guibo Luo, 7 Aug 2025, FedMP: Tackling Medical Feature Heterogeneity in Federated Learning from a Manifold Perspective, https://arxiv.org/abs/2508.09174
Jinghong Tan, Zhian Liu, Kun Guo, Mingxiong Zhao, 7 Aug 2025, Long-Term Client Selection for Federated Learning with Non-IID Data: A Truthful Auction Approach, https://arxiv.org/abs/2508.09181
Zikai Zhang, Suman Rath, Jiahao Xu, Tingsong Xiao, 13 Aug 2025, Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities, https://arxiv.org/abs/2409.10764
Heqiang Wang, Weihong Yang, Xiaoxiong Zhong, Jia Zhou, Fangming Liu, Weizhe Zhang, 15 Aug 2025, Mitigating Modality Quantity and Quality Imbalance in Multimodal Online Federated Learning, https://arxiv.org/abs/2508.11159
Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko, 14 Aug 2025, Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping, https://arxiv.org/abs/2310.00098
You Hak Lee, Xiaofan Yu, Quanling Zhao, Flavio Ponzina, Tajana Rosing, 16 Aug 2025, FedUHD: Unsupervised Federated Learning using Hyperdimensional Computing, https://arxiv.org/abs/2508.12021
Zahra Kharaghani, Ali Dadras, Tommy L\"ofstedt, 16 Aug 2025, Fairness Regularization in Federated Learning, https://arxiv.org/abs/2508.12042
Emmanouil Kritharakis, Dusan Jakovetic, Antonios Makris, Konstantinos Tserpes, 18 Aug 2025, Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering, https://arxiv.org/abs/2508.12672
Yuhao Zhou, Jindi Lv, Yuxin Tian, Dan Si, Qing Ye, Jiancheng Lv, 18 Aug 2025, Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach, https://arxiv.org/abs/2508.12673
Beomseok Seo, Kichang Lee, JaeYeon Park, 18 Aug 2025, FedUNet: A Lightweight Additive U-Net Module for Federated Learning with Heterogeneous Models, https://arxiv.org/abs/2508.12740
Yue Xia, Tayyebeh Jahani-Nezhad and Rawad Bitar, 18 Aug 2025, Fed-DPRoC:Communication-Efficient Differentially Private and Robust Federated Learning, https://arxiv.org/abs/2508.12978
Xiaojin Zhang, Mingcong Xu, Yiming Li, Wei Chen, Qiang Yang, 16 Aug 2025, Deciphering the Interplay between Attack and Protection Complexity in Privacy-Preserving Federated Learning, https://arxiv.org/abs/2508.11907
Ratun Rahman, Atit Pokharel, Md Raihan Uddin, and Dinh C. Nguyen, 17 Aug 2025, SimQFL: A Quantum Federated Learning Simulator with Real-Time Visualization, https://arxiv.org/abs/2508.12477
Jihyun Lim, Junhyuk Jo, Tuo Zhang, Sunwoo Lee, 17 Aug 2025, Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogenous Federated Learning, https://arxiv.org/abs/2503.11151
Shiwei Li, Xiandi Luo, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li, 17 Aug 2025, The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning, https://arxiv.org/abs/2505.23176
SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon, 17 Aug 2025, Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation, https://arxiv.org/abs/2412.10436
Wenxuan Ye, Xueli An, Onur Ayan, Junfan Wang, Xueqiang Yan, Georg Carle, 19 Aug 2025, Towards a Larger Model via One-Shot Federated Learning on Heterogeneous Client Models, https://arxiv.org/abs/2508.13625
Wenfei Liang, Yanan Zhao, Rui She, Yiming Li and Wee Peng Tay, 19 Aug 2025, Personalized Subgraph Federated Learning with Sheaf Collaboration, https://arxiv.org/abs/2508.13642
Jie Shi, Arno P. J. M. Siebes, Siamak Mehrkanoon, 19 Aug 2025, Trans-XFed: An Explainable Federated Learning for Supply Chain Credit Assessment, https://arxiv.org/abs/2508.13715
Sergey Skorik, Vladislav Dorofeev, Gleb Molodtsov, Aram Avetisyan, Dmitry Bylinkin, Daniil Medyakov, Aleksandr Beznosikov, 19 Aug 2025, Communication-Efficient Federated Learning with Adaptive Number of Participants, https://arxiv.org/abs/2508.13803
Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti, 19 Aug 2025, On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions, https://arxiv.org/abs/2508.13730
Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti, 19 Aug 2025, POPri: Private Federated Learning using Preference-Optimized Synthetic Data, https://arxiv.org/abs/2504.16438
Nazatul Haque Sultan, Yan Bo, Yansong Gao, Seyit Camtepe, Arash Mahboubi, Hang Thanh Bui, Aufeef Chauhan, Hamed Aboutorab, Michael Bewong, Dineshkumar Singh, Praveen Gauravaram, Rafiqul Islam, and Sharif Abuadbba, 19 Aug 2025, Setup Once, Secure Always: A Single-Setup Secure Federated Learning Aggregation Protocol with Forward and Backward Secrecy for Dynamic Users, https://arxiv.org/abs/2502.08989
Tao Shen, Zexi Li, Didi Zhu, Ziyu Zhao, Chao Wu, Fei Wu, 20 Aug 2025, FedEve: On Bridging the Client Drift and Period Drift for Cross-device Federated Learning, https://arxiv.org/abs/2508.14539
Yichen Li, Xiuying Wang, Wenchao Xu, Haozhao Wang, Yining Qi, Jiahua Dong, Ruixuan Li, 20 Aug 2025, Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning, https://arxiv.org/abs/2507.10348
Miha O\v{z}bot, Igor \v{S}krjanc, 21 Aug 2025, Federated Learning based on Self-Evolving Gaussian Clustering, https://arxiv.org/abs/2508.15393
Bingguang Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen, 21 Aug 2025, BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning, https://arxiv.org/abs/2508.15541
Lishan Yang, Wei Emma Zhang, Quan Z. Sheng, Lina Yao, Weitong Chen and Ali Shakeri, 21 Aug 2025, MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning, https://arxiv.org/abs/2505.06911
Dinh C. Nguyen, Md Raihan Uddin, Shaba Shaon, Ratun Rahman, Octavia Dobre, and Dusit Niyato, 21 Aug 2025, Quantum Federated Learning: A Comprehensive Survey, https://arxiv.org/abs/2508.15998
Renxuan Tan, Rongpeng Li, Xiaoxue Yu, Xianfu Chen, Xing Xu, and Zhifeng Zhao, 22 Aug 2025, Pareto Actor-Critic for Communication and Computation Co-Optimization in Non-Cooperative Federated Learning Services, https://arxiv.org/abs/2508.16037
Guangyu Sun, Jingtao Li, Weiming Zhuang, Chen Chen, Chen Chen, Lingjuan Lyu, 22 Aug 2025, Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation, https://arxiv.org/abs/2508.16568
Bibo Wu, Fang Fang, Ming Zeng and Xianbin Wang, 17 Aug 2025, Straggler-Resilient Federated Learning over A Hybrid Conventional and Pinching Antenna Network, https://arxiv.org/abs/2508.15821
Zhenan Fan, Huang Fang, Xinglu Wang, Zirui Zhou, Jian Pei, Michael P. Friedlander, Yong Zhang, 21 Aug 2025, Fair and efficient contribution valuation for vertical federated learning, https://arxiv.org/abs/2201.02658
Xinyu Zhou, Jun Zhao, Huimei Han, Claude Guet, 22 Aug 2025, Joint Optimization of Energy Consumption and Completion Time in Federated Learning, https://arxiv.org/abs/2209.14900
Seunghun Yu, Jin-Hyun Ahn, Joonhyuk Kang, 22 Aug 2025, FedEFC: Federated Learning Using Enhanced Forward Correction Against Noisy Labels, https://arxiv.org/abs/2504.05615
Tao Liu, Xuehe Wang, 23 Aug 2025, Degree of Staleness-Aware Data Updating in Federated Learning, https://arxiv.org/abs/2508.16931
Jiaqi Zhu, Bikramjit Das, Yong Xie, Nikolaos Pappas, and Howard H. Yang, 25 Aug 2025, Rethinking Federated Learning Over the Air: The Blessing of Scaling Up, https://arxiv.org/abs/2508.17697
Ming Yang, Dongrun Li, Xin Wang, Xiaoyang Yu, Xiaoming Wu, Shibo He, 25 Aug 2025, Choice Outweighs Effort: Facilitating Complementary Knowledge Fusion in Federated Learning via Re-calibration and Merit-discrimination, https://arxiv.org/abs/2508.17954
Emmanouil Kritharakis and Antonios Makris and Dusan Jakovetic and Konstantinos Tserpes, 25 Aug 2025, FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning, https://arxiv.org/abs/2508.18060
Po-Hsien Yu, Yu-Syuan Tseng, and Shao-Yi Chien, 24 Aug 2025, FedKLPR: Personalized Federated Learning for Person Re-Identification with Adaptive Pruning, https://arxiv.org/abs/2508.17431
Bishwamittra Ghosh, Debabrota Basu, Fu Huazhu, Wang Yuan, Renuga Kanagavelu, Jiang Jin Peng, Liu Yong, Goh Siow Mong Rick, and Wei Qingsong, 23 Aug 2025, History-Aware and Dynamic Client Contribution in Federated Learning, https://arxiv.org/abs/2403.07151
Ruofan Jia, Weiying Xie, Jie Lei, Jitao Ma, Haonan Qin, Leyuan Fang, 25 Aug 2025, HeteroTune: Efficient Federated Learning for Large Heterogeneous Models, https://arxiv.org/abs/2411.16796
Chao Feng, Yuanzhe Gao, Alberto Huertas Celdran, Gerome Bovet, Burkhard Stiller, 25 Aug 2025, From Models to Network Topologies: A Topology Inference Attack in Decentralized Federated Learning, https://arxiv.org/abs/2501.03119
Harish Karthikeyan and Antigoni Polychroniadou, 24 Aug 2025, $\mathsf{OPA}$: One-shot Private Aggregation with Single Client Interaction and its Applications to Federated Learning, https://arxiv.org/abs/2410.22303
Zhengyu Wu, Xunkai Li, Yinlin Zhu, Zekai Chen, Guochen Yan, Yanyu Yan, Hao Zhang, Yuming Ai, Xinmo Jin, Rong-Hua Li, and Guoren Wang, 22 Jul 2025, A Comprehensive Data-centric Overview of Federated Graph Learning, https://arxiv.org/abs/2507.16541
Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham, 22 Jul 2025, Energy-Efficient and Real-Time Sensing for Federated Continual Learning via Sample-Driven Control, https://arxiv.org/abs/2310.07497
Zhongzheng Yuan, Lianshuai Guo, Xunkai Li, Yinlin Zhu, Wenyu Wang, Meixia Qu, 24 Jul 2025, FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting, https://arxiv.org/abs/2507.18219
Xu Zhang, Zhenyuan Yuan, Minghui Zhu, 18 Jul 2025, Byzantine-resilient federated online learning for Gaussian process regression, https://arxiv.org/abs/2507.14021
Ukjo Hwang, Songnam Hong, 19 Jul 2025, Federated Reinforcement Learning in Heterogeneous Environments, https://arxiv.org/abs/2507.14487
Yujia Mu, Cong Shen, 21 Jul 2025, Federated Split Learning with Improved Communication and Storage Efficiency, https://arxiv.org/abs/2507.15816
Zihao Hu (1), Jia Yan (2), Ying-Jun Angela Zhang (1) ((1) The Chinese University of Hong Kong, (2) The Hong Kong University of Science and Technology (Guangzhou)), 6 Aug 2025, Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation, https://arxiv.org/abs/2508.06557
Jingmao Li, Yuanxing Chen, Shuangge Ma, Kuangnan Fang, 8 Aug 2025, Federated Online Learning for Heterogeneous Multisource Streaming Data, https://arxiv.org/abs/2508.06652
Abhishek Sawaika, Swetang Krishna, Tushar Tomar, Durga Pritam Suggisetti, Aditi Lal, Tanmaya Shrivastav, Nouhaila Innan, Muhammad Shafique, 15 Jul 2025, A Privacy-Preserving Federated Framework with Hybrid Quantum-Enhanced Learning for Financial Fraud Detection, https://arxiv.org/abs/2507.22908
Danni Peng, Yuan Wang, Kangning Cai, Peiyan Ning, Jiming Xu, Yong Liu, Rick Siow Mong Goh, Qingsong Wei, Huazhu Fu, 14 Aug 2025, Improving Learning of New Diseases through Knowledge-Enhanced Initialization for Federated Adapter Tuning, https://arxiv.org/abs/2508.10299
Xinrui Li, Qilin Fan, Tianfu Wang, Kaiwen Wei, Ke Yu, Xu Zhang, 14 Aug 2025, GraphFedMIG: Tackling Class Imbalance in Federated Graph Learning via Mutual Information-Guided Generation, https://arxiv.org/abs/2508.10471
Zekai Chen, Xunkai Li, Yinlin Zhu, Rong-Hua Li, Guoren Wang, 14 Aug 2025, Rethinking Client-oriented Federated Graph Learning, https://arxiv.org/abs/2504.14188
Chengzhuo Han, 28 Jul 2025, Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems, https://arxiv.org/abs/2507.20444
Yebo Wu, Jingguang Li, Zhijiang Guo and Li Li, 31 Jul 2025, Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages, https://arxiv.org/abs/2508.00041
Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King, Yifei Zhang, 2 Aug 2025, Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning, https://arxiv.org/abs/2508.01251
Cui Miao, Tao Chang, Meihan Wu, Hongbin Xu, Chun Li, Ming Li, Xiaodong Wang, 4 Aug 2025, FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation, https://arxiv.org/abs/2508.02190
Shuo Wang and Keke Gai and Jing Yu and Liehuang Zhu and Qi Wu, 5 Aug 2025, Vertical Federated Continual Learning via Evolving Prototype Knowledge, https://arxiv.org/abs/2502.09152
Zihan Tan, Suyuan Huang, Guancheng Wan, Wenke Huang, He Li and Mang Ye, 5 Aug 2025, S2FGL: Spatial Spectral Federated Graph Learning, https://arxiv.org/abs/2507.02409
Shengchao Chen, Guodong Long, Jing Jiang, 6 Aug 2025, FeDaL: Federated Dataset Learning for Time Series Foundation Models, https://arxiv.org/abs/2508.04045
Jiansheng Rao, Jiayi Li, Zhizhi Gong, Soummya Kar, Haoxuan Li, 7 Aug 2025, Federated Multi-Objective Learning with Controlled Pareto Frontiers, https://arxiv.org/abs/2508.05424
Junhyeog Yun, Minui Hong, Gunhee Kim, 8 Aug 2025, FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields, https://arxiv.org/abs/2508.06301
Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang, 12 Aug 2025, Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models, https://arxiv.org/abs/2508.08875
Hao Yu, Xin Yang, Boyang Fan, Xuemei Cao, Hanlin Gu, Lixin Fan, Qiang Yang, 13 Aug 2025, Large-Small Model Collaborative Framework for Federated Continual Learning, https://arxiv.org/abs/2508.09489
Lianshuai Guo, Zhongzheng Yuan, Xunkai Li, Yinlin Zhu, Meixia Qu, Wenyu Wang, 15 Aug 2025, DFed-SST: Building Semantic- and Structure-aware Topologies for Decentralized Federated Graph Learning, https://arxiv.org/abs/2508.11530
Marcel Gregoriadis, Jingwei Kang, Johan Pouwelse, 17 Aug 2025, A Large-Scale Web Search Dataset for Federated Online Learning to Rank, https://arxiv.org/abs/2508.12353
Dingzhu Wen, Sijing Xie, Xiaowen Cao, Yuanhao Cui, Jie Xu, Yuanming Shi, and Shuguang Cui, 21 Aug 2025, Integrated Sensing, Communication, and Computation for Over-the-Air Federated Edge Learning, https://arxiv.org/abs/2508.15185
Hamta Sedghani, Abednego Wamuhindo Kambale, Federica Filippini, Francesca Palermo, Diana Trojaniello, Danilo Ardagna, 24 Aug 2025, Federated Reinforcement Learning for Runtime Optimization of AI Applications in Smart Eyewears, https://arxiv.org/abs/2508.17262
Omar Bekdache and Naresh Shanbhag, 24 Aug 2025, FedERL: Federated Efficient and Robust Learning for Common Corruptions, https://arxiv.org/abs/2508.17381
Payam Abdisarabshali, Fardis Nadimi, Kasra Borazjani, Naji Khosravan, Minghui Liwang, Wei Ni, Dusit Niyato, Michael Langberg, Seyyedali Hosseinalipour, 3 Sep 2025, Hierarchical Federated Foundation Models over Wireless Networks for Multi-Modal Multi-Task Intelligence: Integration of Edge Learning with D2D/P2P-Enabled Fog Learning Architectures, https://arxiv.org/abs/2509.03695
Allan Salihovic, Payam Abdisarabshali, Michael Langberg, Seyyedali Hosseinalipour, 3 Sep 2025, From Federated Learning to $\mathbb{X}$-Learning: Breaking the Barriers of Decentrality Through Random Walks, https://arxiv.org/abs/2509.03709
Ozgu Goksu and Nicolas Pugeault, 4 Sep 2025, FedQuad: Federated Stochastic Quadruplet Learning to Mitigate Data Heterogeneity, https://arxiv.org/abs/2509.04107
Cosmin-Andrei Hatfaludi and Alex Serban, 5 Sep 2025, Foundational Models and Federated Learning: Survey, Taxonomy, Challenges and Practical Insights, https://arxiv.org/abs/2509.05142
Jiaojiao Zhang, Yuqi Xu, Kun Yuan, 5 Sep 2025, An Efficient Subspace Algorithm for Federated Learning on Heterogeneous Data, https://arxiv.org/abs/2509.05213
Walid El Maouaki, Nouhaila Innan, Alberto Marchisio, Taoufik Said, Muhammad Shafique, and Mohamed Bennai, 5 Sep 2025, RobQFL: Robust Quantum Federated Learning in Adversarial Environment, https://arxiv.org/abs/2509.04914
Zijian Wang, Wei Tong, Tingxuan Han, Haoyu Chen, Tianling Zhang, Yunlong Mao, Sheng Zhong, 5 Sep 2025, On Evaluating the Poisoning Robustness of Federated Learning under Local Differential Privacy, https://arxiv.org/abs/2509.05265
Miroslav Popovic, Marko Popovic, Miodrag Djukic, Ilija Basicevic, 5 Sep 2025, Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT, https://arxiv.org/abs/2506.07173
Francesco Diana, Andr\'e Nusser, Chuan Xu, Giovanni Neglia, 5 Sep 2025, Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning, https://arxiv.org/abs/2505.10264
Johan Erbani, Sonia Ben Mokhtar, Pierre-Edouard Portier, Elod Egyed-Zsigmond, Diana Nurbakova, 5 Sep 2025, A Weighted Loss Approach to Robust Federated Learning under Data Heterogeneity, https://arxiv.org/abs/2506.09824
Jiahao Xu, Rui Hu, Olivera Kotevska, 5 Sep 2025, Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy, https://arxiv.org/abs/2505.13655
Rodrigo Tertulino, 23 Aug 2025, Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing, https://arxiv.org/abs/2508.18316
Yang Li, Hanjie Wang, Yuanzheng Li, Jiazheng Li, Zhaoyang Dong, 24 Aug 2025, ZTFed-MAS2S: A Zero-Trust Federated Learning Framework with Verifiable Privacy and Trust-Aware Aggregation for Wind Power Data Imputation, https://arxiv.org/abs/2508.18318
Enrique M\'armol Campos and Aurora Gonz\'alez Vidal and Jos\'e Luis Hern\'andez Ramos and Antonio Skarmeta, 26 Aug 2025, FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks, https://arxiv.org/abs/2508.18737
Adam Breitholtz and Edvin Listo Zec and Fredrik D. Johansson, 26 Aug 2025, Federated Learning with Heterogeneous and Private Label Sets, https://arxiv.org/abs/2508.18774
Zhibo Xu, Jianhao Zhu, Jingwen Xu, Changze Lv, Zisu Huang, Xiaohua Wang, Muling Wu, Qi Qian, Xiaoqing Zheng, Xuanjing Huang, 26 Aug 2025, Enhancing Model Privacy in Federated Learning with Random Masking and Quantization, https://arxiv.org/abs/2508.18911
Md Anwar Hossen, Fatema Siddika, Wensheng Zhang, Anuj Sharma, and Ali Jannesari, 26 Aug 2025, FedProtoKD: Dual Knowledge Distillation with Adaptive Class-wise Prototype Margin for Heterogeneous Federated Learning, https://arxiv.org/abs/2508.19009
Edvin Listo Zec and Adam Breitholtz and Fredrik D. Johansson, 26 Aug 2025, Overcoming label shift with target-aware federated learning, https://arxiv.org/abs/2411.03799
Tiandi Ye, Wenyan Liu, Kai Yao, Lichun Li, Shangchao Su, Cen Chen, Xiang Li, Shan Yin, Ming Gao, 27 Aug 2025, Towards Instance-wise Personalized Federated Learning via Semi-Implicit Bayesian Prompt Tuning, https://arxiv.org/abs/2508.19621
Viktor Valadi, Mattias {\AA}kesson, Johan \"Ostman, Salman Toor, Andreas Hellander, 27 Aug 2025, From Research to Reality: Feasibility of Gradient Inversion Attacks in Federated Learning, https://arxiv.org/abs/2508.19819
Ferdous Pervej and Minseok Choi and Andreas F. Molisch, 27 Aug 2025, Online-Score-Aided Federated Learning: Taming the Resource Constraints in Wireless Networks, https://arxiv.org/abs/2408.05886
Atit Pokharel, Ratun Rahman, Shaba Shaon, Thomas Morris and Dinh C. Nguyen, 27 Aug 2025, Differentially Private Federated Quantum Learning via Quantum Noise, https://arxiv.org/abs/2508.20310
Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury, 28 Aug 2025, FLASH: Federated Learning Across Simultaneous Heterogeneities, https://arxiv.org/abs/2402.08769
Hossein KhademSohi, Hadi Hemmati, Jiayu Zhou, Steve Drew, 28 Aug 2025, Owen Sampling Accelerates Contribution Estimation in Federated Learning, https://arxiv.org/abs/2508.21261
Masahiro Hayashitani, Junki Mori, and Isamu Teranishi, 29 Aug 2025, Survey of Privacy Threats and Countermeasures in Federated Learning, https://arxiv.org/abs/2402.00342
Rodrigo Tertulino, 27 Aug 2025, Centralized vs. Federated Learning for Educational Data Mining: A Comparative Study on Student Performance Prediction with SAEB Microdata, https://arxiv.org/abs/2509.00086
Minku Kang, Hogun Park, 30 Aug 2025, Curriculum Guided Personalized Subgraph Federated Learning, https://arxiv.org/abs/2509.00402
Xiangyu Zhang and Mang Ye, 30 Aug 2025, FedThief: Harming Others to Benefit Oneself in Self-Centered Federated Learning, https://arxiv.org/abs/2509.00540
Noorain Mukhtiar, Adnan Mahmood and Quan Z. Sheng, 31 Aug 2025, Fairness in Federated Learning: Trends, Challenges, and Opportunities, https://arxiv.org/abs/2509.00799
Olusola Odeyomi, Sofiat Olaosebikan, Ajibuwa Opeyemi, and Oluwadoyinsola Ige, 31 Aug 2025, Online Decentralized Federated Multi-task Learning With Trustworthiness in Cyber-Physical Systems, https://arxiv.org/abs/2509.00992
Maciej Krzysztof Zuziak, Roberto Pellungrini and Salvatore Rinzivillo, 1 Sep 2025, One-Shot Clustering for Federated Learning Under Clustering-Agnostic Assumption, https://arxiv.org/abs/2509.01587
Dongseok Kim, Wonjun Jeong, Gisung Oh, 2 Sep 2025, Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It, https://arxiv.org/abs/2509.02391
Rui Zhang, Wenlong Mou, 2 Sep 2025, Federated learning over physical channels: adaptive algorithms with near-optimal guarantees, https://arxiv.org/abs/2509.02538
Chaoyu Zhang and Heng Jin and Shanghao Shi and Hexuan Yu and Sydney Johns and Y. Thomas Hou and Wenjing Lou, 30 Aug 2025, Enabling Trustworthy Federated Learning via Remote Attestation for Mitigating Byzantine Threats, https://arxiv.org/abs/2509.00634
Kai Zhang, Yutong Dai, Hongyi Wang, Eric Xing, Xun Chen, Lichao Sun, 2 Sep 2025, Memory-adaptive Depth-wise Heterogeneous Federated Learning, https://arxiv.org/abs/2303.04887
I-Cheng Lin, Osman Yagan, Carlee Joe-Wong, 2 Sep 2025, FedSPD: A Soft-clustering Approach for Personalized Decentralized Federated Learning, https://arxiv.org/abs/2410.18862
Mehdi Ben Ghali, Gouenou Coatrieux, Reda Bellafqira, 1 Sep 2025, FL-CLEANER: byzantine and backdoor defense by CLustering Errors of Activation maps in Non-iid fedErated leaRning, https://arxiv.org/abs/2501.12123
Yanmeng Wang, Wenkai Ji, Jian Zhou, Fu Xiao, Tsung-Hui Chang, 1 Sep 2025, Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach, https://arxiv.org/abs/2502.17260
Kaoru Otsuka, Yuki Takezawa, Makoto Yamada, 3 Sep 2025, Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation, https://arxiv.org/abs/2509.02970
Yuhang Yao, Yuan Li, Xinyi Fan, Junhao Li, Kay Liu, Weizhao Jin, Yu Yang, Srivatsan Ravi, Philip S. Yu, Carlee Joe-Wong, 2 Sep 2025, FedGraph: A Research Library and Benchmark for Federated Graph Learning, https://arxiv.org/abs/2410.06340
Royson Lee, Minyoung Kim, Fady Rezk, Rui Li, Stylianos I. Venieris, Timothy Hospedales, 3 Sep 2025, FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs, https://arxiv.org/abs/2502.04387
Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam, 4 Sep 2025, AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning, https://arxiv.org/abs/2509.05362
Johan Andreas Balle Rubak, Khuram Naveed, Sanyam Jain, Lukas Esterle, Alexandros Iosifidis and Ruben Pauwels, 8 Sep 2025, Impact of Labeling Inaccuracy and Image Noise on Tooth Segmentation in Panoramic Radiographs using Federated, Centralized and Local Learning, https://arxiv.org/abs/2509.06553
Vasilis Siomos, Jonathan Passerat-Palmbach, Giacomo Tarroni, 8 Sep 2025, An Architecture Built for Federated Learning: Addressing Data Heterogeneity through Adaptive Normalization-Free Feature Recalibration, https://arxiv.org/abs/2410.02006
Wenhan Dong, Chao Lin, Xinlei He, Shengmin Xu, Xinyi Huang, 6 Sep 2025, Privacy-Preserving Federated Learning via Homomorphic Adversarial Networks, https://arxiv.org/abs/2412.01650
Usama Zafar and Andr\'e Teixeira and Salman Toor, 8 Sep 2025, Byzantine-Robust Federated Learning Using Generative Adversarial Networks, https://arxiv.org/abs/2503.20884
Yiyue Chen, Usman Akram, Chianing Wang, Haris Vikalo, 8 Sep 2025, Fed-REACT: Federated Representation Learning for Heterogeneous and Evolving Data, https://arxiv.org/abs/2509.07198
Yuxuan Bai, Yuxuan Sun, Tan Chen, Wei Chen, Sheng Zhou, Zhisheng Niu, 9 Sep 2025, FedTeddi: Temporal Drift and Divergence Aware Scheduling for Timely Federated Edge Learning, https://arxiv.org/abs/2509.07342
Yanxin Yang, Ming Hu, Xiaofei Xie, Yue Cao, Pengyu Zhang, Yihao Huang, Mingsong Chen, 9 Sep 2025, FilterFL: Knowledge Filtering-based Data-Free Backdoor Defense for Federated Learning, https://arxiv.org/abs/2308.11333
Ozgu Goksu, Nicolas Pugeault, 9 Sep 2025, Hybrid-Regularized Magnitude Pruning for Robust Federated Learning under Covariate Shift, https://arxiv.org/abs/2412.15010
Mohammad Hasan Narimani and Mostafa Tavassolipour, 12 Sep 2025, FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection, https://arxiv.org/abs/2509.10041
Nour Jamoussi, Giuseppe Serra, Photios A. Stavrou, Marios Kountouris, 12 Sep 2025, Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning, https://arxiv.org/abs/2509.10132
Shiwei Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Jianbin Lin, Wenliang Zhong, 12 Sep 2025, FedBiF: Communication-Efficient Federated Learning via Bits Freezing, https://arxiv.org/abs/2509.10161
Francisco Javier Esono Nkulu Andong and Qi Min, 12 Sep 2025, Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks, https://arxiv.org/abs/2509.10163
Teresa Salazar and Jo\~ao Gama and Helder Ara\'ujo and Pedro Henriques Abreu, 12 Sep 2025, Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning, https://arxiv.org/abs/2402.07586
Teresa Salazar, Helder Ara\'ujo, Alberto Cano, Pedro Henriques Abreu, 12 Sep 2025, A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research, https://arxiv.org/abs/2410.03855
Zeyneddin Oz, Shreyas Korde, Marius Bock, Kristof Van Laerhoven, 12 Sep 2025, FedFitTech: A Baseline in Federated Learning for Fitness Tracking, https://arxiv.org/abs/2506.16840
Daniel Richards Arputharaj, Charlotte Rodriguez, Angelo Rodio, Giovanni Neglia, 10 Sep 2025, Green Federated Learning via Carbon-Aware Client and Time Slot Scheduling, https://arxiv.org/abs/2509.08980
Sena Ergisi, Luis Ma{\ss}ny, Rawad Bitar, 11 Sep 2025, ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning, https://arxiv.org/abs/2509.09534
Diying Yang, Yingwei Hou, Weigang Wu, 11 Sep 2025, Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization, https://arxiv.org/abs/2504.19903
Shun Takagi, Satoshi Hasegawa, 11 Sep 2025, Securing Private Federated Learning in a Malicious Setting: A Scalable TEE-Based Approach with Client Auditing, https://arxiv.org/abs/2509.08709
Van-Tuan Tran, Hong-Hanh Nguyen-Le, Quoc-Viet Pham, 19 Sep 2025, ToFU: Transforming How Federated Learning Systems Forget User Data, https://arxiv.org/abs/2509.15861
Kristina P. Sinaga, 19 Sep 2025, Personalized Federated Learning with Heat-Kernel Enhanced Tensorized Multi-View Clustering, https://arxiv.org/abs/2509.16101
Rasil Baidar, Sasa Maric, Robert Abbas, 19 Sep 2025, Hybrid Deep Learning-Federated Learning Powered Intrusion Detection System for IoT/5G Advanced Edge Computing Network, https://arxiv.org/abs/2509.15555
Xiumei Deng, Jun Li, Kang Wei, Long Shi, Zehui Xiong, Ming Ding, Wen Chen, Shi Jin, and H. Vincent Poor, 19 Sep 2025, Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization, https://arxiv.org/abs/2405.17932
Qiyue Li, Yingxin Liu, Hang Qi, Jieping Luo, Zhizhang Liu, Jingjin Wu, 19 Sep 2025, Adaptive Client Selection via Q-Learning-based Whittle Index in Wireless Federated Learning, https://arxiv.org/abs/2509.13933
Binquan Guo, Junteng Cao, Marie Siew, Binbin Chen, Tony Q. S. Quek, Zhu Han, 5 Sep 2025, Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems, https://arxiv.org/abs/2509.12222
Ritesh Janga and Rushit Dave, 15 Sep 2025, Enhancing Smart Farming Through Federated Learning: A Secure, Scalable, and Efficient Approach for AI-Driven Agriculture, https://arxiv.org/abs/2509.12363
Haozhi Shi, Weiying Xie, Hangyu Ye, Daixun Li, Jitao Ma, and Leyuan Fang, 16 Sep 2025, High-Energy Concentration for Federated Learning in Frequency Domain, https://arxiv.org/abs/2509.12630
Wilfrid Sougrinoma Compaor\'e, Yaya Etiabi, El Mehdi Amhoud, Mohamad Assaad, 16 Sep 2025, Energy-Efficient Quantized Federated Learning for Resource-constrained IoT devices, https://arxiv.org/abs/2509.12814
Honghong Zeng, Jiong Lou, Zhe Wang, Hefeng Zhou, Chentao Wu, Wei Zhao, Jie Li, 16 Sep 2025, BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning, https://arxiv.org/abs/2509.12964
Jiahao Xu, Zikai Zhang, Rui Hu, 16 Sep 2025, On the Out-of-Distribution Backdoor Attack for Federated Learning, https://arxiv.org/abs/2509.13219
Saptarshi Chakraborty and Peter L. Bartlett, 16 Sep 2025, A Statistical Analysis of Deep Federated Learning for Intrinsically Low-dimensional Data, https://arxiv.org/abs/2410.20659
Jiaxing Cao, Yuzhou Gao, Jiwei Huang, 3 Sep 2025, A Service-Oriented Adaptive Hierarchical Incentive Mechanism for Federated Learning, https://arxiv.org/abs/2509.10512
Rodrigo Tertulino, 3 Sep 2025, A Comparative Benchmark of Federated Learning Strategies for Mortality Prediction on Heterogeneous and Imbalanced Clinical Data, https://arxiv.org/abs/2509.10517
Sahil Tyagi, 5 Sep 2025, On Using Large-Batches in Federated Learning, https://arxiv.org/abs/2509.10537
Ali Burak \"Unal, Cem Ata Baykara, Peter Krawitz, Mete Akg\"un, 12 Sep 2025, Accurate and Private Diagnosis of Rare Genetic Syndromes from Facial Images with Federated Deep Learning, https://arxiv.org/abs/2509.10635
Fardin Jalil Piran, Zhiling Chen, Yang Zhang, Qianyu Zhou, Jiong Tang, Farhad Imani, 12 Sep 2025, Privacy-Preserving Decentralized Federated Learning via Explainable Adaptive Differential Privacy, https://arxiv.org/abs/2509.10691
Soumia Zohra El Mestari, Maciej Krzysztof Zuziak and Gabriele Lenzini, 15 Sep 2025, Poison to Detect: Detection of Targeted Overfitting in Federated Learning, https://arxiv.org/abs/2509.11974
Cosimo Fiorini, Matteo Mosconi, Pietro Buzzega, Riccardo Salami, Simone Calderara, 15 Sep 2025, Intrinsic Training Signals for Federated Learning Aggregation, https://arxiv.org/abs/2507.06813
Herlock (SeyedAbolfazl) Rahimi, Dionysis Kalogerias, 17 Sep 2025, FedAVOT: Exact Distribution Alignment in Federated Learning via Masked Optimal Transport, https://arxiv.org/abs/2509.14444
Xingchen Wang, Feijie Wu, Chenglin Miao, Tianchun Li, Haoyu Hu, Qiming Cao, Jing Gao, Lu Su, 18 Sep 2025, Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking, https://arxiv.org/abs/2509.14603
Zeyu Chen, Wen Chen, Jun Li, Qingqing Wu, Ming Ding, Xuefeng Han, Xiumei Deng, Liwei Wang, 18 Sep 2025, Hierarchical Federated Learning for Social Network with Mobility, https://arxiv.org/abs/2509.14938
Viktor Kovalchuk, Nikita Kotelevskii, Maxim Panov, Samuel Horv\'ath, Martin Tak\'a\v{c}, 18 Sep 2025, Who to Trust? Aggregating Client Knowledge in Logit-Based Federated Learning, https://arxiv.org/abs/2509.15147
Linfeng Luo, Zhiqi Guo, Fengxiao Tang, Zihao Qiu, Ming Zhao, 18 Sep 2025, Federated Hypergraph Learning with Local Differential Privacy: Toward Privacy-Aware Hypergraph Structure Completion, https://arxiv.org/abs/2408.05160
Chih Wei Ling, Chun Hei Michael Shiu, Youqi Wu, Jiande Sun, Cheuk Ting Li, Linqi Song, Weitao Xu, 18 Sep 2025, Communication-Efficient and Privacy-Adaptable Mechanism for Federated Learning, https://arxiv.org/abs/2501.12046
Zhihao Wang, Wenke Huang, Tian Chen, Zekun Shi, Guancheng Wan, Yu Qiao, Bin Yang, Jian Wang, Bing Li, Mang Ye, 18 Sep 2025, An Empirical Study of Federated Prompt Learning for Vision Language Model, https://arxiv.org/abs/2505.23024
Haochen Zhang, Zhong Zheng, Lingzhou Xue, 18 Sep 2025, Gap-Dependent Bounds for Federated $Q$-learning, https://arxiv.org/abs/2502.02859
Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum, 9 Sep 2025, Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning, https://arxiv.org/abs/2509.08089
Konstantin Burlachenko, 9 Sep 2025, Optimization Methods and Software for Federated Learning, https://arxiv.org/abs/2509.08120
Qiaobo Li, Zhijie Chen, Arindam Banerjee, 9 Sep 2025, Sketched Gaussian Mechanism for Private Federated Learning, https://arxiv.org/abs/2509.08195
Kai Yi, 10 Sep 2025, Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization, https://arxiv.org/abs/2509.08233
Delio Jaramillo-Velez and Charul Rajput and Ragnar Freij-Hollanti and Camilla Hollanti and Alexandre Graell i Amat, 10 Sep 2025, Perfectly-Private Analog Secure Aggregation in Federated Learning, https://arxiv.org/abs/2509.08683
Yuanchun Guo and Bingyan Liu and Yulong Sha and Zhensheng Xian, 4 Sep 2025, PracMHBench: Re-evaluating Model-Heterogeneous Federated Learning Based on Practical Edge Device Constraints, https://arxiv.org/abs/2509.08750
Charuka Herath, Yogachandran Rahulamathavan, Varuna De Silva, and Sangarapillai Lambotharan, 10 Sep 2025, DSFL: A Dual-Server Byzantine-Resilient Federated Learning Framework via Group-Based Secure Aggregation, https://arxiv.org/abs/2509.08449
Avais Jan, Qasim Zia, Murray Patterson, 9 Sep 2025, Enhancing Privacy Preservation and Reducing Analysis Time with Federated Transfer Learning in Digital Twins-based Computed Tomography Scan Analysis, https://arxiv.org/abs/2509.08018
Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Tian Qin, Yuyu Zhao, 10 Sep 2025, From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks, https://arxiv.org/abs/2506.07392
Yuyang Qiu, Kibaek Kim, Farzad Yousefian, 10 Sep 2025, A Randomized Zeroth-Order Hierarchical Framework for Heterogeneous Federated Learning, https://arxiv.org/abs/2504.01839
Md Bokhtiar Al Zami, Md Raihan Uddin, and Dinh C. Nguyen, 17 Sep 2025, Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs, https://arxiv.org/abs/2509.13634
Zihou Wu (1), Yuecheng Li (1), Tianchi Liao (2), Jian Lou (2), Chuan Chen (1) ((1) School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China (2) School of Software Engineering, Sun Yat-sen University, Zhuhai, China), 17 Sep 2025, ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning, https://arxiv.org/abs/2509.13739
Zhanting Zhou and Jinshan Lai and Fengchun Zhang and Zeqin Wu and Fengli Zhang, 17 Sep 2025, FedSSG: Expectation-Gated and History-Aware Drift Alignment for Federated Learning, https://arxiv.org/abs/2509.13895
Raouf Kerkouche, Henrik Zunker, Mario Fritz, Martin J. K\"uhn, 17 Sep 2025, Differentially private federated learning for localized control of infectious disease dynamics, https://arxiv.org/abs/2509.14024
Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz, 17 Sep 2025, Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response, https://arxiv.org/abs/2509.13987
Chenghao Huang, Xiaolu Chen, Yanru Zhang, and Hao Wang, 17 Sep 2025, FedCoSR: Personalized Federated Learning with Contrastive Shareable Representations for Label Heterogeneity in Non-IID Data, https://arxiv.org/abs/2404.17916
Youngjoon Lee, Jinu Gong, Joonhyuk Kang, 17 Sep 2025, Embedding Byzantine Fault Tolerance into Federated Learning via Consistency Scoring, https://arxiv.org/abs/2411.10212
Gergely D. N\'emeth, Eros Fan\`i, Yeat Jeng Ng, Barbara Caputo, Miguel \'Angel Lozano, Nuria Oliver, Novi Quadrianto, 16 Sep 2025, FedDiverse: Tackling Data Heterogeneity in Federated Learning with Diversity-Driven Client Selection, https://arxiv.org/abs/2504.11216
Youngjoon Lee, Jinu Gong, Joonhyuk Kang, 17 Sep 2025, A Unified Benchmark of Federated Learning with Kolmogorov-Arnold Networks for Medical Imaging, https://arxiv.org/abs/2504.19639

Mixed-Precision Training

Research on using mixtures of precision in the underlying data values and weights, when performing LLM training:

Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey (Broad survey with many optimizations including this topic.)
Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao, 4 Jan 2024, Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models https://arxiv.org/abs/2401.00625 (A general survey paper with coverage of many techniques including this one.)
Jiahang Zhou, Yanyu Chen, Zicong Hong, Wuhui Chen, Yue Yu, Tao Zhang, Hui Wang, Chuanfu Zhang, Zibin Zheng, 5 Jan 2024, Training and Serving System of Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2401.02643
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu, Feb 2018, Mixed Precision Training, https://arxiv.org/abs/1710.03740
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun, 29 Jul 2024, Efficient Training of Large Language Models on Distributed Infrastructures: A Survey, https://arxiv.org/abs/2407.20018
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, 6 Jan 2024 (v2), Understanding LLMs: A Comprehensive Overview from Training to Inference, https://arxiv.org/abs/2401.02038
Douglas C. Youvan, September 27, 2024, Building and Running Large-Scale Language Models: The Infrastructure and Techniques Behind GPT-4 , https://www.researchgate.net/profile/Douglas-Youvan/publication/384398902_Building_and_Running_Large-Scale_Language_Models_The_Infrastructure_and_Techniques_Behind_GPT-4/links/66f6f4d3906bca2ac3d20e68/Building-and-Running-Large-Scale-Language-Models-The-Infrastructure-and-Techniques-Behind-GPT-4.pdf
Kaiyuan Tian, Linbo Qiao, Baihui Liu, Gongqingjian Jiang, Dongsheng Li, 21 Jan 2025, A Survey on Memory-Efficient Large-Scale Model Training in AI for Science, https://arxiv.org/abs/2501.11847
Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
Nandini Lokesh Reddy, Jan 2025, DeepSeek: Bridging Performance and Efficiency in Modern AI, https://medium.com/@nandinilreddy/deepseek-bridging-performance-and-efficiency-in-modern-ai-106181a85693
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf

Model Merging

Model merging is a technique whereby two separate LLMs can be merged together to create a new model with the combined expertise of the two individual models. Surprisingly, the two sets of weights can simply be combined, such as by addition.

Research papers on model merging:

Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao, 15 Aug 2024 (v2), Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities, https://arxiv.org/abs/2408.07666 Project: https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications (An extensive review of merging two models.)
Cameron R. Wolfe, Sep 16, 2024, Model Merging: A Survey: From modern LLM applications to the early days of machine learning research, https://cameronrwolfe.substack.com/p/model-merging
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu, 2 Oct 2024, Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models, https://arxiv.org/abs/2410.01335
Yuxuan Zhang, Ruizhe Li, 2 Oct 2024, DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models, https://arxiv.org/abs/2410.01497 https://github.com/MeCuping/DLP-LoRA (Merging multiple LoRA adapters for parallel inference.)
Sean Michael Kerner, October 23, 2024, Differentiable Adaptive Merging is accelerating SLMs for enterprises, https://venturebeat.com/ai/differentiable-adaptive-merging-is-accelerating-slms-for-enterprises/
Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang, 18 Dec 2024, Channel Merging: Preserving Specialization for Merged Experts, https://arxiv.org/abs/2412.15283
Sakana.ai, March 21, 2024, Evolving New Foundation Models: Unleashing the Power of Automating Model Development, https://sakana.ai/evolutionary-model-merge/
Sakana.ai, December 03, 2024, Population-based Model Merging via Quality Diversity, https://sakana.ai/cycleqd/
Ayoub Ben Chaliah, Hela Dellagi, 31 Dec 2024, Superposition in Transformers: A Novel Way of Building Mixture of Experts, https://arxiv.org/abs/2501.00530 (Effectively model merging to combine a base model and its fine-tuned version, to avoid catastrophic forgetting.)
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis, 31 Jan 2025, BTS: Harmonizing Specialized Experts into a Generalist LLM, https://arxiv.org/abs/2502.00075 (Combining multiple fine-tuned expert models via "layer stitching").
Yuhang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu, 4 Feb 2025 (v2), MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs, https://arxiv.org/abs/2502.00997
Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu, 11 Feb 2025 (v2), Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing, https://arxiv.org/abs/2502.04411
Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan, 16 May 2025, RanDeS: Randomized Delta Superposition for Multi-Model Compression, https://arxiv.org/abs/2505.11204
Ryota Miyano, Yuki Arase, 30 May 2025, Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation, https://arxiv.org/abs/2505.24174
Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li, Vikram Appia, Emad Barsoum, 22 May 2025, Zebra-Llama: Towards Extremely Efficient Hybrid Models, https://arxiv.org/abs/2505.17272 (Merging SSM and LLM.)
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates, 2 Jul 2025, Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs, https://arxiv.org/abs/2507.02076
Jiang, H., Wang, R., Liang, W. et al. Slerp-Opt: merging large language models via adaptive strategies. J Supercomput 81, 1223 (2025). https://doi.org/10.1007/s11227-025-07727-4 https://link.springer.com/article/10.1007/s11227-025-07727-4
Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao, 14 Aug 2025, LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint, https://arxiv.org/abs/2502.16770
Ryo Bertolissi, Jonas H\"ubotter, Ido Hakimi, Andreas Krause, 30 Jul 2025, Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging, https://arxiv.org/abs/2505.14136
Kotaro Yoshida, Yuji Naraki, Takafumi Horie, Ryotaro Shimizu, Hiroki Naganuma, 2 Aug 2025, DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging, https://arxiv.org/abs/2508.01148
Xin He, Junxi Shen, Zhenheng Tang, Xiaowen Chu, Bo Li, Ivor W. Tsang, Yew-Soon Ong, 3 Aug 2025, RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging, https://arxiv.org/abs/2508.01784
The-Hai Nguyen, Dang Huu-Tien, Takeshi Suzuki, and Le-Minh Nguyen, 5 Aug 2025, RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging, https://arxiv.org/abs/2508.03121
Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong, 7 Aug 2025, Task Vector Quantization for Memory-Efficient Model Merging, https://arxiv.org/abs/2503.06921
Yingfeng Luo, Dingyang Lin, Junxin Wang, Ziqiang Xu, Kaiyan Chang, Tong Zheng, Bei Li, Anxiang Ma, Tong Xiao, Zhengtao Yu, Jingbo Zhu, 8 Aug 2025, One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging, https://arxiv.org/abs/2508.06163
Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodol\`a, 8 Aug 2025, ATM: Improving Model Merging by Alternating Tuning and Merging, https://arxiv.org/abs/2411.03055
Ilja Kuzborskij, Yasin Abbasi Yadkori, 20 Aug 2025, Low-rank bias, weight decay, and model merging in neural networks, https://arxiv.org/abs/2502.17340
Haris Khan, Shumaila Asif, Sadia Asif, 28 Jul 2025, Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition, https://arxiv.org/abs/2507.20997
Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub, 19 Aug 2025, Rethinking Weight-Averaged Model-merging, https://arxiv.org/abs/2411.09263
Marcin Osial, Bartosz W\'ojcik, Bartosz Zieli\'nski, Sebastian Cygert, 26 Aug 2025, Efficient Multi-Source Knowledge Transfer by Model Merging, https://arxiv.org/abs/2508.19353
Pietro Buzzega, Riccardo Salami, Angelo Porrello and Simone Calderara, 29 Aug 2025, Rethinking Layer-wise Model Merging through Chain of Merges, https://arxiv.org/abs/2508.21421
Touayouch Brahim and Fosse Lo\"ic and Damnati G\'eraldine and Lecorv\'e Gw\'enol\'e, 2 Sep 2025, DivMerge: A divergence-based model merging method for multi-tasking, https://arxiv.org/abs/2509.02108
Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa, 2 Sep 2025, Surrogate Benchmarks for Model Merging Optimization, https://arxiv.org/abs/2509.02555
Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh, 31 Aug 2025, To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging, https://arxiv.org/abs/2503.05320
Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard, 1 Sep 2025, A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging, https://arxiv.org/abs/2507.03843
Shilian Chen, Jie Zhou, Tianyu Huai, Yujiang Lu, Junsong Li, Bihao Zhan, Qianjun Pan, Yutao Yang, Xin Li, Qin Chen, Hang Yan, Liang He, 16 Sep 2025, Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories, https://arxiv.org/abs/2509.12951
Xuefeng Liu, Songhao Jiang, Qinan Huang, Tinson Xu, Ian Foster, Mengdi Wang, Hening Lin, Jinbo Xu, Rick Stevens, 14 Sep 2025, FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design, https://arxiv.org/abs/2509.11044
Pouria Mahdavinia, Hamed Mahdavi, Niloofar Mireshghallah, and Mehrdad Mahdavi, 14 Sep 2025, Harnessing Optimization Dynamics for Curvature-Informed Model Merging, https://arxiv.org/abs/2509.11167
Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao, 18 Sep 2025, Superpose Task-specific Features for Model Merging, https://arxiv.org/abs/2502.10698

Early Dropout

Early dropout is an LLM training optimizations whereby training computations are skipped by "dropping out early" during a training cycle. In some LLM training research, it is simply called "dropout." However, it should not be confused with "early exiting" which is an LLM inference optimization involving skipping of layers.

Research papers on early dropout in LLM training:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Communications of the ACM, Volume 60, Issue 6, June 2017, pp 84–90, https://doi.org/10.1145/3065386 https://dl.acm.org/doi/10.1145/3065386 PDF: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf Code: http://code.google.com/p/cuda-convnet/ (The early paper that introduced a grouped convolution architecture for multi-GPUs, later the basis of AlexNet, which was a famous image recognition CNN.)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014), https://dl.acm.org/doi/abs/10.5555/2627435.2670313
Srivastava, Nitish. Improving neural networks with dropout. 2013, Master’s thesis, U. Toronto, https://www.semanticscholar.org/paper/Improving-Neural-Networks-with-Dropout-Srivastava/5d5d4f49d6443c8529a6f5ebef5c499d47a869da PDF: http://www.cs.toronto.edu/~nitish/msc_thesis.pdf
Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu, 29 Apr 2024 (v2), LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding, https://arxiv.org/abs/2404.16710
Alperen Gormez, 2024, Efficient Neural Network Inference and Training Using Early Exit Strategies, Ph.D. Thesis, Electrical and Computer Engineering, University of Illinois at Chicago, 2024, https://alperengormez.github.io/assets/phd/agormez_phd_thesis.pdf (Early exit in inference, training, and fine-tuning.)
Ignacy St\k{e}pka, Nicholas Gisolfi, Kacper Tr\k{e}bacz, Artur Dubrawski, 3 Aug 2025, Mitigating Persistent Client Dropout in Asynchronous Decentralized Federated Learning, https://arxiv.org/abs/2508.01807
Rita Gonz\'alez-M\'arquez, Philipp Berens, Dmitry Kobak, 5 Aug 2025, Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings, https://arxiv.org/abs/2508.03453
Zhihao Guo, Peng Wang, Zidong Chen, Xiangyu Kong, Yan Lyu, Guanyu Gao, Liangxiu Han, 7 Aug 2025, UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS, https://arxiv.org/abs/2508.04968
Pablo G. Almeida, Guilherme A. L. Silva, Val\'eria Santos, Gladston Moreira, Pedro Silva and Eduardo Luz, 9 Aug 2025, Deep Learning for School Dropout Detection: A Comparison of Tabular and Graph-Based Models for Predicting At-Risk Students, https://arxiv.org/abs/2508.14057
Xinhua Chen, Sitao Huang, Cong Guo, Chiyue Wei, Yintao He, Jianyi Zhang, Hai "Hellen" Li, Yiran Chen, 19 Aug 2025, DPad: Efficient Diffusion Language Models with Suffix Dropout, https://arxiv.org/abs/2508.14148
Ayush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha, 4 Sep 2025, Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs, https://arxiv.org/abs/2509.04655
Asif Mohammed Saad, Umme Niraj Mahi, 2 Sep 2025, SegFormer Fine-Tuning with Dropout: Advancing Hair Artifact Removal in Skin Lesion Analysis, https://arxiv.org/abs/2509.02156
Francesco Mori, Francesca Mignacco, 8 Sep 2025, Analytic theory of dropout regularization, https://arxiv.org/abs/2505.07792
Yuchen Zhang, Mohammad Mohammadi Amiri, 19 Sep 2025, Toward Efficient Influence Function: Dropout as a Compression Tool, https://arxiv.org/abs/2509.15651
Cooper Doyle, 15 Sep 2025, Low-rank variational dropout: Uncertainty and rank selection in adapters, https://arxiv.org/abs/2506.22809