Aussie AI

Tool Usage in AI Architectures

Last Updated 30 August, 2025

by David Spuler, Ph.D.

What are Tools?

This section is about tools being used by LLMs, rather than tools being used by humans to create LLMs. There's a big difference!

Why LLM Tools?

LLMs require tools to do more advanced things, just like humans. For example, if someone asks you the time, you look at your watch (or your phone). If you ask an LLM "What is the time?" there is nothing in its training data set that could possibly answer this correctly. The only way is to use a clock that's integrated iinto the LLM, and executed by the AI Engine as part of answering your query.

Types of LLM Tools

There are several types of tools that can be integrated:

Data sources (e.g. real estate listings)
Dynamic calculations
Action tools in agent architectures (e.g. an API to send an email).

Types of dynamic calculation tools include:

Clocks
Calculators (arithmetic)
Converters (e.g. pounds to kilograms)
Calendars (date or day calculations)

And many more...

How are Dynamic Tools Integrated?

Like humans, an AI needs to learn to look at its watch if someone asks the time. Specific training data sets are required that tell the AI what tool to use, and when.

The AI engine has to recognize in the LLM output that a tool must be executed. There are a variety of ways to do this:

Tool-specific tokens — i.e., the LLM can emit a "trigger" token to run a tool. Note that PEFT could be used here to fine-tune new tool capabilities, by only adding a few new tool-triggering tokens to the vocabulary.)
Placeholder patterns — i.e., output something like an "--insert current time here--" special pattern is another way, and the engine then looks for these patterns, which avoids adding tool tokens to the vocabulary, but is inefficient in that there are multiple text tokens in the output).
Code generation — there are various AI models that will generate code, such as in Python, that can be executed to generate the answer. This is a general solution, because Python can call various submodules and can thereby generate many tools.
Multi-level planning — the AI first generates a plan of how to answer the query, including what tools to use, and then runs any tools, and then does another inference query to collate it into a final answer.

Research on AI Tool Integrations

Tool integration papers:

Junzhi Chen, Juhao Liang, Benyou Wang, 9 May 2024, Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning, https://arxiv.org/abs/2405.05955
Jonas Wallat, Adam Jatowt, Avishek Anand, March 2024, Temporal Blind Spots in Large Language Models, WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Pages 683–692, https://arxiv.org/abs/2401.12078, https://doi.org/10.1145/3616855.3635818, https://dl.acm.org/doi/abs/10.1145/3616855.3635818
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, Regina Barzilay, June 2014, Learning to Automatically Solve Algebra Word Problems, P14-1026 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), https://aclanthology.org/P14-1026/ PDF: https://aclanthology.org/P14-1026.pdf
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning, https://proceedings.neurips.cc/paper_files/paper/2023/file/4a47dd69242d5af908cdd5d51c971cbf-Paper-Datasets_and_Benchmarks.pdf
Subhro Roy, Dan Roth, 20 Aug 2016 (v2), Solving General Arithmetic Word Problems, https://arxiv.org/abs/1608.01413
Subhro Roy, Shyam Upadhyay, Dan Roth, 28 Sep 2016, Equation Parsing: Mapping Sentences to Grounded Equations, https://arxiv.org/abs/1609.08824
Yan Wang, Xiaojiang Liu, Shuming Shi, September 2017, Deep Neural Solver for Math Word Problems D17-1088, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing Copenhagen, Denmark, https://aclanthology.org/D17-1088/ PDF: https://aclanthology.org/D17-1088.pdf
reiinakano, November 12, 2019, Teaching a neural network to use a calculator, https://reiinakano.com/2019/11/12/solving-probability.html (Integrate SymPy calculator into the results of a neural network, by looking for the '=' sign.)
Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan, 6 May 2024, AlphaMath Almost Zero: process Supervision without process, https://arxiv.org/abs/2405.03553 https://github.com/MARIO-Math-Reasoning/Super_MARIO
Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu, 12 Mar 2024 (v3), Data Interpreter: An LLM Agent For Data Science, https://arxiv.org/abs/2402.18679 Code: https://github.com/geekan/MetaGPT
Zelong Li, Wenyue Hua, Hao Wang, He Zhu, Yongfeng Zhang, 4 Feb 2024 (v2), Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents, https://arxiv.org/abs/2402.00798 Code: https://github.com/agiresearch/Formal-LLM
Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang, 25 Mar 2024 (v2), InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents, https://arxiv.org/abs/2403.02691
Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022. https://arxiv.org/abs/2211.12588 (Integrate a Python interpreter to execute the code generated by the LLM to answer the query.)
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023. https://arxiv.org/abs/2211.10435 Code: http://reasonwithpal.com/ (Python interpreter integrated as a tool for LLMs.)
Intel, April 2024, Intel® Compiler First to Achieve SYCL* 2020 Conformance, https://www.intel.com/content/www/us/en/developer/articles/technical/compiler-first-full-sycl2020-conformance.html
Long Hei Matthew Lam, Ehsan Shareghi, 1 Jun 2024, A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters, https://arxiv.org/abs/2406.00284 (Using symbolic solvers with LLMs.)
M Keber, I Grubišic, A Barešic, A Jovic, 2024, A Review on Neuro-symbolic AI Improvements to Natural Language Processing, https://www.researchgate.net/profile/Alan-Jovic/publication/380911364_A_Review_on_Neuro-symbolic_AI_Improvements_to_Natural_Language_Processing/links/6655c0ec22a7f16b4f51fb2f/A-Review-on-Neuro-symbolic-AI-Improvements-to-Natural-Language-Processing.pdf
Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu, 2023, ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings, Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track, https://proceedings.neurips.cc/paper_files/paper/2023/hash/8fd1a81c882cd45f64958da6284f4a3f-Abstract-Conference.html
Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. 2023a. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582. https://arxiv.org/abs/2310.08582
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems. https://arxiv.org/abs/2303.11366
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023b. ToolLLM: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789. https://arxiv.org/abs/2307.16789
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, XuChen, Yankai Lin, et al. 2023c. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432. https://arxiv.org/abs/2308.11432
Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
Joy He-Yueya, Gabriel Poesia, Rose E. Wang, and Noah D. Goodman. Solving math word problems by combining language models with symbolic solvers. ArXiv, abs/2304.09102, 2023. https://arxiv.org/abs/2304.09102
Shima Imani, Liang Du, and H. Shrivastava. Mathprompter: Mathematical reasoning using large language models. ArXiv, abs/2303.05398, 2023. https://arxiv.org/abs/2303.05398
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 7 May 2024, An LLM-Tool Compiler for Fused Parallel Function Calling, https://arxiv.org/abs/2405.17438
Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
Pan Lu, 2024, Advancing Mathematical Reasoning with Language Models: A Multimodal and Knowledge-Intensive Perspective, Ph.D. Thesis, Computer Science, University of California, Los Angeles, https://escholarship.org/content/qt678864d8/qt678864d8.pdf
Julian Yip, Apr 2, 2024, Build Autonomous AI Agents with Function Calling: Transform your chatbot into an agent that can interact with external APIs, https://towardsdatascience.com/build-autonomous-ai-agents-with-function-calling-0bb483753975 (Implement agents via models that output a JSON object that describes the API to call and the parmaeters to send.)
Adva Nakash Peleg, May 30, 2024, An LLM Journey: From POC to Production, https://medium.com/cyberark-engineering/an-llm-journey-from-poc-to-production-6c5ec6a172fb
Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su, 22 Feb 2024, Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments, https://arxiv.org/abs/2402.14672
Yaobo Liang, Chenfei Wu , Ting Song , Wenshan Wu , Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan, March 2023, TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, https://arxiv.org/pdf/2303.16434.pdf
kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman, 1 Jun 2022 (v3), WebGPT: Browser-assisted question-answering with human feedback, https://arxiv.org/abs/2112.09332
Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang, 2017, World of Bits: An Open-Domain Platform for Web-Based Agents, Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3135-3144, https://proceedings.mlr.press/v70/shi17a.html
Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, Timothy Lillicrap, 11 Nov 2022 (v2), A data-driven approach for learning to control computers, https://arxiv.org/abs/2202.08137
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom, 9 Feb 2023, Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/abs/2302.04761
OpenAI, 2024, Function calling, https://platform.openai.com/docs/guides/function-calling
Cobus Greyling, June 16, 2023, Practical Examples of OpenAI Function Calling, https://cobusgreyling.medium.com/practical-examples-of-openai-function-calling-a6419dc38775
University of California, Berkeley, 2024, Berkeley Function-Calling Leaderboard, https://gorilla.cs.berkeley.edu/leaderboard.html https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard
Wes Brewer, Ana Gainaru, Frédéric Suter, Feiyi Wang, Murali Emani, Shantenu Jha, 20 Jun 2024, AI-coupled HPC Workflow Applications, Middleware and Performance, (Examines integrations of various workflows into LLMs.) https://arxiv.org/abs/2406.14315
Aarushi Kansal, Chapter 3: Chains, Tools and Agents Building Generative AI-Powered Apps: A Hands-on Guide for Developers, Apress, https://www.amazon.com/Building-Generative-AI-Powered-Apps-Hands-ebook/dp/B0CTXXP1S4/
Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
Shishir Patil, May 10, 2024, Teaching Large Language Models to Use Tools at Scale, Ph.D. Thesis, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-85, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.pdf
Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz, 31 Jul 2024, Adaptive Retrieval-Augmented Generation for Conversational Systems, https://arxiv.org/abs/2407.21712 (Deciding whether or not to include a RAG external data request in the inference of a chatbot in a multi-turn conversation.)
Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
Thomas Reid, Jul 31, 2024, Ollama’s Latest Update: Tool Use: Everything you need to know about function calling in Ollama https://ai.gopubby.com/ollamas-latest-update-tool-use-7b809e15be5c
Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang, 8 Aug 2024, ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, https://arxiv.org/abs/2408.04682 Code: https://github.com/apple/ToolSandbox
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, July 2024, InferCept: Efficient Intercept Support for Augmented Large Language Model Inference, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:81-95, 2024, https://proceedings.mlr.press/v235/abhyankar24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
Yu Du, Fangyun Wei, Hongyang Zhang, July 2024, AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11812-11829, 2024, https://proceedings.mlr.press/v235/du24h.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/du24h/du24h.pdf
MemGPT, Aug 2024, Adding custom tools to MemGPT, https://memgpt.readme.io/docs/adding-custom-tools-to-memgpt
Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov, 18 Feb 2024, Tool-Augmented LLMs as a Universal Interface for IDEs, https://arxiv.org/abs/2402.11635
Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, Amir Gholami, 2 Sep 2024, Efficient and Scalable Estimation of Tool Representations in Vector Space, https://arxiv.org/abs/2409.02141 https://github.com/SqueezeAILab/Tool2Vec (Using synthetic data to train tool usage decision models.)
Xiaoxia Liu, Jingyi Wang, Jun Sun, Xiaohan Yuan, Guoliang Dong, Peng Di, Wenhai Wang, Dongxia Wang, 21 Nov 2023, Prompting Frameworks for Large Language Models: A Survey, https://arxiv.org/abs/2311.12785
Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao, 23 Sep 2024 (v2), CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance, https://arxiv.org/abs/2409.13202
Carl Franzen, September 27, Cohere updates APIs to make it easier for devs to switch from other models, https://venturebeat.com/ai/cohere-updates-apis-to-make-it-easier-for-devs-to-switch-from-other-models/
Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li, 8 Oct 2024 (v2), ToolGen: Unified Tool Retrieval and Calling via Generation, https://arxiv.org/abs/2410.03439
Ke Wang, Jiahui Zhu, Minjie Ren, Zeming Liu, Shiwei Li, Zongye Zhang, Chenkai Zhang, Xiaoyu Wu, Qiqi Zhan, Qingjie Liu, Yunhong Wang, 16 Oct 2024, A Survey on Data Synthesis and Augmentation for Large Language Models, https://arxiv.org/abs/2410.12896
Yakun Zhu, Shaohang Wei, Xu Wang, Kui Xue, Xiaofan Zhang, Shaoting Zhang, 17 Oct 2024, MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling, https://arxiv.org/abs/2410.13610
Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
Michael Nuñez, November 4, 2024, UC San Diego, Tsinghua University researchers just made AI way better at knowing when to ask for help, https://venturebeat.com/ai/uc-san-diego-tsinghua-university-researchers-just-made-ai-way-better-at-knowing-when-to-ask-for-help/
Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar, 14 Apr 2024, Towards Practical Tool Usage for Continually Learning LLMs, https://arxiv.org/abs/2404.09339
Amy Marks, Jun 11, 2024, Clarifying Function Calling / Tool Use in LLMs, https://medium.com/@aevalone/clarifying-function-calling-tool-use-in-llms-6511af510f99
Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu, 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412
Anthropic, 26 Nov 2024, Introducing the Model Context Protocol, https://www.anthropic.com/news/model-context-protocol
Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 23 Nov 2024, Less is More: Optimizing Function Calling for LLM Execution on Edge Devices, https://arxiv.org/abs/2411.15399
Soh, J., Singh, P. (2024). Semantic Kernel, Plugins, and Function Calling. In: Data Science Solutions on Azure. Apress, Berkeley, CA. https://doi.org/10.1007/979-8-8688-0914-9_7 https://link.springer.com/chapter/10.1007/979-8-8688-0914-9_7
Chris Sypherd, Vaishak Belle, 5 Dec 2024, Practical Considerations for Agentic LLM Systems, https://arxiv.org/abs/2412.04093
Zhi-Yuan Chen, Shiqi Shen, Guangyao Shen, Gong Zhi, Xu Chen, and Yankai Lin. 2024. Towards Tool Use Alignment of Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1382–1400, Miami, Florida, USA. Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.82/ https://aclanthology.org/2024.emnlp-main.82.pdf
Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall, 8 Dec 2024, Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt, https://arxiv.org/abs/2412.05967
In Gim, Seung-seob Lee, Lin Zhong, 9 Dec 2024, Asynchronous LLM Function Calling, https://arxiv.org/abs/2412.07017 (Overlap LLM computations and tool execution.)
Outlore, Dec 14, 2024, Reflections on building with Model Context Protocol (MCP), https://outlore.dev/blog/model-context-protocol/
Andrew Zuo, Dec 13, 2024, AI Assistants Are Going To Get Really Good, https://andrewzuo.com/ai-assistants-are-going-to-get-really-good-d6e6a026e588
Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
Qwen: An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu (additional authors not shown), 19 Dec 2024, Qwen2.5 Technical Report, https://arxiv.org/abs/2412.15115
Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu, 22 Dec 2024, Teaching LLMs to Refine with Tools, https://arxiv.org/abs/2412.16871
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Florian Dietz, Dietrich Klakow, 1 Jan 2025, IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently, https://arxiv.org/abs/2501.00684
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic, Sep 2024, Agents, Google Whitepaper, https://www.kaggle.com/whitepaper-agents
S. Song et al., 2025, "How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2025.3527978. https://ieeexplore.ieee.org/abstract/document/10841938/
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, 21 Feb 2024 (v4), ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, https://arxiv.org/abs/2309.17452
Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun, 28 Dec 2023, GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension, https://arxiv.org/abs/2312.17294
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Xinzhe Li, Jan 2025, A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning, Proceedings of the 31st International Conference on Computational Linguistics, pages 9760–9779, January 19–24, 2025. ©2025 Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.652.pdf https://github.com/xinzhel/LLM-Agent-Survey
Connor Shorten, Charles Pierse, Thomas Benjamin Smith, Karel D'Oosterlinck, Tuana Celik, Erika Cardenas, Leonie Monigatti, Mohd Shukri Hasan, Edward Schmuhl, Daniel Williams, Aravind Kesiraju, Bob van Luijt, 23 Jan 2025, Querying Databases with Function Calling, https://arxiv.org/abs/2502.00032
Jiali Cheng, Hadi Amiri, 3 Feb 2025. Tool Unlearning for Tool-Augmented LLMs, https://arxiv.org/abs/2502.01083 (Unlearning theory applied to tool usage.)
Wenjun Li, Dexun Li, Kuicai Dong, Cong Zhang, Hao Zhang, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Liu, 18 Feb 2025, Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger, https://arxiv.org/abs/2502.12961 (Examining the decision whether or not to launch a tool, and the inefficiency of non-needed tool calls.)
C Winston, R Just, Feb 2025, A Taxonomy of Failures in Tool-Augmented LLMs, https://homes.cs.washington.edu/~rjust/publ/tallm_testing_ast_2025.pdf
Xuan Zhang, Yongliang Shen, Zhe Zheng, Linjuan Wu, Wenqi Zhang, Yuchen Yan, Qiuying Peng, Jun Wang, Weiming Lu, 3 Mar 2025, AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification, https://arxiv.org/abs/2503.01940
Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Lu Chen, Kai Yu, 9 Mar 2025, Alignment for Efficient Tool Calling of Large Language Models, https://arxiv.org/abs/2503.06708
Anthropic, 14 Mar 2025, Token-saving updates on the Anthropic API, https://www.anthropic.com/news/token-saving-updates (Prompt caching, excluding cached responses from rate limits, and token-efficient tool calling.)
Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen, 21 Mar 2025, Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models, https://arxiv.org/abs/2503.16779 https://github.com/fairyshine/Chain-of-Tools
Ali Forootani, 22 Mar 2025, A Survey on Mathematical Reasoning and Optimization with Large Language Models, https://arxiv.org/abs/2503.17726
Aiyao He, Sijia Cui, Shuai Xu, Yanna Wang, Bo Xu, 13 May 2025, TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers, https://arxiv.org/abs/2505.08402
Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian, 7 May 2025, Advancing and Benchmarking Personalized Tool Invocation for LLMs, https://arxiv.org/abs/2505.04072 https://github.com/hyfshadow/PTBench
Wang et. al., 2025, Function Calling in Large Language Models: Industrial Practices, Challenges, and Future Directions, https://openreview.net/pdf?id=LNxVGPedFW
Cameron R. Wolfe, Ph.D., Jun 09, 2025, AI Agents from First Principles: Understanding AI agents by building upon the most basic concepts of LLMs, https://cameronrwolfe.substack.com/p/ai-agents
Beong-woo Kwak, Minju Kim, Dongha Lim, Hyungjoo Chae, Dongjin Kang, Sunghwan Kim, Dongil Yang, Jinyoung Yeo, 29 May 2025, ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions, https://arxiv.org/abs/2505.23662 https://github.com/bwookwak/ToolHaystack
Dex Horthy, June 2025 (accessed), 12-Factor Agents - Principles for building reliable LLM applications, https://github.com/humanlayer/12-factor-agents?tab=readme-ov-file
Bohan Yao, Vikas Yadav, 25 Jul 2025, A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation, https://arxiv.org/abs/2507.18973 (Launch multiple tools and aggregate the results)
Bin Wu, Edgar Meij, Emine Yilmaz, Aug 2025, AJoint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents, Findings of the Association for Computational Linguistics: ACL 2025, pages 22361–22373 July 27- August 1, 2025, https://aclanthology.org/2025.findings-acl.1149.pdf
Zijing Zhang, Zhanpeng Chen, He Zhu, Ziyang Chen, Nan Du, Xiaolong Li, Aug 2025, ToolExpNet: Optimizing Multi-Tool Selection in LLMs with Similarity and Dependency-Aware Experience Networks, Findings of the Association for Computational Linguistics: ACL 2025, pages 15706–15722 July 27- August 1, 2025, https://aclanthology.org/2025.findings-acl.811.pdf
Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, Shenghua Liu, 21 Jul 2025 (v2), A Survey of Context Engineering for Large Language Models, https://arxiv.org/abs/2507.13334
Xu, W., Huang, C., Gao, S. et al. LLM-Based Agents for Tool Learning: A Survey. Data Sci. Eng. (2025). https://doi.org/10.1007/s41019-025-00296-9 https://link.springer.com/article/10.1007/s41019-025-00296-9
Yan Jiang, Hao Zhou, LiZhong GU, Ai Han, TianLong Li, 24 Jun 2025, NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling, https://arxiv.org/abs/2506.19500
Xiaoyu Tan, Bin Li, Xihe Qiu, Chao Qu, Wei Chu, Yinghui Xu, and Yuan Qi. 2025. Meta-Agent-Workflow: Streamlining Tool Usage in LLMs through Workflow Construction, Retrieval, and Refinement. In Companion Proceedings of the ACM on Web Conference 2025 (WWW '25). Association for Computing Machinery, New York, NY, USA, 458–467. https://doi.org/10.1145/3701716.3715247 https://dl.acm.org/doi/abs/10.1145/3701716.3715247
Sebastian Nicolas Müller, May 23, 2025, Infinite tool use, https://snimu.github.io/2025/05/23/infinite-tool-use.html
J Vigel, R Cai, ECA Neema, A Liao, K Zhu, S O'Brien, 2025, Self Knowledge-Tracing for Tool Use (SKT-Tool): Helping LLM Agents Understand Their Capabilities in Tool Use, NAACL2025, The 5th Workshop on Insights from Negative Results in NLP, https://aclanthology.org/anthology-files/anthology-files/pdf/insights/2025.insights-1.pdf#page=155
Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji, 16 Apr 2025, ToolRL: Reward is All Tool Learning Needs, https://arxiv.org/abs/2504.13958
Geoffrey Huntley AGENT.md: The Universal Agent Configuration File, July 2025 Request for Comments, https://ampcode.com/AGENT.md
Ningning Wang, Xavier Hu, Pai Liu, He Zhu, Yue Hou, Heyuan Huang, Shengyu Zhang, Jian Yang, Jiaheng Liu, Ge Zhang, Changwang Zhang, Jun Wang, Yuchen Eleanor Jiang, Wangchunshu Zhou, 24 Jul 2025, Efficient Agents: Building Effective Agents While Reducing Cost, https://arxiv.org/pdf/2508.02694 https://github.com/OPPO-PersonalAI/OAgents
Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
Anoop Kotha, Julian Lee, Eric Zakariasson, Anoop Kotha, Julian Lee, OpenAI, Aug 2025, GPT-5 prompting guide, https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide
Will Fein, Ryan J. Horwitz, John E. Brown III, Amit Misra, Felipe Oviedo, Kevin White, Juan M. Lavista Ferres, Samuel K. Wasser, 13 Aug 2025, AI-Driven Detection and Analysis of Handwriting on Seized Ivory: A Tool to Uncover Criminal Networks in the Illicit Wildlife Trade, https://arxiv.org/abs/2508.10219
Muhammad Ahmad, Fida Ullah, Muhammad Usman, Ildar Batyrshin, Grigori Sidorov, 12 Aug 2025, SABIA: An AI-Powered Tool for Detecting Opioid-Related Behaviors on Social Media, https://arxiv.org/abs/2508.10046
Xingshan Zeng, Weiwen Liu, Xu Huang, Zezhong Wang, Lingzhi Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruiming Tang, Qun Liu, 14 Aug 2025, ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning, https://arxiv.org/abs/2504.01400
Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas, 23 Jul 2025, TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment, https://arxiv.org/abs/2507.17514
Zhao Song, Song Yue, Jiahao Zhang, 23 Jul 2025, Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations, https://arxiv.org/abs/2507.17699
Arduin Findeis, Floris Weers, Guoli Yin, Ke Ye, Ruoming Pang, Tom Gunter, 22 Jul 2025, Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?, https://arxiv.org/abs/2507.17015
Po-Yen Wu, Cheng-Yu Kuo, Yuki Kadokawa, and Takamitsu Matsubara, 23 Jul 2025, Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning, https://arxiv.org/abs/2507.17275
Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo, Jiahao Li, Bin Li, Houqiang Li, Yan Lu, 23 Jul 2025, Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding, https://arxiv.org/abs/2505.18079
Timothy Tin Long Yu, Mahdi Mostajabdaveh, Jabo Serge Byusa, Rindra Ramamonjison, Giuseppe Carenini, Kun Mao, Zirui Zhou, Yong Zhang, 23 Jul 2025, SMARTAPS: Tool-augmented LLMs for Operations Management, https://arxiv.org/abs/2507.17927
Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun, 23 Jul 2025, Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale, https://arxiv.org/abs/2507.17985
Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin, 18 Jul 2025, To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization, https://arxiv.org/abs/2502.00691
Jiale Liu, Huan Wang, Yue Zhang, Xiaoyu Luo, Jiaxiang Hu, Zhiliang Liu, Min Xie, 20 Jul 2025, InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis, https://arxiv.org/abs/2507.14899
Richard M. Charles, James H. Curry and Richard B. Charles, 15 Jul 2025, Mitigating Trojanized Prompt Chains in Educational LLM Use Cases: Experimental Findings and Detection Tool Design, https://arxiv.org/abs/2507.14207
Qian Xiong and Yuekai Huang and Ziyou Jiang and Zhiyuan Chang and Yujia Zheng and Tianhao Li and Mingyang Li, 21 Jul 2025, Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems, https://arxiv.org/abs/2507.15296
Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni, 19 Jul 2025, Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation, https://arxiv.org/abs/2506.11092
Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo, 11 Aug 2025, MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark, https://arxiv.org/abs/2508.07575
Wenpeng Xing, Zhipeng Chen, Changting Lin, Meng Han, 11 Aug 2025, HGMF: A Hierarchical Gaussian Mixture Framework for Scalable Tool Invocation within the Model Context Protocol, https://arxiv.org/abs/2508.07602
Luyao Zhuang, Qinggang Zhang, Huachi Zhou, Juhua Liu, Qing Li, Xiao Huang, 11 Aug 2025, LoSemB: Logic-Guided Semantic Bridging for Inductive Tool Retrieval, https://arxiv.org/abs/2508.07690
Keyan Ding, Jing Yu, Junjie Huang, Yuchen Yang, Qiang Zhang, Huajun Chen, 27 Jul 2025, SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration, https://arxiv.org/abs/2507.20280
Vicente Ramos (1), Sundous Hussein (1), Mohamed Abdel-Hafiz (1), Arunangshu Sarkar (2), Weixuan Liu (2), Katerina J. Kechris (2), Russell P. Bowler (3), Leslie Lange (4), Farnoush Banaei-Kashani (1) ((1) Department of Computer Science and Engineering, University of Colorado Denver, Denver, USA, (2) Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, USA, (3) Genomic Medicine Institute, Cleveland Clinic, Cleveland, USA, (4) Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, USA), 27 Jul 2025, BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool, https://arxiv.org/abs/2507.20440
Nicola Croce, Tobin South, 26 Jul 2025, Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data, https://arxiv.org/abs/2507.19880
Harsh Purohit, Tomoya Nishida, Kota Dohi, Takashi Endo, and Yohei Kawaguchi, 28 Jul 2025, MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection, https://arxiv.org/abs/2507.20666
Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board), 28 Jul 2025, Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures, https://arxiv.org/abs/2507.21360
Sergio Rojas-Galeano, 26 Jun 2025, Tool or Trouble? Exploring Student Attitudes Toward AI Coding Assistants, https://arxiv.org/abs/2507.22900
Yiya Diao, Changhe Li, Sanyou Zeng, Xinye Cai, Wenjian Luo, Shengxiang Yang, and Carlos A. Coello Coello, 30 Jul 2025, Nearest-Better Network for Visualizing and Analyzing Combinatorial Optimization Problems: A Unified Tool, https://arxiv.org/abs/2507.22440
Hongjin Qian, Zheng Liu, 1 Aug 2025, MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning, https://arxiv.org/abs/2508.00271
Amur Ghose, Andrew B. Kahng, Sayak Kundu, and Zhiang Wang, 1 Aug 2025, ORFS-agent: Tool-Using Agents for Chip Design Optimization, https://arxiv.org/abs/2506.08332
Eric Hirsch and Christian Friedrich, 31 Jul 2025, Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach, https://arxiv.org/abs/2412.19950
Michelle S. Lam, Fred Hohman, Dominik Moritz, Jeffrey P. Bigham, Kenneth Holstein, Mary Beth Kery, 1 Aug 2025, Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors, https://arxiv.org/abs/2409.18203
Guozhao Mo, Wenliang Zhong, Jiawei Chen, Xuanang Chen, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun, 3 Aug 2025, LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?, https://arxiv.org/abs/2508.01780
Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li, 4 Aug 2025, Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools, https://arxiv.org/abs/2508.02110
Nys Tjade Siegel, James H. Cole, Mohamad Habes, Stefan Haufe, Kerstin Ritter, Marc-Andr\'e Schulz, 4 Aug 2025, Explainable AI Methods for Neuroimaging: Systematic Failures of Common Tools, the Need for Domain-Specific Validation, and a Proposal for Safe Application, https://arxiv.org/abs/2508.02560
Ashutosh Hathidara, Julien Yu, Sebastian Schreiber, 4 Aug 2025, Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky, https://arxiv.org/abs/2507.03336
Sunil Kumar, Bowen Zhao, Leo Dirac, Paulina Varshavskaya, 2 Aug 2025, Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints, https://arxiv.org/abs/2506.14821
Peng Ding, Rick Stevens, 5 Aug 2025, Unified Tool Integration for LLMs: A Protocol-Agnostic Approach to Function Calling, https://arxiv.org/abs/2508.02979
Zikun Cui, Tianyi Huang, Chia-En Chiang, Cuiqianhe Du, 5 Aug 2025, Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework, https://arxiv.org/abs/2508.03092
Shaofeng Yin, Ting Lei, Yang Liu, 5 Aug 2025, ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools, https://arxiv.org/abs/2508.03284
Khaled Bachir Delassi (1), Lakhdar Zeggane (1), Hadda Cherroun (1), Abdelhamid Haouhat (1), Kaoutar Bouzouad (2) ((1) LIM Lab, Amar Telidji University, Laghouat, Algeria, (2) Computer Science Dept., USTHB, Algiers, Algeria), 5 Aug 2025, VQA support to Arabic Language Learning Educational Tool, https://arxiv.org/abs/2508.03488
Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie, 5 Aug 2025, Tool-integrated Reinforcement Learning for Repo Deep Search, https://arxiv.org/abs/2508.03012
Peng Ding, 11 Jul 2025, ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs, https://arxiv.org/abs/2507.10593
Si Chen, Izzy Molnar, Ting Hua, Peiyu Li, Le Huy Khiem, G. Alex Ambrose, Jim Lang, Ronald Metoyer, Nitesh V. Chawla, 6 Aug 2025, \textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices, https://arxiv.org/abs/2508.04428
Manuela Schuler, 6 Aug 2025, A Visual Tool for Interactive Model Explanation using Sensitivity Analysis, https://arxiv.org/abs/2508.04269
Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604
Natalia Echeverry and Arun Lekshmi Narayanan, 6 Aug 2025, How are CS students using resources and AI tools for coding tasks?, https://arxiv.org/abs/2508.04667
Rafael Salinas-Buestan, Otto Parra, Nelly Condori-Fernandez, Maria Fernanda Granda, 22 Jul 2025, Evaluating Generative AI Tools for Personalized Offline Recommendations: A Comparative Study, https://arxiv.org/abs/2508.03710
Linfeng Gao, Yaoxiang Wang, Minlong Peng, Jialong Tang, Yuzhe Shang, Mingming Sun, Jinsong Su, 7 Aug 2025, Tool Graph Retriever: Exploring Dependency Graph-based Tool Retrieval for Large Language Models, https://arxiv.org/abs/2508.05152
Hannah-Beth Clark, Laura Benton, Emma Searle, Margaux Dowland, Matthew Gregory, Will Gayne and John Roberts, 7 Aug 2025, Building Effective Safety Guardrails in AI Education Tools, https://arxiv.org/abs/2508.05360
Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber, 7 Aug 2025, Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning, https://arxiv.org/abs/2508.05888
Chandler Campbell, Bernie Boscoe, Tuan Do, 25 Jul 2025, AquiLLM: a RAG Tool for Capturing Tacit Knowledge in Research Groups, https://arxiv.org/abs/2508.05648
Jiaxuan Liang, Shide Zhou, and Kailong Wang, 26 Jul 2025, OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools, https://arxiv.org/abs/2508.05650
Xianghe Pang, Shuo Tang, Rui Ye, Yuwen Du, Yaxin Du, Siheng Chen, 12 Aug 2025, BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair, https://arxiv.org/abs/2508.09129
Junjie Ye, Changhao Jiang, Zhengyin Du, Yufei Xu, Xuesong Yao, Zhiheng Xi, Xiaoran Fan, Qi Zhang, Xuanjing Huang, Jiecao Chen, 12 Aug 2025, Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments, https://arxiv.org/abs/2508.08791
Jiawei Zhou, Amy Z. Chen, Darshi Shah, Laura M. Schwab Reese, and Munmun De Choudhury, 11 Aug 2025, A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health, https://arxiv.org/abs/2411.02594
Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du, 15 Aug 2025, Tool-Planner: Task Planning with Clusters across Multiple Tools, https://arxiv.org/abs/2406.03807
Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi, 18 Aug 2025, GTool: Graph Enhanced Tool Planning with Large Language Model, https://arxiv.org/abs/2508.12725
Guangfu Hao, Haojie Wen, Liangxuan Guo, Yang Chen, Yanchao Bi, Shan Yu, 18 Aug 2025, Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language, https://arxiv.org/abs/2505.22146
Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, Hong Zhang, 19 Aug 2025, MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence, https://arxiv.org/abs/2508.13534
Wenxin Jiang, Mingyu Kim, Chingwo Cheung, Heesoo Kim, George K. Thiruvathukal, James C. Davis, 18 Aug 2025, "I see models being a whole other thing": An Empirical Study of Pre-Trained Model Naming Conventions and A Tool for Enhancing Naming Consistency, https://arxiv.org/abs/2310.01642
Zhongzhou Chen, 20 Aug 2025, Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use, https://arxiv.org/abs/2508.14755
Lixiang Yan, 20 Aug 2025, From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning, https://arxiv.org/abs/2508.14825
Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji, 21 Aug 2025, IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents, https://arxiv.org/abs/2508.15310
Yufeng Zhao, Junnan Liu, Hongwei Liu, Dongsheng Zhu, Yuan Shen, Songyang Zhang, Kai Chen, 21 Aug 2025, Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis, https://arxiv.org/abs/2508.15754
Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li, 19 Aug 2025, MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers, https://arxiv.org/abs/2508.14925
Vishnou Vinayagame, Gregory Senay, and Luis Mart\'i, 20 Aug 2025, MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications, https://arxiv.org/abs/2411.18915
Fei Lei, Yibo Yang, Wenxiu Sun, Dahua Lin, 22 Aug 2025, MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use, https://arxiv.org/abs/2508.16260
Eduardo de Conto, Blaise Genest, Arvind Easwaran, Nicholas Ng, Shweta Menon, 25 Aug 2025, DesCartes Builder: A Tool to Develop Machine-Learning Based Digital Twins, https://arxiv.org/abs/2508.17988
Thao Le, Tim Miller, Ruihan Zhang, Liz Sonenberg, Ronal Singh, 25 Aug 2025, Visual Evaluative AI: A Hypothesis-Driven Tool with Concept-Based Explanations and Weight of Evidence, https://arxiv.org/abs/2407.04710
Bingguang Hao, Maolin Wang, Zengzhuang Xu, Yicheng Chen, Cunyin Peng, Jinjie GU, Chenyi Zhuang, 7 Aug 2025, Exploring Superior Function Calls via Reinforcement Learning, https://arxiv.org/abs/2508.05118

Tool-Augmented Language Models (TALM)

Reserch papers on TALM:

Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 7 May 2024, An LLM-Tool Compiler for Fused Parallel Function Calling, https://arxiv.org/abs/2405.17438
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, 2024, INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference, https://openreview.net/pdf?id=wDDGQabYPQ
Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, July 2024, InferCept: Efficient Intercept Support for Augmented Large Language Model Inference, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:81-95, 2024, https://proceedings.mlr.press/v235/abhyankar24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov, 18 Feb 2024, Tool-Augmented LLMs as a Universal Interface for IDEs, https://arxiv.org/abs/2402.11635
Amy Marks, Jun 11, 2024, Clarifying Function Calling / Tool Use in LLMs, https://medium.com/@aevalone/clarifying-function-calling-tool-use-in-llms-6511af510f99
Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu, 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412
Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall, 8 Dec 2024, Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt, https://arxiv.org/abs/2412.05967
Vincent-Pierre Berges, Barlas Oguz, December 12, 2024, Memory Layers at Scale, Meta, https://ai.meta.com/research/publications/memory-layers-at-scale/ https://github.com/facebookresearch/memory (Augmention of an LLM with an additional key-value associative memory, by replacing some FFNs with a "memory layer".)
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Xinyu Pang, Ruixin Hong, Zhanke Zhou, Fangrui Lv, Xinwei Yang, Zhilong Liang, Bo Han, Changshui Zhang, 18 Dec 2024, Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models, https://arxiv.org/abs/2412.13791 (Augmented reasoning by retrieving physics formulas, checklists, and other relevant information.)
Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu, 22 Dec 2024, Teaching LLMs to Refine with Tools, https://arxiv.org/abs/2412.16871
Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Lakmal Seneviratne, Irfan Hussain, 10 Dec 2024, Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning, https://arxiv.org/abs/2412.07493 https://muhayyuddin.github.io/llm-tamp/ (Detecting objects in the prompt text and then using a RALM algorithm to query an ontology database.)
Oleksandr Palagin, Vladislav Kaverinskiy, Anna Litvin, Kyrylo Malakhov, 11 Jul 2023, OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning, International Journal of Computing, 22(2), 170-183, https://arxiv.org/abs/2307.05082 https://doi.org/10.47839/ijc.22.2.3086 https://computingonline.net/computing/article/view/3086
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Florian Dietz, Dietrich Klakow, 1 Jan 2025, IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently, https://arxiv.org/abs/2501.00684
Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
Julian Perry, Surasakdi Siripong, Thanakorn Phonchai, 15 Jan 2025, Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning, https://arxiv.org/abs/2501.08597 (Augment training data dynamically by retrieving extra information.)
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, 21 Feb 2024 (v4), ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, https://arxiv.org/abs/2309.17452
Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan, 18 Sep 2024, TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning, https://arxiv.org/abs/2409.11724 https://github.com/XinyuanLu00/TART
Jianfeng Pan, Senyou Deng, Shaomang Huang, 4 Feb 2025, CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning, https://arxiv.org/abs/2502.02390 (Integrating results from an "associative memory" in CoT reasoning paths at inference time.)
Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang, 10 Feb 2025, ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, https://arxiv.org/abs/2502.06772 https://github.com/Gen-Verse/ReasonFlux (RALM-like retrieval of reasoning prompt templates at inference time.)
Sam Lin, Wenyue Hua, Lingyao Li, Zhenting Wang, Yongfeng Zhang, 17 Feb 2025. ADO: Automatic Data Optimization for Inputs in LLM Prompts, https://arxiv.org/pdf/2502.11436 (Reformulating the input context such as by semantical marking of relevant content or formatting changes.)
Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan, 16 Feb 2025, QuOTE: Question-Oriented Text Embeddings, https://arxiv.org/abs/2502.10976 (Augmenting RAG chunks with additional information, such as questions the chunk might answer.)
C Winston, R Just, Feb 2025, A Taxonomy of Failures in Tool-Augmented LLMs, https://homes.cs.washington.edu/~rjust/publ/tallm_testing_ast_2025.pdf
Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, 26 Feb 2025, Automatic Prompt Optimization via Heuristic Search: A Survey, https://arxiv.org/abs/2502.18746 (Survey of auto prompting, from basic LLM enhancements to some methods quite similar to RALM and TALM.)
H. Lu, X. Li, X. Ji, Z. Kan and Q. Hu, "ToolFiVe: Enhancing Tool-Augmented LLMs via Tool Filtering and Verification," ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5, doi: 10.1109/ICASSP49660.2025.10887544. https://ieeexplore.ieee.org/abstract/document/10887544/
Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang, 11 Jun 2024, Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees, https://arxiv.org/abs/2406.07115
Aiyao He, Sijia Cui, Shuai Xu, Yanna Wang, Bo Xu, 13 May 2025, TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers, https://arxiv.org/abs/2505.08402
Bohan Yao, Vikas Yadav, 25 Jul 2025, A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation, https://arxiv.org/abs/2507.18973 (Launch multiple tools and aggregate the results)
Sebe Vanbrabant, Gilles Eerlings, Gustavo Alberto Rovelo Ruiz, and Davy Vanacken. 2025. ECHO: Enhancing Conversational Explainable AI through Tool-Augmented Language Models. Proc. ACM Hum.-Comput. Interact. 9, 4, Article EICS014 (June 2025), 33 pages. https://doi.org/10.1145/3734191 https://dl.acm.org/doi/abs/10.1145/3734191
Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604

LLM Screen Access

Yicheng Fu, Raviteja Anantha, Prabal Vashisht, Jianpeng Cheng, Etai Littwin, 6 Sep 2024, UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity, https://www.arxiv.org/abs/2409.04081
V Adrakatti, 2024, Exploring screen summarization with large language and multimodal models, Masters Thesis, University of Illinois Urbana-Champaign, Urbana, Illinois, USA, https://www.ideals.illinois.edu/items/131510
Anthropic, 23 Oct 2024, Developing a computer use model, https://www.anthropic.com/news/developing-computer-use
Anthropic, 23 Oct 2024, Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku, https://www.anthropic.com/news/3-5-models-and-computer-use
Anirban Ghoshal, 23 Oct 2024, How Anthropic’s new ‘computer use’ ability could further AI automation, https://www.cio.com/article/3583260/how-anthropics-new-computer-use-ability-could-further-ai-automation.html
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao, 30 Oct 2024, OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, https://arxiv.org/abs/2410.23218 https://github.com/OS-Copilot/OS-Atlas
Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui, 13 Sep 2024 (v2), Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale, https://arxiv.org/abs/2409.08264
Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890

LLM Computer Usage

Anthropic, 23 Oct 2024, Developing a computer use model, https://www.anthropic.com/news/developing-computer-use
Anthropic, 23 Oct 2024, Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku, https://www.anthropic.com/news/3-5-models-and-computer-use
Anirban Ghoshal, 23 Oct 2024, How Anthropic’s new ‘computer use’ ability could further AI automation, https://www.cio.com/article/3583260/how-anthropics-new-computer-use-ability-could-further-ai-automation.html
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao, 30 Oct 2024, OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, https://arxiv.org/abs/2410.23218 https://github.com/OS-Copilot/OS-Atlas
Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui, 13 Sep 2024 (v2), Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale, https://arxiv.org/abs/2409.08264
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang, Yang Yang, Yifan Xu, Yu Yang, Yujia Wang, Yulin Xu, Zehan Qi, Yuxiao Dong, Jie Tang, 28 Oct 2024, AutoGLM: Autonomous Foundation Agents for GUIs https://arxiv.org/abs/2411.00820
Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890
Shirin Ghaffary and Rachel Metz November 14, 2024, OpenAI Nears Launch of AI Agent Tool to Automate Tasks for Users. The new software, codenamed “Operator,” is set to be released in January. https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users
Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou, 15 Nov 2024, The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, https://arxiv.org/abs/2411.10323 https://github.com/showlab/computer_use_ootb
Mike Elgan, 22 Nov 2024, AI agents are unlike any technology ever, https://www.computerworld.com/article/3608973/ai-agents-are-unlike-any-technology-ever.html
Show Lab, Nov 2024, ShowUI: ShowUI is a lightweight (2B) vision-language-action model designed for GUI agents. https://huggingface.co/showlab/ShowUI-2B
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, 27 Nov 2024, Large Language Model-Brained GUI Agents: A Survey, https://arxiv.org/abs/2411.18279
Zhuosheng Zhang, Aston Zhang, 7 Jun 2024 (v4), You Only Look at Screens: Multimodal Chain-of-Action Agents, https://arxiv.org/abs/2309.11436
Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su, 7 Oct 2024, Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents, https://arxiv.org/abs/2410.05243
Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu, 23 Feb 2024 (v2), SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents, https://arxiv.org/abs/2401.10935
Michael Nuñez, November 29, 2024, AI that clicks for you: Microsoft’s research points to the future of GUI automation, https://venturebeat.com/ai/ai-that-clicks-for-you-microsoft-research-points-to-the-future-of-gui-automation/
Yiqin Wang, Haoji Zhang, Jingqi Tian, Yansong Tang, 2 Dec 2024, Ponder & Press: Advancing Visual GUI Agent towards General Computer Control, https://arxiv.org/abs/2412.01268 https://invinciblewyq.github.io/ponder-press-page/
Kyle Wiggers, December 5, 2024, Copilot Vision, Microsoft’s AI tool that can read your screen, launches in preview, https://techcrunch.com/2024/12/05/copilot-vision-microsofts-ai-tool-that-can-read-your-screen-launches-in-preview/
Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang; 2024, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 14281-14290, CogAgent: A Visual Language Model for GUI Agents https://openaccess.thecvf.com/content/CVPR2024/html/Hong_CogAgent_A_Visual_Language_Model_for_GUI_Agents_CVPR_2024_paper.html
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan, 8 Apr 2024, Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs, https://arxiv.org/abs/2404.05719
Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun, 16 Jun 2024, GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents, https://arxiv.org/abs/2406.10819
Wentong Chen, Junbo Cui, Jinyi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun, 17 Jun 2024, GUICourse: From General Vision Language Models to Versatile GUI Agents, https://arxiv.org/abs/2406.11317 https://github.com/yiye3/GUICourse
Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong, 5 Dec 2024, Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction, https://arxiv.org/abs/2412.04454 https://aguvis-project.github.io/
AskUI, Dec 2024, AskUI Vision Agent: Automate computer tasks in Python, https://github.com/askui/vision-agent
Huawen Shen, Chang Liu, Gengluo Li, Xinlong Wang, Yu Zhou, Can Ma, Xiangyang Ji, 12 Dec 2024, Falcon-UI: Understanding GUI Before Following User Instructions, https://arxiv.org/abs/2412.09362
Google, Dec 2024, Project Mariner: A research prototype exploring the future of human-agent interaction, starting with your browser, https://deepmind.google/technologies/project-mariner/
Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, 13 Dec 2024, Large Action Models: From Inception to Implementation, https://arxiv.org/abs/2412.10047 https://github.com/microsoft/UFO/tree/main/dataflow https://microsoft.github.io/UFO/dataflow/overview/
Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang, 13 Dec 2024, Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining, https://arxiv.org/abs/2412.10342
Jiarun Liu, Jia Hao, Chunhong Zhang, Zheng Hu, 14 Dec 2024, WEPO: Web Element Preference Optimization for LLM-based Web Navigation, https://arxiv.org/abs/2412.10742
Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt, 18 Dec 2024, GUI Agents: A Survey, https://arxiv.org/abs/2412.13501
Diego Rivas, D Shin, 19 Dec 2024 User Interface for Efficient Control of Autonomous Agent Tasks , https://www.tdcommons.org/cgi/viewcontent.cgi?article=8835&context=dpubs_series
Hao Wen, Shizuo Tian, Borislav Pavlov, Wenjie Du, Yixuan Li, Ge Chang, Shanhui Zhao, Jiacheng Liu, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li, 24 Dec 2024, AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation, https://arxiv.org/abs/2412.18116
Kangjia Zhao, Jiahui Song, Leigang Sha, Haozhan Shen, Zhi Chen, Tiancheng Zhao, Xiubo Liang, Jianwei Yin, 24 Dec 2024, GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent, https://arxiv.org/abs/2412.18426 https://github.com/ZJU-ACES-ISE/ChatUITest
X Hu, T Xiong, B Yi, Z Wei, R Xiao, Y Chen, J Ye, M Tao, Dec 2024, OS Agents: A Survey on MLLM-Based Agents for General Computing Devices Use, https://www.preprints.org/frontend/manuscript/3842b6163d82801988adf663ee18b6d5/download_pub
Xueyu Hu,Tao Xiong,Biao Yi,Zishu Wei,Ruixuan Xiao,Yurun Chen,Jiasheng Ye,Meiling Tao,Xiangxin Zhou,Ziyu Zhao,Yuhuai Li,Shengze Xu,Shawn Wang,Xinchen Xu,Shuofei Qiao,Kun Kuang,Tieyong Zeng,Liang Wang,Jiwei Li,Yuchen Eleanor Jiang,Wangchunshu Zhou,Guoyin Wang,Keting Yin,Zhou Zhao,Hongxia Yang,Fan Wu,Shengyu Zhang ,Fei Wu, Dec 2024, OS Agents: A Survey on MLLM-Based Agents for General Computing Devices Use, https://www.preprints.org/manuscript/202412.2294/v1
Gautier Dagan, Frank Keller, Alex Lascarides, 30 Dec 2024, Plancraft: an evaluation dataset for planning with LLM agents, https://arxiv.org/abs/2412.21033
Yuxiang Chai, Hanhao Li, Jiayu Zhang, Liang Liu, Guozhi Wang, Shuai Ren, Siyuan Huang, Hongsheng Li, 2 Jan 2025, A3: Android Agent Arena for Mobile GUI Agents, https://arxiv.org/abs/2501.01149 https://yuxiangchai.github.io/Android-Agent-Arena/
Dezhi Ran, Mengzhou Wu, Hao Yu, Yuetong Li, Jun Ren, Yuan Cao, Xia Zeng, Haochuan Lu, Zexin Xu, Mengqian Xu, Ting Su, Liangchao Yao, Ting Xiong, Wei Yang, Yuetang Deng, Assaf Marron, David Harel, Tao Xie, 6 Jan 2025, Beyond Pass or Fail: A Multi-dimensional Benchmark for Mobile UI Navigation, https://arxiv.org/abs/2501.02863
Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu, 8 Jan 2025, InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, https://arxiv.org/abs/2501.04575
Taryn Plumb, January 22, 2025, ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude, https://venturebeat.com/ai/bytedances-ui-tars-can-take-over-your-computer-outperforms-gpt-4o-and-claude/
Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi, 21 Jan 2025, UI-TARS: Pioneering Automated GUI Interaction with Native Agents, https://arxiv.org/abs/2501.12326
Maxwell Zeff, January 23, 2025, OpenAI launches Operator, an AI agent that performs tasks autonomously, https://techcrunch.com/2025/01/23/openai-launches-operator-an-ai-agent-that-performs-tasks-autonomously/
Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, Thilo Stadelmann, 27 Jan 2025, AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants, https://arxiv.org/abs/2501.16150
Kyle Wiggers, January 27, 2025, Alibaba’s Qwen team releases AI models that can control PCs and phones, https://techcrunch.com/2025/01/27/alibabas-qwen-team-releases-ai-models-that-can-control-pcs-and-phones/
Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, and Yuanchun Shi. 2025. Prompt2Task: Automating UI Tasks on Smartphones from Textual Prompts. ACM Trans. Comput.-Hum. Interact. Just Accepted (February 2025). https://doi.org/10.1145/3716132 https://dl.acm.org/doi/abs/10.1145/3716132
Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang, 5 Feb 2025, ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation, https://arxiv.org/abs/2502.02955
Kunal Singh, Shreyas Singh, Mukund Khanna, 12 Feb 2025, TRISHUL: Towards Region Identification and Screen Hierarchy Understanding for Large VLM based GUI Agents, https://arxiv.org/abs/2502.08226
Matt Marshall, February 22, 2025, The rise of browser-use agents: Why Convergence’s Proxy is beating OpenAI’s Operator, https://venturebeat.com/ai/the-rise-of-browser-use-agents-why-convergences-proxy-is-beating-openais-operator/
Frank Landymore, Jan 25, 2025, OpenAI's Agent Has a Problem: Before It Does Anything Important, You Have to Double-Check It Hasn't Screwed Up: Not as hands-off as you might hope, https://futurism.com/openai-asks-permission-important
Jiani Zheng, Lu Wang, Fangkai Yang, Chaoyun Zhang, Lingrui Mei, Wenjie Yin, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang, 26 Feb 2025, VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model, https://arxiv.org/abs/2502.18906
Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Chi Zhang, 4 Mar 2025, AppAgentX: Evolving GUI Agents as Proficient Smartphone Users, https://arxiv.org/abs/2503.02268
Zongru Wu, Pengzhou Cheng, Zheng Wu, Tianjie Ju, Zhuosheng Zhang, Gongshen Liu, 4 Mar 2025 (v2), Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks, https://arxiv.org/abs/2503.00401 https://github.com/ZrW00/GUIPivot
Yuqi Zhou, Shuai Wang, Sunhao Dai, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Jun Xu, 5 Mar 2025, CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning, https://arxiv.org/abs/2503.03743
Asif Razzaq, March 8, 2025, Meet Manus: A New AI Agent from China with Deep Research + Operator + Computer Use + Lovable + Memory, https://www.marktechpost.com/2025/03/08/meet-manus-a-new-ai-agent-from-china-with-deep-research-operator-computer-use-lovable-memory/
Kyle Wiggers, March 12, 2025, Browser Use, one of the tools powering Manus, is also going viral,https://techcrunch.com/2025/03/12/browser-use-one-of-the-tools-powering-manus-is-also-going-viral/
Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang, Chunhua Shen, 11 Mar 2025, SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories, https://arxiv.org/abs/2503.08625
Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv, 12 Mar 2025, COLA: A Scalable Multi-Agent Framework For Windows UI Task Automation, https://arxiv.org/abs/2503.09263
Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Dongmei Zhang, 14 Mar 2025, API Agents vs. GUI Agents: Divergence and Convergence, https://arxiv.org/abs/2503.11069
Yibin Xu, Liang Yang, Hao Chen, Hua Wang, Zhi Chen, Yaohua Tang, 14 Mar 2025, DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents, https://arxiv.org/abs/2503.11170
Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Guanjing Xiong, Hongsheng Li, 30 Mar 2025 (v2), UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning, https://arxiv.org/abs/2503.21620
Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu, 21 Mar 2025 (v2), Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment, https://arxiv.org/abs/2503.15937
Bin Lei, Weitai Kang, Zijian Zhang, Winson Chen, Xi Xie, Shan Zuo, Mimi Xie, Ali Payani, Mingyi Hong, Yan Yan, Caiwen Ding, 16 May 2025, InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction, https://arxiv.org/abs/2505.10887
Shuning Zhang, Jingruo Chen, Zhiqi Gao, Jiajing Gao, Xin Yi, Hewu Li, 16 May 2025 (v2), Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing, https://arxiv.org/abs/2505.09875
Apoorv Agrawal, May 23, 2025, Why Cars Drive Themselves Before Computers Do: Robocars are ready; robot secretaries aren’t… yet, https://apoorv03.com/p/autonomy
Yuheng Lu, Qian Yu, Hongru Wang, Zeming Liu, Wei Su, Yanping Liu, Yuhang Guo, Maocheng Liang, Yunhong Wang, Haifeng Wang, 27 May 2025 (v2), TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments, https://arxiv.org/abs/2505.17629
Shuquan Lian, Yuhang Wu, Jia Ma, Yifan Ding, Zihan Song, Bingqi Chen, Xiawu Zheng, Hui Li, 9 Aug 2025, UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding, https://arxiv.org/abs/2507.22025
Zihan Zheng, Tianle Cui, Chuwen Xie, Jiahui Zhang, Jiahui Pan, Lewei He, Qianglong Chen, 2 Aug 2025, NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset, https://arxiv.org/abs/2508.01330
Zheng Wu and Pengzhou Cheng and Zongru Wu and Lingzhong Dong and Zhuosheng Zhang, 4 Aug 2025, GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents, https://arxiv.org/abs/2505.12842
Chao Hao, Shuai Wang and Kaiwen Zhou, 6 Aug 2025, Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement, https://arxiv.org/abs/2508.04025
Liang Tang, Shuxian Li, Yuhao Cheng, Yukang Huo, Zhepeng Wang, Yiqiang Yan, Kaer Huang, Yanzhe Jing and Tiaonan Duan, 6 Aug 2025, SEA: Self-Evolution Agent with Step-wise Reward for Computer Use, https://arxiv.org/abs/2508.04037
Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang, 6 Aug 2025, SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience, https://arxiv.org/abs/2508.04700
Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang, 19 Jul 2025, MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning, https://arxiv.org/abs/2508.03700
Wenkang Han, Zhixiong Zeng, Jing Huang, Shu Jiang, Liming Zheng, Haibo Qiu, Chang Yao, Jingyuan Chen, Lin Ma, 6 Aug 2025, UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions, https://arxiv.org/abs/2506.11127
Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, and Jie Tang, 19 Aug 2025, ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents, https://arxiv.org/abs/2508.14040
Xinyuan Wang, Bowen Wang, Dunjie Lu, Junlin Yang, Tianbao Xie, Junli Wang, Jiaqi Deng, Xiaole Guo, Yiheng Xu, Chen Henry Wu, Zhennan Shen, Zhuokai Li, Ryan Li, Xiaochuan Li, Junda Chen, Boyuan Zheng, Peihang Li, Fangyu Lei, Ruisheng Cao, Yeqiao Fu, Dongchan Shin, Martin Shin, Jiarui Hu, Yuyan Wang, Jixuan Chen, Yuxiao Ye, Danyang Zhang, Dikang Du, Hao Hu, Huarong Chen, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, Heng Wang, Diyi Yang, Victor Zhong, Flood Sung, Y.Charles, Zhilin Yang, Tao Yu, 14 Aug 2025, OpenCUA: Open Foundations for Computer-Use Agents, https://arxiv.org/abs/2508.09123
Thong Q. Nguyen, Shubhang Desai, Raja Hasnain Anwar, Firoz Shaik, Vishwas Suryanarayanan, Vishal Chowdhary, 2 Aug 2025, VerificAgent: Domain-Specific Memory Verification for Scalable Oversight of Aligned Computer-Use Agents, https://arxiv.org/abs/2506.02539
Songqin Nong, Jingxuan Xu, Sheng Zhou, Jianfeng Chen, Xiaoxuan Tang, Tao Jiang, Wenhao Xu, 15 Aug 2025, CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks, https://arxiv.org/abs/2508.11360
Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, and Ming Yan, 21 Aug 2025, Mobile-Agent-v3: Foundamental Agents for GUI Automation, https://arxiv.org/abs/2508.15144