Aussie AI
Tool Usage in AI Architectures
-
Last Updated 12 December, 2024
-
by David Spuler, Ph.D.
This section is about tools being used by LLMs, rather than tools being used by humans to create LLMs. There's a big difference!
Why LLM Tools?
LLMs require tools to do more advanced things, just like humans. For example, if someone asks you the time, you look at your watch (or your phone). If you ask an LLM "What is the time?" there is nothing in its training data set that could possibly answer this correctly. The only way is to use a clock that's integrated iinto the LLM, and executed by the AI Engine as part of answering your query.
Types of LLM Tools
There are several types of tools that can be integrated:
- Data sources (e.g. real estate listings)
- Dynamic calculations
- Action tools in agent architectures (e.g. an API to send an email).
Types of dynamic calculation tools include:
- Clocks
- Calculators (arithmetic)
- Converters (e.g. pounds to kilograms)
- Calendars (date or day calculations)
And many more...
How are Dynamic Tools Integrated?
Like humans, an AI needs to learn to look at its watch if someone asks the time. Specific training data sets are required that tell the AI what tool to use, and when.
The AI engine has to recognize in the LLM output that a tool must be executed. There are a variety of ways to do this:
- Tool-specific tokens — i.e., the LLM can emit a "trigger" token to run a tool. Note that PEFT could be used here to fine-tune new tool capabilities, by only adding a few new tool-triggering tokens to the vocabulary.)
- Placeholder patterns — i.e., output something like an "--insert current time here--" special pattern is another way, and the engine then looks for these patterns, which avoids adding tool tokens to the vocabulary, but is inefficient in that there are multiple text tokens in the output).
- Code generation — there are various AI models that will generate code, such as in Python, that can be executed to generate the answer. This is a general solution, because Python can call various submodules and can thereby generate many tools.
- Multi-level planning — the AI first generates a plan of how to answer the query, including what tools to use, and then runs any tools, and then does another inference query to collate it into a final answer.
Research on AI Tool Integrations
Tool integration papers:
- Junzhi Chen, Juhao Liang, Benyou Wang, 9 May 2024, Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning, https://arxiv.org/abs/2405.05955
- Jonas Wallat, Adam Jatowt, Avishek Anand, March 2024, Temporal Blind Spots in Large Language Models, WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Pages 683–692, https://arxiv.org/abs/2401.12078, https://doi.org/10.1145/3616855.3635818, https://dl.acm.org/doi/abs/10.1145/3616855.3635818
- Nate Kushman, Yoav Artzi, Luke Zettlemoyer, Regina Barzilay, June 2014, Learning to Automatically Solve Algebra Word Problems, P14-1026 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), https://aclanthology.org/P14-1026/ PDF: https://aclanthology.org/P14-1026.pdf
- Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning, https://proceedings.neurips.cc/paper_files/paper/2023/file/4a47dd69242d5af908cdd5d51c971cbf-Paper-Datasets_and_Benchmarks.pdf
- Subhro Roy, Dan Roth, 20 Aug 2016 (v2), Solving General Arithmetic Word Problems, https://arxiv.org/abs/1608.01413
- Subhro Roy, Shyam Upadhyay, Dan Roth, 28 Sep 2016, Equation Parsing: Mapping Sentences to Grounded Equations, https://arxiv.org/abs/1609.08824
- Yan Wang, Xiaojiang Liu, Shuming Shi, September 2017, Deep Neural Solver for Math Word Problems D17-1088, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing Copenhagen, Denmark, https://aclanthology.org/D17-1088/ PDF: https://aclanthology.org/D17-1088.pdf
- reiinakano, November 12, 2019, Teaching a neural network to use a calculator, https://reiinakano.com/2019/11/12/solving-probability.html (Integrate SymPy calculator into the results of a neural network, by looking for the '=' sign.)
- Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan, 6 May 2024, AlphaMath Almost Zero: process Supervision without process, https://arxiv.org/abs/2405.03553 https://github.com/MARIO-Math-Reasoning/Super_MARIO
- Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu, 12 Mar 2024 (v3), Data Interpreter: An LLM Agent For Data Science, https://arxiv.org/abs/2402.18679 Code: https://github.com/geekan/MetaGPT
- Zelong Li, Wenyue Hua, Hao Wang, He Zhu, Yongfeng Zhang, 4 Feb 2024 (v2), Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents, https://arxiv.org/abs/2402.00798 Code: https://github.com/agiresearch/Formal-LLM
- Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang, 25 Mar 2024 (v2), InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents, https://arxiv.org/abs/2403.02691
- Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022. https://arxiv.org/abs/2211.12588 (Integrate a Python interpreter to execute the code generated by the LLM to answer the query.)
- Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023. https://arxiv.org/abs/2211.10435 Code: http://reasonwithpal.com/ (Python interpreter integrated as a tool for LLMs.)
- Intel, April 2024, Intel® Compiler First to Achieve SYCL* 2020 Conformance, https://www.intel.com/content/www/us/en/developer/articles/technical/compiler-first-full-sycl2020-conformance.html
- Long Hei Matthew Lam, Ehsan Shareghi, 1 Jun 2024, A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters, https://arxiv.org/abs/2406.00284 (Using symbolic solvers with LLMs.)
- M Keber, I Grubišic, A Barešic, A Jovic, 2024, A Review on Neuro-symbolic AI Improvements to Natural Language Processing, https://www.researchgate.net/profile/Alan-Jovic/publication/380911364_A_Review_on_Neuro-symbolic_AI_Improvements_to_Natural_Language_Processing/links/6655c0ec22a7f16b4f51fb2f/A-Review-on-Neuro-symbolic-AI-Improvements-to-Natural-Language-Processing.pdf
- Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
- Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu, 2023, ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings, Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track, https://proceedings.neurips.cc/paper_files/paper/2023/hash/8fd1a81c882cd45f64958da6284f4a3f-Abstract-Conference.html
- Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. 2023a. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582. https://arxiv.org/abs/2310.08582
- Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems. https://arxiv.org/abs/2303.11366
- Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023b. ToolLLM: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789. https://arxiv.org/abs/2307.16789
- Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, XuChen, Yankai Lin, et al. 2023c. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432. https://arxiv.org/abs/2308.11432
- Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
- Joy He-Yueya, Gabriel Poesia, Rose E. Wang, and Noah D. Goodman. Solving math word problems by combining language models with symbolic solvers. ArXiv, abs/2304.09102, 2023. https://arxiv.org/abs/2304.09102
- Shima Imani, Liang Du, and H. Shrivastava. Mathprompter: Mathematical reasoning using large language models. ArXiv, abs/2303.05398, 2023. https://arxiv.org/abs/2303.05398
- Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
- Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 7 May 2024, An LLM-Tool Compiler for Fused Parallel Function Calling, https://arxiv.org/abs/2405.17438
- Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
- Pan Lu, 2024, Advancing Mathematical Reasoning with Language Models: A Multimodal and Knowledge-Intensive Perspective, Ph.D. Thesis, Computer Science, University of California, Los Angeles, https://escholarship.org/content/qt678864d8/qt678864d8.pdf
- Julian Yip, Apr 2, 2024, Build Autonomous AI Agents with Function Calling: Transform your chatbot into an agent that can interact with external APIs, https://towardsdatascience.com/build-autonomous-ai-agents-with-function-calling-0bb483753975 (Implement agents via models that output a JSON object that describes the API to call and the parmaeters to send.)
- Adva Nakash Peleg, May 30, 2024, An LLM Journey: From POC to Production, https://medium.com/cyberark-engineering/an-llm-journey-from-poc-to-production-6c5ec6a172fb
- Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su, 22 Feb 2024, Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments, https://arxiv.org/abs/2402.14672
- Yaobo Liang, Chenfei Wu , Ting Song , Wenshan Wu , Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan, March 2023, TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, https://arxiv.org/pdf/2303.16434.pdf
- kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
- Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman, 1 Jun 2022 (v3), WebGPT: Browser-assisted question-answering with human feedback, https://arxiv.org/abs/2112.09332
- Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang, 2017, World of Bits: An Open-Domain Platform for Web-Based Agents, Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3135-3144, https://proceedings.mlr.press/v70/shi17a.html
- Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, Timothy Lillicrap, 11 Nov 2022 (v2), A data-driven approach for learning to control computers, https://arxiv.org/abs/2202.08137
- Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom, 9 Feb 2023, Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/abs/2302.04761
- OpenAI, 2024, Function calling, https://platform.openai.com/docs/guides/function-calling
- Cobus Greyling, June 16, 2023, Practical Examples of OpenAI Function Calling, https://cobusgreyling.medium.com/practical-examples-of-openai-function-calling-a6419dc38775
- University of California, Berkeley, 2024, Berkeley Function-Calling Leaderboard, https://gorilla.cs.berkeley.edu/leaderboard.html https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard
- Wes Brewer, Ana Gainaru, Frédéric Suter, Feiyi Wang, Murali Emani, Shantenu Jha, 20 Jun 2024, AI-coupled HPC Workflow Applications, Middleware and Performance, (Examines integrations of various workflows into LLMs.) https://arxiv.org/abs/2406.14315
- Aarushi Kansal, Chapter 3: Chains, Tools and Agents Building Generative AI-Powered Apps: A Hands-on Guide for Developers, Apress, https://www.amazon.com/Building-Generative-AI-Powered-Apps-Hands-ebook/dp/B0CTXXP1S4/
- Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
- Shishir Patil, May 10, 2024, Teaching Large Language Models to Use Tools at Scale, Ph.D. Thesis, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-85, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-85.pdf
- Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz, 31 Jul 2024, Adaptive Retrieval-Augmented Generation for Conversational Systems, https://arxiv.org/abs/2407.21712 (Deciding whether or not to include a RAG external data request in the inference of a chatbot in a multi-turn conversation.)
- Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
- Thomas Reid, Jul 31, 2024, Ollama’s Latest Update: Tool Use: Everything you need to know about function calling in Ollama https://ai.gopubby.com/ollamas-latest-update-tool-use-7b809e15be5c
- Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang, 8 Aug 2024, ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, https://arxiv.org/abs/2408.04682 Code: https://github.com/apple/ToolSandbox
- Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, July 2024, InferCept: Efficient Intercept Support for Augmented Large Language Model Inference, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:81-95, 2024, https://proceedings.mlr.press/v235/abhyankar24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
- Yu Du, Fangyun Wei, Hongyang Zhang, July 2024, AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11812-11829, 2024, https://proceedings.mlr.press/v235/du24h.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/du24h/du24h.pdf
- MemGPT, Aug 2024, Adding custom tools to MemGPT, https://memgpt.readme.io/docs/adding-custom-tools-to-memgpt
- Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
- Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov, 18 Feb 2024, Tool-Augmented LLMs as a Universal Interface for IDEs, https://arxiv.org/abs/2402.11635
- Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 1 Sep 2024, TinyAgent: Function Calling at the Edge, https://arxiv.org/abs/2409.00608 https://github.com/SqueezeAILab/TinyAgent
- Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, Amir Gholami, 2 Sep 2024, Efficient and Scalable Estimation of Tool Representations in Vector Space, https://arxiv.org/abs/2409.02141 https://github.com/SqueezeAILab/Tool2Vec (Using synthetic data to train tool usage decision models.)
- Xiaoxia Liu, Jingyi Wang, Jun Sun, Xiaohan Yuan, Guoliang Dong, Peng Di, Wenhai Wang, Dongxia Wang, 21 Nov 2023, Prompting Frameworks for Large Language Models: A Survey, https://arxiv.org/abs/2311.12785
- Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
- Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao, 23 Sep 2024 (v2), CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance, https://arxiv.org/abs/2409.13202
- Carl Franzen, September 27, Cohere updates APIs to make it easier for devs to switch from other models, https://venturebeat.com/ai/cohere-updates-apis-to-make-it-easier-for-devs-to-switch-from-other-models/
- Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li, 8 Oct 2024 (v2), ToolGen: Unified Tool Retrieval and Calling via Generation, https://arxiv.org/abs/2410.03439
- Ke Wang, Jiahui Zhu, Minjie Ren, Zeming Liu, Shiwei Li, Zongye Zhang, Chenkai Zhang, Xiaoyu Wu, Qiqi Zhan, Qingjie Liu, Yunhong Wang, 16 Oct 2024, A Survey on Data Synthesis and Augmentation for Large Language Models, https://arxiv.org/abs/2410.12896
- Yakun Zhu, Shaohang Wei, Xu Wang, Kui Xue, Xiaofan Zhang, Shaoting Zhang, 17 Oct 2024, MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling, https://arxiv.org/abs/2410.13610
- Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
- A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
- Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
- Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
- Michael Nuñez, November 4, 2024, UC San Diego, Tsinghua University researchers just made AI way better at knowing when to ask for help, https://venturebeat.com/ai/uc-san-diego-tsinghua-university-researchers-just-made-ai-way-better-at-knowing-when-to-ask-for-help/
- Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar, 14 Apr 2024, Towards Practical Tool Usage for Continually Learning LLMs, https://arxiv.org/abs/2404.09339
- Amy Marks, Jun 11, 2024, Clarifying Function Calling / Tool Use in LLMs, https://medium.com/@aevalone/clarifying-function-calling-tool-use-in-llms-6511af510f99
- Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu, 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412
- Anthropic, 26 Nov 2024, Introducing the Model Context Protocol, https://www.anthropic.com/news/model-context-protocol
- Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 23 Nov 2024, Less is More: Optimizing Function Calling for LLM Execution on Edge Devices, https://arxiv.org/abs/2411.15399
- Soh, J., Singh, P. (2024). Semantic Kernel, Plugins, and Function Calling. In: Data Science Solutions on Azure. Apress, Berkeley, CA. https://doi.org/10.1007/979-8-8688-0914-9_7 https://link.springer.com/chapter/10.1007/979-8-8688-0914-9_7
- Chris Sypherd, Vaishak Belle, 5 Dec 2024, Practical Considerations for Agentic LLM Systems, https://arxiv.org/abs/2412.04093
- Zhi-Yuan Chen, Shiqi Shen, Guangyao Shen, Gong Zhi, Xu Chen, and Yankai Lin. 2024. Towards Tool Use Alignment of Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1382–1400, Miami, Florida, USA. Association for Computational Linguistics. https://aclanthology.org/2024.emnlp-main.82/ https://aclanthology.org/2024.emnlp-main.82.pdf
- Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall, 8 Dec 2024, Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt, https://arxiv.org/abs/2412.05967
Tool-Augmented Language Models (TALM)
Reserch papers on TALM:
- Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
- Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
- Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
- Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis, 7 May 2024, An LLM-Tool Compiler for Fused Parallel Function Calling, https://arxiv.org/abs/2405.17438
- Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, 2024, INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference, https://openreview.net/pdf?id=wDDGQabYPQ
- Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
- Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, July 2024, InferCept: Efficient Intercept Support for Augmented Large Language Model Inference, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:81-95, 2024, https://proceedings.mlr.press/v235/abhyankar24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
- Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
- Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov, 18 Feb 2024, Tool-Augmented LLMs as a Universal Interface for IDEs, https://arxiv.org/abs/2402.11635
- Amy Marks, Jun 11, 2024, Clarifying Function Calling / Tool Use in LLMs, https://medium.com/@aevalone/clarifying-function-calling-tool-use-in-llms-6511af510f99
- Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu, 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412
- Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
- Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall, 8 Dec 2024, Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt, https://arxiv.org/abs/2412.05967
LLM Screen Access
- Yicheng Fu, Raviteja Anantha, Prabal Vashisht, Jianpeng Cheng, Etai Littwin, 6 Sep 2024, UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity, https://www.arxiv.org/abs/2409.04081
- V Adrakatti, 2024, Exploring screen summarization with large language and multimodal models, Masters Thesis, University of Illinois Urbana-Champaign, Urbana, Illinois, USA, https://www.ideals.illinois.edu/items/131510
- Anthropic, 23 Oct 2024, Developing a computer use model, https://www.anthropic.com/news/developing-computer-use
- Anthropic, 23 Oct 2024, Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku, https://www.anthropic.com/news/3-5-models-and-computer-use
- Anirban Ghoshal, 23 Oct 2024, How Anthropic’s new ‘computer use’ ability could further AI automation, https://www.cio.com/article/3583260/how-anthropics-new-computer-use-ability-could-further-ai-automation.html
- Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao, 30 Oct 2024, OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, https://arxiv.org/abs/2410.23218 https://github.com/OS-Copilot/OS-Atlas
- Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui, 13 Sep 2024 (v2), Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale, https://arxiv.org/abs/2409.08264
- Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890
LLM Computer Usage
- Anthropic, 23 Oct 2024, Developing a computer use model, https://www.anthropic.com/news/developing-computer-use
- Anthropic, 23 Oct 2024, Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku, https://www.anthropic.com/news/3-5-models-and-computer-use
- Anirban Ghoshal, 23 Oct 2024, How Anthropic’s new ‘computer use’ ability could further AI automation, https://www.cio.com/article/3583260/how-anthropics-new-computer-use-ability-could-further-ai-automation.html
- Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao, 30 Oct 2024, OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, https://arxiv.org/abs/2410.23218 https://github.com/OS-Copilot/OS-Atlas
- Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui, 13 Sep 2024 (v2), Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale, https://arxiv.org/abs/2409.08264
- Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang, Yang Yang, Yifan Xu, Yu Yang, Yujia Wang, Yulin Xu, Zehan Qi, Yuxiao Dong, Jie Tang, 28 Oct 2024, AutoGLM: Autonomous Foundation Agents for GUIs https://arxiv.org/abs/2411.00820
- Shuai Wang, Weiwen Liu, Jingxuan Chen, Weinan Gan, Xingshan Zeng, Shuai Yu, Xinlong Hao, Kun Shao, Yasheng Wang, Ruiming Tang, 7 Nov 2024, GUI Agents with Foundation Models: A Comprehensive Survey, https://arxiv.org/abs/2411.04890
- Shirin Ghaffary and Rachel Metz November 14, 2024, OpenAI Nears Launch of AI Agent Tool to Automate Tasks for Users. The new software, codenamed “Operator,” is set to be released in January. https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users
- Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou, 15 Nov 2024, The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, https://arxiv.org/abs/2411.10323 https://github.com/showlab/computer_use_ootb
- Mike Elgan, 22 Nov 2024, AI agents are unlike any technology ever, https://www.computerworld.com/article/3608973/ai-agents-are-unlike-any-technology-ever.html
- Show Lab, Nov 2024, ShowUI: ShowUI is a lightweight (2B) vision-language-action model designed for GUI agents. https://huggingface.co/showlab/ShowUI-2B
- Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang, 27 Nov 2024, Large Language Model-Brained GUI Agents: A Survey, https://arxiv.org/abs/2411.18279
- Zhuosheng Zhang, Aston Zhang, 7 Jun 2024 (v4), You Only Look at Screens: Multimodal Chain-of-Action Agents, https://arxiv.org/abs/2309.11436
- Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su, 7 Oct 2024, Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents, https://arxiv.org/abs/2410.05243
- Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu, 23 Feb 2024 (v2), SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents, https://arxiv.org/abs/2401.10935
- Michael Nuñez, November 29, 2024, AI that clicks for you: Microsoft’s research points to the future of GUI automation, https://venturebeat.com/ai/ai-that-clicks-for-you-microsoft-research-points-to-the-future-of-gui-automation/
- Yiqin Wang, Haoji Zhang, Jingqi Tian, Yansong Tang, 2 Dec 2024, Ponder & Press: Advancing Visual GUI Agent towards General Computer Control, https://arxiv.org/abs/2412.01268 https://invinciblewyq.github.io/ponder-press-page/
- Kyle Wiggers, December 5, 2024, Copilot Vision, Microsoft’s AI tool that can read your screen, launches in preview, https://techcrunch.com/2024/12/05/copilot-vision-microsofts-ai-tool-that-can-read-your-screen-launches-in-preview/
- Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang; 2024, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 14281-14290, CogAgent: A Visual Language Model for GUI Agents https://openaccess.thecvf.com/content/CVPR2024/html/Hong_CogAgent_A_Visual_Language_Model_for_GUI_Agents_CVPR_2024_paper.html
- Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan, 8 Apr 2024, Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs, https://arxiv.org/abs/2404.05719
- Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun, 16 Jun 2024, GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents, https://arxiv.org/abs/2406.10819
- Wentong Chen, Junbo Cui, Jinyi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun, 17 Jun 2024, GUICourse: From General Vision Language Models to Versatile GUI Agents, https://arxiv.org/abs/2406.11317 https://github.com/yiye3/GUICourse
- Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong, 5 Dec 2024, Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction, https://arxiv.org/abs/2412.04454 https://aguvis-project.github.io/
- AskUI, Dec 2024, AskUI Vision Agent: Automate computer tasks in Python, https://github.com/askui/vision-agent
More AI Research
Read more about:
- Advanced AI Mathematics
- Zero-Multiplication Models
- Matrix Algebra
- Logarithmic Models
- Inference Optimizations
- « Research Home