Aussie AI
Retrieval Augmented Language Models (RALM)
-
Last Updated 2 March, 2025
-
by David Spuler, Ph.D.
Retrieval Augmented Language Models (RALM) is the general method of using external data sources to make LLMs more powerful. It improves the "smartness" of the LLM, rather than being a speed optimization. In fact, it is often slower, because accessing a secondary data source requires an extra step.
Types of RALM include:
Also related in the sense of providing extra context to the LLM, but not technically part of RALM are techniques including:
RALM vs RAG
RALM and RAG are almost the same thing, but RALM is a little more general in the types of extra data used as context. RAG is a very specific type of RALM, which a particular architecture, whereas RALM is the general idea.
RALM may also includes capabilities such as:
- Data source integrations ("plug-ins")
- Tool Augmented Language Models (TALM); see also tool usage by LLMs.
RALM generally refers to a read-only type architecture that simply returns information for the LLM to use as context, whereas more powerful two-way integrations with tools that "do" something are called "agent architectures."
Research Papers on RALM
Papers on the use of RALM techniques in LLMs and Transformer architectures:
- Dakhel, A.M., Nikanjam, A., Khomh, F., Desmarais, M.C., Washizaki, H. (2024). An Overview on Large Language Models. In: Nguyen-Duc, A., Abrahamsson, P., Khomh, F. (eds) Generative AI for Effective Software Development. Springer, Cham. https://doi.org/10.1007/978-3-031-55642-5_1 https://link.springer.com/chapter/10.1007/978-3-031-55642-5_1
- Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin, 29 May 2024, Nearest Neighbor Speculative Decoding for LLM Generation and Attribution, https://arxiv.org/abs/2405.19325 (Merging of RALM and speculative decoding.)
- Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 24 May 2024, RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference, https://arxiv.org/abs/2405.15198 (Early exit classifiers built with pre-computation using a retrieval database.)
- Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
- Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
- Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
- Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
- Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024, https://openreview.net/pdf?id=CDnv4vg02f
- Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, 2024, INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference, https://openreview.net/pdf?id=wDDGQabYPQ
- Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang, 12 Jun 2024, Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling, https://arxiv.org/abs/2406.08116
- Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
- Vishal Rajput, Apr 16, 2024, RAG 2.0: Retrieval Augmented Language Models, https://medium.com/aiguys/rag-2-0-retrieval-augmented-language-models-3762f3047256
- Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, July 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60626-60643, 2024, https://proceedings.mlr.press/v235/zhang24cq.html
- Gauthier Guinet, Behrooz Omidvar-Tehrani, Anoop Deoras, Laurent Callot, July 2024, Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16773-16801, 2024, https://proceedings.mlr.press/v235/guinet24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/guinet24a/guinet24a.pdf
- Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li, July 2024, C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22963-23000, 2024, https://proceedings.mlr.press/v235/kang24a.html
- Seong-Il Park, Jay-Yoon Lee, 19 Oct 2024, Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models, https://arxiv.org/abs/2410.15107
- Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher, 23 Oct 2024, Efficient Inference for Augmented Large Language Models, https://arxiv.org/abs/2410.18248
- Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang, 23 Oct 2024, LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering, https://arxiv.org/abs/2410.18050 https://github.com/QingFei1/LongRAG
- Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
- Vincent-Pierre Berges, Barlas Oguz, December 12, 2024, Memory Layers at Scale, Meta, https://ai.meta.com/research/publications/memory-layers-at-scale/ https://github.com/facebookresearch/memory (Augmention of an LLM with an additional key-value associative memory, by replacing some FFNs with a "memory layer".)
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Xinyu Pang, Ruixin Hong, Zhanke Zhou, Fangrui Lv, Xinwei Yang, Zhilong Liang, Bo Han, Changshui Zhang, 18 Dec 2024, Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models, https://arxiv.org/abs/2412.13791 (Augmented reasoning by retrieving physics formulas, checklists, and other relevant information.)
- Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context.)
- Sreedevi Gogusetty, Dec 6, 2024, From RAG to TAG: Leveraging the Power of Table-Augmented Generation (TAG): A Leap Beyond Retrieval-Augmented Generation (RAG), https://ai.plainenglish.io/from-rag-to-tag-leveraging-the-power-of-table-augmented-generation-tag-a-leap-beyond-54d1cfadb994 (TAG for augmenting LLMs with queries from database tables, similar to data source plugins.)
- Shubham Sharma, November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
- Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Lakmal Seneviratne, Irfan Hussain, 10 Dec 2024, Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning, https://arxiv.org/abs/2412.07493 https://muhayyuddin.github.io/llm-tamp/ (Detecting objects in the prompt text and then using a RALM algorithm to query an ontology database.)
- L Huang, S Wu, Y Cui, Y Xiong, X Liu, TW Kuo, N Guan, Dec 2024, RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference, 4th NeurIPS Efficient Natural Language and Speech Processing Workshop (ENLSP-IV 2024), https://neurips2024-enlsp.github.io/papers/paper_66.pdf (Early exit combined with RALM by using retrieval to help the classifier decide to exit at layers.)
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 26 Sep 2024 (v3), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
- Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang, 31 Dec 2024, RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions, https://arxiv.org/abs/2501.00353 https://github.com/FreedomIntelligence/RAG-Instruct
- Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- Omar Santos, Jun 15, 2024, Comparing RAG, RAG Fusion, with RAPTOR: Different AI Retrieval-Augmented Implementations, https://becomingahacker.org/comparing-rag-rag-fusion-with-raptor-different-ai-retrieval-augmented-implementations-1aa76fce6a5c
- Julian Perry, Surasakdi Siripong, Thanakorn Phonchai, 15 Jan 2025, Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning, https://arxiv.org/abs/2501.08597 (Augment training data dynamically by retrieving extra information.)
- H Liao, S He, Y Xu, Y Zhang, S Liu, K Liu, J Zhao, Jan 2025, Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering, Proceedings of the 31st International Conference on Computational Linguistics, pages 1333–1352, January 19–24, 2025, https://aclanthology.org/2025.coling-main.89.pdf https://github.com/Xnhyacinth/IAG (Attempts to perform RALM based only on parametric knowledge, without any external sources, thereby optimizing away RAG steps.)
- Chang Zong, Jian Wan, Lei Zhang, 22 Jan 2025, EvidenceMap: Unleashing the Power of Small Language Models with Evidence Analysis for Biomedical Question Answering, https://arxiv.org/abs/2501.12746 (Using small models to summarize medical evidence, to improve final results from a larger LLM.)
- Jeonghun Cho, Gary Geunbae Lee, 23 Jan 2025, K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor, https://arxiv.org/abs/2501.13567
- Peter Baile Chen, Yi Zhang, Michael Cafarella, Dan Roth, 30 Jan 2025, Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method, https://arxiv.org/abs/2501.18539
- Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
- Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang, 10 Feb 2025, ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, https://arxiv.org/abs/2502.06772 https://github.com/Gen-Verse/ReasonFlux (RALM-like retrieval of reasoning prompt templates at inference time.)
- Xueguang Ma, Xi Victoria Lin, Barlas Oguz, Jimmy Lin, Wen-tau Yih, Xilun Chen, 25 Feb 2025, DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers, https://arxiv.org/abs/2502.18460
- Minhua Lin, Hui Liu, Xianfeng Tang, Jingying Zeng, Zhenwei Dai, Chen Luo, Zheng Li, Xiang Zhang, Qi He, Suhang Wang, 26 Feb 2025 (v2), How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities, https://arxiv.org/abs/2502.18387
- Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, 26 Feb 2025, Automatic Prompt Optimization via Heuristic Search: A Survey, https://arxiv.org/abs/2502.18746 (Survey of auto prompting, from basic LLM enhancements to some methods quite similar to RALM and TALM.)
More AI Research
Read more about: