Aussie AI

Retrieval Augmented Language Models (RALM)

  • Last Updated 8 December, 2024
  • by David Spuler, Ph.D.

Retrieval Augmented Language Models (RALM) is the general method of using external data sources to make LLMs more powerful. It improves the "smartness" of the LLM, rather than being a speed optimization. In fact, it is often slower, because accessing a secondary data source requires an extra step.

RALM vs RAG

RALM and RAG are almost the same thing, but RALM is a little more general. RAG is a very specific type of RALM, which a particular architecture, whereas RALM is the general idea.

RALM may also includes capabilities such as:

  • Data source integrations ("plug-ins")
  • Tool Augmented Language Models (TALM); see also tool usage by LLMs.

RALM generally refers to a read-only type architecture that simply returns information, whereas more powerful two-way integrations with tools that "do" something are called "agents."

Research Papers on RALM

Papers on the use of RALM techniques in LLMs and Transformer architectures:

  • Dakhel, A.M., Nikanjam, A., Khomh, F., Desmarais, M.C., Washizaki, H. (2024). An Overview on Large Language Models. In: Nguyen-Duc, A., Abrahamsson, P., Khomh, F. (eds) Generative AI for Effective Software Development. Springer, Cham. https://doi.org/10.1007/978-3-031-55642-5_1 https://link.springer.com/chapter/10.1007/978-3-031-55642-5_1
  • Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin, 29 May 2024, Nearest Neighbor Speculative Decoding for LLM Generation and Attribution, https://arxiv.org/abs/2405.19325 (Merging of RALM and speculative decoding.)
  • Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 24 May 2024, RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference, https://arxiv.org/abs/2405.15198 (Early exit classifiers built with pre-computation using a retrieval database.)
  • Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
  • Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
  • Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen, 21 Feb 2024 (v2), SciAgent: Tool-augmented Language Models for Scientific Reasoning, https://arxiv.org/abs/2402.11451
  • Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022. https://arxiv.org/abs/2205.12255
  • Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024, https://openreview.net/pdf?id=CDnv4vg02f
  • Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang, 2024, INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference, https://openreview.net/pdf?id=wDDGQabYPQ
  • Zile Qiao, Wei Ye, Yong Jiang, Tong Mo, Pengjun Xie, Weiping Li, Fei Huang, Shikun Zhang, 12 Jun 2024, Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling, https://arxiv.org/abs/2406.08116
  • Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
  • Vishal Rajput, Apr 16, 2024, RAG 2.0: Retrieval Augmented Language Models, https://medium.com/aiguys/rag-2-0-retrieval-augmented-language-models-3762f3047256
  • Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, July 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60626-60643, 2024, https://proceedings.mlr.press/v235/zhang24cq.html
  • Gauthier Guinet, Behrooz Omidvar-Tehrani, Anoop Deoras, Laurent Callot, July 2024, Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16773-16801, 2024, https://proceedings.mlr.press/v235/guinet24a.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/guinet24a/guinet24a.pdf
  • Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li, July 2024, C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22963-23000, 2024, https://proceedings.mlr.press/v235/kang24a.html
  • Seong-Il Park, Jay-Yoon Lee, 19 Oct 2024, Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models, https://arxiv.org/abs/2410.15107
  • Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher, 23 Oct 2024, Efficient Inference for Augmented Large Language Models, https://arxiv.org/abs/2410.18248
  • Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang, 23 Oct 2024, LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering, https://arxiv.org/abs/2410.18050 https://github.com/QingFei1/LongRAG
  • Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830

More AI Research

Read more about: