Aussie AI

Retrieval Augmented Generation (RAG) Architectures

Last Updated 22 October, 2025

by David Spuler, Ph.D.

What is RAG?

RAG is a fundamental technique in generative AI that extends the knowledge of an LLM without fine-tuning. Rather than train new knowledge in the LLM's parameters, we instead look up the extra information by searching a database. The LLM receives the user's prompt and the extra information found by the RAG lookup (called the "retriever" component). The LLM then uses its summarization and natural language capabilities to answer the user's question, based on the extra RAG text as input context.

RAG is commonly used as the go-to architecture for fine-tuning an LLM on a business's specialist data. For example, to create a chatbot that knows about your products, you could use fine-tuning to create a custom LLM that knows about your products. The more efficient way is to leave your LLM unchanged, but put your special documents into a RAG database (e.g. your entire website), and then have the LLM search these documents using a RAG architecture.

The current capabilities of Google and Bing with AI assistants are a RAG-like architecture, but more like a mega-RAG architecture, using a rather large database of documents. The way it works is that Google or Bing first search the entire internet (however they do this), and then the LLM summarizes the handful of internet documents into the final AI answer.

Beyond RAG

There's a lot of different variations on the RAG architecture. Also, RAG architectures can be extended in various ways. Some of the similar capabilities with "augmentation" of the LLM's input prompt with extra data include:

Retrieval Augmented Language Models (RALM) — the most general category including augmentation by basically anything; see more about RALM.
Tool-Augmented Language Models (TALM) — use dynamic tool execution to compute extra input data. See more about tool integrations.
Data source integrations ("plugins") — extended ways to search big databases, such as real estate listing or the entire internet, using a RAG-like approach.

Finally, note that RAG is an inherently "read-only" approach that only generates answers. It doesn't change anything for the user, and the generalization of that idea is "agents" that can do real-world actions (i.e., they're "read-write" and can do "actions"). For example, RAG could maybe tell you what your symptoms might be caused by, but an LLM agent can also book your doctor's appointment for you.

RAG Optimizations

RAG optimizations are LLM efficiency improvements applied to a RAG architecture. First point: RAG architectures are inherently an optimization, themselves. RAG was created because fine-tuning was too expensive and has various other limitations (e.g., attribution, explainability), although Parameter-Efficient Fine-Tuning (PEFT) techniques have also attacked the inefficiences in fine-tuning, so maybe it's a tie between RAG and FT/PEFT.

But you can also optimize your RAG architecture. The first point is that many of the major LLM optimizations also work on the RAG LLM, so there's many ways to do this (e.g., quantization, pruning, inference optimizations, etc.)

However, there are a few techniques that are specifically applicable to RAG architectures because they optimize either (a) non-LLM RAG components, or (b) the RAG prompt structure.

Some examples of RAG non-LLM optimizations include:

RAG database speedups (e.g., indexing, all the usual database stuff)
Keyword versus vector lookups in the retriever (e.g., hybrid keyword-vector search, metadata search, etc.)
Caching — multiple types (e.g. caching in the retriever versus the LLM parts)

Secondly, there are some RAG-specific techniques on the "length" dimension (i.e., input tokens), that are applicable to an input prompt that is extended with extra prepended "context" tokens. Some examples include:

Chunk compression (e.g., chunk pre-summarization)
Prompt compression
Context compression
Prompt lookup decoding (an extension of speculative decoding)
Prefix global KV cache
Precomputed KV cache (for each RAG chunk)

RAG is not the only architecture to use prepended context. For example, chatbots prepend the conversation history, so many of these approaches apply there too.

RAG Survey Papers

Survey papers on RAG architectures:

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu, 13 Feb 2022 (v2), A Survey on Retrieval-Augmented Text Generation, https://arxiv.org/abs/2202.01110
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui, 21 Jun 2024 (v6), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473
Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu, 3 Jul 2024 (v2), Evaluation of Retrieval-Augmented Generation: A Survey, https://arxiv.org/abs/2405.07437
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang, 27 Mar 2024 (v5), Retrieval-Augmented Generation for Large Language Models: A Survey, https://arxiv.org/abs/2312.10997
Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401

RAG Tool Usage

Research on RAG architectures using dynamic tools:

Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Mengsong Wu, Tong Zhu, Han Han, Xiang Zhang, Wenbiao Shao, Wenliang Chen, 21 Mar 2025, Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models, https://arxiv.org/abs/2503.16779 https://github.com/fairyshine/Chain-of-Tools
Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401

RAG Reasoning

Research papers on reasoning models and RAG include:

B Zhan, A Li, X Yang, D He, Y Duan, S Yan, 2024, RARoK: Retrieval-Augmented Reasoning on Knowledge for Medical Question Answering, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2837-2843, DOI: 10.1109/BIBM62325.2024.10822341, https://www.computer.org/csdl/proceedings-article/bibm/2024/10822341/23onp6dXOSI (RAG combined with Chain-of-Thought for medical reasoning.)
Xinyan Guan, Jiali Zeng, Fandong Meng, Chunlei Xin, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Jie Zhou. 3 Feb 2025, DeepRAG: Thinking to Retrieval Step by Step for Large Language Models, https://arxiv.org/abs/2502.01142
P Verma, SP Midigeshi, G Sinha, A Solin, N Natarajan, Mar 2025, Plan *RAG: Efficient Test-Time Planning for Retrieval Augmented Generation, ICLR 2025 review, https://openreview.net/pdf?id=gi9aqlYdBk (Improve RAG reasoning efficiency via planning for parallel reasoning.)
Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Yu Wang, Shiwan Zhao, Zhihu Wang, Ming Fan, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu, 4 Jul 2025 (v3), RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning, https://arxiv.org/abs/2506.11555
Yunfan Gao, Yun Xiong, Yijie Zhong, Yuxi Bi, Ming Xue, Haofen Wang, 24 Apr 2025 (v2), Synergizing RAG and Reasoning: A Systematic Review, https://arxiv.org/abs/2504.15909
Weitao Li, Boran Xiang, Xiaolong Wang, Zhinan Gou, Weizhi Ma, Yang Liu, 8 Aug 2025, UR: Unify RAG and Reasoning through Reinforcement Learning, https://arxiv.org/abs/2508.06165 https://github.com/Tsinghua-dhy/UR2

RAG Best Practices

RAG best practices are practical guidelines on getting the most out of your RAG architecture. This can include accuracy improvements and efficiency optimizations. Research papers that examine the general state of RAG architectures in terms of their best practices include:

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
Harvey Bower, 2024, Debugging RAG Pipelines: Best Practices for High-Performance LLMs, https://www.amazon.com/dp/B0DNWN5RB1
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296

Chunking

Chunking is the splitting of documents into sections called "chunks" that are used as extra context for the LLM. Retrieving relevant chunks is very important for accurate RAG results, and the speed of a RAG system is also affected by the size of each chunk, as measured in tokens. Chunking is a complex issue that needs to decide where to split a document, such as at paragraph or section separators.

Research papers on chunking:

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Thuwarakesh Murallie, Aug 2024, How to Achieve Near Human-Level Performance in Chunking for RAGs: The costly yet powerful splitting technique for superior RAG retrieval, https://towardsdatascience.com/agentic-chunking-for-rags-091beccd94b1
Florian June, Sep 2024, Kotaemon Unveiled: Innovations in RAG Framework for Document QA: PDF Parsing, GraphRAG, Agent-Based Reasoning, and Insights, https://ai.gopubby.com/kotaemon-unveiled-innovations-in-rag-framework-for-document-qa-0b6d67e4b9b7
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
Brandon Smith, Anton Troynikov, July 03, 2024, Evaluating Chunking Strategies for Retrieval, Chroma Technical Report, https://research.trychroma.com/evaluating-chunking https://github.com/brandonstarxel/chunking_evaluation
Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
Sergey Filimonov, Jan 15, 2025, Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything, https://www.sergey.fyi/articles/gemini-flash-2
Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan, 16 Feb 2025, QuOTE: Question-Oriented Text Embeddings, https://arxiv.org/abs/2502.10976 (Augmenting RAG chunks with additional information, such as questions the chunk might answer.)
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
Robin D. Pesl, Jerin G. Mathew, Massimo Mecella, Marco Aiello, 28 Jul 2025, Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.19804
Takumi Kobayashi, Masato Kobayashi, Thanpimon Buamanee, Yuki Uranishi, 28 Jul 2025, Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers, https://arxiv.org/abs/2504.01301
Mehrdad Zakershahrak, Samira Ghodratnama, 7 Aug 2025, H-Net++: Hierarchical Dynamic Chunking for Tokenizer-Free Language Modelling in Morphologically-Rich Languages, https://arxiv.org/abs/2508.05628
Alejandro Posadas-Nava, Andrea Scorsoglio, Luca Ghilardi, Roberto Furfaro, Richard Linares, 4 Sep 2025, Action Chunking with Transformers for Image-Based Spacecraft Guidance and Control, https://arxiv.org/abs/2509.04628
Shuchen Wu, Stephan Alaniz, Shyamgopal Karthik, Peter Dayan, Eric Schulz, Zeynep Akata, 25 Aug 2025, Concept-Guided Interpretability via Neural Chunking, https://arxiv.org/abs/2505.11576
Wensheng Lu, Keyu Chen, Ruizhi Qiao, Xing Sun, 16 Sep 2025, HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking, https://arxiv.org/abs/2509.11552

Multimodal RAG

Multimodal RAG is the use of images in the datastore for chunk retrieval, and is also sometimes called "visual RAG." A common example of multimodal RAG is ingesting PDF documents in their native format, using image-based analysis, rather than converting them to text. The retriever in multimodal RAG may return images and/or text to be passed to the Multimodal LLM (MLLM) for inference. The final output from the visual RAG system may be text or images or both, as with any other use of a multimodal LLM.

Multimodal RAG is one of the newest areas of AI research, combining the recent advances in multimodal LLMs with the older RAG architectural styles. Research papers on multimodal RAG (visual RAG):

Vectorize, October 29, 2024, Multimodal RAG Patterns Every AI Developer Should Know, https://vectorize.io/multimodal-rag-patterns/
Emilia David, November 8, 2024, Multimodal RAG is growing, here’s the best way to get started, https://venturebeat.com/ai/multimodal-rag-is-growing-heres-the-best-way-to-get-started/
C. Su et al., "Hybrid RAG-Empowered Multi-Modal LLM for Secure Data Management in Internet of Medical Things: A Diffusion-Based Contract Approach," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3521425. https://ieeexplore.ieee.org/abstract/document/10812735
Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo, 7 Oct 2024 (v3), ColPali: Efficient Document Retrieval with Vision Language Models, https://arxiv.org/abs/2407.01449
Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun, 14 Oct 2024, VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents, https://arxiv.org/abs/2410.10594 https://github.com/openbmb/visrag
Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He, Wentao Zhang, 3 Dec 2024, OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation, https://arxiv.org/abs/2412.02592 https://github.com/opendatalab/OHR-Bench
Junjie Zhou, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian, Yongping Xiong, 19 Dec 2024, MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval, https://arxiv.org/abs/2412.14475
Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, Mohit Bansal, 7 Nov 2024, M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding, https://arxiv.org/abs/2411.04952 https://m3docrag.github.io/
Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha, 14 Dec 2024, VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation, https://arxiv.org/abs/2412.10704
David Lau, Dr. Ganthan Narayana Samy, Dr. Fiza Abdul Rahim, Dr. Nurazean Maarop, Dr. Mahiswaran Selvananthan, Dr. Mazlan Ali, Dr. Sundresan Perumal, Dec 2024, Vol. 12 No. 2 (2024): Open International Journal of Informatics (OIJI), DOI: https://doi.org/10.11113/oiji2024.12n2.309, https://oiji.utm.my/index.php/oiji/article/view/309 https://oiji.utm.my/index.php/oiji/issue/view/29
P. Joshi, A. Gupta, P. Kumar and M. Sisodia, "Robust Multi Model RAG Pipeline For Documents Containing Text, Table & Images," 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 2024, pp. 993-999, doi: 10.1109/ICAAIC60222.2024.10574972. https://ieeexplore.ieee.org/document/10574972
Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus, 7 Jan 2025, RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance, https://arxiv.org/abs/2501.03995
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Soyeong Jeong, Kangsan Kim, Jinheon Baek, Sung Ju Hwang, 10 Jan 2025, VideoRAG: Retrieval-Augmented Generation over Video Corpus, https://arxiv.org/abs/2501.05874
Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia Lin, Jinfa Huang, Jiayi Ji, Fei Chao, Jiebo Luo, Rongrong Ji, 20 Dec 2024 (v3), Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension, https://arxiv.org/abs/2411.13093
Kuicai Dong, Yujing Chang, Xin Deik Goh, Dexun Li, Ruiming Tang, Yong Liu, 15 Jan 2025, MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents, https://arxiv.org/abs/2501.08828
M. Barochiya, P. Makhijani, H. N. Patel, P. Goel and B. Patel, "Evaluating RAG Pipeline in Multimodal LLM-based Question Answering Systems," 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 2024, pp. 69-75, doi: 10.1109/ICACRS62842.2024.10841620. https://ieeexplore.ieee.org/abstract/document/10841620
Jeff Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le, 28 Feb 2025, SuperRAG: Beyond RAG with Layout-Aware Graph Modeling, https://arxiv.org/abs/2503.04790
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Morphik, July 21, 2025, Don't bother parsing: Just use images for RAG: If search is the game, looks matter, https://www.morphik.ai/blog/stop-parsing-docs
Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, Vahab Mirrokni, 29 May 2024, MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings, https://arxiv.org/abs/2405.19504 (Multi-vector search optimization.)
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401

RAG Fusion

RAG fusion is a RAG extension that incorporates analyzing multiple versions of the query to return the best context chunks. The model generates multiple "reformulated" versions of the original text query, each of which is sent to the retriever, and a final use of "Reciprocal Rank Fusion" combines all of the returned chunks into a single ranking, like a "reranker" component, but using multiple similar rankings. The main advantage is finding more accurate context for the LLM, and the downside is the many additional calls to the retriever database with slightly modified queries.

Research on RAG fusion algorithms:

Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Surya Maddula, Sep 2024, Not RAG, but RAG Fusion? Understanding Next-Gen Info Retrieval. https://pub.towardsai.net/not-rag-but-rag-fusion-understanding-next-gen-info-retrieval-477788da02e2
Adrian H. Raudaschl, Oct 6, 2023, Forget RAG, the Future is RAG-Fusion: The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion and Generated Queries, https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Deval Shah, Jul 4, 2024, Reciprocal Rank Fusion (RRF) explained in 4 mins — How to score results form multiple retrieval methods in RAG: Unlock the power of Reciprocal Rank Fusion in Retrieval-Augmented Generation. https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Sanjay Kumar, Apr 2, 2024, RAG Fusion: A New Frontier in Search and Generative AI, https://medium.com/@Sanjaynk7907/rag-fusion-a-new-frontier-in-search-and-generative-ai-ebb24e7e905e
Omar Santos, Jun 15, 2024, Comparing RAG, RAG Fusion, with RAPTOR: Different AI Retrieval-Augmented Implementations, https://becomingahacker.org/comparing-rag-rag-fusion-with-raptor-different-ai-retrieval-augmented-implementations-1aa76fce6a5c
Payel Santra, Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, and Sudip Kumar Naskar, 2 Sep 2025, HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers, https://arxiv.org/abs/2509.02837

Super RAG

Super RAG is a generalization of retrieval to accept more general information than naive RAG systems. Hence, a "super RAG" system is an embodiment of a more general type of RALM. Research papers on "super RAG" include:

Ayush Thakur, Raghav Gupta, 13 Apr 2024, Introducing Super RAGs in Mistral 8x7B-v1, https://arxiv.org/abs/2404.08940
SuperAgent, 2024, Super-Rag with SAML, https://docs.superagent.sh/overview/rag-retrieval/super-rag-with-saml
Andrew Ditmer, May 13 2024, SuperRAG – How to achieve higher accuracy with Retrieval Augmented Generation, https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/superrag-how-to-achieve-higher-accuracy-with-retrieval-augmented/ba-p/4139004

Agentic RAG

Agentic RAG is the combination of agent and RAG technologies. Traditional RAG is a read-only use of extra context, but adding agent capabilities to the system allows a RAG-based application to perform tasks or actions.

Papers on agentic RAG include:

Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
Shubham Sharma. November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana, 18 Aug 2024, Agentic Retrieval-Augmented Generation for Time Series Analysis, https://arxiv.org/abs/2408.14484
Jisoo Jang and Wen-Syan Li. 2024. AU-RAG: Agent-based Universal Retrieval Augmented Generation. In Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP 2024). Association for Computing Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3673791.3698416 https://dl.acm.org/doi/abs/10.1145/3673791.3698416
Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic, Sep 2024, Agents, Google Whitepaper, https://www.kaggle.com/whitepaper-agents
Hui Wu, Xiaoyang Wang, Zhong Fan, 14 Jan 2025, Addressing the sustainable AI trilemma: a case study on LLM agents and RAG, https://arxiv.org/abs/2501.08262
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Peter Baile Chen, Yi Zhang, Michael Cafarella, Dan Roth, 30 Jan 2025, Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method, https://arxiv.org/abs/2501.18539
Zitao Li, Fei Wei, Yuexiang Xie, Dawei Gao, Weirui Kuang, Zhijian Ma, Bingchen Qian, Yaliang Li, Bolin Ding, 13 Feb 2025, KIMAs: A Configurable Knowledge Integrated Multi-Agent System, https://arxiv.org/abs/2502.09596
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee, 20 May 2025 (v3), AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges, https://arxiv.org/abs/2505.10468
Antonio Martinez, Apr 3, 2025, Building an Agentic LLM with RAG Using OpenVINO™, https://medium.com/openvino-toolkit/building-an-agentic-llm-with-rag-using-openvino-4d98bef28205
Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang and Weidi Xie, 21 Aug 2025, End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning, https://arxiv.org/abs/2508.15746
Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, 18 Jul 2025, Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning, https://arxiv.org/abs/2507.10571
Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, and Rishikesan Kamaleswaran, 30 Jul 2025, Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review, https://arxiv.org/abs/2508.05660
Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo, 10 Sep 2025, CyberRAG: An Agentic RAG cyber attack classification and reporting tool, https://arxiv.org/abs/2507.02424

Reranker Component in RAG

The reranker is a RAG component that aims to calibrate the best chunk for the LLM to use. The input is a set of chunks or documents from the retriever in a preliminary ordering, which are then "re-ranked" into a better order. The basic idea is:

Retriever returns several chunks
Reranker orders them in priority of relevance
Packer merges the chunks with the user's query and other global instructions
One final LLM request answers the user's question

Here are some research papers specific to the reranker component:

Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Benjamin Clavié, 30 Aug 2024, rerankers: A Lightweight Python Library to Unify Ranking Methods, https://arxiv.org/abs/2408.17344 https://arxiv.org/pdf/2408.17344
Vivedha Elango, Sep 2024, Search in the age of AI- Retrieval methods for Beginners, https://ai.gopubby.com/search-in-the-age-of-ai-retrieval-methods-for-beginners-557621e12ded
Zhangchi Feng, Dongdong Kuang, Zhongyuan Wang, Zhijie Nie, Yaowei Zheng, Richong Zhang, 15 Oct 2024 (v2), EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations, https://arxiv.org/abs/2410.10315 https://github.com/BUAADreamer/EasyRAG
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Y Huang, T Gao, J Zhang, X Liu, G Wang, 2024, Adapting Large Language Models for Biomedicine though Retrieval-Augmented Generation with Documents Scoring, 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024, pages 5770-5775, DOI: 10.1109/BIBM62325.2024.10822725, https://www.computer.org/csdl/proceedings-article/bibm/2024/10822725/23oodpoidfq (Using an LLM-based reranker for medical research documents.)
MS Tamber, R Pradeep, J Lin, Jan 2025, LiT and Lean: Distilling Listwise Rerankers into Encoder-Decoder Models, https://cs.uwaterloo.ca/~jimmylin/publications/Tamber_Lin_ECIR2025.pdf
Bharani Subramaniam, 13 February 2025, Emerging Patterns in Building GenAI Products, https://martinfowler.com/articles/gen-ai-patterns/
Tanay Varshney, Annie Surla, Nave Algarici, Isabel Hulseman and Cherie Wang, Mar 06, 2025, How Using a Reranking Microservice Can Improve Accuracy and Costs of Information Retrieval, https://developer.nvidia.com/blog/how-using-a-reranking-microservice-can-improve-accuracy-and-costs-of-information-retrieval/
Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Flora D. Salim, Imran Razzak, 19 Mar 2025, Long Context Modeling with Ranked Memory-Augmented Retrieval, https://arxiv.org/abs/2503.14800
Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han, 16 May 2025 (v2), DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation https://arxiv.org/abs/2505.07233 https://github.com/GasolSun36/DynamicRAG
Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
Latent Space, Aug 20, 2025, "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma: What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows, https://www.latent.space/p/chroma
Zekun Xu, Yudi Zhang, 22 Jul 2025, LLM-Enhanced Reranking for Complementary Product Recommendation, https://arxiv.org/abs/2507.16237
Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Atharva Nijasure, Tanya Chowdhury, James Allan, 10 Aug 2025, How Relevance Emerges: Interpreting LoRA Fine-Tuning in Reranking LLMs, https://arxiv.org/abs/2504.08780
Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme, 8 Aug 2025, Rank1: Test-Time Compute for Reranking in Information Retrieval, https://arxiv.org/abs/2502.18418
Bongsu Kim, 7 Aug 2025, RRRA: Resampling and Reranking through a Retriever Adapter, https://arxiv.org/abs/2508.11670
Haotian Chen, Qingqing Long, Meng Xiao, Xiao Luo, Wei Ju, Chengrui Wang, Xuezhi Wang, Yuanchun Zhou, Hengshu Zhu, 12 Aug 2025, SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs, https://arxiv.org/abs/2508.08742
Haike Xu, Tong Chen, 8 Sep 2025, Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval, https://arxiv.org/abs/2509.07163
Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu, 9 Sep 2025, GRADA: Graph-based Reranking against Adversarial Documents Attack, https://arxiv.org/abs/2505.07546
Phuong-Nam Dang, Kieu-Linh Nguyen and Thanh-Hieu Pham, 11 Sep 2025, ViRanker: A BGE-M3 & Blockwise Parallel Transformer Cross-Encoder for Vietnamese Reranking, https://arxiv.org/abs/2509.09131
Nicholas Pipitone, Ghita Houir Alami, Advaith Avadhanam, Anton Kaminskyi, Ashley Khoo, 16 Sep 2025, zELO: ELO-inspired Training Method for Rerankers and Embedding Models, https://arxiv.org/abs/2509.12541
Zihan Wang, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li, 16 Sep 2025, InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering, https://arxiv.org/abs/2509.12765

Long Context RAG

Long context RAG, or simply "long RAG", is the use of LLM long context capabilities to improve RAG architectures. The simplest ideas include using bigger chunks or sending more chunks to the LLM, both of which give more tokens for the LLM to process as context. There is a lot of research on getting LLMs to run fast on long context inputs, and some of this is specially related to RAG architectures.

Research papers on "long RAG" include:

Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
Qingfei Zhao, Ruobing Wang, Yukuo Cen, Daren Zha, Shicheng Tan, Yuxiao Dong, Jie Tang, 23 Oct 2024, LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering, https://arxiv.org/abs/2410.18050 https://github.com/QingFei1/LongRAG
Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong, 3 Oct 2024, UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation, https://arxiv.org/abs/2410.02719
Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik, 8 Oct 2024, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983
Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343
Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/
Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context and then using KV caching.)
Xinze Li, Yixin Cao, Yubo Ma, Aixin Sun, 27 Dec 2024, Long Context vs. RAG for LLMs: An Evaluation and Revisits, https://arxiv.org/abs/2501.01880 (Long context, summarization-based RAG, and classic chunked RAG have different strengths and weaknesses for different types of query.)
Kuicai Dong, Yujing Chang, Xin Deik Goh, Dexun Li, Ruiming Tang, Yong Liu, 15 Jan 2025, MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents, https://arxiv.org/abs/2501.08828
Salvatore Raieli, Jan 2025, Do Not Flip a Coin: When to Use RAG or Long Context LLMs, Understanding the Trade-offs and Best Practices for Optimizing LLMs with External Knowledge Sources, https://levelup.gitconnected.com/do-not-flip-a-coin-when-to-use-rag-or-long-context-llms-6f51a39de98c (Analysis of several papers that compare LC to RAG)
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 16 May 2024 (v3), FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065
Isuru Lakshan Ekanayaka, Jan 2025, Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive into Faster, Smarter Knowledge Integration, https://pub.towardsai.net/retrieval-augmented-generation-rag-vs-0b4bc63c1653
Dr. Ashish Bamania Jan 10, 2025, Cache-Augmented Generation (CAG) Is Here To Replace RAG: A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG). https://levelup.gitconnected.com/cache-augmented-generation-cag-is-here-to-replace-rag-3d25c52360b2
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 12 Apr 2021 (v4), Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu, 27 Jan 2025, Parametric Retrieval Augmented Generation, https://arxiv.org/abs/2501.15915 https://github.com/oneal2000/prag (Parametric RAG (PRAG) is training the RAG documents into model parameters, rather than prepending documents using long context RAG, and this means a shorter inference token length.)
Xubin Ren, Lingrui Xu, Long Xia, Shuaiqiang Wang, Dawei Yin, Chao Huang, 3 Feb 2025, VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos, https://arxiv.org/abs/2502.01549 https://github.com/HKUDS/VideoRAG
Cristian Leo, Feb 2025, Don’t Do RAG: Cache is the future: CAG or RAG? Let’s explore Cached Augmented Generation, its math, and trade-offs. Let’s dig into its research paper to see what it excels at, and how you could leverage it. https://levelup.gitconnected.com/dont-do-rag-cache-is-the-future-d1e995f0c76f
Manpreet Singh, Feb 2025, Goodbye RAG? Gemini 2.0 Flash Have Just Killed It! https://ai.gopubby.com/goodbye-rag-gemini-2-0-flash-have-just-killed-it-96301113c01f
Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, Jun Zhao, Kang Liu, 17 Feb 2025, Does RAG Really Perform Bad For Long-Context Processing? https://arxiv.org/abs/2502.11444 (Long context RAG processing based on the KV cache data is similar to fused/substring KV caching methods.)
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
Guanzheng Chen, Qilong Feng, Jinjie Ni, Xin Li, Michael Qizhe Shieh, 27 Feb 2025, Long-Context Inference with Retrieval-Augmented Speculative Decoding, https://arxiv.org/abs/2502.20330
Jiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 15 May 2025, Hierarchical Document Refinement for Long-context Retrieval-augmented Generation, https://arxiv.org/abs/2505.10413 https://github.com/ignorejjj/LongRefiner
Javier Ramos, June 2025, You Don’t Need RAG! Build a Q&A AI Agent in 30 Minutes 🚀, https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495
Alisa Fortin, Aug 18, 2025, URL context tool for Gemini API now generally available, https://developers.googleblog.com/en/url-context-tool-for-gemini-api-now-generally-available/
Herbert Ullrich, Jan Drchal, 5 Aug 2025, AIC CTU@FEVER 8: On-premise fact checking through long context RAG, https://arxiv.org/abs/2508.04390
Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu, 14 Aug 2025, ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning, https://arxiv.org/abs/2508.10419
Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan, 1 Sep 2025, REFRAG: Rethinking RAG based Decoding, https://www.arxiv.org/abs/2509.01092 https://www.alphaxiv.org/pdf/2509.01092 (Separates the attention computations across RAG chunks, which is effectively the same as "fused KV" or "concatenated KV" approaches with pre-computed per-chunk KV caches.)

Mini-RAG

Mini-RAG is single-document RAG that stores the entirety of the knowledge base in the LLM's input context. The advantage of this architecture is that there is no need for a retriever component at all, but the disadvantages include token counts for inference, and practical limitations on the size of the document being used. Efficiency constraints are crumbling lately, viz "long RAG" based on LLM efficiency optimizations, such as prefix KV caching.

Research papers on single-document RAG or "mini-RAG" include:

Jérôme DIAZ, Dec 2024, Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models. In this article we will explore why 128K tokens (and more) models can’t fully replace using RAG. https://towardsdatascience.com/why-retrieval-augmented-generation-is-still-relevant-in-the-era-of-long-context-language-models-e36f509abac5
Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky, 17 Oct 2024 (v2), Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833
Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang, 20 Nov 2023 (v3), Lost in the Middle: How Language Models Use Long Contexts, https://arxiv.org/abs/2307.03172 (Information is best placed at the start, or otherwise at the end, of a long context.)
Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context and then using KV caching.)
Xinze Li, Yixin Cao, Yubo Ma, Aixin Sun, 27 Dec 2024, Long Context vs. RAG for LLMs: An Evaluation and Revisits, https://arxiv.org/abs/2501.01880 (Long context, summarization-based RAG, and classic chunked RAG have different strengths and weaknesses for different types of query.)
Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang, 14 Jan 2025 (v2), MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation, https://arxiv.org/abs/2501.06713 https://github.com/HKUDS/MiniRAG (Uses the name "mini RAG" but is about knowledge graphs not long context RAG.)
Isuru Lakshan Ekanayaka, Jan 2025, Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive into Faster, Smarter Knowledge Integration, https://pub.towardsai.net/retrieval-augmented-generation-rag-vs-0b4bc63c1653
Dr. Ashish Bamania Jan 10, 2025, Cache-Augmented Generation (CAG) Is Here To Replace RAG: A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG). https://levelup.gitconnected.com/cache-augmented-generation-cag-is-here-to-replace-rag-3d25c52360b2
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 12 Apr 2021 (v4), Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu, 27 Jan 2025, Parametric Retrieval Augmented Generation, https://arxiv.org/abs/2501.15915 https://github.com/oneal2000/prag (Parametric RAG (PRAG) is training the RAG documents into model parameters, rather than prepending documents using long context RAG, and this means a shorter inference token length.)
Cristian Leo, Feb 2025, Don’t Do RAG: Cache is the future: CAG or RAG? Let’s explore Cached Augmented Generation, its math, and trade-offs. Let’s dig into its research paper to see what it excels at, and how you could leverage it. https://levelup.gitconnected.com/dont-do-rag-cache-is-the-future-d1e995f0c76f
Manpreet Singh, Feb 2025, Goodbye RAG? Gemini 2.0 Flash Have Just Killed It! https://ai.gopubby.com/goodbye-rag-gemini-2-0-flash-have-just-killed-it-96301113c01f
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu, 24 Feb 2025, Thus Spake Long-Context Large Language Model, https://arxiv.org/abs/2502.17129 (Impressive survey of many techniques to improve efficiency and accuracy of long context processing in both inference and training, covering text, video and multimodal models.)
Javier Ramos, June 2025, You Don’t Need RAG! Build a Q&A AI Agent in 30 Minutes 🚀, https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495
Alisa Fortin, Aug 18, 2025, URL context tool for Gemini API now generally available, https://developers.googleblog.com/en/url-context-tool-for-gemini-api-now-generally-available/
Latent Space, Aug 20, 2025, "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma: What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows, https://www.latent.space/p/chroma

RAG Knowledge Graph

A RAG Knowledge Graph architecture, or a "RAG Graph," is a combination of RAG with a Knowledge Graph. Instead of returning text chunks, the retriever returns a structured "graph" that represents additional knowledge. The advantage of a graph is that it contains concept relationships such as hierarchies.

Research on RAG with Knowledge Graphs:

Dr. Ashish Bamania, Aug 2024, ‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine A deep-dive into how RAG, GraphRAG, and MedGraphRAG work and how they significantly improve the performance of LLM responses in Medicine, https://levelup.gitconnected.com/medgraphrag-is-a-complete-game-changer-for-ai-in-medicine-c6b41b0effd6
Junde Wu, Jiayuan Zhu, Yunli Qi, 8 Aug 2024, Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2408.04187 Code: https://github.com/MedicineToken/Medical-Graph-RAG/tree/main
Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao, 26 May 2024, GRAG: Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2405.16506
Philip Rathle, Jul 11, 2024, The GraphRAG Manifesto: Adding Knowledge to GenAI, https://neo4j.com/blog/graphrag-manifesto/
Microsoft, Aug 2024 (accessed), GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system, https://github.com/microsoft/graphrag
Chia Jeng Yang, Dec 14, 2023, A first intro to Complex RAG (Retrieval Augmented Generation), https://medium.com/enterprise-rag/a-first-intro-to-complex-rag-retrieval-augmented-generation-a8624d70090f
Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 24 Sep 2024 (v2), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang, 31 Oct 2024, RAGraph: A General Retrieval-Augmented Graph Learning Framework, https://arxiv.org/abs/2410.23855
Cristian-George Crăciun, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel, 5 Dec 2024, GRAF: Graph Retrieval Augmented by Facts for Legal Question Answering, https://arxiv.org/abs/2412.04119
Vivedha Elango, Dec 2024, How to Make your RAG application Use External Data More Wisely? RAG Optimisation Techniques for Explicit and Implicit Fact Queries with Implementations. https://ai.gopubby.com/how-to-make-your-rag-application-use-external-data-more-wisely-4ff1863752c5
AI Engineer, Sep 2024, GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem, https://www.youtube.com/watch?v=knDDGYHnnSI
Alla Chepurova, Yuri Kuratov, Aydar Bulatov, and Mikhail Burtsev. 2024. Prompt Me One More Time: A Two-Step Knowledge Extraction Pipeline with Ontology-Based Verification. In Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing, pages 61–77, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024.textgraphs-1.5/ https://aclanthology.org/2024.textgraphs-1.5.pdf
Steve Hedden, Dec 30, 2024, How to Build a Graph RAG App: Using knowledge graphs and AI to retrieve, filter, and summarize medical journal articles, https://towardsdatascience.com/how-to-build-a-graph-rag-app-b323fc33ba06
Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang, 14 Jan 2025 (v2), MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation, https://arxiv.org/abs/2501.06713 https://github.com/HKUDS/MiniRAG (Uses the name "mini RAG" but is about knowledge graphs not long context RAG.)
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Reham Omar, Omij Mangukiya, Essam Mansour, 17 Jan 2025, Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs, https://arxiv.org/abs/2501.09928
Shige Liu, Zhifang Zeng, Li Chen, Adil Ainihaer, Arun Ramasami, Songting Chen, Yu Xu, Mingxi Wu, Jianguo Wang, 20 Jan 2025, TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs, https://arxiv.org/abs/2501.11216
Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, Xiao Huang, 21 Jan 2025, A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models, https://arxiv.org/abs/2501.13958
Tianpeng Pan, Wenqiang Pu, Licheng Zhao, Rui Zhou, 30 Jan 2025, Leveraging LLM Agents for Automated Optimization Modeling for SASP Problems: A Graph-RAG based Approach, https://arxiv.org/abs/2501.18320
Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu, 8 Feb 2025, Knowledge Graph-Guided Retrieval Augmented Generation, https://arxiv.org/abs/2502.06864
Haoyu Han, Harry Shomer, Yu Wang, Yongjia Lei, Kai Guo, Zhigang Hua, Bo Long, Hui Liu, Jiliang Tang, 17 Feb 2025, RAG vs. GraphRAG: A Systematic Evaluation and Key Insights, https://arxiv.org/abs/2502.11371
Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han, 16 Feb 2025, RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation, https://arxiv.org/abs/2502.10996
Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su, 20 Feb 2025, From RAG to Memory: Non-Parametric Continual Learning for Large Language Models, https://arxiv.org/abs/2502.14802 https://github.com/OSU-NLP-Group/HippoRAG
Pengcheng Huang, Zhenghao Liu, Yukun Yan, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong, 21 Feb 2025, PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning, https://arxiv.org/abs/2502.15543
R Chen, Mar 2025, Retrieval-Augmented Generation with Knowledge Graphs: A Survey Computer Science Undergradaute Conference 2025, https://openreview.net/pdf?id=ZikTuGY28C
Jeff Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le, 28 Feb 2025, SuperRAG: Beyond RAG with Layout-Aware Graph Modeling, https://arxiv.org/abs/2503.04790
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Zemin Kuang, Meina Song, Yifan Zhu, Luu Anh Tuan, 27 Mar 2025, HyperGraphRAG: Retrieval-Augmented Generation with Hypergraph-Structured Knowledge Representation, https://arxiv.org/abs/2503.21322
Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Cheng Long, Jie Zhang, 19 May 2025 (v2), SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache, https://arxiv.org/abs/2505.10951
Junde Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Aug 2025, Min Xu, Filippo Menolascina, Yueming Jin, Vicente Grau, Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28443–28467 July 27- August 1, 2025, https://aclanthology.org/2025.acl-long.1381.pdf
Qiao Xiao, Hong Ting Tsang, Jiaxin Bai, 23 Sep 2025, TERAG: Token-Efficient Graph-Based Retrieval-Augmented Generation, https://arxiv.org/abs/2509.18667
Zahra Zehtabi Sabeti Moghaddam, Zeinab Dehghani, Maneeha Rani, Koorosh Aslansefat, Bhupesh Kumar Mishra, Rameez Raja Kureshi, Dhavalkumar Thakker, 3 Sep 2025, Explainable Knowledge Graph Retrieval-Augmented Generation (KG-RAG) with KG-SMILE, https://arxiv.org/abs/2509.03626
Kai Hu, Parfait Atchade-Adelomou, Carlo Adornetto, Adrian Mora-Carrero, Luis Alonso-Pastor, Ariel Noyman, Yubo Liu, Kent Larson, 5 Sep 2025, Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain, https://arxiv.org/abs/2508.16172
Yaodong Su, Yixiang Fang, Yingli Zhou, Quanqing Xu, Chuanhui Yang, 3 Aug 2025, Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval, https://arxiv.org/abs/2507.08445
Xu Yuan, Liangbo Ning, Wenqi Fan, Qing Li, 7 Aug 2025, mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering, https://arxiv.org/abs/2508.05318
Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan, 7 Aug 2025, Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems, https://arxiv.org/abs/2507.03226
Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab, 11 Aug 2025, What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge, https://arxiv.org/abs/2508.08344
Sarat Ahmad, Zeinab Nezami, Maryam Hafeez, Syed Ali Raza Zaidi, 20 Aug 2025, Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN), https://arxiv.org/abs/2507.03608
Hudson de Martim, 26 Aug 2025, An Ontology-Driven Graph RAG for Legal Norms: A Hierarchical, Temporal, and Deterministic Approach, https://arxiv.org/abs/2505.00039
Qikai Wei and Huansheng Ning and Chunlong Han and Jianguo Ding, 7 Jul 2025, A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models, https://arxiv.org/abs/2507.16826
Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu and Huadong Ma, 29 Jul 2025, SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation, https://arxiv.org/abs/2507.21585
Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang, 31 Jul 2025, GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning, https://arxiv.org/abs/2507.23581
Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang, 6 Aug 2025, A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models, https://arxiv.org/abs/2508.04276
Vibhor Agrawal, Fay Wang, Rishi Puri, 25 Jul 2025, Query-Aware Graph Neural Networks for Enhanced Retrieval-Augmented Generation, https://arxiv.org/abs/2508.05647
Yukun Cao, Zengyi Gao, Zhiyang Li, Xike Xie, S. Kevin Zhou, Jianliang Xu, 19 Aug 2025, LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration, https://arxiv.org/abs/2411.05844
Jiale Liu, Jiahao Zhang, Suhang Wang, 24 Aug 2025, Exposing Privacy Risks in Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2508.17222
Jiasheng Xu, Mingda Li, Yongqiang Tang, Peijie Wang, Wensheng Zhang, 1 Sep 2025, Towards Open-World Retrieval-Augmented Generation on Knowledge Graph: A Multi-Agent Collaboration Framework, https://arxiv.org/abs/2509.01238
ZiXuan Zhang, Bowen Hao, Yingjie Li, Hongzhi Yin, 6 Sep 2025, ZhiFangDanTai: Fine-tuning Graph-based Retrieval-Augmented Generation Model for Traditional Chinese Medicine Formula, https://arxiv.org/abs/2509.05867
Thanh Ma, Tri-Tam La, Lam-Thu Le Huu, Minh-Nghi Nguyen, Khanh-Van Pham Luu, Huu-Hoa Nguyen, 2 Oct 2025, REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing, https://arxiv.org/abs/2510.01800

Ontology RAG

Ontology-based RAG is the use of a special type of Knowledge Graph, known as an "ontology" or "taxonomy" of the concept space. Extra information can be extracted from the taxonomy as a special type of retrieval for RAG-based systems. The advantage is the ability to better capture structured information and hierarchical relationships between concepts in the ontology.

Research papers on LLMs and Ontologies include:

Prajwal Kailas, Max Homilius, Rahul C. Deo, Calum A. MacRae, 16 Dec 2024, NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text, https://arxiv.org/abs/2412.11477
Muhayy Ud Din, Jan Rosell, Waseem Akram, Isiah Zaplana, Maximo A Roa, Lakmal Seneviratne, Irfan Hussain, 10 Dec 2024, Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning, https://arxiv.org/abs/2412.07493 https://muhayyuddin.github.io/llm-tamp/ (Detecting objects in the prompt text and then using a RALM algorithm to query an ontology database.)
Oleksandr Palagin, Vladislav Kaverinskiy, Anna Litvin, Kyrylo Malakhov, 11 Jul 2023, OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning, International Journal of Computing, 22(2), 170-183, https://arxiv.org/abs/2307.05082 https://doi.org/10.47839/ijc.22.2.3086 https://computingonline.net/computing/article/view/3086
Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
Kartik Sharma, Peeyush Kumar, Yunqing Li, 12 Dec 2024, OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models, https://arxiv.org/abs/2412.15235
Chengshuai Zhao, Garima Agrawal, Tharindu Kumarage, Zhen Tan, Yuli Deng, Ying-Chih Chen, Huan Liu, 10 Dec 2024, Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education, https://arxiv.org/abs/2412.14191
Ramona Kühn, Jelena Mitrović, Michael Granitzer, 18 Dec 2024, Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration, https://arxiv.org/abs/2412.13799
Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang, 13 Sep 2024, A RAG Approach for Generating Competency Questions in Ontology Engineering, https://arxiv.org/abs/2409.08820
Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi, Lokesh Mishra, Michele Dolfi, Peter Staar, Panagiotis Vagenas, 29 Nov 2024, Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems, https://arxiv.org/abs/2411.19710
Yuxing Lu, Sin Yee Goi, Xukai Zhao, Jinzhuo Wang, 22 Jan 2025 (v2), Biomedical Knowledge Graph: A Survey of Domains, Tasks, and Real-World Applications, https://arxiv.org/abs/2501.11632
Battazza, I. F. C., Rodrigues, C. M. d. O., & Oliveira, J. F. L. d. (2025). A Framework for Market State Prediction with Ontological Asset Selection: A Multimodal Approach. Applied Sciences, 15(3), 1034. https://doi.org/10.3390/app15031034 https://www.mdpi.com/2076-3417/15/3/1034
AD Al Hauna, AP Yunus, M Fukui, S Khomsah - International Journal on Robotics, Apr 2025, Enhancing LLM Efficiency: A Literature Review of Emerging Prompt Optimization Strategies, https://doi.org/10.33093/ijoras.2025.7.1.9 https://mmupress.com/index.php/ijoras/article/view/1311 PDF: https://mmupress.com/index.php/ijoras/article/view/1311/834
Jean-Philippe Corbeil, Amin Dada, Jean-Michel Attendu, Asma Ben Abacha, Alessandro Sordoni, Lucas Caccia, François Beaulieu, Thomas Lin, Jens Kleesiek, Paul Vozila, 15 May 2025, A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment, https://arxiv.org/abs/2505.10717
Junde Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Aug 2025, Min Xu, Filippo Menolascina, Yueming Jin, Vicente Grau, Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28443–28467 July 27- August 1, 2025, https://aclanthology.org/2025.acl-long.1381.pdf
Ziheng Zhang, Zhenxi Lin, Yefeng Zheng, and Xian Wu. 2025. How much Medical Knowledge do LLMs have? An Evaluation of Medical Knowledge Coverage for LLMs. In Proceedings of the ACM on Web Conference 2025 (WWW '25). Association for Computing Machinery, New York, NY, USA, 5330–5341. https://doi.org/10.1145/3696410.3714535 https://dl.acm.org/doi/abs/10.1145/3696410.3714535 https://dl.acm.org/doi/pdf/10.1145/3696410.3714535
Yan Ting Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang, 14 Aug 2025, HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation, https://arxiv.org/abs/2508.10425
Yiping Song, Jiaoyan Chen and Renate A. Schmidt, 14 Aug 2025, GenOM: Ontology Matching with Description Generation and Large Language Model, https://arxiv.org/abs/2508.10703
Qing Cheng, Zefan Zeng, Xingchen Hu, Yuehang Si, Zhong Liu, 23 Jul 2025, A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects, https://arxiv.org/abs/2411.10371
Stefan Borgwardt, Duy Nhu, Gabriele R\"oger, 23 Jul 2025, Automated planning with ontologies under coherence update semantics (Extended Version), https://arxiv.org/abs/2507.15120
Lam Nguyen and Erika Barcelos and Roger French and Yinghui Wu, 18 Jul 2025, KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models, https://arxiv.org/abs/2507.14032
Oussama Bouaggad, Natalia Grabar, 18 Jul 2025, Search-Optimized Quantization in Biomedical Ontology Alignment, https://arxiv.org/abs/2507.13742
Hui Yang, Jiaoyan Chen, Yuan He, Yongsheng Gao, Ian Horrocks, 18 Jul 2025, Language Models as Ontology Encoders, https://arxiv.org/abs/2507.14334
Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskis\"arkk\"a, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese, 19 Jul 2025, Large Language Models Assisting Ontology Evaluation, https://arxiv.org/abs/2507.14552
Ritesh Chandra, Shashi Shekhar Kumar, Rushil Patra, Sonali Agarwal, 21 Jul 2025, Decision support system for Forest fire management using Ontology with Big Data and LLMs, https://arxiv.org/abs/2405.11346
Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi, Sankalp KJ, Vinija Jain, Aman Chadha, 21 Jul 2025, Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles, https://arxiv.org/abs/2406.12644
Soumen Sinha, Tanisha Rana, Rahul Roy, 22 Jul 2025, A novel approach to navigate the taxonomic hierarchy to address the Open-World Scenarios in Medicinal Plant Classification, https://arxiv.org/abs/2502.17289
Maurice Funk, Marvin Grosser, Carsten Lutz, 11 Aug 2025, Fitting Description Logic Ontologies to ABox and Query Examples, https://arxiv.org/abs/2508.08007
Xiaohua Feng,Jiaming Zhang,Fengyuan Yu,Chengye Wang,Li Zhang,Kaixiang Li,Yuyuan Li,Chaochao Chen,Jianwei Yin, 26 Jul 2025, A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction, https://arxiv.org/abs/2507.19894
Md Fantacher Islam, Jarrod Mosier, Vignesh Subbian, 26 Jul 2025, NIRS: An Ontology for Non-Invasive Respiratory Support in Acute Care, https://arxiv.org/abs/2507.19992
Joydeep Chandra and Satyam Kumar Navneet, 26 Jul 2025, Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation, https://arxiv.org/abs/2507.20014
Wenbin Guo, Xin Wang, Jiaoyan Chen, Zhao Li and Zirui Chen, 28 Jul 2025, Ontology-Enhanced Knowledge Graph Completion using Large Language Models, https://arxiv.org/abs/2507.20643
Federico Donato and Adrien Barton, 26 Jul 2025, An ontological analysis of risk in Basic Formal Ontology, https://arxiv.org/abs/2507.21171
Vishal Raman, Vijai Aravindh R, 29 Jul 2025, Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models, https://arxiv.org/abs/2507.21438
Sabrina Patania, Luca Annese, Cansu Koyuturk, Azzurra Ruggeri, Dimitri Ognibene, 25 May 2025, Dialogic Social Learning for Artificial Agents: Enhancing LLM Ontology Acquisition through Mixed-Initiative Educational Interactions, https://arxiv.org/abs/2507.21065
Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade, 31 Jul 2025, Tractable Responsibility Measures for Ontology-Mediated Query Answering, https://arxiv.org/abs/2507.23191
Zhangcheng Qiang, Kerry Taylor, Weiqing Wang, Jing Jiang, 25 Mar 2025, OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching, https://arxiv.org/abs/2503.21813
Renato Vukovic, Carel van Niekerk, Michael Heck, Benjamin Ruppik, Hsien-Chin Lin, Shutong Feng, Nurul Lubis, Milica Gasic, 31 Jul 2025, Text-to-SQL Task-oriented Dialogue Ontology Construction, https://arxiv.org/abs/2507.23358
Haonan Bian, Yutao Qi, Rui Yang, Yuanxi Che, Jiaqian Wang, Heming Xia, Ranran Zhen, 2 Aug 2025, From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs, https://arxiv.org/abs/2508.01424
Manuel Cossio, 3 Aug 2025, A comprehensive taxonomy of hallucinations in Large Language Models, https://arxiv.org/abs/2508.01781
Yuki Yamagata, Koji Kyoda, Hiroya Itoga, Emi Fujisawa and Shuichi Onami, 4 Aug 2025, SSBD Ontology: A Two-Tier Approach for Interoperable Bioimaging Metadata, https://arxiv.org/abs/2508.02084
Haoran Sun, Yusen Wu, Peng Wang, Wei Chen, Yukun Cheng, Xiaotie Deng, Xu Chu, 5 Aug 2025, Game Theory Meets Large Language Models: A Systematic Survey with Taxonomy and New Frontiers, https://arxiv.org/abs/2502.09053
Alessia Pisu, Livio Pompianu, Francesco Osborne, Diego Reforgiato Recupero, Daniele Riboni, Angelo Salatino, 6 Aug 2025, A Hybrid AI Methodology for Generating Ontologies of Research Topics from Scientific Paper Corpora, https://arxiv.org/abs/2508.04213
Yuyang Liu, Qiuhe Hong, Linlan Huang, Alexandra Gomez-Villa, Dipam Goswami, Xialei Liu, Joost van de Weijer, Yonghong Tian, 6 Aug 2025, Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting, https://arxiv.org/abs/2508.04227
Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, Mohammad Masudur Rahman, 6 Aug 2025, Taxonomy of Faults in Attention-Based Neural Networks, https://arxiv.org/abs/2508.04925
Anouk Oudshoorn, Magdalena Ortiz, Mantas Simkus, 16 Jul 2025, SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques, https://arxiv.org/abs/2507.12286
Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jimenez-Ruiz, Artur d'Avila Garcez, 11 Aug 2025, Large Language Models as Oracles for Ontology Alignment, https://arxiv.org/abs/2508.08500
Amir Mohammad Salehoof, Ali Ramezani, Yadollah Yaghoobzadeh, Majid Nili Ahmadabadi, 12 Aug 2025, A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions, https://arxiv.org/abs/2508.08795
Farzana Zahid, Anjalika Sewwandi, Lee Brandon, Vimal Kumar, Roopak Sinha, 12 Aug 2025, Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment, https://arxiv.org/abs/2508.08629
Jiawei Zhou, Amy Z. Chen, Darshi Shah, Laura M. Schwab Reese, and Munmun De Choudhury, 11 Aug 2025, A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health, https://arxiv.org/abs/2411.02594
David J. Moore, 18 Aug 2025, A Taxonomy of Hierarchical Multi-Agent Systems: Design Patterns, Coordination Mechanisms, and Industrial Applications, https://arxiv.org/abs/2508.12683
Zabir Al Nazi, Vagelis Hristidis, Aaron Lawson McLean, Jannat Ara Meem and Md Taukir Azam Chowdhury, 15 Aug 2025, Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models, https://arxiv.org/abs/2508.11784
Simon Hosemann, Jean Christoph Jung, Carsten Lutz, Sebastian Rudolph, 11 Aug 2025, Fitting Ontologies and Constraints to Relational Structures, https://arxiv.org/abs/2508.13176
Hui Wei, Dong Yoon Lee, Shubham Rohal, Zhizhang Hu, Ryan Rossi, Shiwei Fang, Shijia Pan, 21 Aug 2025, A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis, https://arxiv.org/abs/2506.12263
Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin, 21 Aug 2025, Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering, https://arxiv.org/abs/2502.11491
John Beverley and Danielle Limbaugh, 26 Jul 2025, Ontological Foundations of State Sovereignty, https://arxiv.org/abs/2507.21172
Michael Banf and Johannes Kuhn, 22 Aug 2025, Tripartite-GraphRAG via Plugin Ontologies, https://arxiv.org/abs/2504.19667
Natalie Abreu, Edwin Zhang, Eran Malach, Naomi Saphra, 25 Aug 2025, A Taxonomy of Transcendence, https://arxiv.org/abs/2508.17669
Aarush Kumbhakern, Saransh Kumar Gupta, Lipika Dey, Partha Pratim Das, 4 Sep 2025, Towards an Action-Centric Ontology for Cooking Procedures Using Temporal Graphs, https://arxiv.org/abs/2509.04159
John Wentworth, David Lorell, 4 Sep 2025, Natural Latents: Latent Variables Stable Across Ontologies, https://arxiv.org/abs/2509.03780
Barbara Gendron (LORIA, UL), Ga\"el Guibon (LIPN, LORIA), Mathieu D'aquin (LORIA, UL), 5 Sep 2025, Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts, https://arxiv.org/abs/2509.04926
Heinke Hihn, Dennis A. V. Dittrich, Carl Jeske, Cayo Costa Sobral, Helio Pais, and Timm Lochmann, 5 Sep 2025, Ontology-Aligned Embeddings for Data-Driven Labour Market Analytics, https://arxiv.org/abs/2509.04942
Cosmin-Andrei Hatfaludi and Alex Serban, 5 Sep 2025, Foundational Models and Federated Learning: Survey, Taxonomy, Challenges and Practical Insights, https://arxiv.org/abs/2509.05142
Alice Schiavone (1 and 2), Marco Fraccaro (3), Lea Marie Pehrson (1, 4 and 5), Silvia Ingala (4 and 6), Rasmus Bonnevie (3), Michael Bachmann Nielsen (5), Vincent Beliveau (7), Melanie Ganz (1 and 2), Desmond Elliott (1) ((1) Department of Computer Science, University of Copenhagen, Denmark, (2) Neurobiology Research Unit, Copenhagen University Hospital, Denmark, (3) Unumed Aps, Denmark, (4) Department of Diagnostic Radiology, Copenhagen University Hospital, Denmark, (5) Department of Clinical Medicine, University of Copenhagen, Denmark, (6) Cerebriu A/S, Denmark, (7) Institute for Human Genetics, Medical University of Innsbruck, Austria), 29 Aug 2025, MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification, https://arxiv.org/abs/2509.04471
Samira Khorshidi, Azadeh Nikfarjam, Suprita Shankar, Yisi Sang, Yash Govind, Hyun Jang, Ali Kasgari, Alexis McClimans, Mohamed Soliman, Vishnu Konda, Ahmed Fakhry, Xiaoguang Qi, 4 Sep 2025, ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs, https://arxiv.org/abs/2509.04696
Hudson de Martim, 26 Aug 2025, An Ontology-Driven Graph RAG for Legal Norms: A Hierarchical, Temporal, and Deterministic Approach, https://arxiv.org/abs/2505.00039
Felix N\"utzel, Mischa Dombrowski, Bernhard Kainz, 27 Aug 2025, Ontology-Based Concept Distillation for Radiology Report Retrieval and Labeling, https://arxiv.org/abs/2508.19915
Generoso Immediato, 26 Aug 2025, Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI, https://arxiv.org/abs/2508.19304
Samah Alkhuzaey, Floriana Grasso, Terry R. Payne and Valentina Tamma, 27 Aug 2025, Evaluating the Fitness of Ontologies for the Task of Question Generation, https://arxiv.org/abs/2504.07994
Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Dongjie Wang, Zijun Yao, 29 Aug 2025, Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation, https://arxiv.org/abs/2508.21320
Maijunxian Wang, Ran Ji, 2 Sep 2025, AGI as Second Being: The Structural-Generative Ontology of Intelligence, https://arxiv.org/abs/2509.02089
Aryan Amit Barsainyan, Jing Yu Lim, Dianbo Liu, 1 Sep 2025, Toward a Unified Benchmark and Taxonomy of Stochastic Environments, https://arxiv.org/abs/2509.01793
Luca Cotti, Anisa Rula, Devis Bianchini, Federico Cerutti, 26 Aug 2025, Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies, https://arxiv.org/abs/2509.00081
Songhui Yue, 29 Aug 2025, LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards, https://arxiv.org/abs/2509.00140
Peter Stockinger (ESCOM, PLIDAM, Inalco, CIS), 1 Sep 2025, Animer une base de connaissance: des ontologies aux mod{\`e}les d'I.A. g{\'e}n{\'e}rative, https://arxiv.org/abs/2509.01304
Theodor Stoecker, Samed Bayer, and Ingo Weber, 28 Aug 2025, Bias Mitigation for AI-Feedback Loops in Recommender Systems: A Systematic Literature Review and Taxonomy, https://arxiv.org/abs/2509.00109
Khalid M. Saqr, 2 Sep 2025, A Novel Kuhnian Ontology for Epistemic Classification of STM Scholarly Articles, https://arxiv.org/abs/2002.03531
Shriyank Somvanshi, Md Monzurul Islam, Syed Aaqib Javed, Gaurab Chhetri, Kazi Sifatul Islam, Tausif Islam Chowdhury, Sazzad Bin Bashar Polock, Anandi Dutta, Subasish Das, 31 Aug 2025, A Comprehensive Survey on Bio-Inspired Algorithms: Taxonomy, Applications, and Future Directions, https://arxiv.org/abs/2506.04238
Chengshuai Zhao, Riccardo De Maria, Tharindu Kumarage, Kumar Satvik Chaudhary, Garima Agrawal, Yiwen Li, Jongchan Park, Yuli Deng, Ying-Chih Chen, Huan Liu, 3 Sep 2025, CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation, https://arxiv.org/abs/2504.00389
Aleksandr Boldachev, 11 Sep 2025, Executable Ontologies: Synthesizing Event Semantics with Dataflow Architecture, https://arxiv.org/abs/2509.09775
Hanna Abi Akl, 12 Sep 2025, Investigating Language Model Capabilities to Represent and Process Formal Knowledge: A Preliminary Study to Assist Ontology Engineering, https://arxiv.org/abs/2509.10249
Teresa Salazar, Helder Ara\'ujo, Alberto Cano, Pedro Henriques Abreu, 12 Sep 2025, A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research, https://arxiv.org/abs/2410.03855
Lukas Laakmann, Seyyid A. Ciftci, Christian Janiesch, 19 Sep 2025, A Nascent Taxonomy of Machine Learning in Intelligent Robotic Process Automation, https://arxiv.org/abs/2509.15730
Natallia Kokash, Bernard de Bono and Tom Gillespie, 19 Sep 2025, Ontology Creation and Management Tools: the Case of Anatomical Connectivity, https://arxiv.org/abs/2509.15780
Xinyu Zhang, Pei Zhang, Shuang Luo, Jialong Tang, Yu Wan, Baosong Yang, Fei Huang, 13 Sep 2025, CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis, https://arxiv.org/abs/2509.10886
Haoye Tian, Chong Wang, BoYang Yang, Lyuye Zhang, Yang Liu, 17 Sep 2025, A Taxonomy of Prompt Defects in LLM Systems, https://arxiv.org/abs/2509.14404
Tom Westermann, Malte Ramonat, Johannes Hujer, Felix Gehlhoff, Alexander Fay, 18 Sep 2025, Automatic Mapping of AutomationML Files to Ontologies for Graph Queries and Validation, https://arxiv.org/abs/2504.21694
Pranav Pawar, Kavish Shah, Akshat Bhalani, Komal Kasat, Dev Mittal, Hadi Gala, Deepali Patil, Nikita Raichada, Monali Deshmukh, 10 Sep 2025, Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models, https://arxiv.org/abs/2509.08270

RAG Caching

RAG caching is the use of caching optimizations to improve the latency and speed of a RAG system. Several components in a RAG architecture can be optimized with a cache. The retrieval component can use all of the types of caching that are applicable to whatever database or datastore architecture it uses, irrespective whether it's keyword or vector lookup, and whether stored on disk or cached in memory. All of these different retrieval options can have a cache. At the bottom level of the LLM, there are various KV caching techniques (see further below). At the topmost level, there can be an overall cache via an "inference cache" for exactly identical queries, or a "semantic cache" for similar queries.

Research papers on RAG cache architectures:

Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin, 18 Apr 2024, RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation, https://arxiv.org/abs/2404.12457
Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 3 Jun 2024 (v2), CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, https://arxiv.org/abs/2405.16444
Google, 2024, Context caching, https://ai.google.dev/gemini-api/docs/caching?lang=python (Pass in context tokens and reuse them without re-uploading, might be doing something like prefix KV caching underneath.)
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Pere Martra, Aug 2024 (accessed), Implementing semantic cache to improve a RAG system with FAISS, https://huggingface.co/learn/cookbook/semantic_cache_chroma_vector_database
Richmond Alake, Apoorva Joshi, Aug 14, 2024, Adding Semantic Caching and Memory to Your RAG Application Using MongoDB and LangChain, MongoDB, https://www.mongodb.com/developer/products/atlas/advanced-rag-langchain-mongodb/
Anthropic, 20 Sept 2024, Introducing Contextual Retrieval, https://www.anthropic.com/news/contextual-retrieval
Yihua Cheng, Kuntai Du, Jiayi Yao, Junchen Jiang, 16 Sep 2024, Do Large Language Models Need a Content Delivery Network? https://arxiv.org/abs/2409.13761 https://github.com/LMCache/LMCache (Managing the process of sharing KV cache data over a network.)
David Spuler, , September 26, 2024, RAG Optimization via Caching, https://www.aussieai.com/blog/rag-optimization-caching
Songshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang, 10 Oct 2024, TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text, https://arxiv.org/abs/2410.07590 (Fusing precomputed KV caches for each RAG chunk.)
David Spuler, October 24, 2024, Generalizing Prefix KV Caching to RAG Chunks, Aussie AI Blog, https://www.aussieai.com/blog/prefix-kv-rag
Philhoon Oh, Jinwoo Shin, James Thorne, 13 Jan 2025, Parallel Key-Value Cache Fusion for Position Invariant RAG, https://arxiv.org/abs/2501.07523 (Generating the KV cache for each RAG chunk.)
Guangyuan Liu, Yinqiu Liu, Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, 16 Jan 2025, Adaptive Contextual Caching for Mobile Edge Large Language Model Service, https://arxiv.org/abs/2501.09383
S Agarwal, S Sundaresan, S Mitra, D Mahapatra, Feb 2025, Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation, https://skejriwal44.github.io/docs/CacheCraft_SIGMOD_2025.pdf (Managing pre-computed KV caches for RAG chunks as a generalization of prefix KV caching, addressing limitations in their position and ordering.)
Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang, 21 Feb 2025, KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse, https://arxiv.org/abs/2502.16002 https://github.com/UCSB-NLP-Chang/KVLink (Computing a KV cache for each RAG chunk, and using techniques to fuse/merge/concatenate these KV caches, i.e., fused KV caching as a generalization of prefix KV caching, while restoring cross-chunk attention accuracy via 3 techniques: positional re-encoding, "link tokens" between chunks processed during inference, and fine-tuning).
Shai Bergman, Zhang Ji, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos, 7 Mar 2025, Leveraging Approximate Caching for Faster Retrieval-Augmented Generation, https://arxiv.org/abs/2503.05530
Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti, 6 Mar 2025, Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning, https://arxiv.org/abs/2503.04973
Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Cheng Long, Jie Zhang, 19 May 2025 (v2), SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache, https://arxiv.org/abs/2505.10951
H Lee, K Kim, J Kim, MH Cha, HY Kim, JJ Kim, Y Kim, 2025, Disk-Based Shared KV Cache Management for Fast Inference in Multi-Instance LLM RAG Systems, https://discos.sogang.ac.kr/file/2025/intl_conf/CLOUD_2025_H_Lee.pdf (Offloading KV caches from memory to disk.)
Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan, 1 Sep 2025, REFRAG: Rethinking RAG based Decoding, https://www.arxiv.org/abs/2509.01092 https://www.alphaxiv.org/pdf/2509.01092 (Separates the attention computations across RAG chunks, which is effectively the same as "fused KV" or "concatenated KV" approaches with pre-computed per-chunk KV caches.)

RAG KV Caching Optimizations

KV caching optimizations are the storing of Key-Vector data from LLM inference for use in subsequent inference requests in a RAG system. In addition to RAG caches, such as retrieval caches, there are various LLM cache methods. Several of the many types of KV caching optimizations can optimize RAG architectures (and other LLM use cases). The main KV cache techniques involve precomputed caches for RAG chunks, such as prefix caching or session caching. More information is available:

Prefix KV cache
Session KV cache (multi-turn KV caching)
Substring KV cache (Lengthwise-fused KV caching)
KV cache global (multi-query KV caching)
KV caching (overview)

Other general types of caching that apply to any LLM system, and can be used with RAG:

RAG Optimization Research Papers

Research papers on optimization of RAG architectures:

Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin, 18 Apr 2024, RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation, https://arxiv.org/abs/2404.12457
Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 3 Jun 2024 (v2), CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, https://arxiv.org/abs/2405.16444 Code: https://github.com/YaoJiayi/CacheBlend.git (Generalizes prefix KV caching to KV cache fusion with selective recomputation of some KV cache data.)
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
Priyank Rathod, May 21, 2024, Efficient Usage of RAG Systems in the World of LLMs, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171625877.73379410/v1
25 May 2024, Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection, Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen, https://arxiv.org/abs/2405.16178
Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Dr. Ashish Bamania, Jun 18, 2024, Google’s New Algorithms Just Made Searching Vector Databases Faster Than Ever: A Deep Dive into how Google’s ScaNN and SOAR Search algorithms supercharge the performance of Vector Databases, https://levelup.gitconnected.com/googles-new-algorithms-just-made-searching-vector-databases-faster-than-ever-36073618d078
Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister, 11 Jul 2024, Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting, https://arxiv.org/abs/2407.08223
Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami, 11 Jul 2024, Characterizing Prompt Compression Methods for Long Context Inference, https://arxiv.org/abs/2407.08892
Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari, 17 Jul 2024, LLM Inference Serving: Survey of Recent Advances and Opportunities, https://arxiv.org/abs/2407.12391
Eric Yang, Jonathan Amar, Jong Ha Lee, Bhawesh Kumar, Yugang Jia, 25 Jul 2024, The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.18044
Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi, 19 Jul 2024 (v2), Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation https://arxiv.org/abs/2404.06910 (Process each RAG chunk in parallel and choose a final output.)
Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang, 6 Feb 2024 (v2), When Large Language Models Meet Vector Databases: A Survey, https://arxiv.org/abs/2402.01763
Anthropic, 20 Sept 2024, Introducing Contextual Retrieval, https://www.anthropic.com/news/contextual-retrieval
David Spuler, , September 26, 2024, RAG Optimization via Caching, https://www.aussieai.com/blog/rag-optimization-caching
Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343
Zhangchi Feng, Dongdong Kuang, Zhongyuan Wang, Zhijie Nie, Yaowei Zheng, Richong Zhang, 15 Oct 2024 (v2), EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations, https://arxiv.org/abs/2410.10315 https://github.com/BUAADreamer/EasyRAG
Tolga Şakar and Hakan Emekci, 30 October 2024, Maximizing RAG efficiency: A comparative analysis of RAG methods, Natural Language Processing. doi:10.1017/nlp.2024.53, https://www.cambridge.org/core/journals/natural-language-processing/article/maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods/D7B259BCD35586E04358DF06006E0A85 https://www.cambridge.org/core/services/aop-cambridge-core/content/view/D7B259BCD35586E04358DF06006E0A85/S2977042424000530a.pdf/div-class-title-maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods-div.pdf
Sarayavalasaravikiran, Nov 2024, Optimizing RAG with Embedding Tuning, https://ai.plainenglish.io/optimizing-rag-with-embedding-tuning-2508af2ec049
Joyce Birkins, Oct 10, 2024, 6 Advanced RAG Optimization Strategies: Analysis of 14 Key Research Papers, https://medium.com/@pamperherself/6-advanced-rag-optimization-strategies-analysis-of-14-key-research-papers-f12329975009
Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang, 13 Dec 2024, RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation, https://arxiv.org/abs/2412.10543
Michael Shen, Muhammad Umar, Kiwan Maeng, G. Edward Suh, Udit Gupta, 16 Dec 2024, Towards Understanding Systems Trade-offs in Retrieval-Augmented Generation Model Inference, https://arxiv.org/abs/2412.11854
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, SeungYoon Han, Jong C. Park, 18 Dec 2024 (v2), EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation, https://arxiv.org/abs/2412.12559 https://github.com/ThisIsHwang/EXIT
Derrick Quinn, Mohammad Nouri, Neel Patel, John Salihu, Alireza Salemi, Sukhan Lee, Hamed Zamani, Mohammad Alian, 14 Dec 2024, Accelerating Retrieval-Augmented Generation, https://arxiv.org/abs/2412.15246 (Speeding up vector databases using either approximate or exact nearest neighbor search.)
Harvey Bower, 2024, Debugging RAG Pipelines: Best Practices for High-Performance LLMs, https://www.amazon.com/dp/B0DNWN5RB1
East Sun, Yan Wang, Lan Tian, 17 Oct 2024 (v4), Block-Attention for Efficient RAG, https://arxiv.org/abs/2409.15355
Yunxiao Shi, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, Min Xu, 15 Jul 2024, Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems, https://arxiv.org/abs/2407.10670 https://github.com/Ancientshi/ERM4
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Yuyang (Bernie) Wang, Tim Kraska, Jan 2025, PipeRAG: Fast retrieval-augmented generation via adaptive pipeline parallelism, https://www.amazon.science/publications/piperag-fast-retrieval-augmented-generation-via-adaptive-pipeline-parallelism (Parallelization/pipelining of retrieval and generation phases, and other parallelism optimizations of RAG.)
Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian, 13 Jan 2025, Enhancing Retrieval-Augmented Generation: A Study of Best Practices, https://arxiv.org/abs/2501.07391 https://github.com/ali-bahrainian/RAG_best_practices (Examines RAG best practices such as model size, prompt wording, chunk size, knowledge base size, and more.)
H Liao, S He, Y Xu, Y Zhang, S Liu, K Liu, J Zhao, Jan 2025, Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering, Proceedings of the 31st International Conference on Computational Linguistics, pages 1333–1352, January 19–24, 2025, https://aclanthology.org/2025.coling-main.89.pdf https://github.com/Xnhyacinth/IAG (Attempts to perform RALM based only on parametric knowledge, without any external sources, thereby optimizing away RAG steps.)
Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, Ricardo Bianchini, 29 Jan 2025 (v2), Towards Resource-Efficient Compound AI Systems, https://arxiv.org/abs/2501.16634
Bharani Subramaniam, 13 February 2025, Emerging Patterns in Building GenAI Products, https://martinfowler.com/articles/gen-ai-patterns/
Zitao Li, Fei Wei, Yuexiang Xie, Dawei Gao, Weirui Kuang, Zhijian Ma, Bingchen Qian, Yaliang Li, Bolin Ding, 13 Feb 2025, KIMAs: A Configurable Knowledge Integrated Multi-Agent System, https://arxiv.org/abs/2502.09596
S. Mengmeng, L. Zhibin, W. Qingwei, H. Man and X. Feiyang, "An Effective Retrieval Method to Improve RAG Performance," 2024 7th International Conference on Data Science and Information Technology (DSIT), Nanjing, China, 2024, pp. 1-5, doi: 10.1109/DSIT61374.2024.10881380. https://ieeexplore.ieee.org/abstract/document/10881380/ (Word and sentence-level retrieval search.)
Chien-Yu Lin, Keisuke Kamahori, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci, 28 Feb 2025, TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval, https://arxiv.org/abs/2502.20969 (Parallelization of RAG lookup retrieval with prefetching and LLM decoding.)
Shai Bergman, Zhang Ji, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos, 7 Mar 2025, Leveraging Approximate Caching for Faster Retrieval-Augmented Generation, https://arxiv.org/abs/2503.05530
Jiawei Zhou, Lei Chen, 11 Mar 2025, OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning, https://arxiv.org/abs/2503.08398
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gustavo Alonso, Amir Yazdanbakhsh, Vidushi Dadu, 21 Mar 2025 (v2), RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving, https://arxiv.org/abs/2503.14649
Rajeshkumar Bambhaniya, Abhimanyu ; Wu, Hanjiang ; Subramanian, Suvinay ; Srinivasan, Sudarshan ; Kundu, Souvik ; Yazdanbakhsh, Amir ; Elavazhagan, Midhilesh ; Kumar, Madhu ; Krishna, Tushar, April 2025, Understanding and Optimizing Multi-Stage AI Inference Pipelines, https://ui.adsabs.harvard.edu/abs/2025arXiv250409775R/abstract https://arxiv.org/abs/2504.09775
Minchae Song, 21 May 2025, Enhancing RAG Performance by Representing Hierarchical Nodes in Headers for Tabular Data, IEEE Access, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11003975
Adel Ammar, Anis Koubaa, Omer Nacar, Wadii Boulila, 13 May 2025, Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency, https://arxiv.org/abs/2505.08445
Bodun Hu, Luis Pabon, Saurabh Agarwal, Aditya Akella, 1 May 2025, Patchwork: A Unified Framework for RAG Serving, https://arxiv.org/abs/2505.07833
Chaitanya Sharma, 28 May 2025, Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers, https://arxiv.org/abs/2506.00054
Kwesi Cobbina, Tianyi Zhou, 30 Jul 2025, Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning, https://arxiv.org/abs/2507.22887 (Best place to put examples is at the start of a prompt not the end.)
T. Ouyang et al., "AdaRAG: Adaptive Optimization for Retrieval Augmented Generation with Multilevel Retrievers at the Edge," IEEE INFOCOM 2025 - IEEE Conference on Computer Communications, London, United Kingdom, 2025, pp. 1-10, doi: 10.1109/INFOCOM55648.2025.11044685. https://ieeexplore.ieee.org/abstract/document/11044685/ (Optimizing retrieval with multi-level retrievers.)
Ranran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, Tong Liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang, 28 Apr 2025, Taming the Titans: A Survey of Efficient LLM Inference Serving, https://arxiv.org/abs/2504.19720 (Surver of various inference and serving optimizations, such as parallelism, offloading, scheduling, length prediction, KV cache compression, and prefill-decode phase disaggregation.)
Quentin Romero Lauro, Shreya Shankar, Sepanta Zeighami, Aditya Parameswaran, 18 Apr 2025, RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines, https://arxiv.org/abs/2504.13587
Shangyu Liu, Zhenzhe Zheng, Xiaoyao Huang, Fan Wu, Guihai Chen, Jie Wu, 16 Apr 2025 (v2), Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance, https://arxiv.org/abs/2504.11197
Andrew Brown, Muhammad Roman, Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
Bin Yang, Qiuyu Leng, Jun Zeng, Zhenhua Wu, 11 Oct 2025, CacheClip: Accelerating RAG with Effective KV Cache Reuse, https://arxiv.org/abs/2510.10129

General Research Papers on RAG

There are rather a lot of research papers on RAG, as its a fundamental underpinning technique of generative AI. Here's a few of them:

Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
Timo Lehto, June 2024, Developing LLM-powered Applications Using Modern Frameworks, Bachelor’s Thesis, Information and Communications Technology, Jamk University of Applied Sciences, Finland, June 2024, 53 pages., https://www.theseus.fi/bitstream/handle/10024/862271/Lehto_Timo.pdf?sequence=2 (Building LLM-based applications in RAG architecture using LangChain.)
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu, 7 May 2024, FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference, https://arxiv.org/abs/2405.04065 (Optimize RAG by appending rather than prepending documents, and modifying the attention for improvements in KV caching, by shimming or replacing some of the CUDA GPU low-level memory management APIs to avoid the need to rewrite kernels with extra higher-level memory management code.)
Yucheng Hu, Yuxing Lu, 30 Apr 2024, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543 Project: https://github.com/2471023025/RALM_Survey
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, 22 Apr 2024, A Survey on Efficient Inference for Large Language Models, https://arxiv.org/abs/2404.14294
Mandar Karhade, Mar 20, 2024, Why RAG Applications Fail in Production, Towards AI, https://pub.towardsai.net/why-rag-applications-fail-in-production-a-technical-deep-dive-15cc976af52c
Priyank Rathod, May 21, 2024, Efficient Usage of RAG Systems in the World of LLMs, https://www.techrxiv.org/doi/full/10.36227/techrxiv.171625877.73379410/v1
June 2024 (accessed), R2R: The ultimate open-source RAG framework, https://github.com/SciPhi-AI/R2R
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Bin Cui, 27 Mar 2024 (v2), Retrieval-Augmented Generation for AI-Generated Content: A Survey, https://arxiv.org/abs/2402.19473 Project: https://github.com/hymie122/RAG-Survey
Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
Bijit Ghosh, Dec 25, 2023, Advanced RAG for LLMs/SLMs, Medium, https://medium.com/@bijit211987/advanced-rag-for-llms-slms-5bcc6fbba411
Iulia Brezeanu, Jan 5, 2024, How to Cut RAG Costs by 80% Using Prompt Compression, Towards Data Science, https://towardsdatascience.com/how-to-cut-rag-costs-by-80-using-prompt-compression-877a07c6bedb
James Nguyen, Nov 19, 2023, Forget RAG: Embrace agent design for a more intelligent grounded ChatGPT! https://james-tn.medium.com/forget-rag-embrace-agent-design-for-a-more-intelligent-grounded-chatgpt-6c562d903c61
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, Apr 2021, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401
Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang, 3 Jun 2024 (v2), CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, https://arxiv.org/abs/2405.16444 Code: https://github.com/YaoJiayi/CacheBlend.git (Generalizes prefix KV caching to KV cache fusion with selective recomputation of some KV cache data.)
David Spuler, March 2024, Chapter 6. Training, Fine-Tuning & RAG, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Tiernan Ray, June 3, 2024, Make room for RAG: How Gen AI's balance of power is shifting, https://www.zdnet.com/article/make-room-for-rag-how-gen-ais-balance-of-power-is-shifting/
Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou, 12 Jun 2024 (v2), Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation, https://arxiv.org/abs/2402.18150 (Analysis about how LLMs can mishandle information retrieved from a datastore and how to make LLMs better at handling RAG information using a specialized training regime.)
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li, 17 Jun 2024 (v3), A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2405.06211 Project: https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/
Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
Louis-François Bouchard, Louie Peters, May 2024, Chapter 7: RAG, and Chapter 8, Advanced RAG, Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG, https://www.amazon.com/Building-LLMs-Production-Reliability-Fine-Tuning/dp/B0D4FFPFW8/
Matt Murphy, Tim Tully, Derek Xiao, January 18, 2024, The Modern AI Stack: Design Principles for the Future of Enterprise AI Architectures, Menlo Ventures, https://menlovc.com/perspective/the-modern-ai-stack-design-principles-for-the-future-of-enterprise-ai-architectures/ (Various details about the AI tech stack, organizational AI maturity levels, and several interesting facts: inference is 95% of AI cost now, 60% of organizations are using multi-model methods, RAG is the dominant architecture currently, and AI application development teams are primarily made up of non-ML software engineers leveraging on top of AI models.)
Anirban Ghoshal, July 3, 2024, AWS approach to RAG evaluation could help enterprises reduce AI spending, https://www.infoworld.com/article/3715629/aws-new-approach-to-rag-evaluation-could-help-enterprises-reduce-ai-spending.html
Yi Zhou, Dec 16, 2023, Optimizing GenAI: Comparing Model Training, Fine-Tuning, RAG, and Prompt Engineering, https://medium.com/generative-ai-revolution-ai-native-transformation/optimizing-genai-comparing-model-training-fine-tuning-rag-and-prompt-engineering-7a7c6c65e0f0
Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue, 18 Jul 2024, Retrieval-Augmented Generation for Natural Language Processing: A Survey, https://arxiv.org/abs/2407.13193
Chip Huyen, Jul 25, 2024, Building A Generative AI Platform, https://huyenchip.com/2024/07/25/genai-platform.html
Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
Chips Ahoy Capital, Jul 02, 2024, Evolution of Databases in the World of AI Apps, https://chipsahoycapital.substack.com/p/evolution-of-databases-in-the-world
Pavan Belagatti, Jul 31, 2024, Semantic Chunking for Enhanced RAG Applications! https://levelup.gitconnected.com/semantic-chunking-for-enhanced-rag-applications-b6bc92942af0
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Louis-François Bouchard, Aug 12, 2024, When to Use GraphRAG, https://louisbouchard.substack.com/p/when-to-use-graphrag
Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo, 17 Jan 2024, Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native, https://arxiv.org/abs/2401.12230
David Spuler, March 2024, Use Cases for FT vs RAG, in Generative AI in C++, https://www.aussieai.com/book/ch6-use-cases-rag-vs-ft
Jason Perlow, Sept. 6, 2024, Understanding RAG: How to integrate generative AI LLMs with your business knowledge, https://www.zdnet.com/article/understanding-rag-how-to-integrate-generative-ai-llms-with-your-business-knowledge/
Sau Sheong, Jun 13, 2024, Programming with AI — RAG: Using RAG in LLM Applications, https://sausheong.com/programming-with-ai-rag-27bf5c19daa7
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, Arsalan Shahid, 30 Oct 2024 (v3), The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities, https://arxiv.org/abs/2408.13296
Chenliang Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen, 14 Aug 2025, A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering, https://arxiv.org/abs/2508.10337
Juyuan Wang, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu, 14 Aug 2025, ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning, https://arxiv.org/abs/2508.10419
Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou, 13 Aug 2025, FinSage: A Multi-aspect RAG System for Financial Filings Question Answering, https://arxiv.org/abs/2504.14493
Athanasios Davvetas, Xenia Ziouvelou, Ypatia Dami, Alexis Kaponis, Konstantina Giouvanopoulou, Michael Papademas, 23 Jul 2025, TAI Scan Tool: A RAG-Based Tool With Minimalistic Input for Trustworthy AI Self-Assessment, https://arxiv.org/abs/2507.17514
Shiting Chen, Zijian Zhao, Jinsong Chen, 23 Jul 2025, Each to Their Own: Exploring the Optimal Embedding in RAG, https://arxiv.org/abs/2507.17442
Yue Ding, Conor McCarthy, Kevin O'Shea and Mingming Liu, 23 Jul 2025, Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis, https://arxiv.org/abs/2507.10382
Jean Lelong, Adnane Errazine and Annabelle Blangero, 22 Jul 2025, Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications, https://arxiv.org/abs/2507.16507
Lars Hillebrand, Armin Berger, Daniel Uedelhoven, David Berghaus, Ulrich Warning, Tim Dilmaghani, Bernd Kliem, Thomas Schmid, R\"udiger Loitz, Rafet Sifa, 22 Jul 2025, Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance, https://arxiv.org/abs/2507.16711
San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee, 24 Jul 2025, Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection, https://arxiv.org/abs/2507.18202
Shad Nygren, Pinar Avci, Andre Daniels, Reza Rassol, Afshin Beheshti, Diego Galeano, 18 Jul 2025, RAG-based Architectures for Drug Side Effect Retrieval in LLMs, https://arxiv.org/abs/2507.13822
Mohita Chowdhury, Yajie Vera He, Jared Joselowitz, Aisling Higham, Ernest Lim, 18 Jul 2025, ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems, https://arxiv.org/abs/2501.08208
Konstantinos I. Roumeliotis, Ranjan Sapkota, Manoj Karkee, Nikolaos D. Tselikas, 18 Jul 2025, Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning, https://arxiv.org/abs/2507.10571
Qikai Wei and Huansheng Ning and Chunlong Han and Jianguo Ding, 7 Jul 2025, A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models, https://arxiv.org/abs/2507.16826
Shubham Mohole, Hongjun Choi, Shusen Liu, Christine Klymko, Shashank Kushwaha, Derek Shi, Wesam Sakla, Sainyam Galhotra, Ruben Glatt, 23 Jul 2025, VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation, https://arxiv.org/abs/2507.17948
Jie Ouyang, Tingyue Pan, Mingyue Cheng, Ruiran Yan, Yucong Luo, Jiaying Lin, Qi Liu, 18 Jul 2025, HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation, https://arxiv.org/abs/2503.04800
Jerry Wang and Fang Yu, 20 Jul 2025, DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection, https://arxiv.org/abs/2507.15042
Marina Danilevsky, Kristjan Greenewald, Chulaka Gunasekara, Maeda Hanafi, Lihong He, Yannis Katsis, Krishnateja Killamsetty, Yulong Li, Yatin Nandwani, Lucian Popa, Dinesh Raghu, Frederick Reiss, Vraj Shah, Khoi-Nguyen Tran, Huaiyu Zhu, Luis Lastras, 20 Jul 2025, A Library of LLM Intrinsics for Retrieval-Augmented Generation, https://arxiv.org/abs/2504.11704
Shengming Zhao, Yuchen Shao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma, 21 Jul 2025, Understanding the Design Decisions of Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2411.19463
Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni, 19 Jul 2025, Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation, https://arxiv.org/abs/2506.11092
Pravallika Abbineni, Saoud Aldowaish, Colin Liechty, Soroosh Noorzad, Ali Ghazizadeh, Morteza Fayazi, 11 Aug 2025, MuaLLM: A Multimodal Large Language Model Agent for Circuit Design Assistance with Hybrid Contextual Retrieval-Augmented Generation, https://arxiv.org/abs/2508.08137
Agada Joseph Oche and Arpan Biswas, 8 Aug 2025, Role of Large Language Models and Retrieval-Augmented Generation for Accelerating Crystalline Material Discovery: A Systematic Review, https://arxiv.org/abs/2508.06691
Ran Xu, Yuchen Zhuang, Yue Yu, Haoyu Wang, Wenqi Shi, Carl Yang, 26 Jul 2025, RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation, https://arxiv.org/abs/2507.20059
Baiyu Chen, Wilson Wongso, Xiaoqian Hu, Yue Tan, Flora Salim, 27 Jul 2025, Multi-Stage Verification-Centric Framework for Mitigating Hallucination in Multi-Modal RAG, https://arxiv.org/abs/2507.20136
Robin D. Pesl, Jerin G. Mathew, Massimo Mecella, Marco Aiello, 28 Jul 2025, Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.19804
Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie, 27 Jul 2025, Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control, https://arxiv.org/abs/2502.12145
Nicholas Botti (Federal Reserve Board), Flora Haberkorn (Federal Reserve Board), Charlotte Hoopes (Federal Reserve Board), Shaun Khan (Federal Reserve Board), 28 Jul 2025, Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures, https://arxiv.org/abs/2507.21360
Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris, 29 Jul 2025, Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG), https://arxiv.org/abs/2507.21453
Hao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu and Huadong Ma, 29 Jul 2025, SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation, https://arxiv.org/abs/2507.21585
Gr\'egoire Martinon, Alexandra Lorenzo de Brionne, J\'er\^ome Bohard, Antoine Lojou, Damien Hervault, Nicolas J-B. Brunel (ENSIIE, LaMME), 29 Jul 2025, Towards a rigorous evaluation of RAG systems: the challenge of due diligence, https://arxiv.org/abs/2507.21753
Kezhen Zhong, Basem Suleiman, Abdelkarim Erradi, Shijing Chen, 10 Jul 2025, SemRAG: Semantic Knowledge-Augmented RAG for Improved Question-Answering, https://arxiv.org/abs/2507.21110
Kushal Chawla, Alfy Samuel, Anoop Kumar, Daben Liu, 29 Jul 2025, FB-RAG: Improving RAG with Forward and Backward Lookup, https://arxiv.org/abs/2505.17206
YiHan Jiao, ZheHao Tan, Dan Yang, DuoLin Sun, Jie Feng, Yue Shen, Jian Wang, Peng Wei, 29 Jul 2025, HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation, https://arxiv.org/abs/2507.05714
Hyeon Seong Jeong, Sangwoo Jo, Byeong Hyun Yoon, Yoonseok Heo, Haedong Jeong, Taehoon Kim, 31 Jul 2025, Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation, https://arxiv.org/abs/2507.23217
Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Jianxin Li, Ziwei Zhang, 31 Jul 2025, GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning, https://arxiv.org/abs/2507.23581
Kwun Hang Lau, Ruiyuan Zhang, Weijie Shi, Xiaofang Zhou, Xiaojun Cheng, 21 Jul 2025, Reading Between the Timelines: RAG for Answering Diachronic Questions, https://arxiv.org/abs/2507.22917
Shuyu Guo, Zhaochun Ren, 24 Jul 2025, Enhancing RAG Efficiency with Adaptive Context Compression, https://arxiv.org/abs/2507.22931
Daeyong Kwon, SeungHeon Doh and Juhan Nam, 31 Jul 2025, MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation, https://arxiv.org/abs/2507.23334
Zerui Yang and Yuwei Wan and Siyu Yan and Yudai Matsuda and Tong Xie and Bram Hoex and Linqi Song, 31 Jul 2025, DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search, https://arxiv.org/abs/2507.07426
Hruday Markondapatnaikuni, Basem Suleiman, Abdelkarim Erradi, Shijing Chen, 31 Jul 2025, KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities, https://arxiv.org/abs/2507.07695
Roie Kazoom, Raz Lapid, Moshe Sipper and Ofer Hadar, 30 Jul 2025, Don't Lag, RAG: Training-Free Adversarial Detection Using RAG, https://arxiv.org/abs/2504.04858
Eric Yang, Jonathan Amar, Jong Ha Lee, Bhawesh Kumar, Yugang Jia, 30 Jul 2025, The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA, https://arxiv.org/abs/2407.18044
Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh and Arnab Bhattacharya, 1 Aug 2025, NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System, https://arxiv.org/abs/2508.00709
Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang, 1 Aug 2025, Credible Plan-Driven RAG Method for Multi-Hop Question Answering, https://arxiv.org/abs/2504.16787
Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu, 1 Aug 2025, RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism, https://arxiv.org/abs/2507.02962
Roie Kazoom, Ofir Cohen, Rami Puzis, Asaf Shabtai, Ofer Hadar, 1 Aug 2025, VAULT: Vigilant Adversarial Updates via LLM-Driven Retrieval-Augmented Generation for NLI, https://arxiv.org/abs/2508.00965
Pengcheng Zhou, Yinglun Feng, Zhongliang Yang, 1 Aug 2025, Provably Secure Retrieval-Augmented Generation, https://arxiv.org/abs/2508.01084
Jaskaranjeet Singh, Rakesh Thakur, 3 Aug 2025, Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language, https://arxiv.org/abs/2508.01918
Vali Tawosia, Salwa Alamir, Xiaomo Liu, Manuela Veloso, 4 Aug 2025, Meta-RAG on Large Codebases Using Code Summarization, https://arxiv.org/abs/2508.02611
Jimeng Shi, Sizhe Zhou, Bowen Jin, Wei Hu, Runchu Tian, Shaowen Wang, Giri Narasimhan, Jiawei Han, 4 Aug 2025, Hypercube-Based Retrieval-Augmented Generation for Scientific Question-Answering, https://arxiv.org/abs/2505.19288
Yaodong Su, Yixiang Fang, Yingli Zhou, Quanqing Xu, Chuanhui Yang, 3 Aug 2025, Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval, https://arxiv.org/abs/2507.08445
Tuan-Dung Bui, Duc-Thieu Luu-Van, Thanh-Phat Nguyen, Thu-Trang Nguyen, Son Nguyen, and Hieu Dinh Vo, 2 Aug 2025, RAMBO: Enhancing RAG-based Repository-Level Method Body Completion, https://arxiv.org/abs/2409.15204
Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim, 4 Aug 2025, Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation, https://arxiv.org/abs/2508.02835
Kaiwen Zhao, Bharathan Balaji, Stephen Lee, 5 Aug 2025, CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation, https://arxiv.org/abs/2508.03489
Giovanni Cherubin, Andrew Paverd, 4 Aug 2025, Highlight & Summarize: RAG without the jailbreaks, https://arxiv.org/abs/2508.02872
Kunal Sawarkar, Shivam R. Solanki, Abhilasha Mangal, 5 Aug 2025, MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering, https://arxiv.org/abs/2505.18247
Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang, 6 Aug 2025, A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models, https://arxiv.org/abs/2508.04276
Herbert Ullrich, Jan Drchal, 5 Aug 2025, AIC CTU@FEVER 8: On-premise fact checking through long context RAG, https://arxiv.org/abs/2508.04390
Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, and Guangliang Cheng, 6 Aug 2025, RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection, https://arxiv.org/abs/2508.04524
Chitranshu Harbola, Anupam Purwar, 28 Jul 2025, Prescriptive Agents based on Rag for Automated Maintenance (PARAM), https://arxiv.org/abs/2508.04714
Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li, 7 Aug 2025, QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering, https://arxiv.org/abs/2508.05197
Xu Yuan, Liangbo Ning, Wenqi Fan, Qing Li, 7 Aug 2025, mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering, https://arxiv.org/abs/2508.05318
Zhenghao Liu, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Ge Yu, Maosong Sun, 7 Aug 2025, Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts, https://arxiv.org/abs/2502.17297
Vibhor Agrawal, Fay Wang, Rishi Puri, 25 Jul 2025, Query-Aware Graph Neural Networks for Enhanced Retrieval-Augmented Generation, https://arxiv.org/abs/2508.05647
Chandler Campbell, Bernie Boscoe, Tuan Do, 25 Jul 2025, AquiLLM: a RAG Tool for Capturing Tacit Knowledge in Research Groups, https://arxiv.org/abs/2508.05648
Jiaxuan Liang, Shide Zhou, and Kailong Wang, 26 Jul 2025, OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools, https://arxiv.org/abs/2508.05650
Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, and Rishikesan Kamaleswaran, 30 Jul 2025, Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review, https://arxiv.org/abs/2508.05660
Yuzhou Zhu, 31 Jul 2025, From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base, https://arxiv.org/abs/2508.05662
Hei Yu Chan, Kuok Tou Ho, Chenglong Ma, Yujing Si, Hok Lai Lin, Sa Lei Lam, 1 Aug 2025, Enhancing Retrieval-Augmented Generation for Electric Power Industry Customer Support, https://arxiv.org/abs/2508.05664
Alejandro Godinez, 1 Aug 2025, HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis, https://arxiv.org/abs/2508.05666
Weitao Li, Boran Xiang, Xiaolong Wang, Zhinan Gou, Weizhi Ma, Yang Liu, 8 Aug 2025, UR$^2$: Unify RAG and Reasoning through Reinforcement Learning, https://arxiv.org/abs/2508.06165
Richard Willats, Josh Pennington, Aravind Mohan, Bertie Vidgen, 8 Aug 2025, Classification is a RAG problem: A case study on hate speech detection, https://arxiv.org/abs/2508.06204
Andrew Brown, Muhammad Roman and Barry Devereux, 8 Aug 2025, A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges, https://arxiv.org/abs/2508.06401
Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan, 7 Aug 2025, Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems, https://arxiv.org/abs/2507.03226
Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, Yuchi Ma, 8 Aug 2025, ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation, https://arxiv.org/abs/2502.09891
Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab, 11 Aug 2025, What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge, https://arxiv.org/abs/2508.08344
Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao, 12 Aug 2025, SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling, https://arxiv.org/abs/2508.09105
Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal, 12 Aug 2025, Retrieval-Augmented Generation with Conflicting Evidence, https://arxiv.org/abs/2504.13079
Tim Cofala, Oleh Astappiev, William Xion, Hailay Teklehaymanot, 12 Aug 2025, RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition, https://arxiv.org/abs/2506.14412
Amit Kumar Jaiswal, Haiming Liu, Ingo Frommholz, 6 Aug 2025, Multimodal RAG Enhanced Visual Description, https://arxiv.org/abs/2508.09170
Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, Haipeng Cai, 12 Aug 2025, VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs, https://arxiv.org/abs/2408.04125
Changjian Wang, Weihong Deng, Weili Guan, Quan Lu, Ning Jiang, 15 Aug 2025, Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering, https://arxiv.org/abs/2508.11247
Jinquan Shi, Yingying Cheng, Fan Zhang, Miao Jiang, Jun Lin, Yanbai Shen, 18 Aug 2025, GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and Compliance, https://arxiv.org/abs/2508.12682
Yifei Chen, Guanting Dong, Yutao Zhu, Zhicheng Dou, 19 Aug 2025, Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration, https://arxiv.org/abs/2508.13828
Yukun Cao, Zengyi Gao, Zhiyang Li, Xike Xie, S. Kevin Zhou, Jianliang Xu, 19 Aug 2025, LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration, https://arxiv.org/abs/2411.05844
Yao Ding, Yuqing Wu, Ziyang Ding, 11 Aug 2025, An automatic patent literature retrieval system based on LLM-RAG, https://arxiv.org/abs/2508.14064
Lorenz Brehme, Benedikt Dornauer, Thomas Str\"ohle, Maximilian Ehrhart and Ruth Breu, 11 Aug 2025, Retrieval-Augmented Generation in Industry: An Interview Study on Use Cases, Requirements, Challenges, and Evaluation, https://arxiv.org/abs/2508.14066
Skatje Myers, Dmitriy Dligach, Timothy A. Miller, Samantha Barr, Yanjun Gao, Matthew Churpek, Anoop Mayampurath, Majid Afshar, 20 Aug 2025, Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs, https://arxiv.org/abs/2508.14817
Sarat Ahmad, Zeinab Nezami, Maryam Hafeez, Syed Ali Raza Zaidi, 20 Aug 2025, Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN), https://arxiv.org/abs/2507.03608
Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee, 21 Aug 2025, Conflict-Aware Soft Prompting for Retrieval-Augmented Generation, https://arxiv.org/abs/2508.15253
Wutao Liu, YiDan Wang, and Pan Gao, 21 Aug 2025, First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection, https://arxiv.org/abs/2508.15313
Mandeep Rathee, Venktesh V, Sean MacAvaney, Avishek Anand, 21 Aug 2025, Test-time Corpus Feedback: From Retrieval to RAG, https://arxiv.org/abs/2508.15437
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang and Weidi Xie, 21 Aug 2025, End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning, https://arxiv.org/abs/2508.15746
Kai Hu, Parfait Atchade-Adelomou, Carlo Adornetto, Adrian Mora-Carrero, Luis Alonso-Pastor, Ariel Noyman, Yubo Liu, Kent Larson, 22 Aug 2025, Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain, https://arxiv.org/abs/2508.16172
Yosef Dayani, Omer Benishu, Sagie Benaim, 22 Aug 2025, MV-RAG: Retrieval Augmented Multiview Diffusion, https://arxiv.org/abs/2508.16577
Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng, 22 Aug 2025, Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG, https://arxiv.org/abs/2504.05220
Yiming Xu, Junfeng Jiao, 24 Aug 2025, Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction, https://arxiv.org/abs/2508.17527
Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Li\`o, 24 Aug 2025, How to make Medical AI Systems safer? Simulating Vulnerabilities, and Threats in Multimodal Medical RAG System, https://arxiv.org/abs/2508.17215
Hsuan-Kung Yang, Tsu-Ching Hsiao, Ryoichiro Oka, Ryuya Nishino, Satoko Tofukuji, Norimasa Kobori, 10 Aug 2025, An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance, https://arxiv.org/abs/2508.16602
Jeongsoo Lee, Daeyong Kwon, Kyohoon Jin, 23 Aug 2025, GRADE: Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation, https://arxiv.org/abs/2508.16994
Jiale Liu, Jiahao Zhang, Suhang Wang, 24 Aug 2025, Exposing Privacy Risks in Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2508.17222
Xiaqiang Tang, Yi Wang, Keyu Hu, Rui Xu, Chuang Li, Weigao Sun, Jian Li, Sihong Xie, 24 Aug 2025, SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation, https://arxiv.org/abs/2508.17225
Amulya Suravarjhula, Rashi Chandrashekhar Agrawal, Sakshi Jayesh Patel, Rahul Gupta, 11 Aug 2025, Retrieval-Augmented Multi-Agent System for Rapid Statement of Work Generation, https://arxiv.org/abs/2508.07569
Haidong Xu, Guangwei Xu, Zhedong Zheng, Xiatian Zhu, Wei Ji, Xiangtai Li, Ruijie Guo, Meishan Zhang, Min zhang, Hao Fei, 16 Aug 2025, VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models, https://arxiv.org/abs/2508.12081
Steeve Cuthbert Marcelyn, Yucen Gao, Yuzhe Zhang, Xiaofeng Gao, 20 Aug 2025, PathGPT: Reframing Path Recommendation as a Natural Language Generation Task with Retrieval-Augmented Language Models, https://arxiv.org/abs/2504.05846
Xiaokai Bai, Chenxu Zhou, Lianqing Zheng, Si-Yuan Cao, Jianan Liu, Xiaohan Zhang, Zhengzhuang Zhang, Hui-liang Shen, 26 Jul 2025, RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection, https://arxiv.org/abs/2507.19856
Jeiyoon Park, Yongshin Han, Minseop Kim, Kisu Yang, 4 Aug 2025, Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented Generations, https://arxiv.org/abs/2508.02016
Jonas van Elburg, Peter van der Putten, Maarten Marx, 15 Aug 2025, Can we Evaluate RAGs with Synthetic Data?, https://arxiv.org/abs/2508.11758
Zahra Zehtabi Sabeti Moghaddam, Zeinab Dehghani, Maneeha Rani, Koorosh Aslansefat, Bhupesh Kumar Mishra, Rameez Raja Kureshi, Dhavalkumar Thakker, 3 Sep 2025, Explainable Knowledge Graph Retrieval-Augmented Generation (KG-RAG) with KG-SMILE, https://arxiv.org/abs/2509.03626
Bohdan M. Pavlyshenko, 25 Aug 2025, Multilevel Analysis of Cryptocurrency News using RAG Approach with Fine-Tuned Mistral Large Language Model, https://arxiv.org/abs/2509.03527
Yuqing Huang, Rongyang Zhang, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Xuyang Zhi, Guiquan Liu, Xin Li, Hao Wang, Enhong Chen, 4 Sep 2025, SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment, https://arxiv.org/abs/2509.03934
Songjiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kaiwen Xue, Kwan-Ho Lin, Yan-Ming Choi, Vincent Ng, Kin-Man Lam, 4 Sep 2025, Enhancing Technical Documents Retrieval for RAG, https://arxiv.org/abs/2509.04139
Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok, 3 Sep 2025, Retrieval-Augmented Generation with Estimation of Source Reliability, https://arxiv.org/abs/2410.22954
Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Hao Chen, Yilin Xiao, Chuang Zhou, Yi Chang, Xiao Huang, 4 Sep 2025, A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models, https://arxiv.org/abs/2501.13958
Ibrahim Al Azher, Miftahul Jannat Mokarrama, Zhishuai Guo, Sagnik Ray Choudhury, Hamed Alhoori, 4 Sep 2025, FutureGen: A RAG-based Approach to Generate the Future Work of Scientific Article, https://arxiv.org/abs/2503.16561
Qixin Sun, Ziqin Wang, Hengyuan Zhao, Yilin Li, Kaiyou Song, Linjiang Huang, Xiaolin Hu, Qingpei Guo, Si Liu, 2 Sep 2025, VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples, https://arxiv.org/abs/2509.04502
Yushi Sun, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen, 5 Sep 2025, KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering, https://arxiv.org/abs/2509.04716
Shuai Wang, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring, 5 Sep 2025, ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding, https://arxiv.org/abs/2505.06020
Ilias Driouich, Hongliu Cao, Eoin Thomas, 26 Aug 2025, Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework, https://arxiv.org/abs/2508.18929
Chan-Wei Hu, Yueqi Wang, Shuo Xing, Chia-Ju Chen, Suofei Feng, Ryan Rossi, Zhengzhong Tu, 26 Aug 2025, mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation, https://arxiv.org/abs/2505.24073
Junkyum Kim, Divya Mahajan, 25 Aug 2025, VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG, https://arxiv.org/abs/2504.08930
Hudson de Martim, 26 Aug 2025, An Ontology-Driven Graph RAG for Legal Norms: A Hierarchical, Temporal, and Deterministic Approach, https://arxiv.org/abs/2505.00039
Yang Sun, Lixin Zou, Dan Luo, Zhiyong Xie, Long Zhang, Liming Dong, Yunwei Zhao, Xixun Lin, Yanxiong Lu, Chenliang Li, 27 Aug 2025, LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation, https://arxiv.org/abs/2508.19614
Xue Tan, Hao Luan, Mingyu Luo, Xiaoyan Sun, Ping Chen, Jun Dai, 28 Aug 2025, RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis, https://arxiv.org/abs/2411.18948
Nazanin Siavash, Armin Moin, 28 Aug 2025, Model-Driven Quantum Code Generation Using Large Language Models and Retrieval-Augmented Generation, https://arxiv.org/abs/2508.21097
Rahul Anand, 27 Aug 2025, MODE: Mixture of Document Experts for RAG, https://arxiv.org/abs/2509.00100
Jiasheng Xu, Mingda Li, Yongqiang Tang, Peijie Wang, Wensheng Zhang, 1 Sep 2025, Towards Open-World Retrieval-Augmented Generation on Knowledge Graph: A Multi-Agent Collaboration Framework, https://arxiv.org/abs/2509.01238
Gaurangi Raul, Yu-Zheng Lin, Karan Patel, Bono Po-Jen Shih, Matthew W. Redondo, Banafsheh Saber Latibari, Jesus Pacheco, Soheil Salehi, Pratik Satam, 31 Aug 2025, RAG-PRISM: A Personalized, Rapid, and Immersive Skill Mastery Framework with Adaptive Retrieval-Augmented Tutoring, https://arxiv.org/abs/2509.00646
Changin Choi, Wonseok Lee, Jungmin Ko, Wonjong Rhee, 31 Aug 2025, Multimodal Iterative RAG for Knowledge Visual Question Answering, https://arxiv.org/abs/2509.00798
Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan, 1 Sep 2025, REFRAG: Rethinking RAG based Decoding, https://arxiv.org/abs/2509.01092
Yunus Serhat Bicakci, Joseph Shingleton, Anahid Basiri, 1 Sep 2025, Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation, https://arxiv.org/abs/2509.01341
Yicong Zhao, Shisong Chen, Jiacheng Zhang, Zhixu Li, 2 Sep 2025, ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation, https://arxiv.org/abs/2509.02330
Zhiyuan Chang, Mingyang Li, Xiaojun Jia, Junjie Wang, Yuekai Huang, Ziyou Jiang, Yang Liu, Qing Wang, 1 Sep 2025, One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2505.11548
Feng Wang, Yiding Sun, Jiaxin Mao, Wei Xue, Danqing Xu, 1 Sep 2025, FinS-Pilot: A Benchmark for Online Financial RAG System, https://arxiv.org/abs/2506.02037
Shai Bergman, Zhang Ji, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos, 2 Sep 2025, Leveraging Approximate Caching for Faster Retrieval-Augmented Generation, https://arxiv.org/abs/2503.05530
Payel Santra, Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, and Sudip Kumar Naskar, 2 Sep 2025, HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers, https://arxiv.org/abs/2509.02837
Cheng Qian, Hainan Zhang, Yongxin Tong, Hong-Wei Zheng, Zhiming Zheng, 8 Sep 2025, HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data, https://arxiv.org/abs/2509.06444
ZiXuan Zhang, Bowen Hao, Yingjie Li, Hongzhi Yin, 6 Sep 2025, ZhiFangDanTai: Fine-tuning Graph-based Retrieval-Augmented Generation Model for Traditional Chinese Medicine Formula, https://arxiv.org/abs/2509.05867
Xinyu Gao, Xiangtao Meng, Yingkai Dong, Zheng Li, and Shanqing Guo, 7 Sep 2025, DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation, https://arxiv.org/abs/2509.06026
Mansi Garg, Lee-Chi Wang, Bhavesh Ghanchi, Sanjana Dumpala, Shreyash Kakde, Yen Chih Chen, 5 Sep 2025, Biomedical Literature Q&A System Using Retrieval-Augmented Generation (RAG), https://arxiv.org/abs/2509.05505
Yildiray Kabak, Gokce B. Laleci Erturkmen, Mert Gencturk, Tuncay Namli, A. Anil Sinaci, Ruben Alcantud Corcoles, Cristina Gomez Ballesteros, Pedro Abizanda, Asuman Dogac, 9 Sep 2025, FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support, https://arxiv.org/abs/2509.07706
Amay Jain, Liu Cui, and Si Chen, 9 Sep 2025, Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study, https://arxiv.org/abs/2509.07846
Saumya Goswami, Siddharth Kurra, 9 Sep 2025, HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention, https://arxiv.org/abs/2509.07475
Kai Ye, Liangcai Su, Chenxiong Qian, 9 Sep 2025, ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation, https://arxiv.org/abs/2509.07941
Nobin Sarwar, 8 Sep 2025, FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA, https://arxiv.org/abs/2502.18536
Nana Han, Dong Liu, Tomas Norton, 11 Sep 2025, Towards an AI-based knowledge assistant for goat farmers based on Retrieval-Augmented Generation, https://arxiv.org/abs/2509.09848
Fei Huang, Fan Wu, Zeqing Zhang, Qihao Wang, Long Zhang, Grant Michael Boquet and Hongyang Chen, 18 Aug 2025, GeoGPT.RAG Technical Report, https://arxiv.org/abs/2509.09686
Duolin Sun, Dan Yang, Yue Shen, Yihan Jiao, Zhehao Tan, Jie Feng, Lianzhen Zhong, Jian Wang, Peng Wei, Jinjie Gu, 8 Sep 2025, HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering, https://arxiv.org/abs/2509.09713
Jiayi Miao, Dingxin Lu, Zhuqi Wang, 10 Sep 2025, A Multimodal RAG Framework for Housing Damage Assessment: Collaborative Optimization of Image Encoding and Policy Vector Retrieval, https://arxiv.org/abs/2509.09721
Alex Dantart, 11 Sep 2025, Inteligencia Artificial jur\'idica y el desaf\'io de la veracidad: an\'alisis de alucinaciones, optimizaci\'on de RAG y principios para una integraci\'on responsable, https://arxiv.org/abs/2509.09467
Zakaria El Kassimi, Fares Fourati, Mohamed-Slim Alouini, 11 Sep 2025, Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations, https://arxiv.org/abs/2509.09651
Qitao Qin, Yucong Luo, Yihang Lu, Zhibo Chu, Xiaoman Liu, Xianwei Meng, 11 Sep 2025, Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation, https://arxiv.org/abs/2504.05312
Jaeyoung Kim, Jongho Kim, Seung-won Hwang, Seoho Song, Young-In Song, 19 Sep 2025, Relevance to Utility: Process-Supervised Rewrite for RAG, https://arxiv.org/abs/2509.15577
Michael Galarnyk, Rutwik Routu, Vidhyakshaya Kannan, Kosha Bheda, Prasun Banerjee, Agam Shah, Sudheer Chava, 19 Sep 2025, ConfReady: A RAG based Assistant and Dataset for Conference Checklist Responses, https://arxiv.org/abs/2408.04675
Ziqiang Cui, Yunpeng Weng, Xing Tang, Peiyang Liu, Shiwei Li, Bowei He, Jiamin Chen, Yansen Zhang, Xiuqiang He, Chen Ma, 19 Sep 2025, CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning, https://arxiv.org/abs/2508.19282
Zihan Wang, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li, 16 Sep 2025, InfoGain-RAG: Boosting Retrieval-Augmented Generation via Document Information Gain-based Reranking and Filtering, https://arxiv.org/abs/2509.12765
Wensheng Lu, Keyu Chen, Ruizhi Qiao, Xing Sun, 16 Sep 2025, HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking, https://arxiv.org/abs/2509.11552
Xiaoying Song, Anirban Saha Anik, Dibakar Barua, Pengcheng Luo, Junhua Ding, Lingzi Hong, 1 Sep 2025, Speaking at the Right Level: Literacy-Controlled Counterspeech Generation with RAG-RL, https://arxiv.org/abs/2509.01058
Jesse Ponnock, Grace Kenneally, Michael Robert Briggs, Elinor Yeo, Tyrone Patterson III, Nicholas Kinberg, Matthew Kalinowski, David Hechtman, 23 Aug 2025, Real-Time RAG for the Identification of Supply Chain Vulnerabilities, https://arxiv.org/abs/2509.10469
Alexandre Sallinen, Stefan Krsteski, Paul Teiletche, Marc-Antoine Allard, Baptiste Lecoeur, Michael Zhang, Fabrice Nemo, David Kalajdzic, Matthias Meyer, Mary-Anne Hartley, 15 Sep 2025, MMORE: Massive Multimodal Open RAG & Extraction, https://arxiv.org/abs/2509.11937
Guy Tel-Zur, 15 Sep 2025, A GPU-Accelerated RAG-Based Telegram Assistant for Supporting Parallel Processing Students, https://arxiv.org/abs/2509.11947
Timothy Rupprecht, Enfu Nan, Arash Akbari, Arman Akbari, Lei Lu, Priyanka Maan, Sean Duffy, Pu Zhao, Yumei He, David Kaeli, and Yanzhi Wang, 15 Sep 2025, RAGs to Riches: RAG-like Few-shot Learning for Large Language Model Role-playing, https://arxiv.org/abs/2509.12168
Chien-Yu Lin and Keisuke Kamahori and Yiyu Liu and Xiaoxiang Shi and Madhav Kashyap and Yile Gu and Rulin Shao and Zihao Ye and Kan Zhu and Stephanie Wang and Arvind Krishnamurthy and Rohan Kadekodi and Luis Ceze and Baris Kasikci, 15 Sep 2025, TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval, https://arxiv.org/abs/2502.20969
Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell, Ayman Asad Khan, Mika Saari and Pekka Abrahamsson, 18 Sep 2025, Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation, https://arxiv.org/abs/2506.20869
Zhuo Xiao (1), Qinglong Yao (1), Jingjing Wang (1), Fugen Zhou (1), Bo Liu (1), Haitao Sun (2), Zhe Ji (2), Yuliang Jiang (2), Junjie Wang (2), Qiuwen Wu (3) ((1) Image Processing Center, Beihang University, Beijing, China, (2) Department of Radiation Oncology, Peking University Third Hospital, Beijing, China, (3) Department of Radiation Oncology, Duke University Medical Center, Durham, USA), 10 Sep 2025, An Iterative LLM Framework for SIBT utilizing RAG-based Adaptive Weight Optimization, https://arxiv.org/abs/2509.08407
Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Weihao Wang, Xin Zhang, Yongjian Cui, 10 Sep 2025, CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models, https://arxiv.org/abs/2504.13534
Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo, 10 Sep 2025, CyberRAG: An Agentic RAG cyber attack classification and reporting tool, https://arxiv.org/abs/2507.02424
Baolei Zhang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang, 17 Sep 2025, Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation, https://arxiv.org/abs/2509.13772

Advanced RAG

Research papers on advanced RAG architectures:

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 1 Jul 2024, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219 Project: https://github.com/FudanDNN-NLP/RAG (Attempts to optimize the entire RAG system, including the various options for different RAG modules in the RAG pipeline, such as optimal methods for chunking, retrieval, embedding models, vector databases, prompt compression, reranking, repacking, summarizers, and other components.)
Akash Bajwa and Chia Jeng Yang, May 27, 2024, The RAG Stack: Featuring Knowledge Graphs: Reducing Hallucinations To Make LLMs Production-Grade With Complex RAG, https://akashbajwa.substack.com/p/the-rag-stack-featuring-knowledge
Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz, 31 Jul 2024, Adaptive Retrieval-Augmented Generation for Conversational Systems, https://arxiv.org/abs/2407.21712 (Deciding whether or not to include a RAG external data request in the inference of a chatbot in a multi-turn conversation.)
Igor Novikov, Jul 23, 2024, RAG Architecture: Advanced RAG, https://pub.towardsai.net/rag-architecture-advanced-rag-3fea83e0d189
Vishal Rajput, Apr 16, 2024, RAG 2.0: Retrieval Augmented Language Models, https://medium.com/aiguys/rag-2-0-retrieval-augmented-language-models-3762f3047256
Florian June Aug 2024, The Best Practices of RAG: Typical RAG Process, Best Practices for Each Module, and Comprehensive Evaluation, https://pub.towardsai.net/the-best-practices-of-rag-300e313322e6
Chandini Jain, Aug 15, 2024, The magic of RAG is in the retrieval, https://www.infoworld.com/article/3484132/the-magic-of-rag-is-in-the-retrieval.html (Quality of RAG answers is more dependent on the retriever than the LLM, needing both high quality data availability and accurate retriever query lookup.)
Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta, 9 Aug 2024, HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction, https://arxiv.org/abs/2408.04948
Florian June, Jul 14, 2024, Three Practical Challenges of RAG and Their Mitigation Ideas: Strategies for Overcoming Obstacles in Real-World RAG Projects https://ai.gopubby.com/three-practical-challenges-of-rag-and-their-mitigation-ideas-5cc8e6dd7e30
Matei Zaharia, Omar Khattab, Lingjiao Chen, Jared Quincy Davis, Heather Miller, Chris Potts, James Zou, Michael Carbin, Jonathan Frankle, Naveen Rao, Ali Ghodsi, Feb 18, 2024, The Shift from Models to Compound AI Systems, https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/
Dr. Ashish Bamania, Aug 2024, ‘MedGraphRAG’ Is A Complete Game Changer For AI In Medicine A deep-dive into how RAG, GraphRAG, and MedGraphRAG work and how they significantly improve the performance of LLM responses in Medicine, https://levelup.gitconnected.com/medgraphrag-is-a-complete-game-changer-for-ai-in-medicine-c6b41b0effd6
Junde Wu, Jiayuan Zhu, Yunli Qi, 8 Aug 2024, Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2408.04187 Code: https://github.com/MedicineToken/Medical-Graph-RAG/tree/main
Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao, 26 May 2024, GRAG: Graph Retrieval-Augmented Generation, https://arxiv.org/abs/2405.16506
Philip Rathle, Jul 11, 2024, The GraphRAG Manifesto: Adding Knowledge to GenAI, https://neo4j.com/blog/graphrag-manifesto/
Tomaž Bratanič, Mar 12, 2024, Implementing Advanced Retrieval RAG Strategies With Neo4j, https://neo4j.com/developer-blog/advanced-rag-strategies-neo4j/
Microsoft, Aug 2024 (accessed), GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system, https://github.com/microsoft/graphrag
Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia, July 2024, Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:60626-60643, 2024, https://proceedings.mlr.press/v235/zhang24cq.html
Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li, July 2024, C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:22963-23000, 2024, https://proceedings.mlr.press/v235/kang24a.html
Ahmed Besbes, Aug 24, 2024, What Nobody Tells You About RAGs, https://towardsdatascience.com/what-nobody-tells-you-about-rags-b35f017e1570
Ayush RoyChowdhury, Mulong Luo,, Prateek Sahu,, Sarbartha Banerjee, Mohit Tiwari, Aug 2024, ConfusedPilot: Confused Deputy Risks in RAG-based LLMs, https://confusedpilot.info/confused_pilot_new.pdf
Ziyan Jiang, Xueguang Ma, Wenhu Chen, June 2024, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, arXiv preprint arXiv:2406.15319, https://arxiv.org/abs/2406.15319 (Improved accuracy performance of RAG methods when using a long context LLM and longer chunk sizes for the retriever.)
Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak, 5 Aug 2024, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 https://github.com/IntelLabs/RAGFoundry
Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, Zhicheng Dou, 22 May 2024, FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, https://arxiv.org/abs/2405.13576 https://github.com/RUC-NLPIR/FlashRAG
David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, Stéphane Clinchant, 1 Jul 2024, BERGEN: A Benchmarking Library for Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01102
Ayush Thakur, Raghav Gupta, 13 Apr 2024, Introducing Super RAGs in Mistral 8x7B-v1, https://arxiv.org/abs/2404.08940
SuperAgent, 2024, Super-Rag with SAML, https://docs.superagent.sh/overview/rag-retrieval/super-rag-with-saml
Andrew Ditmer, May 13 2024, SuperRAG – How to achieve higher accuracy with Retrieval Augmented Generation, https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/superrag-how-to-achieve-higher-accuracy-with-retrieval-augmented/ba-p/4139004
Chia Jeng Yang, Dec 14, 2023, A first intro to Complex RAG (Retrieval Augmented Generation), https://medium.com/enterprise-rag/a-first-intro-to-complex-rag-retrieval-augmented-generation-a8624d70090f
Vahe Aslanyan, June 11, 2024, Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook, https://www.freecodecamp.org/news/retrieval-augmented-generation-rag-handbook/
Chandini Jain, August 28, 2024, The magic of RAG is in the retrieval, https://edt.infoworld.com/q/1tldUPQDxjluYqjeyhS98AV4/wv
NirDiamant, Aug 2024, Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems, https://github.com/NirDiamant/RAG_Techniques
Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph E. Gonzalez, Carlos Guestrin, Matei Zaharia, 27 Aug 2024, Text2SQL is Not Enough: Unifying AI and Databases with TAG, https://arxiv.org/abs/2408.14717 https://github.com/TAG-Research/TAG-Bench
Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi, 19 Jul 2024 (v2), Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation https://arxiv.org/abs/2404.06910 (Process each RAG chunk in parallel and choose a final output.)
Zheng Wang, Shu Xian Teo, Jieer Ouyang, Yongjun Xu, Wei Shi, 26 May 2024, M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions, https://arxiv.org/abs/2405.16420
Shenggang Li, Jul 30, 2024, Mem0: Is This the Future of AI Memory Management? https://ai.gopubby.com/mem0-is-this-the-future-of-ai-memory-management-1e228dc8220a
C Yang, S Fujita, 2024, Adaptive Control of Retrieval-Augmented Generation for LLMs Through Reflective Tags, https://www.preprints.org/manuscript/202408.2152/download/final_file
Thuwarakesh Murallie, Aug 2024, How to Achieve Near Human-Level Performance in Chunking for RAGs: The costly yet powerful splitting technique for superior RAG retrieval, https://towardsdatascience.com/agentic-chunking-for-rags-091beccd94b1
Dom Couldwell, Sep 03, 2024 Dealing with ‘day two’ issues in generative AI deployments, https://www.infoworld.com/article/3493255/dealing-with-day-two-issues-in-generative-ai-deployments.html
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela, 17 Apr 2024 (v2), Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906
Tan Yu, Anbang Xu, Rama Akkiraju, 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666
Florian June, Feb 3, 2024, Advanced RAG 02: Unveiling PDF Parsing, https://pub.towardsai.net/advanced-rag-02-unveiling-pdf-parsing-b84ae866344e
Lior Solomon, Sep 2024, Gen AI testing strategies and tools, https://medium.com/ai-in-grc/gen-ai-testing-strategies-and-tools-257383e5cbfb
Vivedha Elango, Sep 2024, Search in the age of AI- Retrieval methods for Beginners, https://ai.gopubby.com/search-in-the-age-of-ai-retrieval-methods-for-beginners-557621e12ded
Ali Forootani, Danial Esmaeili Aliabadi, Daniela Thraen, 11 Sep 2024, Bio-Eng-LMM AI Assist chatbot: A Comprehensive Tool for Research and Education, https://arxiv.org/abs/2409.07110
Louis Bouchard, Sep 13, 2024, Top RAG Techniques You Should Know (Wang et al., 2024), https://www.louisbouchard.ai/top-rag-techniques/
Sascha Heyer, Sep 2024, RAG API: 30 lines of code is all you need for RAG. The easiest way to get started with RAG. https://medium.com/google-cloud/google-cloud-rag-api-c7e3c9931b3e
Florian June, Sep 2024, Kotaemon Unveiled: Innovations in RAG Framework for Document QA: PDF Parsing, GraphRAG, Agent-Based Reasoning, and Insights, https://ai.gopubby.com/kotaemon-unveiled-innovations-in-rag-framework-for-document-qa-0b6d67e4b9b7
Michael D. Skarlinski, James D. Braza, SamCox, Michaela Hinks, Manvitha Ponnapati, Samuel G. Rodriques, Jon M. Laurent, Michael J. Hammerling, Andrew D. White, Sep 2024, Language Agents Achieve Superhuman Synthesis of Scientific Knowledge, https://storage.googleapis.com/fh-public/paperqa/Language_Agents_Science.pdf https://github.com/Future-House/paper-qa
Pathway, Sep 2024, 2024 Top RAG Frameworks, https://pathway.com/rag-frameworks
Anthropic, 20 Sept 2024, Introducing Contextual Retrieval, https://www.anthropic.com/news/contextual-retrieval
Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Yuan Qu, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, 24 Sep 2024 (v2), KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, https://arxiv.org/abs/2409.13731
Surya Maddula, Sep 2024, Not RAG, but RAG Fusion? Understanding Next-Gen Info Retrieval. https://pub.towardsai.net/not-rag-but-rag-fusion-understanding-next-gen-info-retrieval-477788da02e2
Adrian H. Raudaschl, Oct 6, 2023, Forget RAG, the Future is RAG-Fusion: The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion and Generated Queries, https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Deval Shah, Jul 4, 2024, Reciprocal Rank Fusion (RRF) explained in 4 mins — How to score results form multiple retrieval methods in RAG: Unlock the power of Reciprocal Rank Fusion in Retrieval-Augmented Generation. https://medium.com/@devalshah1619/mathematical-intuition-behind-reciprocal-rank-fusion-rrf-explained-in-2-mins-002df0cc5e2a
Vishal Rajput, Sep 27, 2024, Why Scaling RAGs For Production Is So Hard? https://medium.com/aiguys/why-scaling-rags-for-production-is-so-hard-a2f540785e97
Chirag Agrawal, Sep 20, 2024, Unlocking the Power of Efficient Vector Search in RAG Applications, https://pub.towardsai.net/unlocking-the-power-of-efficient-vector-search-in-rag-applications-c2e3a0c551d5
Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong, 3 Oct 2024, UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation, https://arxiv.org/abs/2410.02719
Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343
Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik, 8 Oct 2024, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983
Zhangchi Feng, Dongdong Kuang, Zhongyuan Wang, Zhijie Nie, Yaowei Zheng, Richong Zhang, 15 Oct 2024 (v2), EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations, https://arxiv.org/abs/2410.10315 https://github.com/BUAADreamer/EasyRAG
Barhoumi Mosbeh, Sep 29, 2024, Anthropic’s New RAG Approach, https://pub.towardsai.net/anthropics-new-rag-approach-e0c24a68893b
Tianyang Zhang, Zhuoxuan Jiang, Shengguang Bai, Tianrui Zhang, Lin Lin, Yang Liu, Jiawei Ren, 21 Oct 2024, RAG4ITOps: A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance, https://arxiv.org/abs/2410.15805
Ran Xu, Hui Liu, Sreyashi Nag, Zhenwei Dai, Yaochen Xie, Xianfeng Tang, Chen Luo, Yang Li, Joyce C. Ho, Carl Yang, Qi He, 23 Oct 2024, SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains, https://arxiv.org/abs/2410.17952
Elias Lumer, Vamse Kumar Subbiah, James A. Burke, Pradeep Honaganahalli Basavaraju, Austin Huber, 22 Oct 2024 (v2), Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases, https://arxiv.org/abs/2410.14594
Kibeom Lee, Oct 2024, Retrieval-Augmented Generation: Enhancing LLMs with Dynamic Information Access, https://sendbird.com/developer/tutorials/rag (Covers BM25 "Best Match 25" vector search for RAG.)
Damian Gil, Apr 17, 2024, Advanced Retriever Techniques to Improve Your RAGs, https://towardsdatascience.com/advanced-retriever-techniques-to-improve-your-rags-1fac2b86dd61
Vectorize, October 29, 2024, Multimodal RAG Patterns Every AI Developer Should Know, https://vectorize.io/multimodal-rag-patterns/
Tolga Şakar and Hakan Emekci, 30 October 2024, Maximizing RAG efficiency: A comparative analysis of RAG methods, Natural Language Processing. doi:10.1017/nlp.2024.53, https://www.cambridge.org/core/journals/natural-language-processing/article/maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods/D7B259BCD35586E04358DF06006E0A85 https://www.cambridge.org/core/services/aop-cambridge-core/content/view/D7B259BCD35586E04358DF06006E0A85/S2977042424000530a.pdf/div-class-title-maximizing-rag-efficiency-a-comparative-analysis-of-rag-methods-div.pdf
Sebastian Petrus, Sep 4, 2024, Top 10 RAG Frameworks Github Repos 2024, https://sebastian-petrus.medium.com/top-10-rag-frameworks-github-repos-2024-12b2a81f4a49
Jason Perlow, Nov. 6, 2024, The best open-source AI models: All your free-to-use options explained: Here are the best open-source and free-to-use AI models for text, images, and audio, organized by type, application, and licensing considerations. https://www.zdnet.com/article/the-best-open-source-ai-models-all-your-free-to-use-options-explained/
Ziting Wang, Haitao Yuan, Wei Dong, Gao Cong, Feifei Li, 1 Nov 2024, CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation, https://arxiv.org/abs/2411.00744
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Emilia David, November 8, 2024, Multimodal RAG is growing, here’s the best way to get started, https://venturebeat.com/ai/multimodal-rag-is-growing-heres-the-best-way-to-get-started/
Shubham Sharma. November 12, 2024, How agentic RAG can be a game-changer for data processing and retrieval, https://venturebeat.com/ai/how-agentic-rag-can-be-a-game-changer-for-data-processing-and-retrieval/
Alden Do Rosario, Nov 2024, Dear IT Departments, Please Stop Trying To Build Your Own RAG, https://pub.towardsai.net/dear-it-departments-please-stop-trying-to-build-your-own-rag-4546b4638273
Cobus Greyling, Nov 2024, Four Levels of RAG — Research from Microsoft. Improving Retrieval-Augmented Generation (RAG) involves classifying queries based on user intent & focusing on context. Also utilising SLMs and fine-tuning to deliver more accurate & relevant results. https://cobusgreyling.medium.com/four-levels-of-rag-research-from-microsoft-fdc54388f0ff
Rupali Patil, Nov 10, 2024, RAGate: Adaptive RAG for Conversational AI, https://pub.towardsai.net/ragate-adaptive-rag-for-conversational-ai-94b5ca469b7d
Shalin Shah, Srikanth Ryali, Ramasubbu Venkatesh, 8 Nov 2024, Multi-Document Financial Question Answering using LLMs, https://arxiv.org/abs/2411.07264
Alexandria Leto, Cecilia Aguerrebere, Ishwar Bhati, Ted Willke, Mariano Tepper, Vy Ai Vo, 11 Nov 2024, Toward Optimal Search and Retrieval for RAG, https://arxiv.org/abs/2411.07396
Jiejun Tan, Zhicheng Dou, Wen Wang, Mang Wang, Weipeng Chen, Ji-Rong Wen, 5 Nov 2024, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, https://arxiv.org/abs/2411.02959
Louis-François Bouchard, Nov 22, 2024, Advanced RAG Evaluation Techniques for Optimal LLM Performance. Why RAG Evaluation Matters and Techniques to Leverage, https://louisbouchard.substack.com/p/advanced-rag-evaluation-techniques
Sonal Prabhune, Donald J. Berndt, 7 Nov 2024, Deploying Large Language Models With Retrieval Augmented Generation, https://arxiv.org/abs/2411.11895
Mohammad Hassan Heydari, Arshia Hemmat, Erfan Naman, Afsaneh Fatemi. 25 Nov 2024, Context Awareness Gate For Retrieval Augmented Generation, https://arxiv.org/abs/2411.16133
Shengming Zhao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma, 29 Nov 2024, Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems, https://arxiv.org/abs/2411.19463
Matvey Arye, Avthar Sewrathan, 29 Oct 2024, Vector Databases Are the Wrong Abstraction, https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction/
Jérôme DIAZ, Dec 2024, Why Retrieval-Augmented Generation Is Still Relevant in the Era of Long-Context Language Models. In this article we will explore why 128K tokens (and more) models can’t fully replace using RAG. https://towardsdatascience.com/why-retrieval-augmented-generation-is-still-relevant-in-the-era-of-long-context-language-models-e36f509abac5
Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky, 17 Oct 2024 (v2), Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang, 20 Nov 2023 (v3), Lost in the Middle: How Language Models Use Long Contexts, https://arxiv.org/abs/2307.03172 (Information is best placed at the start, or otherwise at the end, of a long context.)
Joyce Birkins, Oct 10, 2024, 6 Advanced RAG Optimization Strategies: Analysis of 14 Key Research Papers, https://medium.com/@pamperherself/6-advanced-rag-optimization-strategies-analysis-of-14-key-research-papers-f12329975009
Michael Shen, Muhammad Umar, Kiwan Maeng, G. Edward Suh, Udit Gupta, 16 Dec 2024, Towards Understanding Systems Trade-offs in Retrieval-Augmented Generation Model Inference, https://arxiv.org/abs/2412.11854
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
Vivedha Elango, Dec 2024, How to Make your RAG application Use External Data More Wisely? RAG Optimisation Techniques for Explicit and Implicit Fact Queries with Implementations. https://ai.gopubby.com/how-to-make-your-rag-application-use-external-data-more-wisely-4ff1863752c5
Aritra Sen, Anindita Desarkar and Vishwanathan Raman, Dec 2024, An End-to-End Framework Towards Improving RAG (Retrieval-Augmented Generation) Based Application Performance, https://easychair.org/publications/preprint/XLw8 https://easychair.org/publications/preprint/XLw8/download
Xueguang Ma, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Wenhu Chen, Jimmy Lin, 19 Dec 2024, VISA: Retrieval Augmented Generation with Visual Source Attribution, https://arxiv.org/abs/2412.14457
Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, Hen-Hsen Huang, 20 Dec 2024, Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, https://arxiv.org/abs/2412.15605 (Mini-RAG architecture preloading the entire knowledge into the LLM context.)
Sreedevi Gogusetty, Dec 6, 2024, From RAG to TAG: Leveraging the Power of Table-Augmented Generation (TAG): A Leap Beyond Retrieval-Augmented Generation (RAG), https://ai.plainenglish.io/from-rag-to-tag-leveraging-the-power-of-table-augmented-generation-tag-a-leap-beyond-54d1cfadb994 (TAG for augmenting LLMs with queries from database tables, similar to data source plugins.)
Harvey Bower, 2024, Debugging RAG Pipelines: Best Practices for High-Performance LLMs, https://www.amazon.com/dp/B0DNWN5RB1
C. Su et al., "Hybrid RAG-Empowered Multi-Modal LLM for Secure Data Management in Internet of Medical Things: A Diffusion-Based Contract Approach," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3521425. https://ieeexplore.ieee.org/abstract/document/10812735
Omar Khattab, Matei Zaharia, 4 Jun 2020 (v2), ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, https://arxiv.org/abs/2004.12832
Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo, 7 Oct 2024 (v3), ColPali: Efficient Document Retrieval with Vision Language Models, https://arxiv.org/abs/2407.01449
Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
AI Engineer, 2023, Building Production-Ready RAG Applications: Jerry Liu, https://www.youtube.com/watch?v=TRjq7t2Ms5I&t=152s
Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/
Latent Space, Dec 28, 2024, The 2025 AI Engineering Reading List: We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here. https://www.latent.space/p/2025-papers
Y Li, K Livescu, J Zhou, Dec 2024, Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://neurips2024-enlsp.github.io/papers/paper_90.pdf (Generate multiple tokens in decoding by inserting RAG chunks directly into the decoding output.)
Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406
Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang, 31 Dec 2024, RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions, https://arxiv.org/abs/2501.00353 https://github.com/FreedomIntelligence/RAG-Instruct
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Omar Santos, Jun 15, 2024, Comparing RAG, RAG Fusion, with RAPTOR: Different AI Retrieval-Augmented Implementations, https://becomingahacker.org/comparing-rag-rag-fusion-with-raptor-different-ai-retrieval-augmented-implementations-1aa76fce6a5c
Tianyu Fan, Jingyuan Wang, Xubin Ren, Chao Huang, 14 Jan 2025 (v2), MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation, https://arxiv.org/abs/2501.06713 https://github.com/HKUDS/MiniRAG (Uses the name "mini RAG" but is about knowledge graphs not long context RAG.)
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, 15 Jan 2025, Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, https://arxiv.org/abs/2501.09136
Taehee Jeong, 17 Jan 2025, 4bit-Quantization in Vector-Embedding for RAG, https://arxiv.org/abs/2501.10534 https://github.com/taeheej/4bit-Quantization-in-Vector-Embedding-for-RAG
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu, 27 Jan 2025, Parametric Retrieval Augmented Generation, https://arxiv.org/abs/2501.15915 https://github.com/oneal2000/prag (Parametric RAG (PRAG) is training the RAG documents into model parameters, rather than prepending documents using long context RAG, and this means a shorter inference token length.)
Bharani Subramaniam, 13 February 2025, Emerging Patterns in Building GenAI Products, https://martinfowler.com/articles/gen-ai-patterns/
Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan, 16 Feb 2025, QuOTE: Question-Oriented Text Embeddings, https://arxiv.org/abs/2502.10976 (Augmenting RAG chunks with additional information, such as questions the chunk might answer.)
Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su, 20 Feb 2025, From RAG to Memory: Non-Parametric Continual Learning for Large Language Models, https://arxiv.org/abs/2502.14802 https://github.com/OSU-NLP-Group/HippoRAG
Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Lisa Vandenhurk, Joey Chua, 20 Feb 2025, RAGVA: Engineering Retrieval Augmented Generation-based Virtual Assistants in Practice, https://arxiv.org/abs/2502.14930
Timothy B. Lee, Feb 24, 2025, These experts were stunned by OpenAI Deep Research: "I would use this model professionally," an antitrust lawyer told me, https://www.understandingai.org/p/these-experts-were-stunned-by-openai
R. Shan, "OpenRAG: Open-source Retrieval-Augmented Generation Architecture for Personalized Learning," 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC), Xiamen, China, 2024, pp. 212-216, doi: 10.1109/ICAIRC64177.2024.10900069. https://ieeexplore.ieee.org/abstract/document/10900069
Krish Arvapally, Mar 2025, The End of AI Scraping? A Better Way to Unlock Data at the Point of Inference with RAG & MCP, https://medium.com/@arvapallykrish/the-end-of-ai-scraping-a-better-way-to-unlock-data-at-the-point-of-inference-with-rag-mcp-6cbb141a5765
Jiawei Zhou, Lei Chen, 11 Mar 2025, OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning, https://arxiv.org/abs/2503.08398
Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen, 17 Mar 2025 (v2), A Survey on Knowledge-Oriented Retrieval-Augmented Generation, https://arxiv.org/abs/2503.10677
Minchae Song, 21 May 2025, Enhancing RAG Performance by Representing Hierarchical Nodes in Headers for Tabular Data, IEEE Access, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11003975
Bodun Hu, Luis Pabon, Saurabh Agarwal, Aditya Akella, 1 May 2025, Patchwork: A Unified Framework for RAG Serving, https://arxiv.org/abs/2505.07833
Chen Amiraz, Florin Cuconasu, Simone Filice, Zohar Karnin, 11 May 2025, The Distracting Effect: Understanding Irrelevant Passages in RAG, https://arxiv.org/abs/2505.06914
Javier Ramos, June 2025, You Don’t Need RAG! Build a Q&A AI Agent in 30 Minutes 🚀, https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495
Xiaohan Yu, Pu Jian, Chong Chen 12 Jun 2025, TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning, https://arxiv.org/abs/2506.10380 https://github.com/yxh-y/TableRAG/tree/main
Agada Joseph Oche, Ademola Glory Folashade, Tirthankar Ghosal, Arpan Biswas, 25 Jul 2025, A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions, https://arxiv.org/abs/2507.18910
Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, Shenghua Liu, 21 Jul 2025 (v2), A Survey of Context Engineering for Large Language Models, https://arxiv.org/abs/2507.13334
Li, M., Song, S., Gao, N., Zhang, . (2025). NeuRAG: Retrieval-Augmented Generation based on Dynamic Neural Matching. In: Huang, DS., Zhang, Q., Zhang, C., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2025. Lecture Notes in Computer Science, vol 15853. Springer, Singapore. https://doi.org/10.1007/978-981-96-9894-3_19 https://link.springer.com/chapter/10.1007/978-981-96-9894-3_19
Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin, 6 Aug 2025, TURA: Tool-Augmented Unified Retrieval Agent for AI Search, https://arxiv.org/abs/2508.04604
Latent Space, Aug 20, 2025, "RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma: What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows, https://www.latent.space/p/chroma