Aussie AI

RAG Evaluation

  • Last Updated 30 December, 2024
  • by David Spuler, Ph.D.

RAG evaluation is the analysis of the LLM-based RAG architecture as a whole, rather than conventional model evaluation that examines only the model. A typical RAG system includes not only an LLM, but a vector database of document chunks, and an orchestrator component. Advanced RAG architectures typically also include a keyword search datastore, reranker, packer, and other components.

See also more research on related areas:

Research on RAG Evaluation

  • Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert, 26 Sep 2023, RAGAS: Automated Evaluation of Retrieval Augmented Generation, https://arxiv.org/abs/2309.15217
  • Shangeetha Sivasothy, Scott Barnett, Stefanus Kurniawan, Zafaryab Rasool, Rajesh Vasa, 24 Sep 2024, RAGProbe: An Automated Approach for Evaluating RAG Applications, https://arxiv.org/abs/2409.19019
  • Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia, 31 Mar 2024 (v2), ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems, https://arxiv.org/abs/2311.09476
  • Kevin Wu, Eric Wu, James Zou, 10 Jun 2024 (v2), ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence, https://arxiv.org/abs/2404.10198
  • Galla, D., Hoda, S., Zhang, M., Quan, W., Yang, T.D., Voyles, J. (2024). CoURAGE: A Framework to Evaluate RAG Systems. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_37 https://link.springer.com/chapter/10.1007/978-3-031-70242-6_37
  • Rafael Teixeira de Lima, Shubham Gupta, Cesar Berrospi, Lokesh Mishra, Michele Dolfi, Peter Staar, Panagiotis Vagenas, 29 Nov 2024, Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems, IBM Research, https://arxiv.org/abs/2411.19710
  • Lilian Weng, July 7, 2024, Extrinsic Hallucinations in LLMs, https://lilianweng.github.io/posts/2024-07-07-hallucination/
  • Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan, Amina Terfai, Anoop Surya, Tracey Mercer, Vinodh Kumar Thanigachalam, Tamar Bar, Sanjana Krishnan, Samy Kilaru, Jasmine Jaksic, Nave Algarici, Jacob Liberman, Joey Conway, Sonu Nayyar, Justin Boitano, 10 Jul 2024, FACTS About Building Retrieval Augmented Generation-based Chatbots, NVIDIA Research, https://arxiv.org/abs/2407.07858
  • Contextual AI Team, March 19, 2024 Introducing RAG 2.0, https://contextual.ai/introducing-rag2/
  • Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto de M. Estevão Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra, 30 Jan 2024 (v3), RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406

More AI Research

Read more about: