Large Language Models (LLMs) have transformed Natural Language Processing (NLP), especially in Question Answering (QA). However, LLMs still struggle with generating accurate responses, leading to the challenge of “hallucination.” Retrieval-Augmented Generation (RAG) offers a promising solution to address this issue by improving the knowledge base of LLMs.

CRAG Benchmark: A Practical Solution

CRAG is a benchmark designed to evaluate RAG solutions. It includes diverse QA pairs from various domains, covering different types of questions and entity popularity. The benchmark aims to provide realistic and reliable data by manually verifying and paraphrasing questions. Additionally, CRAG simulates web retrieval and knowledge graphs to test the capabilities of RAG systems.

The benchmark offers three tasks to evaluate web retrieval, structured querying, and summarization capabilities of RAG solutions. These tasks aim to assess the systems’ ability to generate accurate answers by accessing external data sources.

The results from CRAG evaluations demonstrate the effectiveness of the benchmark in highlighting the limitations of existing RAG solutions. The benchmark serves as a valuable tool for driving further progress in developing trustworthy question-answering systems.

CRAG: Driving AI Research and Development

Researchers behind CRAG plan to continuously enhance and expand the benchmark, addressing emerging challenges and incorporating multi-lingual questions and multi-modal inputs. This ongoing development ensures that CRAG remains at the forefront of driving RAG research and addressing new research needs in the field of reliable language generation capabilities.

