In today’s fast-paced digital landscape, ensuring the reliability of AI-generated content is crucial for businesses and developers alike. This article delves into how to build a Groundedness Verification Tool using Upstage API and LangChain, designed to help AI developers, data scientists, and business managers verify the accuracy of AI outputs.
Understanding the Target Audience
The primary audience for this tutorial includes AI developers, data scientists, and business managers who are focused on ensuring the reliability of AI-generated content. These professionals often face challenges related to the accuracy of AI outputs and the need for trustworthy information in their decision-making processes. They seek to enhance the credibility of their AI systems while maintaining efficiency in content generation. Thus, clear and concise communication, along with practical examples, is essential for this audience.
Introduction to Upstage’s Groundedness Check Service
Upstage’s Groundedness Check service offers a robust API that allows users to verify whether AI-generated responses are anchored in reliable source material. By submitting context–answer pairs to the Upstage endpoint, users can determine if the provided context supports a given answer and receive a confidence assessment of that grounding. This tutorial will walk you through utilizing Upstage’s core capabilities, including single-shot verification, batch processing, and multi-domain testing, to ensure that AI systems produce factual and trustworthy content across various subject areas.
Setting Up the Environment
To get started, you need to install the necessary packages:
pip install -qU langchain-core langchain-upstage
Next, set your Upstage API key in the environment to authenticate all subsequent groundedness check requests:
import os
os.environ["UPSTAGE_API_KEY"] = "Use Your API Key Here"
Creating the AdvancedGroundednessChecker Class
The AdvancedGroundednessChecker
class wraps Upstage’s groundedness API into a reusable interface. This class allows for both single and batch context–answer checks while accumulating results. It includes methods to extract a confidence label from each response and compute overall accuracy statistics across all checks.
class AdvancedGroundednessChecker:
def __init__(self):
self.checker = UpstageGroundednessCheck()
self.results = []
def check_single(self, context: str, answer: str) -> Dict[str, Any]:
request = {"context": context, "answer": answer}
response = self.checker.invoke(request)
result = {
"context": context,
"answer": answer,
"grounded": response,
"confidence": self._extract_confidence(response)
}
self.results.append(result)
return result
def batch_check(self, test_cases: List[Dict[str, str]]) -> List[Dict[str, Any]]:
batch_results = []
for case in test_cases:
result = self.check_single(case["context"], case["answer"])
batch_results.append(result)
return batch_results
def _extract_confidence(self, response) -> str:
if hasattr(response, 'lower'):
if 'grounded' in response.lower():
return 'high'
elif 'not grounded' in response.lower():
return 'low'
return 'medium'
def analyze_results(self) -> Dict[str, Any]:
total = len(self.results)
grounded = sum(1 for r in self.results if 'grounded' in str(r['grounded']).lower())
return {
"total_checks": total,
"grounded_count": grounded,
"not_grounded_count": total - grounded,
"accuracy_rate": grounded / total if total > 0 else 0
}
Running Groundedness Checks
Here are examples of running single groundedness checks:
result1 = checker.check_single(
context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
answer="Mauna Kea is 5,207.3 meters tall."
)
result2 = checker.check_single(
context="Python is a high-level programming language created by Guido van Rossum in 1991.",
answer="Python was made by Guido van Rossum & focuses on code readability."
)
result3 = checker.check_single(
context="The Great Wall of China is approximately 13,000 miles long.",
answer="The Great Wall of China is very long."
)
result4 = checker.check_single(
context="Water boils at 100 degrees Celsius at sea level atmospheric pressure.",
answer="Water boils at 90 degrees Celsius at sea level."
)
Batch Processing Example
Batch processing allows for multiple checks at once:
test_cases = [
{
"context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
"answer": "Romeo and Juliet was written by Shakespeare."
},
{
"context": "The speed of light is approximately 299,792,458 meters per second.",
"answer": "Light travels at about 300,000 kilometers per second."
},
{
"context": "Earth has one natural satellite called the Moon.",
"answer": "Earth has two moons."
}
]
batch_results = checker.batch_check(test_cases)
Results Analysis
After running the checks, you can analyze the results:
analysis = checker.analyze_results()
print(f"Total checks performed: {analysis['total_checks']}")
print(f"Grounded responses: {analysis['grounded_count']}")
print(f"Not grounded responses: {analysis['not_grounded_count']}")
print(f"Groundedness rate: {analysis['accuracy_rate']:.2%}")
Multi-domain Testing
Conduct multi-domain validations to illustrate how Upstage handles groundedness across different subject areas:
domains = {
"Science": {
"context": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, & water into glucose and oxygen.",
"answer": "Plants use photosynthesis to make food from sunlight and CO2."
},
"History": {
"context": "World War II ended in 1945 after the surrender of Japan following the atomic bombings.",
"answer": "WWII ended in 1944 with Germany's surrender."
},
"Geography": {
"context": "Mount Everest is the highest mountain on Earth, located in the Himalayas at 8,848.86 meters.",
"answer": "Mount Everest is the tallest mountain and is located in the Himalayas."
}
}
for domain, test_case in domains.items():
result = checker.check_single(test_case["context"], test_case["answer"])
Creating a Test Report
To generate a detailed test report summarizing the performance:
def create_test_report(checker_instance):
report = {
"summary": checker_instance.analyze_results(),
"detailed_results": checker_instance.results,
"recommendations": []
}
accuracy = report["summary"]["accuracy_rate"]
if accuracy < 0.7:
report["recommendations"].append("Consider reviewing answer generation process")
if accuracy > 0.9:
report["recommendations"].append("High accuracy - system performing well")
return report
Conclusion
This tutorial demonstrated the importance of groundedness checking, batch processing capabilities, multi-domain testing, results analysis, and the implementation of an advanced wrapper. With Upstage’s Groundedness Check, users gain a scalable, domain-agnostic solution for real-time fact verification and confidence scoring. By integrating this service into their workflows, organizations can enhance the reliability of AI-generated outputs and maintain rigorous standards of factual integrity across all applications. For further exploration, check out the Upstage website for more resources and documentation.
FAQ
- What is the purpose of the Groundedness Check service? The Groundedness Check service verifies if AI-generated responses are based on reliable sources.
- Who can benefit from this tool? AI developers, data scientists, and business managers looking to ensure the accuracy of AI outputs can benefit from this tool.
- How does batch processing work? Batch processing allows users to check multiple context-answer pairs at once, streamlining the verification process.
- What should I do if the accuracy rate is low? If the accuracy rate is below 70%, it is advisable to review the answer generation process.
- Can this tool be used across different domains? Yes, the tool is designed to handle groundedness checks across various subject areas effectively.