Build a Groundedness Verification Tool with Upstage API and LangChain for AI Developers

In today’s fast-paced digital landscape, ensuring the reliability of AI-generated content is crucial for businesses and developers alike. This article delves into how to build a Groundedness Verification Tool using Upstage API and LangChain, designed to help AI developers, data scientists, and business managers verify the accuracy of AI outputs.

Understanding the Target Audience

The primary audience for this tutorial includes AI developers, data scientists, and business managers who are focused on ensuring the reliability of AI-generated content. These professionals often face challenges related to the accuracy of AI outputs and the need for trustworthy information in their decision-making processes. They seek to enhance the credibility of their AI systems while maintaining efficiency in content generation. Thus, clear and concise communication, along with practical examples, is essential for this audience.

Introduction to Upstage’s Groundedness Check Service

Upstage’s Groundedness Check service offers a robust API that allows users to verify whether AI-generated responses are anchored in reliable source material. By submitting context–answer pairs to the Upstage endpoint, users can determine if the provided context supports a given answer and receive a confidence assessment of that grounding. This tutorial will walk you through utilizing Upstage’s core capabilities, including single-shot verification, batch processing, and multi-domain testing, to ensure that AI systems produce factual and trustworthy content across various subject areas.

Setting Up the Environment

To get started, you need to install the necessary packages:

pip install -qU langchain-core langchain-upstage

Next, set your Upstage API key in the environment to authenticate all subsequent groundedness check requests:

import os
os.environ["UPSTAGE_API_KEY"] = "Use Your API Key Here"

Creating the AdvancedGroundednessChecker Class

The AdvancedGroundednessChecker class wraps Upstage’s groundedness API into a reusable interface. This class allows for both single and batch context–answer checks while accumulating results. It includes methods to extract a confidence label from each response and compute overall accuracy statistics across all checks.

class AdvancedGroundednessChecker:
    def __init__(self):
        self.checker = UpstageGroundednessCheck()
        self.results = []
   
    def check_single(self, context: str, answer: str) -> Dict[str, Any]:
        request = {"context": context, "answer": answer}
        response = self.checker.invoke(request)
        result = {
            "context": context,
            "answer": answer,
            "grounded": response,
            "confidence": self._extract_confidence(response)
        }
        self.results.append(result)
        return result
   
    def batch_check(self, test_cases: List[Dict[str, str]]) -> List[Dict[str, Any]]:
        batch_results = []
        for case in test_cases:
            result = self.check_single(case["context"], case["answer"])
            batch_results.append(result)
        return batch_results
   
    def _extract_confidence(self, response) -> str:
        if hasattr(response, 'lower'):
            if 'grounded' in response.lower():
                return 'high'
            elif 'not grounded' in response.lower():
                return 'low'
        return 'medium'
   
    def analyze_results(self) -> Dict[str, Any]:
        total = len(self.results)
        grounded = sum(1 for r in self.results if 'grounded' in str(r['grounded']).lower())
        return {
            "total_checks": total,
            "grounded_count": grounded,
            "not_grounded_count": total - grounded,
            "accuracy_rate": grounded / total if total > 0 else 0
        }

Running Groundedness Checks

Here are examples of running single groundedness checks:

result1 = checker.check_single(
    context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
    answer="Mauna Kea is 5,207.3 meters tall."
)
result2 = checker.check_single(
    context="Python is a high-level programming language created by Guido van Rossum in 1991.",
    answer="Python was made by Guido van Rossum & focuses on code readability."
)
result3 = checker.check_single(
    context="The Great Wall of China is approximately 13,000 miles long.",
    answer="The Great Wall of China is very long."
)
result4 = checker.check_single(
    context="Water boils at 100 degrees Celsius at sea level atmospheric pressure.",
    answer="Water boils at 90 degrees Celsius at sea level."
)

Batch Processing Example

Batch processing allows for multiple checks at once:

test_cases = [
    {
        "context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
        "answer": "Romeo and Juliet was written by Shakespeare."
    },
    {
        "context": "The speed of light is approximately 299,792,458 meters per second.",
        "answer": "Light travels at about 300,000 kilometers per second."
    },
    {
        "context": "Earth has one natural satellite called the Moon.",
        "answer": "Earth has two moons."
    }
]
batch_results = checker.batch_check(test_cases)

Results Analysis

After running the checks, you can analyze the results:

analysis = checker.analyze_results()
print(f"Total checks performed: {analysis['total_checks']}")
print(f"Grounded responses: {analysis['grounded_count']}")
print(f"Not grounded responses: {analysis['not_grounded_count']}")
print(f"Groundedness rate: {analysis['accuracy_rate']:.2%}")

Multi-domain Testing

Conduct multi-domain validations to illustrate how Upstage handles groundedness across different subject areas:

domains = {
    "Science": {
        "context": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, & water into glucose and oxygen.",
        "answer": "Plants use photosynthesis to make food from sunlight and CO2."
    },
    "History": {
        "context": "World War II ended in 1945 after the surrender of Japan following the atomic bombings.",
        "answer": "WWII ended in 1944 with Germany's surrender."
    },
    "Geography": {
        "context": "Mount Everest is the highest mountain on Earth, located in the Himalayas at 8,848.86 meters.",
        "answer": "Mount Everest is the tallest mountain and is located in the Himalayas."
    }
}
for domain, test_case in domains.items():
    result = checker.check_single(test_case["context"], test_case["answer"])

Creating a Test Report

To generate a detailed test report summarizing the performance:

def create_test_report(checker_instance):
    report = {
        "summary": checker_instance.analyze_results(),
        "detailed_results": checker_instance.results,
        "recommendations": []
    }
    accuracy = report["summary"]["accuracy_rate"]
    if accuracy < 0.7:
        report["recommendations"].append("Consider reviewing answer generation process")
    if accuracy > 0.9:
        report["recommendations"].append("High accuracy - system performing well")
    return report

Conclusion

This tutorial demonstrated the importance of groundedness checking, batch processing capabilities, multi-domain testing, results analysis, and the implementation of an advanced wrapper. With Upstage’s Groundedness Check, users gain a scalable, domain-agnostic solution for real-time fact verification and confidence scoring. By integrating this service into their workflows, organizations can enhance the reliability of AI-generated outputs and maintain rigorous standards of factual integrity across all applications. For further exploration, check out the Upstage website for more resources and documentation.

FAQ

What is the purpose of the Groundedness Check service? The Groundedness Check service verifies if AI-generated responses are based on reliable sources.
Who can benefit from this tool? AI developers, data scientists, and business managers looking to ensure the accuracy of AI outputs can benefit from this tool.
How does batch processing work? Batch processing allows users to check multiple context-answer pairs at once, streamlining the verification process.
What should I do if the accuracy rate is low? If the accuracy rate is below 70%, it is advisable to review the answer generation process.
Can this tool be used across different domains? Yes, the tool is designed to handle groundedness checks across various subject areas effectively.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Test-Time Preference Optimization: A Novel AI Framework that Optimizes LLM Outputs During Inference with an Iterative Textual Reward Policy

Understanding Large Language Models (LLMs) Large Language Models (LLMs) are essential in today’s world, impacting various fields. They excel in many tasks but sometimes produce unexpected or unsafe responses. Ongoing research aims to better align LLMs…

AI Tech News
This AI Paper from China Proposes SGGRL: A Novel Molecular Representation Learning Model based on the Multi-Modals of Molecules for Molecular Property Prediction

Advancements in artificial intelligence and machine learning have revolutionized molecular property prediction in drug discovery and design. The SGGRL model from Zhejiang University introduces a multi-modal approach, combining sequence, graph, and geometry data to overcome the…

AI Tech News
Meet Neosync: The Open Source Solution for Synchronizing and Anonymizing Production Data Across Development Environments and Testing

Neosync is an open-source platform helping software development teams anonymize and generate synthetic data for testing while maintaining data privacy. It connects to production databases to facilitate data synchronization across environments and offers features like automatic…

AI Tech News
This AI Paper Introduces CLIN: A Continually Learning Language Agent that Excels in Both Task Adaptation and Generalization to Unseen Tasks and Environments in a Pure Zero-Shot Setup

CLIN (Continually Learning Language Agent) is an innovative architecture that allows language agents to adapt and improve their performance over time. It introduces a dynamic textual memory system that focuses on causal abstractions and enables the…

AI Tech News
UC Berkeley Researchers Introduce StreamDiffusion: A Real-Time Diffusion-Pipeline Designed for Interactive Image Generation

Researchers have introduced StreamDiffusion, a novel pipeline-level approach to interactive image generation with high throughput capabilities. Addressing the limitations of traditional diffusion models in real-time interaction, StreamDiffusion employs batching denoising processes, RCFG, efficient parallel processing, and…

AI Tech News
LightLab: Advanced Diffusion-Based AI for Fine-Grained Light Control in Images

Introduction to LightLab: A New AI Method for Image Lighting Control Google researchers, in collaboration with several universities, have developed LightLab, a cutting-edge AI method that allows for precise control over lighting in images. This innovation…

AI News
Top 10 UX Videos of 2023

The article highlights top videos from 2023, covering topics like UX resumes, usability test facilitation, information architecture, content strategy, empathy maps, and more. It also features bonus videos from 2021 with content on user interviews, UX…

UX News
From Social Media to Macroeconomics: ALERTA-Net and the Future of Stock Market Analysis

ALERTA-Net is a deep neural network that forecasts stock prices and market volatility by integrating social media, economic indicators, and search data, surpassing conventional analytical approaches.

AI Tech News
Do Transformers Truly Understand Search? A Deep Dive into Their Limitations

Understanding Transformers and Their Role in Graph Search Transformers are essential for large language models (LLMs) and are now being used for graph search problems, which are crucial in AI and computational logic. Graph search involves…

AI Tech News
SCD2 — Semantics and Styles

This text discusses the semantics of slowly changing dimension type 2 (SCD2) techniques in dimensional modeling. It covers the importance of choosing appropriate reference dates and the impact of different row-versioning methods on access patterns. Three…

AI Tech News
Phidata: An AI Framework for Building Autonomous Assistants with Long-Term Memory, Contextual Knowledge and the Ability to Take Actions Using Function Calling

Innovative AI Framework: Phidata Revolutionizing Autonomous Assistants with Long-Term Memory and Actionable Capabilities In the modern world, artificial intelligence (AI), particularly large language models (LLMs), plays a crucial role in assisting businesses and individuals. However, traditional…

AI Tech News
Enhancing Text Retrieval: Overcoming the Limitations with Contextual Document Embeddings

Improving Text Retrieval with AI Solutions Challenges in Text Retrieval Text retrieval in machine learning has significant challenges. Traditional methods, like BM25, rely on basic word matching and struggle to understand the meaning behind words. Neural…

AI Tech News
BigGait: Revolutionizing Gait Recognition with Unsupervised Learning and Large Vision Models

Gait recognition technology, like BigGait, offers non-intrusive identification from a distance, utilizing unique walking patterns. BigGait introduces a paradigm shift by harnessing Large Vision Models for unsupervised gait feature extraction, outperforming traditional methods and showcasing adaptability…

AI Tech News
Microsoft Researchers Introduce SpaceEvo: A Game-Changer for Designing Ultra-Efficient and Quantized Neural Networks for Real-World Devices

SpaceEvo is a novel method introduced by Microsoft researchers to automatically create specialized search spaces for efficient INT8 inference on specific hardware platforms. It offers hardware-specific, quantization-friendly neural network models and outperforms manually designed search spaces.…

AI Tech News
OpenAI vs. Vertex AI: A Comparison of Two Artificial Intelligence (AI) Powerhouses in 2024

AI Tech News
AI Revenue Streams for Home Cleaning Businesses

AI Revenue Streams for Home Cleaning: A Business Plan Executive Summary: This plan outlines a rapid-launch, low-cost business opportunity leveraging AI to generate leads and streamline operations for home cleaning businesses in the US. Utilizing the…

AI Business
Level Up Your Coding: Get Your AI Pair Programmer with Magicode 🚀

The Problem: The Limitations of Current AI Copilots Different tools focus on various parts of the software development cycle, often leading to erroneous code and constraints on users’ expressiveness. The MagiCode Solution: Autonomous Control MagiCode bridges…

AI Tech News
Explore 50+ Essential Model Context Protocol (MCP) Servers for Developers and Tech Leaders

The Model Context Protocol (MCP) is a groundbreaking advancement in the field of artificial intelligence, introduced by Anthropic in November 2024. This protocol establishes a secure and standardized interface for AI models to communicate with various…

AI Tech News
Researchers at FPT Software AI Center Introduce AgileCoder: A Multi-Agent System for Generating Complex Software, Surpassing MetaGPT and ChatDev

Introduction Code Large Language Models (CodeLLMs) have shown proficiency in generating code but struggle with complex software engineering tasks. Recent works introduced multi-agent frameworks for software development, aiming to mimic real-world software development. Introducing AgileCoder FPT…

AI Tech News
GRAF: A Machine Learning Framework that Convert Multiplex Heterogeneous Networks to Homogeneous Networks to Make Them more Suitable for Graph Representation Learning

Understanding Complex Networks with GRAF Challenges in Analyzing Complex Networks Real-world networks, like those in biomedical fields, are often complicated. They consist of various types of nodes and connections, making them heterogeneous or multiplex. Traditional graph-based…

AI Tech News