Woodpecker is a new approach that aims to fix hallucinations in Multimodal Large Language Models (MLLM), such as GPT-4V. By connecting the MLLM to the internet, Woodpecker allows the model to validate its generated descriptions using relevant internet data, leading to self-correction. It builds a visual knowledge base from the image and uses it to correct and enhance the MLLM’s initial description. Test results show significant accuracy improvement. If integrated into models like ChatGPT, Woodpecker could greatly enhance visual performance and decision-making systems relying on visual descriptions.
Multimodal Large Language Models (MLLM) and Woodpecker: Fixing Hallucinations
MLLMs like GPT-4V are great at analyzing and describing images, but they sometimes make mistakes. Woodpecker is a new approach that can fix these hallucinations.
When you ask an MLLM to describe a photo, it can usually identify objects and accurately describe the scene. However, it sometimes makes assumptions based on common associations.
For example, an MLLM might describe a photo of a shopfront scene and mention people in the scene, even if there are none.
Fixing hallucinations in text-based MLLMs is an ongoing challenge, but it becomes easier when the model is connected to the internet. By checking the generated text against relevant internet data, the model can self-correct when necessary.
Scientists from Tencent’s YouTu Lab and the University of Science and Technology of China have developed a visual solution called Woodpecker based on this approach.
How Woodpecker Works
Woodpecker uses an MLLM like GPT-3.5 Turbo to analyze the initial description generated by the MLLM and extract key concepts, such as objects and attributes. For example, in the sentence “The man is wearing a black hat,” the objects “man” and “hat” are extracted.
An LLM is then prompted to generate questions related to these concepts, such as “Is there a man in the image?” or “What is the man wearing?”
These questions are fed as prompts to a Visual Question Answering (VQA) model. The VQA model performs object detection and counting, while another model answers attribute-related questions after analyzing the image.
The answers to these questions are combined into a visual knowledge base for the image.
An LLM then uses this reference knowledge base to correct any hallucinations in the original MLLM’s description and add missing details.
Test results showed that Woodpecker improved accuracy by 30.66% for MiniGPT4 and 24.33% for the mPLUG-Owl models.
The generic nature of this approach means that Woodpecker could easily be integrated into various MLLMs.
Benefits and Applications
If OpenAI integrates Woodpecker into ChatGPT, we can expect a significant improvement in visual performance. Reducing hallucinations in MLLMs can also enhance automated decision-making systems that rely on visual descriptions.
By using Woodpecker, you can:
- Improve the accuracy of image descriptions generated by MLLMs
- Enhance automated decision-making processes
- Stay competitive in the AI landscape
How AI Can Transform Your Company
If you want to evolve your company with AI and stay competitive, Woodpecker is a valuable solution to consider. It can solve hallucination issues in MLLMs and improve your visual performance.
Here’s how you can leverage AI effectively:
- Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI.
- Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes.
- Select an AI Solution: Choose tools that align with your needs and provide customization.
- Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.
For AI KPI management advice, connect with us at hello@itinai.com. Stay updated on leveraging AI by following us on Telegram or Twitter @itinaicom.
Spotlight on a Practical AI Solution: AI Sales Bot
Consider using the AI Sales Bot from itinai.com/aisalesbot. It is designed to automate customer engagement 24/7 and manage interactions across all stages of the customer journey.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.