UC San Diego and New York University developed the V* algorithm, which outperforms GPT-4V in contextual understanding and precise targeting of specific visual elements in images. The algorithm employs a Visual Question Answering (VQA) LLM, SEAL, to focus its search on relevant areas, demonstrating superior performance in processing high-res images compared to GPT-4V. Source: DailyAI
“`html
V* – Multimodal LLM guided visual search that beats GPT-4V
Researchers from UC San Diego and New York University have developed V*, an algorithm that outperforms GPT-4V in contextual understanding and precise targeting of specific visual elements in images.
Practical Solutions and Value
The V* algorithm uses a Visual Question Answering (VQA) LLM to guide it in identifying which area of the image to focus on to answer a visual query. This approach, called SEAL (Show, sEArch, and telL), enables efficient and accurate visual analysis of images.
When prompted with a textual query about an image, V* first tries to locate the image target directly. If it’s unable to do that, it asks the MLLM to use a common sense approach to identify which area of the image the target is most likely to be in. It then focuses its search just on that area, rather than attempting a “zoomed-in” search of the entire image.
SEAL using V* performs significantly better than GPT-4V in answering questions about images, as demonstrated by its accurate responses compared to GPT-4V’s incorrect guesses.
The V*Bench benchmark tests two tasks: attribute recognition and spatial relationship reasoning, and shows the impressive boost that V* gives in SEAL’s performance, despite using a smaller MLLM than GPT-4V.
This intuitive approach to analyzing images seems to work really well with a number of impressive examples, making it a valuable tool for visual questioning and analysis.
AI Solutions for Middle Managers
If you want to evolve your company with AI, stay competitive, and use V* – Multimodal LLM guided visual search that beats GPT-4V, consider the following practical steps:
- Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI.
- Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes.
- Select an AI Solution: Choose tools that align with your needs and provide customization.
- Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.
For AI KPI management advice and continuous insights into leveraging AI, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.
Spotlight on a Practical AI Solution
Consider the AI Sales Bot from itinai.com/aisalesbot, designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.
“`