Salesforce AI Introduces TACO: A New Family of Multimodal Action Models that Combine Reasoning with Real-World Actions to Solve Complex Visual Tasks

Salesforce AI Introduces TACO: A New Family of Multimodal Action Models that Combine Reasoning with Real-World Actions to Solve Complex Visual Tasks

Effective Multi-Modal AI Systems

Building successful multi-modal AI systems for real-world use involves addressing various tasks like detailed recognition, visual grounding, reasoning, and problem-solving. Current open-source models struggle with tasks that require external tools like OCR or math calculations, mainly due to limited datasets that don’t support comprehensive reasoning.

Challenges and Limitations

Most existing models depend on simple instruction tuning with limited datasets. Proprietary systems like GPT-4 are better at logical reasoning but open-source models lack the necessary datasets and tool integration. Previous attempts, such as LLaVa-Plus, faced issues with small datasets and oversimplified tasks, hindering their ability to tackle complex multi-modal challenges.

Introducing TACO

Researchers from the University of Washington and Salesforce Research have launched TACO, a new framework designed to train multi-modal action models using advanced synthetic datasets. This framework offers several key improvements:

  • Large Datasets: Over 1.8 million traces were created using GPT-4 and Python, with 293K high-quality examples selected to ensure diverse reasoning and action sequences.
  • Tool Integration: TACO includes 15 versatile tools, such as OCR and mathematical solvers, to effectively manage complex tasks.
  • Enhanced Learning: Advanced filtering and data mixing techniques improve dataset quality, focusing on reasoning-action integration for better learning outcomes.

Training and Performance

TACO was trained on a comprehensive CoTA dataset with 293K instances from 31 sources, including Visual Genome. It features a broad array of tasks in mathematical reasoning and object localization, supported by a robust architecture combining LLaMA3 for language and CLIP for visuals. The training strategy emphasized fine-tuning to solve intricate multi-modal challenges.

Results and Impact

TACO showed remarkable improvements across eight benchmarks, with an average accuracy boost of 3.6% over other models, and up to 15% on tasks involving OCR and math. The well-curated 293K dataset outperformed larger datasets, highlighting the significance of targeted data selection.

Transforming Real-World Applications

TACO presents a new approach to multi-modal action modeling that addresses previous shortcomings in reasoning and tool usage. This innovation is set to enhance various applications, from visual question answering to complex reasoning tasks.

Research Credit and Engagement

Check out the Paper, GitHub Page, and Project Page. Follow us on Twitter, join our Telegram Channel, and participate in our LinkedIn Group. Don’t miss out on our growing ML SubReddit community.

Webinar Invitation

Join our webinar to learn practical strategies for enhancing LLM model performance while ensuring data privacy.

AI for Business Growth

Boost your business with AI by following these steps:

  • Identify Automation Opportunities: Find areas in customer interactions that can benefit from AI.
  • Define KPIs: Measure the impact of AI on business outcomes.
  • Select the Right AI Solution: Choose tools that suit your needs and allow for customization.
  • Implement Gradually: Start small, gather feedback, and expand your AI applications carefully.

Contact and Further Insights

For AI KPI management advice, reach out to us at hello@itinai.com. Stay updated on AI insights by following us on Telegram or Twitter.

Revolutionize Your Sales Processes

Explore how AI can transform your sales and customer engagement strategies at itinai.com.

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.