ScreenSpot-Pro: The First Benchmark Driving Multi-Modal LLMs into High-Resolution Professional GUI-Agent and Computer-Use Environments

ScreenSpot-Pro: The First Benchmark Driving Multi-Modal LLMs into High-Resolution Professional GUI-Agent and Computer-Use Environments

Challenges Faced by GUI Agents in Professional Environments

GUI agents encounter three main challenges in professional settings:

  • Complex Applications: Professional software is more intricate than general-use applications, requiring a deep understanding of complex layouts.
  • High Resolution: Professional tools often have higher resolutions, leading to smaller targets and less accurate interactions.
  • Additional Tools: The need for extra tools and documents complicates workflows.

These challenges underline the importance of advanced solutions to improve GUI agent performance.

Limitations of Current GUI Grounding Models

Existing GUI grounding models and benchmarks do not meet the needs of professional environments:

  • Tools like ScreenSpot are designed for low-resolution tasks and do not accurately reflect real-world scenarios.
  • Models such as OS-Atlas and UGround are inefficient and struggle with small targets or icon-heavy interfaces.
  • Lack of multilingual support limits their use in global contexts.

These gaps highlight the need for more realistic benchmarks in this field.

Introducing ScreenSpot-Pro

A team from various universities has developed ScreenSpot-Pro, a framework specifically for high-resolution professional environments. Key features include:

  • A dataset with 1,581 tasks across 23 applications in various industries.
  • High-resolution visuals and expert annotations for accuracy.
  • Multilingual guidelines in English and Chinese.

ScreenSpot-Pro documents real workflows, making it a valuable tool for assessing and developing GUI grounding models.

Realistic Dataset Characteristics

ScreenSpot-Pro captures challenging scenarios with:

  • High-resolution images where target regions are only 0.07% of the total screen.
  • Data collected by professionals using specialized tools for precise annotations.
  • Support for bilingual functionality and various workflows.

This dataset is crucial for improving the accuracy and flexibility of GUI agents.

Performance Analysis of GUI Grounding Models

Analysis using ScreenSpot-Pro shows significant shortcomings in current models:

  • OS-Atlas-7B achieved only 18.9% accuracy.
  • Iterative methods like ReGround improved performance to 40.2% through fine-tuning.
  • Small components and bilingual tasks posed challenges for these models.

These results highlight the need for better techniques to enhance contextual understanding in complex GUI environments.

Transformative Impact of ScreenSpot-Pro

ScreenSpot-Pro establishes a new standard for evaluating GUI agents in high-resolution professional settings. It addresses complex workflow challenges and provides a precise dataset to drive innovation. This advancement leads to smarter, more efficient agents that enhance productivity across all industries.

Get Involved

Explore the Paper and Data for more insights. Follow us on Twitter, join our Telegram Channel, and connect with our LinkedIn Group. Join our 60k+ ML SubReddit for ongoing discussions.

Webinar Invitation

Join our webinar for actionable insights on improving LLM model performance while ensuring data privacy.

Leverage AI for Your Business

Stay competitive by utilizing ScreenSpot-Pro to enhance your professional workflows:

  • Identify Automation Opportunities: Find key areas for AI integration.
  • Define KPIs: Measure the impact of your AI initiatives.
  • Select an AI Solution: Choose tools that fit your needs.
  • Implement Gradually: Start small, gather data, and expand wisely.

For AI KPI management advice, reach out at hello@itinai.com. Stay updated on AI insights via our Telegram or Twitter.

Transform Your Sales and Customer Engagement

Discover how AI can enhance your sales processes at itinai.com.

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.