Salesforce AI’s GTA1: Revolutionary GUI Agent Surpassing OpenAI’s CUA

Introduction to GTA1

Salesforce AI Research has unveiled GTA1, a groundbreaking graphical user interface (GUI) agent that takes human-computer interaction to the next level. This innovative tool operates autonomously within real operating system environments, specifically targeting Linux. GTA1 successfully addresses two major challenges in GUI agent development: ambiguous task planning and inaccurate grounding of actions. Achieving a task success rate of 45.2% on the OSWorld benchmark, GTA1 has outperformed OpenAI’s CUA (Computer-Using Agent), setting a new record among open-source models.

Core Challenges in GUI Agents

GUI agents are designed to convert high-level user instructions into actionable sequences—such as clicks and keystrokes—while adapting to real-time UI changes. However, two persistent issues complicate this process:

Planning Ambiguity: Different action sequences can achieve the same task, but their efficiency and reliability can vary significantly.
Grounding Precision: Accurately translating abstract action proposals into precise GUI interactions is particularly challenging in high-resolution and dynamic interfaces.

GTA1 introduces innovative solutions to tackle these issues effectively.

Smarter Planning via Test-Time Scaling

Traditional planning methods often rely on a single action proposal at each decision point, which can limit robustness. GTA1’s test-time scaling method allows the agent to sample multiple candidate actions simultaneously. By employing a multimodal judge model—often a large language model—GTA1 can evaluate and select the most suitable action. This approach prevents premature commitment to suboptimal plans and enhances the agent’s ability to explore various execution paths without needing future rollouts, which can be impractical in GUI environments due to irreversible actions. This method is adaptable and scales effectively with increasing task complexity.

Reinforcement Learning for Grounding Accuracy

Many previous models have relied on supervised fine-tuning to predict the center of UI elements, which can limit their adaptability. GTA1 shifts to a reinforcement learning framework based on Group Relative Policy Optimization (GRPO). Instead of predicting bounding boxes, GTA1 learns directly from click-based rewards, only receiving rewards when its predicted coordinates align with the correct UI element. This reward structure enhances accuracy without the complexities of chain-of-thought supervision. Interestingly, studies indicate that removing auxiliary signals can actually improve grounding performance, especially in static environments.

Performance Across Benchmarks

GTA1 has set a new benchmark in several evaluations:

OSWorld (Task Success Rate): GTA1-7B achieves 45.2%, surpassing OpenAI CUA’s 42.9% and Claude 3.7’s 28.0%.
ScreenSpot-Pro (Grounding Accuracy): GTA1-7B scores 50.1%, outperforming UGround-72B’s 34.5%.
ScreenSpot-V2 (Cross-platform Grounding): GTA1-72B reaches 94.8%, closely matching top proprietary models.
OSWorld-G (Linux GUI Grounding): GTA1-7B achieves 67.7%, outperforming all previous open-source approaches.

These impressive results validate the effectiveness of GTA1’s innovative planning and grounding techniques.

Additional Design Highlights

GTA1’s design incorporates several additional features that enhance its performance:

Data Cleaning: Misaligned annotations from datasets like Aria-UI and OS-Atlas are filtered through OmniParser, ensuring better training signal fidelity.
Model Scaling: The architecture scales efficiently from 7B to 72B parameters, with the 7B model providing an optimal balance of performance and computational efficiency.
Judge Reusability: The multimodal judge used in test-time scaling can double as the planning LLM, reducing overall computational overhead.

Conclusion

GTA1 represents a significant advancement in creating robust and accurate GUI agents through a modular two-stage framework that emphasizes test-time planning diversity and precise reinforcement learning-based grounding. By eliminating unnecessary complexities, Salesforce AI has developed an effective agent architecture that pushes the boundaries of digital interaction.

FAQ

What is GTA1, and how does it differ from previous models?
GTA1 is a new GUI agent developed by Salesforce AI that improves upon previous models by enhancing task planning and grounding accuracy using innovative techniques.
What challenges do GUI agents typically face?
GUI agents often struggle with planning ambiguity and grounding precision, which can affect their efficiency and reliability.
How does test-time scaling improve planning?
This method allows GTA1 to sample multiple actions simultaneously, enabling better decision-making without committing to suboptimal plans prematurely.
What role does reinforcement learning play in GTA1’s performance?
Reinforcement learning helps GTA1 achieve high grounding accuracy by rewarding the agent for correctly predicting the coordinates of UI elements.
In what benchmarks has GTA1 excelled?
GTA1 has set new records in several benchmarks, including OSWorld and ScreenSpot, outperforming previous models significantly.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Top 25 AI Assistants in 2025

Unlocking the Power of AI Assistants Enhancing Productivity and Personal Support In today’s fast-paced digital world, AI assistants are crucial for boosting productivity and managing daily tasks. These tools, from voice-activated devices to smart chatbots, help…

AI Tech News
How Scientific Machine Learning is Revolutionizing Research and Discovery

AI Tech News
aiOla Releases Whisper-NER: An Open Source AI Model for Joint Speech Transcription and Entity Recognition

Advancements in Speech Recognition Technology Speech recognition technology has improved significantly, thanks to AI. It enhances accessibility and accuracy but still struggles with understanding names, places, and specific terms. The challenge is not just converting speech…

AI Tech News
This AI Paper by UC Berkeley Explores the Potential of Self-play Training for Language Models in Cooperative Tasks

The Potential of Self-play Training for Language Models in Cooperative Tasks Advancements in AI AI has made significant strides in game-playing, such as AlphaGo’s superhuman performance using self-play techniques. These techniques have pushed AI capabilities beyond…

AI Tech News
IBM AI Research Introduces Unitxt: An Innovative Library For Customizable Textual Data Preparation And Evaluation Tailored To Generative Language Models

IBM Research introduces Unitxt, a collaborative platform for processing unified textual data, offering a Python module with configurable pipelines for handling textual data in multiple languages. This facilitates collaboration, transparency, and reproducibility. Unitxt allows for over…

AI Tech News
Sitemap, API and other feed

The Role of AI in Modern Business Transformation Artificial Intelligence (AI) is no longer a futuristic concept—it’s a business imperative. At itinai.com, we specialize in transforming workflows through tailored AI solutions, ensuring efficiency, scalability, and competitive…

Chief Editor Blog
Researchers from MIT and Harvard University Work on Enhancing AI Integrity: The Urgent Need for Standardized Data Provenance Frameworks

Practical Solutions for Enhancing AI Integrity Challenges in AI Data Collection Artificial intelligence relies on vast datasets from sources like social media and news outlets. However, the unstructured nature of this data poses challenges in maintaining…

AI Tech News
Leveraging Hallucinations in Large Language Models to Enhance Drug Discovery

Understanding Hallucinations in Large Language Models (LLMs) What Are Hallucinations? Researchers have raised concerns about LLMs generating content that seems plausible but is actually inaccurate. Despite this, these “hallucinations” can be beneficial in creative fields like…

AI Tech News
Developments in Family of Claude Models by Anthropic AI: A Comprehensive Review

Anthropic AI’s Claude Family of Models: Practical Solutions and Value Claude 3: The New Generation The Claude 3 series offers three models: Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku, each catering to specific…

AI Tech News
Data Engineering Interview Questions

This article provides data engineering interview preparation tips, covering common questions and answers. It highlights the importance of research, familiarity with data platform architecture types, coding skills, demonstrating confidence with DE tools, and knowledge of ETL.…

AI Tech News
Is the Future of Agentic AI Personal? Meet PersonaRAG: A New AI Method that Extends Traditional RAG Frameworks by Incorporating User-Centric Agents into the Retrieval Process

The Future of Agentic AI: PersonaRAG Enhancing User-Centric AI Interactions In the field of natural language processing, PersonaRAG represents a significant advancement in Retrieval-Augmented Generation (RAG) systems. It introduces a novel AI approach designed to enhance…

AI Tech News
Top 3 Qualtrics Competitors in 2023

Online surveys are an essential tool for businesses to collect customer feedback, with around 90% of companies using them. This article discusses the top three competitors of Qualtrics, a popular survey tool, in 2023.

AI Tech News
Sup3rCC: An Open-Source Machine Learning Model that Simulates Future Climate Conditions and Their Impact on Renewable Energy Resources

AI Tech News
Scaling up learning across many different robot types

We are launching Open X-Embodiment dataset, a resource for general-purpose robotics learning. With data from 22 robot types, the dataset allows for skills transfer across various robot embodiments. Additionally, we are releasing the RT-1-X, a trained…

AI Tech News
Meta AI Introducing the Language Model Transparency Tool: An Open-Source Interactive Toolkit for Analyzing Transformer-based Language Models

AI Tech News
NeuralOS: Revolutionizing Interactive Operating System Interfaces with Generative AI

Understanding the Target Audience The target audience for NeuralOS primarily includes AI developers, researchers, and business professionals who are keen on the latest advancements in human-computer interaction (HCI). These individuals often face challenges with traditional operating…

AI Tech News
Convolution Explained — Introduction to Convolutional Neural Networks

This article provides an introduction to Convolutional Neural Networks (CNNs), explaining their pivotal role in computer vision tasks. It discusses the limitations of traditional neural networks for image recognition and the concept of convolution as a…

AI Tech News
ChatGPT 3 vs ChatGPT 4: What’s The Major Difference

The article discusses the differences between ChatGPT 3 and ChatGPT 4, highlighting ChatGPT 4’s improvements and new features over its predecessor. ChatGPT 3 is praised for its versatility and tasks it can perform, while ChatGPT 4’s…

AI Tech News
The Long and Short of It: Proportion-Based Relevance to Capture Document Semantics End-to-End

The RPRS model addresses the limitations of current search methods for long documents. It computes relevance between a query document and candidate documents based on proportional matches across their sentences. The approach consists of three stages:…

AI Tech News
Philosophy and data science — Thinking deeply about data

The article explores the intersection of philosophy and data science, focusing on causality. It delves into different philosophical theories of causality, such as deterministic vs probabilistic causality, regularity theory, process theory, and counterfactual causation. The author…

AI Tech News