This AI Paper Introduces AssistantBench and SeePlanAct: A Benchmark and Agent for Complex Web-Based Tasks

Introducing AssistantBench and SeePlanAct: Enhancing AI for Web-Based Tasks

Addressing Challenges in Web-Based AI

Artificial intelligence (AI) aims to develop systems for tasks requiring human intelligence, such as web-based interactions. However, current models face challenges in managing complex tasks effectively.

Challenges and Solutions

Existing methods like closed-book language models and retrieval-augmented models have limitations in accuracy and reliability. To address this, researchers introduced ASSISTANTBENCH, a benchmark for evaluating web agents, and SEEPLANACT (SPA), a novel web agent designed to enhance task performance.

Enhancements of SPA

SPA incorporates a planning component and a memory buffer to improve web navigation and task execution. These enhancements enable SPA to interact more robustly with web elements and adjust its plan dynamically, resulting in a more effective solution for handling complex web tasks.

Performance Evaluation

Performance evaluations of SPA on the ASSISTANTBENCH benchmark showed significant improvements over previous models, achieving higher accuracy and precision in answering questions. However, the overall accuracy of the best-performing models did not exceed 25%, indicating the ongoing challenges in developing reliable web-based AI solutions.

Conclusion and Future Outlook

The introduction of ASSISTANTBENCH and SPA represents a significant step forward in addressing the challenges of web-based AI. However, there remains a gap in achieving highly reliable AI solutions, emphasizing the need for continued innovation and improvement in this field.

If you want to evolve your company with AI, stay competitive, and use AI for your advantage, connect with us at hello@itinai.com for AI KPI management advice and continuous insights into leveraging AI.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

List of Useful Links:

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Unlocking the Brain’s Language Response: How GPT Models Predict and Influence Neural Activity

Recent advancements in machine learning and artificial intelligence have facilitated the development of advanced AI systems, particularly large language models (LLMs). A recent study by MIT and Harvard researchers delves into predicting and influencing human brain…

AI Tech News
Conformal Prediction via Regression-as-Classification

Conformal Prediction for Efficient Regression Addressing Challenges with Practical Solutions Conformal prediction (CP) for regression can be challenging, particularly with complex output distributions. To overcome this, we convert regression to a classification problem and then employ…

AI Tech News
This AI Paper from USC and Google Introduces SELF-DISCOVER: An Efficient Machine Learning Framework for Models to Self-Discover a Reasoning Structure for Any Task

The introduction of Large Language Models in Artificial Intelligence, propelled by the transformer architecture, has greatly enhanced machines’ ability to comprehend and solve problems akin to human cognition. USC and Google’s researchers have introduced SELF-DISCOVER, improving…

AI Tech News
Does AI display racial and gender bias when evaluating images?

Researchers from the National Research Council Canada experimented with four large vision-language models to assess racial and gender bias. They found biases in the models’ evaluation of scenarios in images based on race and gender. Their…

AI Tech News
Fact or Fiction? NOCHA: A New Benchmark for Evaluating Long-Context Reasoning in LLMs

Natural Language Processing (NLP) in Artificial Intelligence Natural Language Processing (NLP) involves developing algorithms and models that enable computers to comprehend, interpret, and generate human language. This technology finds applications in various domains, such as machine…

AI Tech News
Small and Large Language Models: Balancing Precision, Efficiency, and Power in the Evolving Landscape of Natural Language Processing

Small and Large Language Models: Balancing Precision, Efficiency, and Power in the Evolving Landscape of Natural Language Processing Small Language Models: Precision and Efficiency Small language models, with fewer parameters and lower computational requirements, offer practical…

AI Tech News
Meta AI Introduces AdaCache: A Training-Free Method to Accelerate Video Diffusion Transformers (DiTs)

Video Generation in AI Video generation is a key area in artificial intelligence, focusing on creating high-quality, consistent videos. The latest machine learning models, especially diffusion transformers (DiTs), are leading the way, offering better quality than…

AI Tech News
This AI Research Introduces a Novel Vision-Language Model (‘Dolphins’) Architected to Imbibe Human-like Abilities as a Conversational Driving Assistant

Researchers from multiple universities and NVIDIA have developed Dolphins, a vision-language model for autonomous vehicles. Dolphins excel in providing driving instructions by combining language reasoning with visual understanding, exhibiting human-like features such as rapid learning and…

AI Tech News
10 Ways to Use Generative AI for Database

Generative AI for databases is a transformative technology that impacts how humans interact with technology. It has the potential to revolutionize database management for both data scientists and non-data scientists alike.

AI Tech News
DeepSeek-AI Introduces Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

The Fire-Flyer AI-HPC Architecture: Revolutionizing Affordable, High-Performance Computing for AI Addressing Industry Challenges The demand for processing power and bandwidth has surged due to the advancements in Large Language Models (LLMs) and Deep Learning. Challenges such…

AI Tech News
Group Think: Enhancing Collaborative LLM Inference with Token-Level Multi-Agent Reasoning

Enhancing Business Efficiency with Group Think: A New Approach to AI Collaboration Introduction to Group Think In the rapidly evolving field of artificial intelligence, the ability for large language models (LLMs) to work together is gaining…

AI News
Top Chinese Open Agentic/Reasoning Models of 2025: A Comprehensive Review for Developers

Introduction to Chinese Open Agentic Models China has emerged as a leader in the development of open-source large language models, particularly in the realms of agentic structures and profound reasoning capabilities. With advancements that rival other…

AI Tech News
RxEnvironments.jl: A Reactive Programming Approach to Complex Agent-Environment Simulations in the Julia Language

Practical Solutions and Value of RxEnvironments.jl for AI-driven Simulations Introduction to Free Energy Principle and Active Inference The Free Energy Principle (FEP) and Active Inference (AIF) offer insights into self-organization in natural systems. Agents use generative…

AI Tech News
Meet SD4J: An Implementation of Stable Diffusion Inference in Java that can Generate Images with Deep Learning

Stable Diffusion in Java (SD4J) leverages deep learning to transform text into vibrant images, with the ability to handle negative inputs. Its Graphical User Interface simplifies image generation, and integration with ONNXRuntime-Extensions enhances functionality. Users can…

AI Tech News
Anthropic Launches Claude Opus 4 and Sonnet 4: Advances in AI Reasoning and Coding

Anthropic’s Claude Opus 4 and Claude Sonnet 4: Advancements in AI for Business Introduction to Claude Models Anthropic has launched its latest language models, Claude Opus 4 and Claude Sonnet 4. These models represent a significant…

AI News
Top Open-Source Large Language Model (LLM) Evaluation Repositories

Practical Solutions for Large Language Model (LLM) Evaluation DeepEval DeepEval offers a comprehensive set of over 14 metrics for evaluating LLMs, making it easier to assess model performance. It also provides real-time evaluation and the ability…

AI Tech News
Emerging Trends in Machine Translation: Leveraging Large Reasoning Models

Transforming Machine Translation with Large Reasoning Models Machine Translation (MT) is essential for global communication, allowing automatic text translation between languages. Neural Machine Translation (NMT) has advanced this field using deep learning to understand complex language…

AI Tech News
DLAP: A Deep Learning Augmented LLMs Prompting Framework for Software Vulnerability Detection

Practical AI Solutions for Software Vulnerability Detection Enhancing Software Security with Advanced AI Technologies Software vulnerability detection is crucial for safeguarding system security and user privacy against cyber threats. Advanced AI technologies, including large language models…

AI Tech News
Salesforce Research Introduces AgentOhana: A Comprehensive Agent Data Collection and Training Pipeline for Large Language Model

AgentOhana from Salesforce Research addresses the challenges of integrating Large Language Models (LLMs) in autonomous agents by standardizing and unifying data sources, optimizing datasets for training, and showcasing exceptional performance in various benchmarks. It represents a…

AI Tech News
OpenAI CEO Sam Altman jokes that AGI had been “achieved internally”

📢 Exciting update from OpenAI’s CEO, Sam Altman! In a recent statement, Altman teased that artificial general intelligence (AGI) had been “achieved internally.” 🚀 This lighthearted remark stirred up the tech community, sparking debates and discussions…

AI Tech News