Itinai.com llm large language model structure neural network 0d282625 3ef2 4740 b809 9c0ca56581f0 2
Itinai.com llm large language model structure neural network 0d282625 3ef2 4740 b809 9c0ca56581f0 2

SWE-Bench Achieves 50.8% Performance with Monolithic LCLM Agents

SWE-Bench Achieves 50.8% Performance with Monolithic LCLM Agents



Optimizing Software Engineering with Language Models

Optimizing Software Engineering with Language Models

Introduction to Language Model Agents

Recent advancements in language model (LM) agents have showcased their potential to automate complex tasks in various fields, including software engineering, robotics, and scientific research. Typically, these agents propose and execute actions through APIs. As tasks become more intricate, frameworks for LM agents have adapted to include multiple agents, multi-step retrieval processes, and custom scaffolding to enhance performance.

Challenges and Strategies in Software Engineering

Understanding the Environment

A significant challenge in deploying LM agents is effectively understanding and exploring the environment. Many existing methods operate on the assumption of partial observability, requiring agents to collect observations over time. However, in fully observable environments like SWE-bench, where all relevant information is available from the start, this approach may be unnecessarily complex.

Research Strategies

Research has focused on two primary strategies in software engineering:

  • Agent-Based Frameworks: Systems like SWE-Agent and OpenHands CodeAct allow LMs to autonomously interact with codebases through custom interfaces.
  • Structured Pipelines: Approaches like Agentless and CodeMonkey break down tasks into sequential phases such as localization, repair, and validation.

Research Findings and Case Studies

Researchers from Stanford, IBM, and the University of Toronto examined the necessity of complex scaffolding for LM agents in tasks like SWE-bench. Their findings revealed that using Long-Context LMs (LCLMs), such as Gemini-1.5-Pro, achieved competitive performance without scaffolding, reaching a success rate of 38% on SWE-Bench-Verified tasks. Remarkably, Gemini-2.5-Pro achieved a rate of 50.8% under a similar straightforward approach, indicating that many complex designs could be simplified.

Hybrid Approaches

A hybrid approach utilizing both Gemini-1.5-Pro and Claude-3.7 reached a solve rate of 48.6%, reinforcing the benefits of a simplified architecture.

Innovative Agent Models

Traditional LM agents often depend on interactive exploration. However, many tasks, especially in software debugging, allow for full observability. The study introduces state-in-context agents that directly utilize LCLMs to process complete or compressed states of the environment, reducing the need for complex scaffolding. Two methods were developed:

  • DIRECTSOLVE: LCLMs solve tasks using the full context.
  • SELECTSOLVE: LCLMs localize relevant files for Short-Context LMs (SCLMs) to address.

Experimental Evaluation

Experiments assessed a simplified agent framework using LLMs on the SWE-bench Verified benchmark, which includes 500 real-world software engineering tasks. Results indicated that DIRECTSOLVE surpassed complex agentic approaches like Agentless and CodeAct with minimal engineering effort. SELECTSOLVE further improved accuracy by leveraging stronger models for patching.

Cost Considerations

Currently, the cost of utilizing LCLM methods is higher than traditional approaches. However, rapid decreases in inference costs and increasing context lengths are making LCLMs more viable. Techniques such as key-value caching can significantly lower costs after initial runs. While slight changes in codebases may limit caching benefits, advancements in this area could enhance practicality.

Conclusion

In summary, unscaffolded LCLM models can effectively perform on SWE-bench tasks, indicating a potential shift towards simpler architectures in the future. By embracing these advanced models, businesses can enhance productivity, streamline processes, and achieve better results.

For more insights and guidance on integrating AI into your business operations, feel free to reach out to us at hello@itinai.ru or connect with us on Telegram, X, and LinkedIn.

Explore the full research paper here.


Itinai.com office ai background high tech quantum computing 0002ba7c e3d6 4fd7 abd6 cfe4e5f08aeb 0

Vladimir Dyachkov, Ph.D
Editor-in-Chief itinai.com

I believe that AI is only as powerful as the human insight guiding it.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

  • Automation of internal processes.
  • Optimizing AI costs without huge budgets.
  • Training staff, developing custom courses for business needs
  • Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

100% of clients report increased productivity and reduced operati

AI news and solutions