Creating your own code writing agent. How to get results fast and avoid the most common pitfalls

“`html

In this blog post we walk you through our journey creating an LLM-based code writing agent from scratch – fine tuned-for your needs and processes – and we share our experience of how to improve it iteratively.

Introduction

This article is the second part in our series on Coding Agents. The first part provides an in-depth look at existing solutions, exploring their unique attributes and inherent limitations. We recommend starting there for the full picture.

Riding the wave of AutoGPT’s initial popularity surge, we embarked on a mission to uncover its potential for more complex software development projects. Our focus was Data Science, a domain close to our hearts.

Upon realizing the pitfalls of AutoGPT and other Coding Agents, we decided to create an innovative tool of our own. However, it turned out that other powerful solutions had entered the arena in the meantime, nudging us to test all the agents on a common benchmark.

In this article, we invite you on a journey through the evolution of our own AI Agent solutions, from humble beginnings with a basic model to advanced context retrieval. We’ll evaluate both our agents and those currently available to the public, and we’ll also reveal the challenges and limitations we encountered during the development process. Finally, we’ll share the insights and lessons learned from our experience. So join us as we navigate the realm of Coding Agent development, detailing the peaks, the valleys, and all the intricate details in between.

Creating a Data Scientist Agent

All of the agents that we presented in the previous article were mainly evaluated on pure software engineering tasks such as building simple games or writing web servers with Rest API. But when tested on traditional Data Science problems such as image classification, object detection or sentiment analysis, their performance was far from perfect. That’s why we decided to build our own Agent to be better at solving Data Science problems.

From the beginning, we determined the two principal approaches to be explored:

Approach A: The agent will initially generate the complete plan and subsequently execute it.

Approach B: The agent will iteratively create a plan for problem-solving.

We utilized the GPT-3.5 and GPT-4 models to engineer our coding agents. We did explore other models, like Claude 2, but their output paled in comparison to the quality of code produced by the OpenAI models.

We evaluated numerous benchmarks, all of which included only initial objective descriptions. Illustrated in Figure 1 is the description of the simplest benchmark among the seven we tested. This particular benchmark will serve as the example for demonstrating how the agents function in the next sections of this blog post.

…

Approach A: Generating the plan upfront

In this section, we’ll walk you through how our AI Agent evolved over time. We started with a basic implementation and gradually improved it until we reached a point where the agent could successfully write code and tests, refactor previously written code, and much more.

Baseline

We started with the implementation of a baseline agent to quickly assess whether this path made sense. Our baseline represents one of the simplest approaches you can create. It relies heavily on the underlying models and the initial problem description provided by the user. For example, we asked the models to produce a plan in the form of a valid JSON, only to discover that the models failed to do so, returning JSON that couldn’t be parsed. As a result, the entire program would fail (and you can imagine how challenging it is to carry out a Data Science project without a proper plan!).

…

Improving planning and context retrieval

In the next iteration, we paid more attention to reducing the size of the input context, thus optimizing latency and cost. This time, instead of returning the full files in the context, we decided to divide the files into chunks and return only the useful ones to the model.

…

Approach B: Generating a plan on the fly

This approach is different from the others, because it is inspired by the fact that developers are never able to plan every little step that will be taken when starting a new project. Python scripts are not written once from top to bottom, but instead are constantly changed and improved.

…

Evaluation

In order to measure and compare our open-source AI agent and others, we have prepared a set of benchmark tasks. We’ve evaluated coding writing agents over a total of seven small-scale data science projects, which included tasks like:

…

Once the agents completed their tasks, we evaluated the outcomes of the projects they generated. The evaluation was undertaken in two steps:

…

Limitations

While AI Agents show surprising autonomy in the coding task performance, they still face several limitations, with some critical ones being:

…

Lessons learned

We wish to share some of our insights and useful tricks that we learned from our experiments that you may find useful when experimenting with your own AI agents:

…

Creating Coding Agents: final thoughts

In this article, we presented the process of creating our unique Coding Agents and compared them to the currently available solutions. Initially, our solution surpassed the available counterparts. However, the dynamic nature of technology soon brought new contenders like MetaGPT, which at the time of writing exceeds the efficiency of all others and also stands as an equal rival for our agent.

…

Interested in streamlining your projects with AI-based software development? Feel free to reach out to us for comprehensive solutions tailored to your specific needs.

The post Creating your own code writing agent. How to get results fast and avoid the most common pitfalls appeared first on deepsense.ai.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

…

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

“`

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…
AI Agents

Billing Specialist – Explaining billing policies, payment processes, or past invoice details using ERP/CRM data.

The role of a Billing Specialist is essential for ensuring effective communication of billing policies, payment processes, and past invoice information using ERP and CRM data. A Billing Specialist acts as a liaison between clients and…
AI Agents

Training Program Manager – Generating course outlines and answering questions about learning paths or certification procedures.

Professional CV Job Title: Training Program Manager The Training Program Manager is responsible for generating course outlines and answering questions about learning paths or certification procedures. This role involves several key steps: Role Description First, the…
AI Agents

Risk Analyst – Generating scenario briefs and referencing historical incident data to support assessments.

Professional CV Risk Analyst – Generating Scenario Briefs and Referencing Historical Incident Data to Support Assessments An AI is a reliable and effective digital team member that performs repetitive and time-consuming tasks, improving speed, accuracy, and…
AI Agents

Facilities Manager – Answering staff queries about office access, safety protocols, or maintenance workflows.

Facilities Manager – Answering Staff Queries About Office Access, Safety Protocols, or Maintenance Workflows Job Responsibilities and AI Integration The Facilities Manager plays a crucial role in addressing staff queries related to office access, safety protocols,…

AI news and solutions

AI News

Snowflake’s ExCoT: Optimizing Open-Source LLMs with CoT Reasoning and DPO for Enhanced Text-to-SQL Accuracy

Snowflake’s ExCoT Framework: Optimizing AI for Business Solutions Snowflake’s ExCoT Framework: Optimizing AI for Business Solutions Introduction to ExCoT Snowflake has introduced a groundbreaking framework known as ExCoT, aimed at enhancing the performance of open-source Large…
AI News

Advancing Vision-Language Reward Models: Challenges and Innovations in Multimodal Learning

Advancing Vision-Language Reward Models: Practical Business Solutions Advancing Vision-Language Reward Models: Practical Business Solutions In the rapidly evolving field of artificial intelligence, process-supervised reward models (PRMs) present new opportunities for enhancing multimodal learning, particularly in vision-language…
AI News

Salesforce AI Launches BingoGuard: Advanced LLM-Based Moderation System for Enhanced Content Safety

Salesforce AI Introduces BingoGuard: A New Era in Content Moderation Salesforce AI Introduces BingoGuard: A New Era in Content Moderation Overview of BingoGuard Salesforce AI has launched BingoGuard, an innovative moderation system that leverages large language…
AI News

Enhancing Gomoku Decision-Making with LLMs and Reinforcement Learning

Enhancing Strategic Decision-Making in Gomoku Using AI Enhancing Strategic Decision-Making in Gomoku Using AI Introduction Large Language Models (LLMs) have revolutionized natural language processing (NLP), showcasing advanced text generation, comprehension, and reasoning abilities. These models have…
Tools

Meta’s Code Llama vs OpenAI Codex: Which AI Fits Your Product Roadmap?

Technical Relevance In an era where the demand for rapid development cycles and cost-effective solutions is at an all-time high, Code Llama Meta’s code generation model emerges as a game-changer. This AI-driven tool democratizes access to…
AI News

OpenAI Launches PaperBench: New Benchmark for Evaluating AI in Machine Learning Research Replication

OpenAI’s PaperBench: A New Benchmark for AI Evaluation OpenAI’s PaperBench: A New Benchmark for AI Evaluation Introduction The rapid advancements in artificial intelligence (AI) and machine learning (ML) highlight the necessity for effective evaluation methods. Understanding…
AI News

Mitigating Hallucinations in Large Vision-Language Models with Latent Space Steering

Mitigating Hallucinations in Large Vision-Language Models Mitigating Hallucinations in Large Vision-Language Models: Practical Business Solutions Understanding the Challenge of Hallucinations in LVLMs Large Vision-Language Models (LVLMs) are powerful tools that combine visual and textual data to…
AI News

Nomic Launches State-of-the-Art Multimodal Embedding Model for Visual Document Retrieval

Nomic Launches Advanced Multimodal Embedding Model Nomic has introduced a revolutionary embedding model that excels in visual document retrieval tasks. This state-of-the-art model efficiently handles interleaved text, images, and screenshots, achieving a remarkable score on the…
AI News

Meta AI Introduces Multi-Token Attention: Revolutionizing LLM Contextual Understanding

Meta AI’s Multi-Token Attention: Revolutionizing Language Models Meta AI’s Multi-Token Attention: Revolutionizing Language Models Introduction to Attention Mechanisms in Language Models Large Language Models (LLMs) rely heavily on attention mechanisms to efficiently retrieve contextual information. However,…
AI News

Amazon Nova Act: The AI Agent Revolutionizing Web Task Automation

Amazon Nova Act: Revolutionizing Web Task Automation Amazon Nova Act: Revolutionizing Web Task Automation Introduction to Amazon Nova Act Amazon has introduced a groundbreaking AI model named Nova Act, designed to streamline various web tasks. This…
Tools

Tabnine vs Code Llama: Real-Time Coding AI for Agile Product Launches

Technical Relevance: Why Tabnine Is Important for Modern Development Workflows In a rapidly evolving tech landscape, developers are under constant pressure to deliver high-quality software at an unprecedented pace. Tabnine, an AI-powered code completion tool, is…
AI News

Beginner’s Guide to Terminal and Command Prompt: Essential Commands and Tips

The Complete Beginner’s Guide to Terminal/Command Prompt The Complete Beginner’s Guide to Terminal/Command Prompt Introduction The terminal (on Mac/Linux) or command prompt (on Windows) is a powerful tool that allows users to interact with their computers…
AI News

ByteDance’s Hybrid Reward System: Enhancing RLHF with RTV and GenRM

Introduction to a Hybrid Reward System in AI The recent research paper from ByteDance introduces a significant advancement in artificial intelligence through a hybrid reward system. This system combines Reasoning Task Verifiers (RTV) and a Generative…
AI News

ReSearch: An AI Framework for LLMs Integrating Reasoning and Search with Reinforcement Learning

Introducing ReSearch: A Groundbreaking AI Framework Overview of ReSearch Large language models (LLMs) have made significant strides in reasoning tasks. However, merging reasoning with external search processes remains a complex challenge, especially for questions that require…
AI News

How to Use Git and Git Bash Locally: A Complete Guide

Using Git and Git Bash: A Business Guide Using Git and Git Bash Locally: A Business Guide Table of Contents Introduction Installation Windows macOS Linux Basic Git Commands Git Configuration Git Workflow Creating a Repository Committing…
Tools

Microsoft Azure AI vs AWS AI: Automate Product Workflows & Boost Customer Engagement

Technical Relevance: Why Microsoft Azure AI is Important for Modern Development Workflows In the rapidly evolving landscape of technology, businesses are increasingly turning to artificial intelligence (AI) to streamline operations, enhance customer experiences, and drive growth.…
AI News

Build an Open Source X-ray Judgment Tool with TorchXRayVision and Gradio

Building an Open Source X-ray Judgment Tool Building a Prototype X-ray Judgment Tool This guide presents a streamlined approach to creating a prototype X-ray judgment tool using open-source libraries. By utilizing TorchXRayVision alongside Gradio and PyTorch,…
AI News

Boosting Creative Writing Diversity with Diversified DPO and ORPO in AI Models

Enhancing Creative Writing with AI: Practical Solutions for Businesses Understanding the Challenge of Creative Writing in AI Creative writing relies heavily on diversity and imagination, presenting a unique challenge for artificial intelligence (AI) systems. Unlike factual…
AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…