
SWERank: A New Approach to Software Issue Localization
Identifying software issues, such as bugs or feature requests, is one of the most challenging tasks in software development. Despite advancements in automated tools, finding the exact location in the code that requires changes often takes more time than fixing the issue itself. Traditional methods can be slow and costly, especially when using closed-source models. To address these challenges, Salesforce AI has developed a new framework called SWERank, which offers a more efficient and precise way to localize software issues.
Understanding SWERank
SWERank is a lightweight framework that improves the process of software issue localization by treating it as a code ranking task. It consists of two main components:
- SWERankEmbed: A bi-encoder model that efficiently retrieves relevant code snippets by encoding GitHub issues and code into a shared space.
- SWERankLLM: A listwise reranker that refines the retrieval results using contextual understanding from large language models (LLMs).
Data-Driven Approach
To train SWERank, the research team created a dataset called SWELOC, which links real-world issue reports with the corresponding code changes from public GitHub repositories. This dataset enhances the model’s accuracy by providing high-quality training examples.
How SWERank Works
SWERank operates in two stages:
- Retrieval: SWERankEmbed converts issue descriptions and candidate functions into dense vector representations, allowing for efficient similarity-based retrieval.
- Reranking: SWERankLLM processes the issue description and the top retrieved code candidates to generate a ranked list, ensuring that the most relevant code is prioritized.
Performance Insights
SWERank has demonstrated impressive results in evaluations against standard benchmarks. For instance, SWERankEmbed-Large achieved a function-level accuracy of 82.12%, surpassing other models. When combined with SWERankLLM-Large, the accuracy improved to 88.69%, setting a new standard in the field.
Cost Efficiency
In addition to its performance, SWERank is significantly more cost-effective than traditional models. For example, while other models may cost around $0.66 per example, SWERankLLM operates at just $0.011 to $0.015 per example, providing up to six times better accuracy for the cost.
Conclusion
SWERank represents a significant advancement in software issue localization by transforming it into a ranking problem. With its efficient architecture and high-quality training data, SWERank not only achieves state-of-the-art accuracy but also reduces costs and latency. This framework illustrates that practical and scalable solutions for debugging and code maintenance are achievable using open-source tools. By focusing on efficient neural retrieval, Salesforce AI has set a new benchmark for accuracy and efficiency in automated software engineering.
For more information, check out the SWERank project page.
If you are interested in exploring how artificial intelligence can enhance your business processes, consider identifying areas where automation can add value. Start small, measure effectiveness, and gradually expand your AI initiatives. For guidance on managing AI in your business, feel free to contact us.