Anthropic has recently launched Claude Sonnet 4.5, a significant upgrade that sets a new standard in software engineering and real-world computer usage. This update brings several enhancements, including Claude Code checkpoints, a native VS Code extension, API memory/context tools, and an Agent SDK designed to mimic the internal structures used by Anthropic. Notably, the pricing remains the same as its predecessor, Sonnet 4, at $3 input and $15 output per million tokens.
What’s Actually New?
SWE-bench Verified Record
One of the standout features of Claude Sonnet 4.5 is its performance on the SWE-bench Verified dataset. Anthropic reports an impressive accuracy of 77.2% on a 500-problem set using a straightforward two-tool scaffold (bash + file edit). This score is averaged over ten runs without any test-time compute and utilizes a 200K “thinking” budget. In a more resource-intensive setting, the accuracy reaches 78.2%, and with parallel sampling and rejection techniques, it can achieve as high as 82.0%.
Computer-use SOTA
On the OSWorld-Verified dataset, Sonnet 4.5 shows significant improvement, scoring 61.4%, a notable increase from Sonnet 4’s 42.2%. This leap reflects enhanced control over tools and user interface manipulation, which are crucial for executing tasks on browsers and desktop environments.
Long-horizon Autonomy
Another critical advancement is the observed ability of the model to maintain over 30 hours of uninterrupted focus on multi-step coding tasks. This capability is a leap forward from previous limitations and is vital for ensuring agent reliability in complex scenarios.
Reasoning and Math Enhancements
The release notes highlight “substantial gains” in reasoning and mathematical evaluations, coupled with a robust safety posture (ASL-3) that improves defenses against prompt-injection vulnerabilities.
What’s There for Agents?
Sonnet 4.5 also addresses the challenges faced by real agents, such as extended planning, memory management, and reliable tool orchestration. The Claude Agent SDK provides production patterns that go beyond a basic LLM endpoint, offering features such as memory management for long-running tasks, permissioning, and coordination among sub-agents. This architecture allows teams to replicate the same scaffolding used by Claude Code, which now includes checkpoints, a refreshed terminal, and VS Code integration, ensuring coherence and reversibility in multi-hour projects.
For tasks that simulate “using a computer,” the model’s notable 19-point improvement on OSWorld-Verified indicates its enhanced ability to navigate, fill spreadsheets, and execute web flows, as demonstrated in Anthropic’s browser demo. For enterprises considering robotic process automation (RPA) applications, higher OSWorld scores generally correlate with lower intervention rates during execution.
Where You Can Run It?
- Anthropic API & Apps: Model ID claude-sonnet-4-5; pricing remains consistent with Sonnet 4. File creation and code execution are now directly accessible in Claude applications for paid tiers.
- AWS Bedrock: Available through Bedrock, offering integration paths to AgentCore with features for long-horizon agent sessions and memory/context capabilities.
- Google Cloud Vertex AI: Now generally available on Vertex AI, supporting multi-agent orchestration and provisioned throughput for large-scale jobs.
- GitHub Copilot: Public preview across Copilot Chat and CLI, allowing organizations to enable features via policy and support for custom keys in VS Code.
Summary
In summary, Claude Sonnet 4.5 stands out with a documented 77.2% accuracy on the SWE-bench Verified score and a 61.4% lead on OSWorld-Verified tasks. The practical updates, including checkpoints, SDK, and availability across various platforms like Copilot and AWS, position it as a strong contender for long-running, tool-intensive agent workloads. While independent replication will ultimately determine the model’s sustained performance and its claim to be “the best for coding,” its design focuses on autonomy, scaffolding, and enhanced computer control, addressing common production challenges faced by developers today.
FAQ
- What are the primary enhancements in Claude Sonnet 4.5? The main enhancements include improved accuracy on coding tasks, better tool control, and extended autonomy for multi-step tasks.
- How does Claude Sonnet 4.5 compare to its predecessor? Sonnet 4.5 shows significant improvements in accuracy and functionality, particularly in handling complex coding scenarios and user interface tasks.
- Where can I access Claude Sonnet 4.5? It can be accessed through the Anthropic API, AWS Bedrock, Google Cloud Vertex AI, and GitHub Copilot.
- What is the pricing model for Claude Sonnet 4.5? The pricing remains unchanged from Sonnet 4, at $3 input and $15 output per million tokens.
- What industries can benefit from using Claude Sonnet 4.5? It is particularly beneficial for software development, robotic process automation, and any field requiring complex agent-based tasks.


























