Skip to main content
Development 1 min read 234 views

SERA-32B Solves 55% of SWE-Bench Verified Problems, Outperforming Open Models

Open-source coding agent achieves benchmark scores surpassing Qwen3-Coder and Mistral Devstral Small 2.

TD

TechDrop Editorial

Share:

SERA-32B, the larger variant of Ai2's open-source coding agent, solves more than 55% of SWE-Bench Verified problems, surpassing prior open-source models.

Benchmark Results

SERA-32B bests most open models such as Qwen 3-Coder on matched inference setups. The model also outperforms closed models including Mistral 3's Devstral Small 2 on SWE-Bench Verified, a benchmark that tests real-world software engineering tasks.

SERA-8B Performance

The smaller SERA-8B model solves 29.4% of SWE-Bench Verified problems, compared to 9.4% on reinforcement learning baselines. This demonstrates that even smaller models can achieve meaningful performance on practical coding tasks.

Enterprise Applications

Ai2 emphasizes that every component of SERA is open, including models, code and integration into Claude Code. This allows enterprises to customize agents for their codebases while maintaining control over the software stack.

Tags: #Github

Related Articles