Cognition AI and Applied Compute jointly developed the SWE-Check model, which uses reinforcement learning to detect code bugs, with speed and cost significantly outperforming state-of-the-art models. Although the gap with Claude Opus 4.6 has narrowed in evaluations, further optimization is still needed. The model employs linear rewards and a two-stage training approach, aiming to improve detection accuracy and operational efficiency. The preview version is now available on Windsurf Next.

MeNews

2026-04-15 12:40:17

Abstract generation in progress

ME News Report, April 15 (UTC+8), according to Beating Monitoring, the parent company of AI programming tool Windsurf, Cognition AI, has partnered with AI training company Applied Compute to train a model specifically for code bug detection called SWE-Check through reinforcement learning. The model analyzes the user’s current code changes (diff), automatically flags potential bugs introduced, and provides repair suggestions. In evaluation datasets with the same distribution as training data, SWE-Check’s F1 score has matched Claude Opus 4.6 (the gap has decreased from 0.09 to 0); in cross-distribution evaluations, the gap has narrowed from 0.49 to 0.29, still behind the cutting-edge models but showing significant progress. The key advantages are speed and cost: SWE-Check runs an order of magnitude faster than state-of-the-art models, and inference costs are greatly reduced, enabling real-time, free bug detection within IDEs—something that large models like Opus 4.6 cannot do through direct calls. Two training methods with noteworthy design features are as follows: 1. Reward linearization: The team aims to optimize the global F-beta metric, but this metric cannot be directly decomposed into individual samples. They convert the global metric into a reward function calculable per sample via first-order approximation, allowing effective optimization of the overall metric. Early versions had high false positive rates, so the team adjusted beta from 1 to 0.5 to emphasize precision. 2. Two-stage post-training: The first stage purely maximizes bug detection ability without penalizing latency; the second stage introduces latency penalties based on the statistical distribution of how long real users take to switch away after triggering detection. This staged approach outperforms simultaneous optimization of both objectives, which can easily lead to local optima, such as learning to be very fast but shallow in analysis. The preview version of SWE-Check is now available in Windsurf Next (shortcut cmd+U), and will later be integrated into the official Windsurf release. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
121.29K Popularity
#
GateMarchTransparencyReport
42.94K Popularity
#
GoldmanSachsFilesBitcoinIncomeETF
776.5K Popularity
#
USBlocksStraitofHormuz
749.65K Popularity
#
WCTCTradingChallengeShare8MUSDT
614.37K Popularity

Sitemap

Windsurf trained a specialized bug-catching small model using RL, and in internal evaluation, it has matched Claude Opus 4.6.

Trending Topics

GatePreIPOsLaunchesWithSpaceX

GateMarchTransparencyReport

GoldmanSachsFilesBitcoinIncomeETF

USBlocksStraitofHormuz

WCTCTradingChallengeShare8MUSDT

Pin