Gate News, March 25 — Ant Group engineer and Umi.js front-end framework author Chen Cheng reverse-engineered the source code of Claude Code 2.1.81, fully restoring the decision mechanism of Auto Mode. The key finding: each tool invocation passes through four layers of decision-making, and only when the first three layers cannot determine the outcome will an independent AI classifier be called for safety review.
The four layers of the pipeline are: first, checking existing permission rules; if matched, allow directly; second, simulating acceptEdits mode (permission level allowing file edits). If it passes in this mode, it indicates low risk, and the classifier is skipped; third, checking the read-only tool whitelist (Read, Grep, Glob, LSP, WebSearch, etc.), which do not modify any state and are unconditionally allowed; only if none of these conditions are met does it proceed to the fourth layer, sending an API request to Claude Sonnet for safety classification.
Key design details of the classifier include: always using Sonnet instead of Opus, balancing cost and latency; setting temperature to 0 to ensure deterministic output; the classifier is defined as a “safety monitor for autonomous AI programming agents,” protecting against three types of risks (prompt injection, scope creep, unintended harm); the user’s CLAUDE.md configuration file is injected into the classifier context as the basis for judging user intent.
The interception rules cover over 22 categories, including force push, direct push to main branch, downloading and executing external code, production deployment, data leaks, self-modification permissions, creating remote code execution surfaces, credential leaks, etc. Exceptions for allowing include seven types: hardcoded test keys, local file operations within the working directory, read-only GET requests, installing declared dependencies, official toolchain installations, reading configuration credentials sent to target providers, and pushing to the current working branch.
The system also implements a circuit breaker mechanism: after three consecutive rejections or a total of 20 rejections, the system downgrades to manual confirmation; in headless mode, it directly aborts the agent. When the classifier is unavailable, a feature flag controls whether it “fail-closed” (immediately reject) or “fail-open” (downgrade to manual confirmation).
In Auto Mode, prompt injection behavior is finely controlled: injected once every five dialogue rounds, with the first injection in each five-round cycle being the full version (about 800 words, including six instructions such as “execute immediately,” “reduce interruptions,” “action over plan”), and the remaining four being a concise one-line version, balancing context window usage and behavioral stability.
Related Articles
New Altcoins Outperform Older Tokens in 2025, Data Shows Divergence in Exchange Performance
Europe's MiCA Framework Makes Euro Stablecoin 'Safe but Uncompetitive,' Blockchain Association Reports
Global Crypto Funds Record $1.2B Weekly Inflows as Bitcoin Leads Institutional Capital Surge
Bernstein: Crypto Market Shows Structural Strength, Bitcoin Poised for Extended Bull Market
From Speculation to Stability: Discovery Bank Report Reveals 7.8M South Africans Now Invest in Crypto
Gate Q1 2026 Report: Perp DEX Hits $130B in Trading Volume, TradFi Products Drive Multi-Asset Expansion