
Blockchain security auditing firm OpenZeppelin has conducted an independent review of the AI-based smart contract security benchmark EVMbench, launched through a collaboration between OpenAI and Paradigm, and identified two major critical issues: data contamination during training and at least four vulnerabilities labeled as “high risk” that are actually invalid forgeries.
EVMbench was released in mid-February 2026, aiming to evaluate different AI models’ ability to identify, fix, and exploit smart contract vulnerabilities. During testing, the AI agents’ internet access was cut off to prevent them from searching for answers online. However, OpenZeppelin’s audit revealed a structural flaw: the benchmark is based on vulnerabilities from 120 audits conducted between 2024 and mid-2025, and most top AI models’ knowledge cutoff dates are also set in mid-2025.
This means AI agents likely encountered EVMbench’s vulnerability reports during pretraining, and their memory may already contain answers to all the questions. OpenZeppelin stated, “The most important ability for AI security is to discover new vulnerabilities in code that the model has never seen before.” The limited size of the dataset further amplifies the impact of contamination on overall evaluation.
Beyond data contamination, OpenZeppelin uncovered specific factual errors. They evaluated at least four vulnerabilities categorized as high risk by EVMbench and found that these vulnerabilities do not exist—more critically, the described exploit methods are fundamentally ineffective.
OpenZeppelin pointed out, “These are not subjective disagreements over severity; rather, the described exploit methods simply do not work.” If an AI agent “discovers” these fake vulnerabilities during testing, it indicates the scoring system rewards incorrect results.
OpenZeppelin emphasized that this audit does not negate AI’s potential in blockchain security: “The issue is not whether AI will change the security of smart contracts— it certainly will. The problem is whether the data and benchmarks we use to build and evaluate these tools are aligned with the standards of the contracts they aim to protect.”
Q: What issues did OpenZeppelin find in their audit of EVMbench?
A: They identified two core problems: first, data contamination, as EVMbench’s test vulnerabilities come from audits conducted between 2024 and 2025, overlapping with AI models’ training cutoff dates, meaning models may have “seen” the answers during pretraining; second, at least four high-risk vulnerabilities are invalid forgeries, with exploit descriptions that are actually unexecutable.
Q: Why is data contamination so dangerous for AI security evaluation?
A: If AI models have already encountered the benchmark’s vulnerability reports during pretraining, they might answer questions based on memory rather than genuine vulnerability discovery ability. This invalidates the “zero-knowledge” test, making it impossible to accurately assess AI’s real security auditing capabilities against entirely new, unknown smart contracts.
Q: What is OpenZeppelin’s attitude toward AI’s future in blockchain security?
A: They believe AI will significantly impact smart contract security but emphasize that this influence must be based on trustworthy methodologies and accurate evaluation standards. They see the issues with EVMbench not as a rejection of AI’s potential but as an important warning to the industry standards.
Related Articles
Web3 wallet Zerion detected abnormal activity on the platform; the web service is temporarily offline
Phantom Wallet crashes big time! During the airdrop period, token prices went haywire and balances were reset to zero—users blasted it for “making them pay up.”
TAO Plummets 25% as Bittensor Co-Founder Accused of Using Token Sales to Coerce Compliance
Bitcoin Depot Discloses $3.6M BTC Theft After Hack on Settlement Accounts
OpenAI Releases an Announcement on a Third-Party Library Security Incident: No Evidence of User Data Leaks or System Intrusion Found
Blockchain security losses from 2026 to date are nearly $800 million, with incidents related to North Korea accounting for about 42%.