Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground

ETH0,06%

In brief

  • EVMbench tests AI agents on 120 real-world Ethereum smart contract vulnerabilities.
  • Tool evaluates detection, patching, and exploitation across three distinct modes.
  • GPT-5.3-Codex achieved 72.2% success rate in exploit mode testing.

ChatGPT maker OpenAI and crypto-focused investment firm Paradigm have introduced EVMbench, a tool to help improve Ethereum Virtual Machine smart contract security. EVMbench is designed to evaluate AI agents’ ability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. Smart contracts are the heart of the Ethereum network, holding the code that powers everything from decentralized finance protocols to token launches. The weekly number of smart contracts deployed on Ethereum reached an all-time high of 1.7 million in November 2025, with 669,500 deployed last week alone, according to Token Terminal. 

EVMbench draws on 120 curated vulnerabilities from 40 audits, most sourced from open audit competitions such as Code4rena, according to an OpenAI blog post. It also includes scenarios from the security auditing process for Tempo, Stripe’s purpose-built layer-1 blockchain focused on high-throughput, low-cost stablecoin payments. Payments giant Stripe launched the public testnet for Tempo in December, saying at the time that it was being built with input from Visa, Shopify, and OpenAI, among others. The goal is to ground testing in economically meaningful, real-world code—particularly as AI-driven stablecoin payments expand, the firm added.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

EVMbench is meant to evaluate AI models across three modes: Detect, patch, and exploit. In “detect,” agents audit repositories and are scored on their recall of ground-truth vulnerabilities. In “patch,” agents must eliminate vulnerabilities without breaking intended functionality. Finally, in the “exploit” phase, agents attempt end-to-end fund-draining attacks in a sandboxed blockchain environment, with grading performed via deterministic transaction replay. In exploit mode, GPT-5.3-Codex running via OpenAI’s Codex CLI achieved a score of 72.2%, compared to 31.9% for GPT-5, which was released six months earlier. Performance was weaker in the detect and patch tasks, where agents sometimes failed to audit exhaustively or struggled to preserve full contract functionality. The ChatGPT makers’ researchers cautioned that EVMbench does not fully capture real-world security complexity. Still, they added that measuring AI performance in economically relevant environments is critical as models become powerful tools for both attackers and defenders. Sam Altman’s OpenAI and Ethereum co-founder Vitalik Buterin have previously been at odds over the pace of AI development. In January 2025, Altman said that his firm was “confident we know how to build AGI as we have traditionally understood it.” But Buterin advocated that AI systems should include a “soft pause” capability that could temporarily restrict industrial-scale AI operations if warning signs emerge.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Articoli correlati

Tether Creates Additional $1B USDT on Tron Network, Totaling $5B Minted Across Ethereum and Tron in Two Weeks

Gate News message, Tether has minted another $1 billion USDT on the Tron Network. Over the past two weeks, the company has minted a total of $5 billion USDT across Ethereum and Tron networks.

GateNews1h fa

Exodus Movement Holds $46.7M in BTC and ETH, $74.4M Cash as of Q1 2026

According to PANews, Exodus Movement (NYSE American: EXOD) released its Q1 2026 preliminary earnings on May 4, reporting revenue of approximately $22.7 million, down 36.9% year-over-year. The self-custody crypto platform holds $46.7 million in digital assets, comprising 628 Bitcoin ($42.8 million) a

GateNews3h fa

Ethereum Spot ETFs See $82.47M Net Outflows Last Week, Ending Three-Week Inflow Streak

According to SoSoValue data cited by ChainCatcher, Ethereum spot ETFs recorded net outflows of $82.47 million during the trading week ended May 1, breaking a three-week inflow streak. Blackrock's ETHA led outflows with $71.45 million, while its ETHB saw the largest inflows at $44.50 million for the

GateNews5h fa

Ethereum Applications Guild Launches to Support Native App Development

According to Ethereum Korea, the Ethereum Applications Guild (EAG), a global nonprofit collaborative organization, was launched recently to support the Ethereum application ecosystem. EAG focuses on native Ethereum application development rather than infrastructure, reflecting a shift in the ecosyst

GateNews7h fa

比特幣衝破八萬大關,聯發科漲停鎖死,台韓股市再創新高

全球資金偏好提升下,比特幣突破八萬美元,最高達 80,328 美元;ETH 近 2,400 美元、DOGE 上漲逾 5%。台灣股市突破 4 萬點,聯發科開盤即漲停、台積電創高;韓國 KOSPI 亦創歷史新高。受蘋果等財報與 AI 預期激勵,MSCI亞股指數走高,整體市場信心回升。

ChainNewsAbmedia7h fa

Ethereum Foundation Sells 10,000 ETH to Bitmine for $23 Million, Hitting $47 Million in One Week

According to The Block, the Ethereum Foundation sold another 10,000 ETH valued at approximately $23 million to Tom Lee's Bitmine Immersion Technologies on Friday. The latest transaction brings cumulative ETH sales to Bitmine to roughly $47 million within one week, with the foundation paying an

GateNews7h fa
Commento
0/400
Nessun commento