OpenAI launches smart contract security evaluation system
The post OpenAI launches smart contract security evaluation system appeared on BitcoinEthereumNews.com.
OpenAI has introduced a new system called EVMbench, designed to measure how well artificial intelligence agents can find and fix security flaws in crypto smart contracts. Summary OpenAI has introduced EVMbench, a new framework designed to measure how well AI agents can detect, fix, and exploit smart contract vulnerabilities. Developed with Paradigm, the benchmark is built on real audit data and focuses on practical, high-risk security scenarios. Early results show strong progress in exploit tasks, while detection and patching are still challenging. The company announced on Feb. 18 that it has developed EVMbench in partnership with Paradigm. The benchmark focuses on contracts built for the Ethereum Virtual Machine and is meant to test how AI systems perform in real financial settings. OpenAI said smart contracts currently secure more than $100 billion in open-source crypto assets, making security testing increasingly important as AI tools become more capable. Testing how AI handles real security risks EVMbench evaluates AI agents across three main tasks: detecting vulnerabilities, fixing flawed code, and carrying out simulated attacks. The system is built using 120 high-risk issues drawn from 40 past security audits, many of them from public auditing competitions. Additional scenarios were taken from reviews of the Tempo blockchain, a payments-focused network designed for stablecoin use. These cases were added to reflect how smart contracts are used in financial applications. To build the test environment, OpenAI adapted existing exploit scripts and created new ones where needed. All exploit tests run in isolated systems rather than on live networks, and only previously disclosed vulnerabilities are included. In detection mode, agents review contract code and try to identify known security flaws. In patch mode, they must fix those flaws without breaking the software. In exploit mode, agents attempt to drain funds from vulnerable contracts in a controlled setting. Early results…
Filed under: News - @ February 19, 2026 5:20 am