Anthropic Expands AI Model Safety Bug Bounty Program
The post Anthropic Expands AI Model Safety Bug Bounty Program appeared on BitcoinEthereumNews.com.
Darius Baruo Aug 08, 2024 14:47 Anthropic broadens its AI model safety bug bounty program to address universal jailbreak vulnerabilities, offering rewards up to $15,000. The rapid advancement of artificial intelligence (AI) model capabilities necessitates equally swift progress in safety protocols. According to Anthropic, the company is expanding its bug bounty program to introduce a new initiative aimed at finding flaws in the mitigations designed to prevent misuse of their models. Bug bounty programs are essential in fortifying the security and safety of technological systems. Anthropic’s new initiative focuses on identifying and mitigating universal jailbreak attacks, which are exploits that could consistently bypass AI safety guardrails across various sectors. This initiative targets high-risk domains such as chemical, biological, radiological, and nuclear (CBRN) safety, as well as cybersecurity. Our Approach To date, Anthropic has operated an invite-only bug bounty program in collaboration with HackerOne, rewarding researchers for identifying model safety issues in publicly released AI models. The newly announced bug bounty initiative aims to test Anthropic’s next-generation AI safety mitigation system, which has not yet been publicly deployed. Key features of the program include: Early Access: Participants will receive early access to test the latest safety mitigation system before its public deployment. They will be challenged to identify potential vulnerabilities or ways to circumvent safety measures in a controlled environment. Program Scope: Anthropic offers bounty rewards of up to $15,000 for novel, universal jailbreak attacks that could expose vulnerabilities in critical, high-risk domains such as CBRN and cybersecurity. A universal jailbreak is a type of vulnerability allowing consistent bypassing of AI safety measures across a wide range of topics. Detailed instructions and feedback will be provided to program participants. Get Involved This model safety bug bounty initiative will initially be invite-only, conducted in partnership with HackerOne. While starting as invite-only, Anthropic…
Filed under: News - @ August 10, 2024 8:08 am