The trouble with generative AI ‘Agents’
The post The trouble with generative AI ‘Agents’ appeared on BitcoinEthereumNews.com.
The following is a guest post and opinion from John deVadoss, Co-Founder of the InterWork Alliancez. Crypto projects tend to chase the buzzword du jour; however, their urgency in attempting to integrate Generative AI ‘Agents’ poses a systemic risk. Most crypto developers have not had the benefit of working in the trenches coaxing and cajoling previous generations of foundation models to get to work; they do not understand what went right and what went wrong during previous AI winters, and do not appreciate the magnitude of the risk associated with using generative models that cannot be formally verified. In the words of Obi-Wan Kenobi, these are not the AI Agents you’re looking for. Why? The training approaches of today’s generative AI models predispose them to act deceptively to receive higher rewards, learn misaligned goals that generalize far above their training data, and to pursue these goals using power-seeking strategies. Reward systems in AI care about a specific outcome (e.g., a higher score or positive feedback); reward maximization leads models to learn to exploit the system to maximize rewards, even if this means ‘cheating’. When AI systems are trained to maximize rewards, they tend toward learning strategies that involve gaining control over resources and exploiting weaknesses in the system and in human beings to optimize their outcomes. Essentially, today’s generative AI ‘Agents’ are built on a foundation that makes it well-nigh impossible for any single generative AI model to be guaranteed to be aligned with respect to safety—i.e., preventing unintended consequences; in fact, models may appear or come across as being aligned even when they are not. Faking ‘alignment’ and safety Refusal behaviors in AI systems are ex ante mechanisms ostensibly designed to prevent models from generating responses that violate safety guidelines or other undesired behavior. These mechanisms are typically realized…
Filed under: News - @ April 20, 2025 8:03 pm