Relax, You’re Still Better at Playing ‘Doom’ Than AI
The post Relax, You’re Still Better at Playing ‘Doom’ Than AI appeared on BitcoinEthereumNews.com.
Despite the buzz surrounding artificial intelligence, even the most advanced vision-language models—GPT-4o, Claude Sonnet 3.7, and Gemini 2.5 Pro—struggle with a decades-old challenge: playing the classic first-person shooter Doom. On Thursday, a new research project introduced VideoGameBench, an AI benchmark designed to test whether state-of-the-art vision-language models can play—and beat—a suite of 20 popular video games, using only what they see on the screen. “In our experience, current state-of-the-art VLMs substantially struggle to play video games because of high inference latency,” the researchers said. “When an agent takes a screenshot and queries the VLM about what action to take, by the time the response comes back, the game state has changed significantly and the action is no longer relevant.” The researchers stated that they used classic Game Boy and MS-DOS games due to their simpler visuals and diverse input styles, like a mouse and keyboard or game controller, which better test a vision-language model’s spatial reasoning capabilities than text-based games. VideoGameBench was developed by computer scientist and AI researcher Alex Zhang. The suite of games includes classics like Warcraft II, Age of Empires, and Prince of Persia. Claude can play Pokemon, but can it play DOOM? With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furthest, finding the blue room! Our VideoGameBench (twenty games from the 90s) and agent are open source so you can try it yourself now –> 🧵 pic.twitter.com/vl9NNZPBHY — Alex Zhang (@a1zhang) April 17, 2025 According to the researchers, delayed responses are most problematic in first-person shooters like Doom. In these fast-paced environments, an enemy visible in a screenshot may already have moved—or even reached the player—by the time the model acts. For software developers, Doom has long served as a litmus test for technological capability in gaming environments.…
Filed under: News - @ April 22, 2025 10:21 am