New AI Tool Enhances Video Accessibility for Blind and Low-Vision Users
The post New AI Tool Enhances Video Accessibility for Blind and Low-Vision Users appeared on BitcoinEthereumNews.com.
James Ding Aug 13, 2024 09:51 An innovative AI system, SPICA, revolutionizes video accessibility for blind and low-vision users with interactive, layered audio descriptions. New research aims to revolutionize video accessibility for blind or low-vision (BLV) viewers with an AI-powered system that gives users the ability to explore content interactively. The innovative system, detailed in a recent paper, addresses significant gaps in conventional audio descriptions (AD), offering an enriched and immersive video viewing experience. Addressing Gaps in Conventional Audio Descriptions “Although videos have become an important medium to access information and entertain, BLV people often find them less accessible,” said lead author Zheng Ning, a PhD in Computer Science and Engineering at the University of Notre Dame. “With AI, we can build an interactive system to extract layered information from videos and enable users to take an active role in consuming video content through their limited vision, auditory perception, and tactility.” ADs provide spoken narration of visual elements in videos and are crucial for accessibility. However, conventional static descriptions often leave out details and focus primarily on providing information that helps users understand the content, rather than experience it. Plus, simultaneously consuming and processing the original sound and the audio from ADs can be mentally taxing, reducing user engagement. Introducing SPICA: An AI-Powered Solution Researchers from the University of Notre Dame, University of California San Diego, University of Texas at Dallas, and University of Wisconsin-Madison developed a new AI-powered system addressing these challenges. Called the System for Providing Interactive Content for Accessibility (SPICA), the tool enables users to interactively explore video content through layered ADs and spatial sound effects. The machine learning pipeline begins with scene analysis to identify key frames, followed by object detection and segmentation to pinpoint significant objects within each frame. These objects are then described in…
Filed under: News - @ August 14, 2024 11:23 pm