Tech giants use YouTube subtitles for AI training without permission
The post Tech giants use YouTube subtitles for AI training without permission appeared on BitcoinEthereumNews.com.
Apple, Nvidia, and Anthropic have been found to be using YouTube subtitles to train AI models, which is against YouTube policies. A report by Proof News and Wired showed that such firms had used a dataset of the transcripts from thousands of YouTube videos without properly acquiring the license to do so. Also Read: UK watchdog launches probe into Microsoft’s AI talent acquisition The study revealed that Apple, Nvidia, and Anthropic used the YouTube Subtitles dataset. This dataset consists of transcripts from 173,536 YouTube videos from 48,000 channels. The videos include educational channels like Khan Academy and MIT, news channels like The Wall Street Journal, and top creators like MrBeast and Marques Brownlee. Popular YouTubers react to data exploitation Marques Brownlee, a popular YouTuber, commented on the issue on X. He said, “Apple has gathered data for AI from other firms. One of them collected a lot of data/transcripts from YouTube videos, including mine. ” While Apple may not have scraped the data directly, and Brownlee pointed out that this problem will persist. The “YouTube Subtitles” dataset was developed by EleutherAI and published in 2020. It contains 5. 7GB of data, which includes subtitles from the YouTube videos that have been removed from the platform. According to YouTube’s terms and conditions, accessing videos by “automated means” is prohibited. The existence of subtitles from removed videos only adds to the issue, raising questions about privacy and copyright infringement. Salesforce, an organization also implicated in the probe, has also admitted to having used said dataset. “The Pile dataset referred to in the research paper was trained in 2021 for academic and research purposes. The dataset was publicly available and released under a permissive license.” Salesforce spokesperson However, the use of YouTube content without permission is still controversial to this date. In April,…
Filed under: News - @ July 17, 2024 1:08 am