DeAI requires more diverse datasets
The post DeAI requires more diverse datasets appeared on BitcoinEthereumNews.com.
Disclosure: The views and opinions expressed here belong solely to the author and do not represent the views and opinions of crypto.news’ editorial. Artificial intelligence is all the rage. Yet beneath the hype surrounding decentralized AI (DeAI) lies a critical flaw: a dearth of diverse, secure, verifiable data. On-chain datasets are simply too limited to train truly powerful models. This risks ceding the AI future to centralized behemoths, which have unfettered access to the vast data troves of the web. DeAI’s promise—democratized, transparent, and robust AI—hinges on bridging this data gap. Clever cryptography offers a route. The beauty of conventional AI lies in its gluttony. The more data it devours, the smarter it becomes. But this advantage is also its Achilles’ heel. Centralized AI models are trained on data often harvested without explicit consent, raising thorny questions of privacy and control. DeAI, built on blockchain’s principles of decentralization and transparency, offers an appealing alternative. Yet, most data onchain comes from financial transactions or DeFi. Small language models especially require more precise data for fine-tuning. This leaves DeAI models starved of the rich and varied datasets needed to refine them to the competitive levels expected of the latest models. Such datasets are available outside web3, with The Pile and Common Crawl each containing data from billions of unique sources. The depth of existing verified web2 data sources, as much as the volume of data, is what has enabled centralized AI providers to refine their GPTs as far and as fast as they have. Recreating the same level of data onchain is not feasible on a competitive timescale. And while some AI firms have run afoul of data creators who accuse them of stealing exactly the type of nuanced data discussed here, there is another way to get more data onchain—make it…
Filed under: News - @ February 9, 2025 6:19 pm