argument: Notizie/News - Intellectual Property Law
Source: Business Insider
Business Insider reports on a leaked list of websites allegedly used by the AI company Anthropic to train its large language models, including its prominent Claude model. The list, reportedly sourced from a third-party data provider named Surge AI, contains numerous major news outlets, content platforms, and publishers. This leak provides a rare, concrete look into the vast and diverse sources of data—much of it likely copyrighted—that are scraped from the internet to build powerful generative AI systems.
The revelation is significant as it fuels the ongoing and intense legal and ethical debate over AI training practices. Many publishers and creators have already filed lawsuits against AI companies, alleging that their content was used without permission or compensation. This leaked list could provide potent evidence in these legal battles, potentially identifying specific works that were ingested by Anthropic's models. The disclosure also puts pressure on AI companies, which have largely kept their specific training data sources a closely guarded secret, to be more transparent about the materials they use to build their commercially lucrative products.