AI Law - International Review of Artificial Intelligence LawCC BY-NC-SA Commercial Licence ISSN 3035-5451
G. Giappichelli Editore

09/10/2025 - The Case for Ethically Sourced AI Training Data (USA)

argument: Notizie/News - Intellectual Property Law

Source: Computerworld

Computerworld presents a critical perspective on the common practice of training generative AI models by scraping vast amounts of data from the internet. The article argues that this method often involves the unauthorized use of copyrighted materials, leading to a surge in lawsuits from creators, artists, and publishers who claim their intellectual property has been effectively "stolen" to build these powerful systems. This approach, while effective for model development, is fraught with significant legal and ethical problems that could undermine the long-term viability of the technology.

The author advocates for a fundamental shift in the industry towards using AI models that are trained on ethically and legally sourced data. This includes data that is in the public domain, licensed explicitly for training purposes, or generated synthetically. While acknowledging that building models this way may be more challenging or costly, the piece contends that it is a necessary step to ensure fairness to creators and to mitigate the substantial legal risks that companies currently face. The article urges businesses and developers to consider these "clean" models as a more sustainable and responsible path forward for artificial intelligence.