Text and Data Mining (TDM) has been thrust into the spotlight with the recent explosion of Generative AI. 

On the one hand, developers of the most advanced Generative AI tools on the market have carried out vast amounts of TDM, relying heavily on repositories of web crawl data as a means of “training” the Large Language Models (LLMs) that underpin their clever AI machine-learning algorithms.

On the other, news and content publishers have raised concerns that this widespread TDM carried out to train AI models constitutes copyright infringement – with AI companies having unfairly used large amounts of scraped content for training purposes, without permission and without paying a licence fee.

The New York Times recently launched a high-profile legal claim against OpenAI and Microsoft in the US for the alleged “copying” of its vast catalogue of online content for training purposes - the outcome of which is likely to have a huge impact on how the Generative AI industry proceeds from here. This raises interesting questions about how the existing laws around TDM might apply to this practice in the UK and EU and what rights publishers have to protect their copyright-protected content hosted online.

Click here to read the full article.