ITIF Search

Comments to the ICO on Training Generative AI Models Using Web Scraped Data

The Center for Data Innovation submitted comments to the Information Commissioner’s Office (ICO), the UK’s independent body set up to uphold information rights, on its generative AI consultation. The Center provided evidence on the first chapter of the consultation relating to the lawful basis for web scraping to train generative AI models. The Center offered a number of recommendations for how the ICO can better evaluate web scraping in this context, including:

  • The ICO should expand the legal bases for web scraping under UK GDPR to include web scraping for the public sector as a way to encourage public sector AI developments.
  • A broader purpose under legitimate interest should be considered regarding generative AI model development. In particular, an approach similar to training search engines should be taken.
  • The ICO should regard large-scale web-scraping as the most viable option for training, and that alternatives such as synthetically generated data still rely on models trained with Internet data.
  • The ICO should assume a reasonable expectation of data processing where data is intentionally made publicly available on the Internet by the data subject.
  • The ICO should be careful not to penalise distributions of trained models purely because they do not subscribe to the same level of control as models owned and retained by the primary developers.

Read the submission.

Back to Top