AI is dressing up greed as progress on creative rights

At this week’s London Book Fair, a lot of people were walking around with one particular title wedged under their arms. Called Don’t Steal This Book, its pages are empty apart from the names of thousands of authors, including Kazuo Ishiguro and Richard Osman. It’s a chilling protest against the rampant theft of creative work by tech firms, which could leave future artists unable to earn a living.

Generative AI models require immense quantities of human-created content and some of their developers have been as cavalier about copyright as they are about privacy. The New York Times is currently suing Microsoft and OpenAI for using its journalism to train ChatGPT. As that battle reaches a head, the UK government is about to issue an update on how it proposes to overhaul the intellectual property framework for AI. The UK has one of the world’s most successful creative industries and the oldest copyright law. It also has a government which fears losing out in the global AI race.

Creatives want the existing law to be enforced; tech companies want it loosened. One option under consideration is the creation of a text and data mining (TDM) exception, which would let companies train large-scale AI models in certain circumstances without asking copyright holders for permission. The tech sector says this would help UK AI companies compete on a level playing field with the US, Japan and China and boost AI investment in Britain.

However, it’s not clear that weakening copyright law would significantly expand the UK AI sector, compared with lowering energy costs, say. The House of Lords communications and digital committee, which has taken substantial evidence on this topic, has opposed a TDM exception and warned ministers not to “sacrifice the UK’s outstanding creative capacity for speculative AI gains”.

It’s not easy for ministers. In a fast-moving situation, each individual nation is grappling with how best to grow its tech sector and not get left behind. But the overwhelming challenge facing society is how to stop the new robber barons seizing our data, our output and our images. Scarlett Johansson’s voice was cloned without permission. Robert Downey Junior has instructed his lawyers to sue any future executives who try to make AI-generated digital replicas of him. The biggest problem is not that copyright law is unfit for the 21st century, but that it’s being flouted.

The AI sector claims that it is exceptional. So has every new industry for the past three hundred years. In the 1770s, Scottish booksellers seeking to flood the English market argued that copyright law did not apply to them because their printing presses were outside the jurisdiction. That sounds strangely familiar.

AI is an extraordinary, complex and increasingly transnational race. The tech sector likes to imply that any nation which tries to regulate its excesses will be relegated to the economic slow lane. But things are changing. Last year, Anthropic paid $1.5bn to settle a class-action lawsuit by book authors who accused it of training its products on their work without permission. A German court ruled that it was illegal to use copyrighted song lyrics to train generative AI models without a licence. And this week Amazon won an injunction against Perplexity AI, whose shopping agent is, it claims, illegally scraping the Amazon website.

Some US courts have upheld the argument that generative AI systems don’t infringe rights because their intention is to produce new outputs. Experts explain that deep-learning-based models don’t store copies of the training data because what matters is the patterns derived from that data, which are encoded as numerical parameters. Unfortunately, these claims have been weakened by examples of “memorisation”: models regurgitating portions of journalism and books word for word.

The attempt to dress up greed as progress has collapsed trust, which makes it difficult to fashion any new framework in good faith. In 2022 the founder of Midjourney, an AI image generator, admitted that his tool had scraped up 100mn images without knowing where they came from. Websites can use robot.txt “disallow” commands to signal which content bots must not scrape. But some companies are accused of obscuring the trail by paying third-party scrapers to do the work. They do not disclose the datasets used to train the models.

The answer must be unequivocal transparency, not only to ensure artists get paid but also because of increasing concern that models are generating biased or inequitable results.

Ministers must make transparency about AI training data a statutory obligation. They will need to avoid overly burdening UK start-ups, but they have to ensure that rights holders are protected. It’s hard to imagine that their promised economic impact assessment will show anything other than the slow death of our creative industry if they weaken the law. But prolonging uncertainty will only continue to dampen licensing and investment.

Every invention has brought predictions of copyright’s demise: the daguerreotype, the phonograph, radio, cassettes, home video and the internet. All those predictions were premature, and Britain’s soft power flourished as a result.

A belief has taken hold that progress can’t happen without the abolition of some of our oldest rights. That it’s only theft when individuals steal. That corporations don’t steal, they innovate. If you believe all this, you might as well stock your bookshelves with fantasy.

camilla.cavendish@ft.com

AI is dressing up greed as progress on creative rights

Leave a comment

Leave a Reply Cancel reply

Explore more

Featured

Weekly updates

Must read...

Weekly update