Copyright, AI, and the harm we are not measuring

The political and legal debate over AI and copyright has focused overwhelmingly on training data – text and data mining exemptions, opt-out registers and collective licensing arrangements. But using copyrighted works
as training data does not necessarily, by itself, cause much measurable economic harm. The more direct harm arises where AI models can reproduce copyrighted lyrics, text or images on demand, competing directly with the licensed
works on which they were trained. Recent court rulings show this distinction beginning to emerge, albeit inconsistently.

We propose a capability-based levy, drawing on the precedent of the blank-tape levies introduced for analogue recording in the 1970s, under which AI developers would be charged according to a model’s demonstrated
capacity to reproduce protected material – its “memorisation rate” – rather than according to what was used to train it. Combined with mandatory attribution of sources in AI outputs, this would address aspects of the impact of AI on rights holders that current proposals largely leave unresolved. See our thoughts here.