Creators urge Ottawa to force disclosure of ‘black box’ AI system training

OTTAWA - 好色tv creators and publishers want the government to do something about the unauthorized and usually unreported use of their content to train generative artificial intelligence systems.

But AI companies maintain that using the material to train their systems doesn鈥檛 violate copyright, and say limiting its use would stymie the development of AI in Canada.

The two sides are making their cases in recently published submissions to a consultation on copyright and AI being undertaken by the federal government as it considers how Canada鈥檚 copyright laws should address the emergence of generative AI systems like OpenAI鈥檚 ChatGPT.

Generative AI can create text, images, videos and computer code based on a simple prompt, but to do that, the systems must first study vast amounts of existing content.

In its submission to the government, Access Copyright argued most and potentially all large language models "are currently profiting from unauthorized use and reproduction of copyright protected works."

It鈥檚 taking place in a "black box," according to Access Copyright, which represents writers, visual artists and publishers.

"Rightsholders know it is happening, but due to the information asymmetry between themselves and AI platforms, they cannot determine who is conducting the activity, with whose works, and have no mechanism to stop it from happening.鈥�

Music Canada, which represents the country's major record labels, said last year, a fake AI-generated song mimicking the voices of Drake and The Weeknd "made one thing abundantly clear: AI models and systems have already ingested massive amounts of proprietary datasets without authorization from the source of the data or rightsholders."

The Writers鈥� Guild of Canada asked the government to start with implementing basic disclosure and reporting obligations. It said developers have all the knowledge of the work that is being mined and how it鈥檚 being used, while creators have none of that information.

Some organizations have signed licensing deals with AI companies. But the 好色tv Authors Association said rightsholders face "immense obstacles" in licensing their content "because they are being kept in the dark as to which of their works are being used" by which companies.

It asked Canada to clarify that text and data mining are subject to copyright laws.

Numerous lawsuits are underway in the United States over the use of copyrighted materials by generative AI systems, including one launched this week by the world鈥檚 biggest record labels against two AI music generators.

The 好色tv Media Producers Association said legal cases illustrate the problem posed by a lack of transparency, citing one case in which the AI company argued the rightsholder couldn鈥檛 proceed with the infringement allegation unless they could specify the exact work used for training.

"Rightsholders will also undoubtedly face similar evidentiary issues as many datasets used to train Generative AI systems are purportedly destroyed after the initial training is complete," it said.

The group said it鈥檚 an issue that "demands immediate attention" and asked the government to implement transparency requirements.

But AI companies maintain the kind of transparency rightsholders are asking for isn鈥檛 realistic.

Microsoft told the government training large-scale AI systems involves "vast volumes" of data, and companies shouldn鈥檛 have to keep records of that or disclose the content that is used for training.

"It would not be feasible to record such information and any such requirement would inhibit AI development," it said.

The company argued it is not "copyright infringement to analyze works and learn concepts and facts."

Google said AI training is already exempted under existing copyright law, though the government should adopt an exemption to make that explicit.

Google said requiring permission to use content for training purposes would expose competitively sensitive information and "would effectively block the development and use of large language models and other types of cutting-edge AI."

It also said AI developers don鈥檛 have access to accurate information about copyright status.

"In fact, there is no such source of truth anywhere in the world. Thus, complying with disclosure rules may simply prove impossible from the start."

好色tv AI company Cohere said using content for training AI systems works similarly to how an individual reads books to become more informed.

The company said the process doesn鈥檛 violate copyright, and argued that needs to be clear in the law. Otherwise, "Canada鈥檚 ambitions to be the home of world-leading AI companies and ecosystems" could be undermined.

The Council of 好色tv Innovators, which represents the 好色tv tech sector, said disclosure requirements would harm smaller companies as opposed to their Big Tech rivals. It warned this would "seriously hamper the potential of 好色tv companies to scale significantly."

This report by 好色tvwas first published June 30, 2024.