close
close

Canadian media sues OpenAI for copyright infringement, but will they win?

Canadian media sues OpenAI for copyright infringement, but will they win?

Last week, the five most famous mass media in Canada started the legal process against OpenAI for copyright infringement, seeking damages that could amount to billions in damages. The lawsuit followed similar cases filed earlier this year against the creator of ChatGPT The New York Times and other media companies in the United States.

At the heart of all these lawsuits is claims that OpenAI “scraped” a large amount of content from media sites. This involved copying without permission. And the company profits from this without paying royalties to the original creators.

two magazines with the title MOTHER JONES

In June, the Center for Investigative Journalism, publisher of Reveal and Mother Jones, said it was suing ChatGPT maker OpenAI and its closest business partner Microsoft for unauthorized use of their content on AI platforms.
(AP Photo/Matt O’Brien)

OpenAI has yet to officially respond to the Canadian lawsuit, but insists on using news material to train its chatbot it’s “honest business” under copyright law, not as infringement.

Who is right? And why OpenAI comes in licensing agreements with various media companies if they are so sure they are not breaking the law?

Is the Canadian case just a ploy to land a big licensing deal?

A closer look at how chatbots learn shows that OpenAI may be right that “collection” is not copying. But it can’t be a “fair deal” either.

Breach of contract?

To be clear, five media companies — Torstar, Postmedia, The Globe and Mail Inc., The Canadian Press and CBC/Radio-Canada — also making two more demands.

OpenAI has thwarted security measures used by news sites to block the tools used to crawl their websites, thereby violating the sites’ terms of service.

The suing news companies rely on tools to “prevent unauthorized data collection” from their websites. An example is Robot exclusion protocolwhich controls how software such as bots and web crawlers can access the site. These tools, as well as paywalls and account restrictions, are designed to protect against unauthorized use of their content.

The plaintiffs claim that by reading their content online, site visitors accept terms of use somewhere in the background, and that since 2015, the terms have made it clear that the news material is for “personal, non-commercial use by individual users only”. »

website with The Globe and Mail homepage banner

The news sites’ terms of use state that the content is intended for the personal and non-commercial use of individual users.
(Shutterstock)

Exemption from fair dealing

At the heart of all three claims in the Canadian lawsuit is that by using their material — by removing the content — OpenAI is copying their work and using it for profit without authorization.

But is scraping really copying? And if so, does it count honest business?

Copyright law in Canada and the United States allows unauthorized copying or use of a protected work in some cases under fair or fair use exception. Courts consider a number of factorsincluding the purpose of the copying (commercial or educational), the degree of copying and its effect on the original work.

Soon after that The New York Times filed his lawsuit OpenAI argued that training its chatbot on news stories found online does not involve illegal copying. It falls under fair use and they pointed out different ones lawyers and civil society groups that agree

Legal scholars argued that collecting data from news sites involves creating a temporary copybut only as a first step for the purposes of “abstract metadata” or information about the relationships between words and sentences. Combining large amounts of metadata creates a new “artifact” that is “significantly unlike any particular work in training data.”

As the authors state: “Generative AI models are generally not designed to copy training data; they are designed to explore data at an abstract level that is not protected by copyright.”

After all, statistical patterns or word frequencies are not copyrightable.

Non-commercial group Creative Commons agrees: OpenAI’s use of news material to train a chatbot is similar, they say, to Google’s digitizing millions of books to create a searchable database. Both are “transformational” uses of original material. They result in a product that serves a different purpose without competing with or taking anything away from the original creators.

Licensing and billing

To hedge your bets, right after The New York Times OpenAI did two things. He said he would respect the choice of the news organization refuse to use its content for training data. And started making deals with news organizations license your content for educational purposes.

But the lawsuits remain, and judges in Canada and the U.S. will soon begin hearing them. They will have to decide: Is copying a form of reproduction protected by copyright, and is it fair?

One factor will be the non-competitive nature of chatbots and their inability to access paid content Globe and mail or Toronto Star.

But another factor may be related to licensing. Like others – commented the commentatorsfinding that OpenAI’s use of news content to train its AI is fair could reduce the licensing market. The more deals that are made, the stronger this market will be — and the greater the value to media companies, if the deal is to be called fair.

This makes a settlement and license agreement in the Canadian case likely. But OpenAI can just roll the dice.

And if that happens, the future of AI could hang in the balance.