Nvidia Faces Lawsuit from Authors Over Alleged Copyright Infringement in AI Models

Tech giant Nvidia is facing a lawsuit from a group of authors who said it used their copyrighted works without their permission to train its artificial intelligence (AI) platform NeMo.

Source: Fox Business | Published on March 12, 2024

Investors sue Lighthouse Insurance Company

Tech giant Nvidia is facing a lawsuit from a group of authors who said it used their copyrighted works without their permission to train its artificial intelligence (AI) platform NeMo.

Brian Keene, Abdi Nazemian and Stewart O’Nan said their works were included in a dataset of 196,640 books that were used to train NeMo to simulate ordinary written language before it was removed in October “due to reported copyright infringement.”

The authors’ proposed class action lawsuit was filed Friday night in San Francisco federal court and claims that Nvidia “admitted” it trained NeMo on the dataset, thereby infringing their copyrights. The suit is similar to other lawsuits filed regarding AI copyright infringement.

The lawsuit seeks unspecified damages for people in the U.S. whose copyrighted works helped train NeMo’s large language models (LLMs) in the last three years. LLMs are used to power AI tools like NeMo, which Nvidia says is a fast and affordable way to adopt generative AI.

Among the works included in the lawsuit are Keene’s 2008 novel “Ghost Walk,” Nazemian’s 2019 novel “Like a Love Story,” and O’Nan’s 2007 novella “Last Night at the Lobster.”

The suit claims the books were included in a data known as “The Pile” that contained a collection of books called “Books3” and Nvidia has admitted to training its NeMo Megatron AI models on The Pile and Books3.

The NeMo Megatron models were hosted on a website called Hugging Face that included a description of AI models’ training dataset, which stated that the model was trained on The Pile. The Pile’s Books3 dataset was listed on Hugging Face until October 2023, when the dataset was removed with a message that it “is defunct and no longer accessible due to reported copyright infringement.”

Nvidia declined to comment on the pending litigation.

The lawsuit drags Nvidia into a growing group of lawsuits against tech companies over the use of copyrighted content in training AI models, including several filed by writers as well as by the New York Times, which sued ChatGPT-maker OpenAI and Microsoft.