The Limited Times

Now you can see non-English news...

Penguin Randomhouse titles most commonly featured in AI software

2024-02-23T09:22:21.914Z

Highlights: Penguin Randomhouse titles most commonly featured in AI software. More than 70,000 e-books were used without permission to feed texts to language models for artificial intelligence. The Authors Guild, America's oldest and largest professional organization for writers, recently adjusted its publishing author contract. An addition now prohibits training the software with these texts. But whether AI companies will adhere to this remains an open question, according to AI and Copyright. In the past, they had also made use of pirated content.



As of: February 23, 2024, 10:13 a.m

By: Sven Trautwein

Comments

Press

Split

Thousands of books served as the text basis for software that is used, for example, in ChatGPT.

Above all, titles from Penguin Randomhouse.

Prominent authors, including Margaret Atwood, Stephen King and Sarah Silverman, have sued in recent months against the use of their texts as the basis for software models used, for example, by ChatGPT.

On the American side, around 8,000 authors joined them.

But which publishers does this affect?

A search of the Books3 database, which serves as the basis for OpenAI's LLaMA and ChatGPT metas, shows that certain publishers top these rankings.

Searched over 70,000 e-books

More than 70,000 e-books were used without permission to feed texts to language models for artificial intelligence.

© Jonathan Raa/Imago

Peter Schoppert, managing director of NUS Press, has dealt a little with the data sets.

With further help, he focused on around 72,000 e-books, which were searched by author name, publisher name and ISBN.

According to the online magazine

AI and Copyright,

English-language e-books served as the basis.

According to Schoppert, the evaluation produced an interesting picture.

Stay up to date on new releases and book tips with the free newsletter from our partner 24books.de.

Penguin Randomhouse and Harper Collins at Nos. 1 and 2

The publisher with the largest number of e-book titles in this filtered list is Penguin Publishing Group with 6,866 ISBNs, followed by Harper Collins with around 5,800 titles and Random House Publishing with around 3,400 ISBNs.

The current evaluation can be viewed here.

According to Schoppert, university publishers have not been spared either.

Columbia University Press appears on the list with 899 titles, ahead of Yale University Press with 554 and Princeton UP with 376 titles.

According to Schoppert, this shows that the assumption that the texts used to train the software were mainly Wikipedia and Reddit entries, as well as millions of words from the Internet, is wrong.

My news

  • German Children's Book Prize 2023: The ten most beautiful children's books to read

  • Ferdinand von Schirach's new publication “She says.

    He says.” will be published at the end of February

  • Amazon Prime: “Harry Potter” and other series will no longer be available from March read

  • Literary highlights: The most popular bestsellers of 2023 to devour

  • Jussi Adler-Olsen: Readings are canceled due to illness

  • King of Horror: Eight Novels by Stephen King You Should Read

More than 72,000 illegal e-books

More than 72,000 pieces of illegally copied e-book content used to train Large Language Models (LLMs) were found.

Copyright fell by the wayside here.

Recently, horror writer Stephen King addressed readers in an article in

The Atlantic

saying that he had not given permission for his texts to be used.

The Authors Guild, America's oldest and largest professional organization for writers, recently adjusted its publishing author contract.

An addition now prohibits training the software with these texts.

But whether AI companies will adhere to this remains an open question, according to

AI and Copyright

.

In the past, they had also made use of pirated content.

Recently, authors including Stephen King have achieved partial success.

A small database called “Prosecraft” has been taken offline.

We have put together books on the subject of artificial intelligence that shed more light on the topic here.

Source: merkur

All life articles on 2024-02-23

You may like

Trends 24h

Latest

© Communities 2019 - Privacy

The information on this site is from external sources that are not under our control.
The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.