The Limited Times

Now you can see non-English news...

Revenge of the authors: Stephen King and other writers bring book database to closure

2024-01-09T15:27:58.220Z

Highlights: Thousands of books around the world are used to feed AI programs. Texts by famous authors, including Stephen King, Zadie Smith and Margaret Atwood, were used unsolicited to create a textual basis. Authors Sarah Silverman, Richard Kadrey and Christopher Golden are against companies using their books to train the software. A lawsuit filed in June alleges that the only websites that offer this much material are "shadow libraries" such as Library Genesis and Z-Library, through which books can be obtained in bulk via illegal file sharing.



Status: 09.01.2024, 16:11 PM

By: Sven Trautwein

CommentsPrint Share

Thousands of books around the world are used to feed AI programs. Famous authors, on the other hand, are up in arms.

Artificial intelligence (AI) does not stop at authors. Again and again, there are reports that artificially generated books with no real content flood online shops. But where did the software companies that offer these AI-powered applications get the textual content? Texts by famous authors, including Stephen King, Zadie Smith and Margaret Atwood, were used unsolicited to create a textual basis. Many authors do not like this approach.

Books by Stephen King and thousands of other authors serve as the basis for AI software. The authors fight back. © Jens Kalaene/dpa

The biggest problem with artificial intelligence is that it is being developed in secret. Now, in order to get a kind of human-like answer to questions, systems like ChatGPT process a huge amount of text. The actual extent of the texts used to feed the applications is known only to a few employees of Meta or OpenAI.

Stay up to date on new releases and book tips with the free newsletter from our partner 24books.de.

Some texts come from Wikipedia, others from the publicly accessible Project Gutenberg, where around 70,000 books can be read and downloaded free of charge. Recently, writes The Atlantic magazine, a lawsuit was filed against Meta in California in July. Authors Sarah Silverman ("The Bedwetter"), Richard Kadrey ("Hell's Throne") and Christopher Golden ("Road of Bones") are against companies using their books to train the software. Until now, however, it has not been possible to determine exactly whether the text passages have actually been used.

Database of more than 170,000 books feeds AI

But an analysis showed that the software really did use text passages from the authors. Further investigations showed that about one third of the texts used as data in "Books3" originate from entertainment literature and two thirds from non-fiction. More than 170,000 titles are included in the "Books3" database, according to the Guardian. Titles by Stephen King, Jonathan Franzen ("The Corrections") and Haruki Murakami ("Honey Cake") can also be found in the database.

My news

  • Literary highlights: The most popular bestsellers of 2023 to devour

  • Book Tips: Ten Classics You Should Definitely Have Readread

  • New Year's resolutions in book form: Eat healthy, lose weight and get a better body feelingread

  • BookTok Hit Becomes Movie: "It Ends With Us" by Colleen HooverRead

  • Karl Lagerfeld did not reside and did not live: New illustrated book provides insightsread

  • "Book Places" by Susanne Lipps: For all those who love reading – Europe's most beautiful destinations.

OpenAI, the company behind AI chatbot ChatGPT, has also been accused of training its model on copyrighted works, according to the Guardian. Clues to the sources of OpenAI's training data can be found in a document published by the company in 2020 that mentions two "internet-based book corpora," one of which is called "Books2" and is estimated to contain nearly 300,000 titles. A lawsuit filed in June alleges that the only websites that offer this much material are "shadow libraries" such as Library Genesis (LibGen) and Z-Library, through which books can be obtained in bulk via illegal file sharing. Artificial intelligence can also be useful, as these four books reveal.

While a spokesperson for Meta declined to comment on the company's use of "Books3" to The Atlantic, a spokesperson for Bloomberg confirmed that the company did indeed use the dataset. "We will not include the Books3 dataset among the data sources used to train future versions of BloombergGPT," they added. For author Andrew McCarten ("Going Zero"), it's clear that we're running to our doom with AI.

Books by Stephen King at Prosecraft

The "Prosecraft" database also had over 25,000 books, including twenty by horror master Stephen King. In 2018, the founder of Prosecraft wrote that they are building a large literature database. Where the data came from and whether exploitation rights existed was not disclosed, writes yahoo!. More than 8,000 authors recently objected to the use of their works. At the beginning of August, "Prosecraft" was closed. But that may not be enough for the authors.

Source: merkur

All life articles on 2024-01-09

Trends 24h

Latest

© Communities 2019 - Privacy

The information on this site is from external sources that are not under our control.
The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.