As of: March 10, 2024, 7:11 a.m
By: Anika Zuschke
Comments
Press
Split
More and more media are blocking AI crawlers - but one media sector is left out.
© Anika Zuschke/DALL-E (AI generated)
Chatbots like ChatGPT use media content to generate answers.
Right-wing media in particular allow this - with serious consequences.
Frankfurt – More than every second German uses language assistants, every sixth German company plans to use AI applications to generate text, and half of German students have already used ChatGPT.
These are numbers from 2023 - so there is a high probability that these shares have continued to grow.
One thing is clear: artificial intelligence is very popular in everyday use.
However, media companies are now increasingly protecting themselves from being used as a source of training data for AI chatbots.
This is especially true for news sites that can be classified as liberal – their right-wing counterparts, however, still allow AI crawlers in most cases.
What impact can this have on the results of commonly used AI chatbots?
For copyright reasons: Media companies prohibit AI crawlers from training with their data
Large language models such as OpenAI's ChatGPT or Google's Gemini are trained using an almost immeasurable amount of data that so-called crawlers extract from websites.
However, websites can choose to block crawlers so that they no longer have access to their data.
Since August 2023 - after OpenAI and shortly afterwards also Google gave instructions for their AI model Bard (now "Gemini") on how to block their web crawlers - many media companies around the world have taken this step.
The reason for this is simple: News publishers argue that AI chatbots violate copyright when they use articles for their training without permission or financial compensation and may reproduce those articles in their output.
The American trade group
News Media Alliance
, which represents over 2,200 other publishers alongside the
New York Times
, made the problem clear through a study.
As a result, AI developers disproportionately use news content compared to generic online content to train their chatbots.
For this reason, the
New York Times
has already filed a lawsuit against OpenAI and Microsoft.
New York Times and The Guardian block AI: How does the media influence AI chatbots?
Many media companies have drawn conclusions from this development: As a study by the AI startup “Originality AI” showed earlier this year, over 88 percent of the 44 leading news sites in the USA block web crawlers from AI companies.
These include
New York Times
,
Washington Post
and
Guardian.
However, one particular sector of the media world is conspicuously absent from this list: the more right-wing media.
None of the nine leading right-wing news sites surveyed, which include
Fox News
,
Daily Caller
and
Breitbart
in the US , were blocking at the time of the survey, according to the computer magazine
Wired
AI web crawler.
Coincidence?
Or do right-wing media consciously want to influence the AI chatbots with their content?
Could AI models trained primarily by right-wing media platforms even represent a biased or biased source of information?
“Yes,” answers Dr.
Oliver Eberle, scientist at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at the TU Berlin, at the request of
IPPEN.MEDIA
.
“What AI models learn is directly related to what data is available as training material.”
My news
“Most burdened”: A group has to really read the metal for the traffic light’s new pension package
Hubertus Heil wants a pension like in Austria: That's why pensioners in the neighboring country read more money
“Anyone who bought an oil or gas heater was lied to” read
Quick change in the basic pension?
Supplements should “not be a pittance” for all pensioners
Traditional fashion retailer from Baden-Württemberg is closing a branch in the big city at the end of the year
GDL rejects negotiations and gives Deutsche Bahn an ultimatum: read “Smoke Candles Again”.
AI and media: A question of political orientation?
According to him, there are currently no guarantees to ensure that AI tools provide politically neutral answers.
But: “The political character of an AI model can be evaluated using special data sets.
At the same time, it is known that the answers of AI bots, for example, can be greatly influenced by specifically adjusting the prompts used,” explains Dr.
Eberle.
Even if the training data cannot be changed by users, users of AI chatbots can control their answers themselves based on the precise description of their commands (prompts) to the AI.
Additionally, Jeremy Baum, an AI ethics researcher at UCLA, tells
Wired he
is concerned about whether right-wing sites that don't block AI crawlers would even have a measurable effect on the results of finished AI systems like chatbots.
One of the things that speaks against this is the sheer volume of older material that AI companies had already collected from mainstream news outlets before blocking the AI crawlers.
Which media do AI crawlers block in Germany?
In Germany, according to a study by the
Reuters Institute,
around 60 percent of the 15 most used news sites had blocked the AI crawlers from OpenAI and Google by the end of 2023.
Data journalist Ben Welsh also keeps a constantly updated list of news websites that block AI crawlers from OpenAI, Google and the so-called Common Crawl.
Of the German news sites , Bild, Spiegel, Stern
and
Die Zeit
are currently blocking
all three AI crawlers, the
Süddeutsche Zeitung
is blocking OpenAI and Google AI, while
Deutsche Welle
and
MDR Sachsen-Anhalt
allow all three crawlers on their sites.
In Germany, instead of a political connection, there seems to be a correlation between blocking AI crawlers and the financing of media companies, since the public media examined do not stop the crawlers - although privately financed newspapers and magazines do.