We talk about image generation, yet it's all about words. Anyone who has played with AI like Midjourney or Dall-E, knows that the quality of what is produced depends directly on the richness, punctuality and precision of the instructions that are given to artificial intelligence. In short, it is all a matter of "prompt", or how to formulate the request to the system, so much so that the way writing instructions has become increasingly articulated and complex (it is also called prompt engineering), and is becoming a new skill increasingly in demand, almost an art form. A task so difficult that, in some cases, it is preferred to entrust it to ChatGPT, the other very popular generative AI of the American tech company OpenAI.
The company led by Sam Altman has in fact presented these days DALL-E 3, the new iteration of its generative AI that, at least according to its creators, can now produce more complex, detailed and realistic images with much simpler prompts. Of course, we will have to wait until this October to be able to test the new system in person, but in the meantime the first photos shown by the American company, accompanied by the prompt that generated them, already leave you speechless and seem to open new scenarios, for better or for worse. Yes, because if it is true that these tools can be put at the service of creativity, it is equally true that they lend themselves to the creation of increasingly realistic false images: images that malicious people and fraudsters could use for fraud, social engineering or even to manipulate public opinion.
What we know
First of all, we know that DALL-E 3 will be natively integrated with ChatGPT and that from October it will be available through the paid ChatGPT Plus service; that it will be able to create images inspired by the conversation that the user holds with the popular chatbot (as already happens with Microsoft's Bing Chat), but also that will allow it to be used to help generate more timely and precise prompts; We also know that it will finally be able to manage the insertion of text within images (such as labels and signs), overcoming an important limitation of the previous version and beating the competition on time.
What OpenAI has not disclosed - as had already happened for GPT-4 - are the details on the training of the new version of DALL-E, such as the number of parameters used, the type and above all the origin of the data on which the AI was trained. Meanwhile, analyzing the first images on the official blog of the company, to immediately jump to the eye is the remarkable precision (especially compared to DALL-E 2), with which the AI would realize (the conditional is a must until a direct test) details previously problematic as the hands, creating high quality images already with short and very simple prompts.
A tool within everyone's reach, perhaps too many
The enhancement of DALL-E 3, and the simultaneous simplification of the instructions necessary to create high quality images, drastically lower the barrier of access to the production of images useful for the most diverse purposes, which OpenAI also makes available to those who generate them without limitations of use. A great opportunity for creators of all kinds, for artists looking for inspiration, or for companies that need to generate product images with which to accompany their communication and marketing campaigns. And then, again, for e-commerce site managers, who can try to use DALL-E to renew their product catalogs. It remains to be seen what will be, in practice, the actual quality of the images generated by OpenIA AI. And then, ultimately, also what will happen to professionals and companies that produce and trade stock images.
And then, inevitably, there is also the problem of a potential illicit, dangerous or otherwise unethical use of this generative technology: in addition to the fact that the images generated with DALL-E 3 could once again be polluted by the prejudices inherent in the training dataset, they could also be used to create disinformation, to make bad jokes that risk unleashing chaos in contexts where political and social tension is already high (such as the false images of the arrest of former US President Donald Trump), or that simply risk compromising the image of public figures (such as the photos that portrayed Pope Francis with a fashionable down jacket).
In this regard, OpenAI throws water on the fire by revealing that it has collaborated with a "Red Team", a task force specialized in putting a strain on the service in search of flaws and vulnerabilities, precisely to identify and mitigate problems such as harmful bias or the generation of propaganda and disinformation. Whether that will be enough, we will find out very soon.
*Journalist, innovation expert and curator of the Artificial Intelligence Observatory ANSA.it
All rights reserved © Copyright ANSA