Generative Artificial Intelligences and Pictures Synthesis

(A Tribute to a Universal Artist?)

Jean-François COLONNA
CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641, École polytechnique, Institut Polytechnique de Paris, CNRS, France

(CMAP28 WWW site: this page was created on 01/06/2024 and last updated on 05/31/2024)

[en français/in french]


1 - Introduction:

In a matter of months, Generative Artificial Intelligences (GAIs) have infiltrated our daily lives. I have conducted numerous experiments, particularly with ChatGPT and Bard/Gemini. These experiments revealed that, generally, using them as reliable sources of information (in Mathematics, for example) was not always very prudent, while letting them "run free" could unleash boundless imagination upon us.

However, some of these GAIs are not confined to text production. They can also rapidly [01] generate high-quality pictures. As we will see later, this objectively demonstrates their creativity.

2 - The Generative Artificial Intelligences:

To be able to generate pictures like those presented in this document, it is necessary to undergo training using "real" data, particularly pairs {picture, description} available in large quantities on the internet [02]. Specialized formal neural networks are then used to transform, on one hand, pictures in "raster" mode [03] into a more concise [04] representation closer to their semantic content. On the other hand, a similar process is applied to descriptions, which are texts written in natural languages. The result of this processing for each {picture, description} pair is a set of numbers (a "point") stored in a massive multidimensional space known as the Semantic Space (S). The treatments applied are such that two neighboring points in S correspond to semantically close notions.

Thus, learning is, in a sense, a form of semantic compression. The exploitation of space S to generate new pictures (or texts...) can be considered naively as semantic decompression. The user-provided prompt [05] positions itself in S, and one of the closest points P defines a picture that just needs to be decompressed. It seems that a random selection is made when multiple neighbors satisfy the prompt. This likely explains why submitting the same prompt twice will yield two different but semantically close pictures.

However, as always, the devil is in the details, and reality is certainly much more complex. Indeed, as the examples to be presented later will show, in a prompt, it is generally not a single semantic concept that is specified, but multiple ones. Procedures such as "mixing," interpolation, combination, etc., must therefore be implemented.

The experiences reported below showed that in fact two GAIs had to be used: the first one actually Generative and the second one Antagonistic intended on the one hand to evaluate the quality of the productions of the first one and on the other hand to filter the content so as to avoid "inappropriate" pictures [06].

3 - Some examples of the generation of pictures (1450 on Friday May 31 2024):

The GAI accessible on the website '' was used to generate these pictures.

So, these are 1450 images generated by this GAI that will be presented below. In fact, more were calculated but not all are exhibited and those that were rejected were either due to personal preference or because they were too similar to others already obtained. This number (1450) may seem excessive, making it impossible to visualize all of these pictures, but this is voluntary and intended to illustrate the incredible "imaginative" power of this GAI...

Nota: For all submitted prompts, french was used and a translation into english will be provided below.

3.1 - Some examples of the generation of pictures using the prompt "La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"]:

With virtually infinite possibilities, I decided to limit the tests by using only one prompt chosen in such a way that it references concepts with a very low probability of being encountered together on the Internet:

"La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"] [07]

Where X is chosen from an arbitrary list of artists (writers, musicians, painters, sculptors,...), engineers, places,... In most cases, the same prompt was iterated multiple times, resulting in a series of pictures on a given theme (defined by X), all different (illustrating the use of randomness mentioned above, randomness that further explains the a priori impossibility of obtaining each of them again) but referencing the same concepts. Here are 1450 pictures thus obtained:

The pictures obtained in this way are undeniably breathtaking, incredible,... accurately addressing the queries. Indeed, they depict libraries full of books, but also convey the sense of infinity one experiences when reading Jorge Luis Borges's short story, all within an appropriate temporal context.

3.2 - Some examples of the generation of pictures using the prompt "Une image à la façon de X" ["A Picture in the style of X"]:

Let's simplify the prompt by using only:

"Une image à la façon de X" ["A Picture in the style of X"]

thereby giving more freedom to the GAI. Here are the pictures thus obtained:

3.3 - Some "free" examples of the generation of pictures:

And now let's use some "free" prompts...

4 - Best Of:

5 - Some Comments, Remarks and Questions:

These images unequivocally demonstrate that this GAI is capable of transforming a few words (the prompt) into coherent images of remarkable complexity in a relevant manner. Regarding those inspired by known artists, some have argued that they are merely mediocre copies that could fool no one. This might be true, but the achievement does not lie there. It resides in the digital formalization of concepts gleaned from hundreds of millions of documents on the internet. While it is indeed evident that upon closer inspection, an observant eye cannot be deceived and will immediately recognize that this image is not an unknown canvas by Rembrandt, one can't help but question whether it fits within his style and cannot be confused with this one. If I chose to direct my prompts towards art and painting in particular, it was to narrow my experiments, not to play the role of a forger. Thus, what is truly astounding is the performance of the designers of this GAI, and that cannot be contested, unlike the artistic value of these images...

Once the amazement, and dare I say, wonder, has subsided, a number of questions arise:

One will nevertheless note a small numbre of anomalies (but some are perhaps "voluntary"...) and for example:

At last, one will note an astonishing, unexpected convergence: the Library of Babel is practically infinite, and therefore, it is impossible to explore even a partial portion of it. Is it not the same with this GAI that seems to contain a quasi-infinite number of pictures, of which we can never see more than a minuscule fraction?

Is this GAI the Library of Babel?

6 - About Creativity and Consciousness:

Once again, it seems challenging to dispute the quality and originality of these pictures generated by this GAI (and others). There's no hesitation in asserting that it exhibits creativity! While this statement may be surprising to some, let's reflect on our own creative acts. How do we generate new ideas? Certainly not out of nothing and I see two possible origins: firstly, interaction with our environment, particularly through vision concerning pictures. Secondly, I am convinced that at the subconscious level, there is a continuous "mixing" of previous ideas stored in our brain, which should be viewed as a dynamic semantic space. These new tools inevitably lead us to question whether our brain is nothing more than a "mere" machine.

With these undeniable successes, do Artificial Intelligences not demonstrate intelligence in its broad sense? And if so, could they become conscious? If so, would we be aware of it? It seems that the emergence of consciousness is linked to complexity (especially in connections), but also to "external" stimulation, ensured in us (and in "higher" animals) by our five senses, and this may be what our Artificial Intelligences lack to reach this higher level of evolution.

Finally, can't these studies on Artificial Intelligences enlighten us about our own memory [10] and the production of our dreams during which, as in the pictures presented above, known or fictional characters appear in real or imaginary settings?

Do these pictures reveal us the dreams of our GAIs?

7 - Conclusion:

Undoubtedly, in the span of a few months, a threshold has been crossed. The victory of AlphaGo over Lee Sedol in the Google DeepMind Challenge Match in March 2016 already opened a breach, and today, the successes of GAIs demonstrate the enormous potential of this research. What would Alan Turing have thought about it?

However, this emergence is naturally accompanied by sometimes justified fears:

But let's imagine in our living rooms wall screens exhibiting masterpieces of world painting from yesterday, today and tomorrow, that have never existed and are constantly renewed by a GAI...

So, what surprise awaits us tomorrow?

Any sufficiently advanced technology is indistinguishable from magic

Arthur Charles Clarke (1962).

  • [01] - About twenty seconds for the given examples.

  • [02] - It generally involves processing several hundred million {picture, description} pairs, requiring the use of high-performance computing and storage servers. Particularly for formal neural networks, highly parallel NVIDIA processors are used.

  • [03] - A picture in "raster" mode can be defined by three arrays of numerical values (with horizontal and vertical dimensions matching the picture), each corresponding to the luminance of a primary color: Red, Green, and Blue.

  • [04] - It is, in a way, a form of semantic compression.

  • [05] - The prompt corresponds to the natural language query (English, for example) addressed to the GAI to describe what one wishes to obtain (a picture in this case).

  • [06] - This was seen on several occasions with Sandro Botticelli, certainly because naked bodies had been generated.

  • [07] - Jorge Luis Borges is an Argentine man of letters. In 1941, in a fascinating short story, he takes us into the universe of The Library. The narrator, one of its countless servants, reveals what it could be: made up of shelves, corridors, and endless stairs, it would actually contain all possible books printed in a single format: 410 pages, each with 40 lines of 80 characters chosen from 25 possibilities. Although finite (on the order of 101834097), the number of works surpasses comprehension, but very few, of course, contain a completely intelligible text in a certain language (and yet, they exist somewhere, but where?). And the only treasure the narrator has ever discovered in his tedious travels is a single readable yet incomprehensible sentence: Ô time your pyramids.

  • [08] - Hans Ruedi Giger is the designer of the monster and sets for the film Alien, directed in 1979 by Ridley Scott.

  • [09] - The style of certain artists of past decades is easy to formalyse as I have shown myself. This is the case with: Jean Arp, Jean-Michel Atlan, Robert et Sonia Delaunay or again Victor Vasarely. But until recently, flemish artists seemed "inaccessible" and "untouchable" to me! And it is no longer the case (see for exemple Jerôme Bosch and Pieter Bruegel der elder)...

  • [10] - For example, do we really know how faces are stored in our brain?

