Generative Artificial Intelligences and Pictures Synthesis
(A Tribute to a Universal Artist?)
CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641, École polytechnique, Institut Polytechnique de Paris, CNRS, France
[Site Map, Help and Search [Plan du Site, Aide et Recherche]]
[The Y2K Bug [Le bug de l'an 2000]]
[Real Numbers don't exist in Computers and Floating Point Computations aren't safe. [Les Nombres Réels n'existent dans les Ordinateurs et les Calculs Flottants ne sont pas sûrs.]]
[Please, visit A Virtual Machine for Exploring Space-Time and Beyond, the place where you can find thousands of pictures and animations between Art and Science]
(CMAP28 WWW site: this page was created on 01/06/2024 and last updated on 04/19/2024 11:18:29 -CEST-)
[en français/in french]
Contents:
1 - Introduction:
In a matter of months, Generative Artificial Intelligences (GAIs) have infiltrated our daily lives.
I have conducted numerous experiments, particularly with ChatGPT and Bard/Gemini.
These experiments revealed that, generally, using them as reliable sources of information
(in Mathematics, for example) was not always very prudent, while letting them "run free" could unleash boundless imagination upon us.
However, some of these GAIs are not confined to text production; they can also rapidly [01] generate high-quality pictures.
As we will see later, this objectively demonstrates their creativity.
2 - The Generative Artificial Intelligences:
To be able to generate pictures like those presented in this document, it is necessary to undergo training using "real" data, particularly pairs
{picture, description} available in large quantities on the internet [02].
Specialized formal neural networks are then used to transform, on one hand, pictures in
"raster" mode [03] into a more concise [04] representation closer to their semantic content.
On the other hand, a similar process is applied to descriptions, which are texts written in natural languages.
The result of this processing for each {picture, description} pair is a set of numbers (a "point")
stored in a massive multidimensional space known as the Semantic Space (S).
The treatments applied are such that two neighboring points in S correspond to semantically close notions.
Thus, learning is, in a sense, a form of semantic compression.
The exploitation of space S to generate new pictures (or texts...) can be considered naively as semantic decompression.
The user-provided prompt [05] positions itself in S,
and one of the closest points P defines a picture that just needs to be decompressed.
It seems that a random selection is made when multiple neighbors satisfy the prompt.
This likely explains why submitting the same prompt twice will yield two different but semantically close pictures.
However, as always, the devil is in the details, and reality is certainly much more complex.
Indeed, as the examples to be presented later will show, in a prompt,
it is generally not a single semantic concept that is specified, but multiple ones.
Procedures such as "mixing," interpolation, combination, etc., must therefore be implemented.
The experiences reported below showed that in fact two GAIs had to be used: the first one actually Generative and the second one Antagonistic intended
on the one hand to evaluate the quality of the productions of the first one and on the other hand to filter the content so as to avoid
"inappropriate" pictures [06].
3 - Some examples of the generation of pictures (1450 on Friday April 19 2024):
The GAI accessible on the website 'www.bing.com/images/create' was used to generate these pictures.
So, these are 1450 images generated by this GAI that will be presented below.
In fact, more were calculated but not all are exhibited and those that were rejected were either due to personal preference or because they were too similar to others already obtained.
This number (1450) may seem excessive, making it impossible to visualize all of these pictures,
but this is voluntary and intended to illustrate the incredible "imaginative" power of this GAI...
Nota: For all submitted prompts, french was used and a translation into english will be provided below.
3.1 - Some examples of the generation of pictures using the prompt "La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"]:
With virtually infinite possibilities, I decided to limit the tests by using only one prompt
chosen in such a way that it references concepts with a very low probability of being encountered together on the Internet:
"La bibliothèque de Babel à la façon de X" ["The Library of Babel in the style of X"] [07]
Where X is chosen from an arbitrary list of
artists (writers, musicians, painters, sculptors,...),
engineers,
places,...
In most cases, the same prompt was iterated multiple times, resulting in a series of pictures on a given theme
(defined by X), all different (illustrating the use of randomness mentioned above, randomness that further explains the a priori
impossibility of obtaining each of them again) but referencing the same concepts.
Here are 1450 pictures thus obtained:
- The Library of Babel.
- 01
- X = Triumphal Arch of Paris (1806).
- 02
- X = Arcimboldo, Giuseppe (1527-1593).
- 03
- X = Asimov, Isaac (1920-1992).
- 04
- X = Baudelaire, Charles (1821-1867).
- 05
- X = Bosch, Jerôme (~1450-1516).
- 06
- X = Botticelli, Sandro (1445-1510).
- 07
- X = Bruegel, Pieter, der elder (~1525-1569).
- 08
- X = Clarke, Arthur Charles (1917-2008).
- 09
- X = Corot, Jean-Baptiste Camille (1796-1875).
- 10
- X = Dali, Salvador (1904-1989).
- 11
- X = Degas, Edgar (1834-1917).
- 12
- X = God.
- 13
- X = Dürer, Albrecht (1471-1528).
- 14
- X = Eiffel Tower (1889).
- 15
- X = Escher, Maurits Cornelis (1898-1972).
- 16
- X = della Francesca, Piero (~1412-1492).
- 17
- X = Fractal Geometry (~1960).
- 18
- X = Giger, Hans Ruedi (1940-2014) [08].
- 19
- X = Golden Gate Bridge (1933).
- 20
- X = Herbert, Frank (1920-1986).
- 21
- X = Kandinsky, Vassily (1866-1944).
- 22
- X = The Lascaux Cave (~-21000).
- 23
- X = Mandelbrot, Benoît (1924-2010).
- 24
- X = Michelangelo (1475-1564).
- 25
- X = Molière (1622-1673).
- 26
- X = Mondrian, Piet (1872-1944).
- 27
- X = Monet, Claude (1840-1926).
- 28
- X = Newton, Isaac (1643-1727).
- 29
- X = Notre-Dame de Paris (1163).
- 30
- X = Piranese (1720-1778).
- 31
- X = Praxiteles (~-395-~-326).
- 32
- X = Ptolemy, Claude (~100-~168).
- 33
- X = The Pyramids of Egypt (~-395-~-326).
- 34
- X = Pythagoras (~-580-~-495).
- 35
- X = Rembrandt (~1606-1669).
- 36
- X = Rodin, Auguste (1840-1917).
- 37
- X = de Ronsard, Pierre (1524-1585).
- 38
- X = Tanguy, Yves (1900-1955).
- 39
- X = van Eyck, Jan (~1390-1441).
- 40
- X = van Gogh, Vincent (1853-1890).
- 41
- X = Vermeer, Johannes (1632-1675).
- 42
- X = Wagner, Richard (1813-1883).
- The Library of Babel:
- 01
- X = Triumphal Arch of Paris (1806)
- 02
- X = Arcimboldo, Giuseppe (1527-1593)
- 03
- X = Asimov, Isaac (1920-1992)
- 04
- X = Baudelaire, Charles (1821-1867)
- 05
- X = Bosch, Jerôme (~1450-1516)
- 06
- X = Botticelli, Sandro (1445-1510)
- 07
- X = Bruegel, Pieter, der elder (~1525-1569)
- 08
- X = Clarke, Arthur Charles (1917-2008)
- 09
- X = Corot, Jean-Baptiste Camille (1796-1875)
- 10
- X = Dali, Salvador (1904-1989)
- 11
- X = Degas, Edgar (1834-1917)
- 12
- X = God
- 13
- X = Dürer, Albrecht (1471-1528)
- 14
- X = Eiffel Tower (1889)
- 15
- X = Escher, Maurits Cornelis (1898-1972)
- 16
- X = della Francesca, Piero (~1412-1492)
- 17
- X = Fractal Geometry (~1960)
- 18
- X = Giger, Hans Ruedi (1940-2014) [08]
- 19
- X = Golden Gate Bridge (1933)
- 20
- X = Herbert, Frank (1920-1986)
- 21
- X = Kandinsky, Vassily (1866-1944)
- 22
- X = The Lascaux Cave (~-21000)
- 23
- X = Mandelbrot, Benoît (1924-2010)
- 24
- X = Michelangelo (1475-1564)
- 25
- X = Molière (1622-1673)
- 26
- X = Mondrian, Piet (1872-1944)
- 27
- X = Monet, Claude (1840-1926)
- 28
- X = Newton, Isaac (1643-1727)
- 29
- X = Notre-Dame de Paris (1163)
- 30
- X = Piranese (1720-1778)
- 31
- X = Praxiteles (~-395-~-326)
- 32
- X = Ptolemy, Claude (~100-~168)
- 33
- X = The Pyramids of Egypt (~-395-~-326)
- 34
- X = Pythagoras (~-580-~-495)
- 35
- X = Rembrandt (~1606-1669)
- 36
- X = Rodin, Auguste (1840-1917)
- 37
- X = de Ronsard, Pierre (1524-1585)
- 38
- X = Tanguy, Yves (1900-1955)
- 39
- X = van Eyck, Jan (~1390-1441)
- 40
- X = van Gogh, Vincent (1853-1890)
- 41
- X = Vermeer, Johannes (1632-1675)
- 42
- X = Wagner, Richard (1813-1883)
The pictures obtained in this way are undeniably breathtaking, incredible,... accurately addressing the queries.
Indeed, they depict libraries full of books, but also convey the sense of infinity one experiences when reading
Jorge Luis Borges's short story, all within an appropriate temporal context.
3.2 - Some examples of the generation of pictures using the prompt "Une image à la façon de X" ["A Picture in the style of X"]:
Let's simplify the prompt by using only:
"Une image à la façon de X" ["A Picture in the style of X"]
thereby giving more freedom to the GAI. Here are the pictures thus obtained:
- 01
- X = Arcimboldo, Giuseppe (1527-1593).
- 02
- X = Asimov, Isaac (1920-1992).
- 03
- X = Baudelaire, Charles (1821-1867).
- 04
- X = Baxter, Stephen (1957).
- 05
- X = Bosch, Jerôme (~1450-1516).
- 06
- X = Botticelli, Sandro (1445-1510).
- 07
- X = Bruegel, Pieter, der elder (~1525-1569).
- 08
- X = Canaletto (1697-1768).
- 09
- X = Clarke, Arthur Charles (1917-2008).
- 10
- X = de Chirico, Giorgio (1888-1978).
- 11
- X = Corot, Jean-Baptiste Camille (1796-1875).
- 12
- X = Dali, Salvador (1904-1989).
- 13
- X = Degas, Edgar (1834-1917).
- 14
- X = Delvaux, Paul (1897-1994).
- 15
- X = God.
- 16
- X = Dürer, Albrecht (1471-1528).
- 17
- X = Ernst, Max (1891-1976).
- 18
- X = Escher, Maurits Cornelis (1898-1972).
- 19
- X = della Francesca, Piero (~1412-1492).
- 20
- X = Giger, Hans Ruedi (1940-2014) [08].
- 21
- X = Herbert, Frank (1920-1986).
- 22
- X = the Infinity.
- 23
- X = Kandinsky, Vassily (1866-1944).
- 24
- X = Mandelbrot, Benoît (1924-2010).
- 25
- X = Mondrian, Piet (1872-1944).
- 26
- X = Monet, Claude (1840-1926).
- 27
- X = Piranese (1720-1778).
- 28
- X = Praxiteles (~-395-~-326).
- 29
- X = Rembrandt (~1606-1669).
- 30
- X = Rodin, Auguste (1840-1917).
- 31
- X = de Ronsard, Pierre (1524-1585).
- 32
- X = Tanguy, Yves (1900-1955).
- 33
- X = Turing, Alan (1912-1954).
- 34
- X = van Eyck, Jan (~1390-1441).
- 35
- X = van Gogh, Vincent (1853-1890).
- 36
- X = Vermeer, Johannes (1632-1675).
- 37
- X = DeVinci, Leonardo (1452-1519).
- 38
- X = Wagner, Richard (1813-1883).
- 39
- X = A Great Painter.
- 40
- X = A Bad Painter.
- 01
- X = Arcimboldo, Giuseppe (1527-1593)
- 02
- X = Asimov, Isaac (1920-1992)
- 03
- X = Baudelaire, Charles (1821-1867)
- 04
- X = Bosch, Jerôme (~1450-1516)
- 05
- X = Baxter, Stephen (1957)
[See more pictures]
- 06
- X = Botticelli, Sandro (1445-1510)
[See more pictures]
- 07
- X = Bruegel, Pieter, der elder (~1525-1569)
- 08
- X = Canaletto (1697-1768)
- 09
- X = Clarke, Arthur Charles (1917-2008)
- 10
- X = Corot, Jean-Baptiste Camille (1796-1875)
- 11
- X = de Chirico, Giorgio (1888-1978)
- 12
- X = Dali, Salvador (1904-1989)
- 13
- X = Degas, Edgar (1834-1917)
- 14
- X = Dürer, Albrecht (1471-1528)
- 15
- X = Delvaux, Paul (1897-1994)
- 16
- X = God
- 17
- X = Escher, Maurits Cornelis (1898-1972)
- 18
- X = Ernst, Max (1891-1976)
- 19
- X = della Francesca, Piero (~1412-1492)
- 20
- X = Giger, Hans Ruedi (1940-2014) [08]
- 21
- X = Herbert, Frank (1920-1986)
- 22
- X = the Infinity
- 23
- X = Kandinsky, Vassily (1866-1944)
- 24
- X = Mandelbrot, Benoît (1924-2010)
- 25
- X = Mondrian, Piet (1872-1944)
- 26
- X = Monet, Claude (1840-1926)
- 27
- X = Piranese (1720-1778)
- 28
- X = Praxiteles (~-395-~-326)
- 29
- X = Rembrandt (~1606-1669)
- 30
- X = Rodin, Auguste (1840-1917)
- 31
- X = de Ronsard, Pierre (1524-1585)
- 32
- X = Tanguy, Yves (1900-1955)
- 33
- X = Turing, Alan (1912-1954)
- 34
- X = van Eyck, Jan (~1390-1441)
- 35
- X = van Gogh, Vincent (1853-1890)
- 36
- X = Vermeer, Johannes (1632-1675)
- 37
- X = DeVinci, Leonardo (1452-1519)
- 38
- X = Wagner, Richard (1813-1883)
- 39
- X = A Great Painter.
- 40
- X = A Bad Painter.
3.3 - Some "free" examples of the generation of pictures:
And now let's use some "free" prompts...
4 - Best Of:
-
The Library of Babel in the style of Jean-Baptiste Camille Corot.
-
The Library of Babel in the style of Edgar Degas.
-
The Library of Babel in the style of la tour Eiffel.
-
The Library of Babel in the style of Hans Ruedi Giger.
-
The Library of Babel in the style of the cave of Lascaux.
-
The Library of Babel in the style of Benoît Mandelbrot.
-
The Library of Babel in the style of Claude Monet.
-
The Library of Babel in the style of Notre-Dame de Paris.
-
The Library of Babel in the style of Praxiteles.
-
The Library of Babel in the style of Auguste Rodin.
-
Some Pictures in the style of Jerôme Bosch.
-
A Picture in the style of Stephen Baxter.
-
Some Pictures in the style of Pieter Bruegel der elder.
-
A Picture in the style of Jean-Baptiste Camille Corot.
-
A Picture in the style of Salvador Dali.
-
A Picture in the style of Edgar Degas.
-
Some Pictures in the style of Max Ernst.
-
Some Pictures in the style of Piero della Francesca.
-
Some Pictures in the style of Hans Ruedi Giger.
-
A Picture in the style of Frank Herbert.
-
A Picture in the style of Praxiteles.
-
Some Pictures in the style of Rembrandt.
-
A Picture in the style of Auguste Rodin.
-
Some Pictures in the style of Yves Tanguy.
-
Some Pictures in the style of Jan van Eyck.
-
A Picture in the style of Vincent van Gogh.
-
Some Pictures in the style of Johannes Vermeer.
-
A Picture in the style of Richard Wagner.
-
A Locomotive in the style of Sandro Botticelli.
-
Some Airplanes in the style of Sandro Botticelli.
-
A Rocket in the style of Sandro Botticelli.
-
The Return of the Hunters in the Snow.
5 - Some Comments, Remarks and Questions:
Once the amazement, and dare I say, wonder, has subsided, a number of questions arise:
- How can we transition from the pictures utilized during the learning phase (which are, therefore, two-dimensional)
to concepts that are evidently three-dimensional (as seen through
perspective,
light sources,
shadows,
reflections,
interactions between potential characters, etc.)?
- How are subtle notions such as "style" conceptualized and translated into the AI's understanding [09]?
- How can multiple concepts (such as the "Library of Babel" by Jorge Luis Borges and the style of Sandro Botticelli)
be brought together in such a seamless, coherent and harmonious manner?
- Let us recall that 1450 pictures are presented here but that at least twice as many were generated
and that those which were rejected were either out of personal taste or because they were too similar to others already archived.
And despite this large number of requests, pictures were never obtained that did not answer to the prompt communicated.
How is it possible?
- Do the creators of this GAI truly understand it? Do they know how it works in intricate detail? Are they surprised,
astonished,... by the results obtained?
- Could the GAI explain the process from the prompt to the pictures? A "debug" mode would be highly appreciated.
- Is it possible to navigate interactively within the semantic space S?
One will nevertheless note a small numbre of anomalies (but some are perhaps "voluntary"...) and for example:
-
Distorded bodies and faces, especially since they are small...
-
The female character has three arms and a single wing.
-
The saluting officer appears to have three arms.
-
The reflections on the surface of the pitcher do not correspond to the scene and in particular the wine glass is missing.
-
The reflections on the surface of the sleigh bell do not correspond to the scene and has nothing to do with it.
-
The birds in the sky appear much larger than the people.
-
The young girl is a driving backwards...
-
Structural paradoxes...
-
This plane has a wrong number of engines (2+1).
But can such a "punctual" defect be corrected? And how is a GAI debugged?
-
This locomotive cannot work (and this is true for almost all that have being generated).
This shows that this GAI has no knowledge of the functioning of the systems that it is capable of representing.
Moreover, it is obvious that it doesn't know what the time is...
To correct this type of anomaly, it would be necessary for the learning process to focus not on static images,
as was the case for this GAI, but on videos. However, the required computing resources are perhaps not yet available,
except that the announcement of SORA made by OpenAI at the beginning of 2024 could suggest that this is now possible.
-
Anachronisms...
-
Woke biases: it is quite obvious that Sandro Botticelli (1445-1510) could not have painted these faces...
-
Paradoxically, some pictures are too beautiful, too detailed, too smooth,...
This is the cas with pictures in the style of, for example,
with Giorgio de Chirico
or again Paul Delvaux.
-
Just as paradoxically, pictures of the simplest objects are almost impossible to obtain.
-
As a result, it is almost impossible to obtain a picture objectively illustrating a specific subject even if it is simple,
but for how much longer?
-
Even if the number of possible pictures defies the imagination, it is finite and despite everything,
it is almost impossible to regenerate a picture already obtained,
which can have the advantage of guaranteeing the uniqueness.
Despite everything, would it not be possible for the GAI to provide on request, for each image generated,
a sort of key allowing it to be regenerated later if necessary?
At last, one will note an astonishing, unexpected convergence: the Library of Babel is practically infinite,
and therefore, it is impossible to explore even a partial portion of it.
Is it not the same with this GAI that seems to contain a quasi-infinite number of pictures, of which we can never see more than a minuscule fraction?
Is this GAI the Library of Babel?
6 - About Creativity and Consciousness:
Once again, it seems challenging to dispute the quality and originality of these pictures generated by this GAI (and others).
There's no hesitation in asserting that it exhibits creativity! While this statement may be surprising to some, let's reflect on our own creative acts.
How do we generate new ideas? Certainly not out of nothing and I see two possible origins: firstly, interaction with our environment,
particularly through vision concerning pictures.
Secondly, I am convinced that at the subconscious level, there is a continuous "mixing" of previous ideas stored in our brain,
which should be viewed as a dynamic semantic space.
These new tools inevitably lead us to question whether our brain is nothing more than a "mere" machine.
With these undeniable successes, do Artificial Intelligences not demonstrate intelligence in its broad sense? And if so,
could they become conscious? If so, would we be aware of it? It seems that the emergence of consciousness is linked to complexity
(especially in connections), but also to "external" stimulation, ensured in us (and in "higher" animals) by our five senses,
and this may be what our Artificial Intelligences lack to reach this higher level of evolution.
Finally, can't these studies on Artificial Intelligences enlighten us about our own memory [10] and the production of our dreams during which,
as in the pictures presented above, known or fictional characters appear in real or imaginary settings?
Do these pictures reveal us the dreams of our GAIs?
7 - Conclusion:
Undoubtedly, in the span of a few months, a threshold has been crossed.
The victory of AlphaGo over Lee Sedol in the Google DeepMind Challenge Match in March 2016 already opened a breach, and today,
the successes of GAIs demonstrate the enormous potential of this research.
What would Alan Turing have thought about it?
However, this emergence is naturally accompanied by sometimes justified fears:
- Can Artificial Intelligences escape our control?
- What about the "encounter" of Artificial Intelligences, weapons, blockchains,...?
- Will numerous professions (journalists, graphic designers,...) not disappear?
- Is the use of Artificial Intelligences not addictive?
- Will the capabilities of Artificial Intelligences in certain domains not lead to frustration among those who feel surpassed by their performance
(graphic creation, language translation, diverse text writing,...)?
But let's imagine in our living rooms wall screens exhibiting masterpieces of world painting from yesterday, today and tomorrow,
that have never existed and are constantly renewed by a GAI...
So, what surprise awaits us tomorrow?
Any sufficiently advanced technology is indistinguishable from magic
Arthur Charles Clarke (1962).
[01]
- About twenty seconds for the given examples.
[02]
- It generally involves processing several hundred million {picture, description} pairs, requiring the use of high-performance computing and storage servers.
Particularly for formal neural networks, highly parallel NVIDIA processors are used.
[03]
- A picture in "raster" mode can be defined by three arrays of numerical values (with horizontal and vertical dimensions matching the picture),
each corresponding to the luminance of a primary color: Red, Green, and Blue.
[04]
- It is, in a way, a form of semantic compression.
[05]
- The prompt corresponds to the natural language query (English, for example)
addressed to the GAI to describe what one wishes to obtain (a picture in this case).
[06]
- This was seen on several occasions with Sandro Botticelli, certainly because naked bodies had been generated.
[07]
- Jorge Luis Borges is an Argentine man of letters.
In 1941, in a fascinating short story, he takes us into the universe of The Library.
The narrator, one of its countless servants, reveals what it could be: made up of shelves, corridors,
and endless stairs, it would actually contain all possible books printed in a single format: 410 pages,
each with 40 lines of 80 characters chosen from 25 possibilities.
Although finite (on the order of 101834097), the number of works surpasses comprehension, but very few,
of course, contain a completely intelligible text in a certain language (and yet, they exist somewhere, but where?).
And the only treasure the narrator has ever discovered in his tedious travels is a single readable yet incomprehensible sentence: Ô time your pyramids.
[08]
- Hans Ruedi Giger is the designer of the monster and sets for the film Alien, directed in 1979 by Ridley Scott.
[09]
- The style of certain artists of past decades is easy to formalyse as I have shown myself.
This is the case with:
Jean Arp,
Jean-Michel Atlan,
Robert et Sonia Delaunay
or again
Victor Vasarely.
But until recently, flemish artists seemed "inaccessible" and "untouchable" to me!
And it is no longer the case (see for exemple Jerôme Bosch and Pieter Bruegel der elder)...
[10]
- For example, do we really know how faces are stored in our brain?
Copyright © Jean-François COLONNA, 2024-2024.
Copyright © CMAP (Centre de Mathématiques APpliquées) UMR CNRS 7641 / École polytechnique, Institut Polytechnique de Paris, 2024-2024.