For pioneering contributions to generative image synthesis

We are fascinated when a machine succeeds in creating something new. But can a computer really be creative? With recent advances in the field of generative artificial intelligence (AI), the vision of using machines as partners in creative processes is within reach. Generative AI processes make it possible to generate texts, images, audio, videos and three-dimensional scenes of astonishing quality, opening up new possibilities for ground-breaking applications. In fact, we are currently experiencing a real boom. Massive investments and the founding of start-ups that exploit the potential of generative AI are spurring on this rapid development.

In this innovative and highly competitive field, the award winner and his working group have succeeded in distinguishing themselves through groundbreaking pioneering work – years before generative AI became a topic of public perception.

With the development and introduction of so-called diffusion models, the prizewinner and his working group have laid the foundation for an entire family of innovative artificial neural networks. Diffusion models are able to create images from natural language descriptions formulated by humans, ranging from realistic-looking representations of subjects such as landscapes, people, objects or everyday scenes to imaginative abstract image compositions. Björn Ommer’s work complements the saying “a picture is worth a thousand words” by recognizing that just a few words are enough for an AI to automatically generate a rich palette of high-resolution images of excellent quality.

What is behind the innovation of Björn Ommer’s group? To illustrate the principle of the diffusion process, let’s imagine a collection of images that all share a common mood or aesthetic – such as landscapes in the warm light of a sunset. The basic idea behind diffusion techniques is to gradually overlay noise on such input images and, in a reverse process, remove the noise to create an image that fits into our collection. In doing so, the model mimics the distribution of image features in the dataset so that it can generate new and unique images that match the collection in terms of relevant image features without simply copying the original images.

Automated image generation using diffusion models is based on the idea of tracing noisy images back to suitable instances of the image data set used for training. Originally, this required enormous computing power, as a denoising process extending over several iterations was used at the level of individual pixels.

This is where the pioneering work of Björn Ommer’s group came in: Their groundbreaking approach shifted the denoising process from image pixels to socalled latent image representations – compact representations of image information that enable efficient processing. This technique, known as latent diffusion, significantly reduced the computational load without compromising the quality of the images created by the AI. Thanks to this innovation, the models of image generators can now be used on commercially available GPUs. As open source software, it makes a significant contribution to the democratization of generative AI and thus clearly stands out from previous proprietary text-toimage models such as DALL.E and Midjourney.

This year’s prizewinner, Björn Ommer, and the group around him have made a lasting impact on the field of generative AI with their latent diffusion approach and have developed groundbreaking image generators that have attracted a great deal of attention worldwide . Numerous awards at the best international conferences and honors such as the recent German AI Prize of the World and the recent nomination for the German Future Prize of the Federal President confirm the immense importance of their work.

The Eduard Rhein Foundation honors Björn Ommer and his team, a group of pioneers in the field of artificial intelligence. Through open and efficient model architectures, the group has democratized access to generative AI. Their approach demonstrates the potential of generative AI not only for images, but also for other modalities such as audio and text, thus laying the foundation for a wide range of applications – from media production, where realistic or creative content is created for presentations, to prototyping in automotive design, to synthetic data to support diagnostics in medical research.