"Don't paint from nature too much. Art is an abstraction. Derive this abstraction from nature while dreaming before it, and think more of the creation that will result."
↓ Trailer for "Moving Still" (2022):
↓ Still Frames of "Moving Still" (2022):
Moving Still is a 13 minute, experimental short movie and art-installation.
It takes on a odyssey through constantly morphing and pulsating nature scenes, with an eerie, dreamlike atmosphere. The visuals, both interconnected and disintegrating, evoke a haunting liminal space.
An evergrowing stem of memories, past or present? Clutching onto silhouettes and shadows in a world too fast to perceive, running into the unknown abyss.
//Project Info
}
"type": "Personal Project",
"contributor(s)": "Benno Schulze",
"full-length": "13:12 min"
"category": [
"SHORT FILM",
"AVANT GARDE",
"GAUGAN2",
"ART INSTALLATION"
]
}
Moving Still was created as a passion project, stemming from lengthy experiments (more about that in the project insight) with GauGAN Beta and, later on, GauGAN 2. I found the basic concept of being able to produce artificial, photorealistic scenes of nature simply immensely intriguing.
What I found even more fascinating, however, were the technical aspects—the inner workings of the GAN. To understand how it works, dissect its processes, test its limits.
To support and enhance the visual narrative with the use of AI became my primary focus. Use its weaknesses as a stylistic device, rather than trying to create a perfect copy of the reality.
From what I learned about GANs, I always drew parallels to the human brain: neurons firing, creating artificial imagery right before your very own eyes. You can imagine the shape of a house, the number of windows, the color of the door, and, drawing from images you've seen and environmental influences (essentially the training data), your brain fills in the shapes to produce a somewhat realistic image with ease.
Back to the GAN, the strong divergence between it´sindividual video frames stems directly from the limited capabilities of the GauGan Beta (2019) / GauGAN 2 (2021), developed by Taesung, Park et al. at NVIDIA Research AI. Although it is no more available, it was (from my knowledge) the first image generator made available to the public.
The GAN (Generative Adversarial Network) was trained on 10 million—unconnected—reference images of landscapes and, as such, lacks frame consistency since video synthesis was never part of its training data.
Even though, I created the first version of the short film back in 2022, since then, I´ve done multiple additions to both the visual and auditive layer and still have things to work and experiment with out of pure joy for the base idea. Some of those changes found it´s way to the project insight.
data = {
"web-resources": [
"Semantic Image Synthesis with Spatially-Adaptive Normalization",
//[Taesung Park;Ming-Yu_Liu;Ting-Chun_Wang;Jun-Yan;Zhu]
//[arxiv.org][PDF]
"Understanding GauGAN",
//[Ayoosh_Kathuria]
//[paperspace.com]
//[Part1]:_Unraveling_Nvidia's_Landscape_Painting_GANs
//[Part2]:_Training_on_Custom_Datasets
//[Part3]:_Model_Evaluation_Techniques
//[Part4]:_Debugging Training & Deciding If GauGAN Is Right For You
"GauGAN for conditional image generation",
//[Soumik_Rakshit;Sayak_Paul]
//[keras.io]
}
A deliberate lack of frame-to-frame consistency creates a surreal, abstract pulsation of shapes and contours. Abrupt shifts in lighting, and even the complete replacement of objects, introduce a new layer of narrative. The image is held together only by the silhouettes and compositional balance of its visual elements. A sense of unease is intentionally evoked through the dissonance between components within a single frame: while the camera pans and elements like trees or objects move fluidly, others—such as the ground—remain unnaturally static.
Depending on the viewer’s subjective focus, the scenes—despite their linear progression—can evoke entirely different impacts and perceived levels of control. Beyond the segmentation maps that guide image generation (LINK), the visual outcome is left entirely to the GAN. The viewer witnesses a virtual, artificially constructed landscape that never existed—or perhaps did. On a parallel, immersive level, the auditory layer abstracts perception further. Initially, low-frequency textures—barely perceptible, like the distant rattling of memory or the mechanical hum of an old film projector—set the tone. At key moments, calibrated highs and lows allow the viewer to both submerge and resurface. This intra-diegetic soundscape is subtly enriched with experimental music elements composed by Azure Studios.
Segmentation maps function as a type of masking process, using predefined HTML color codes to represent different surface elements—such as water, sand, rock, or grass. These coded maps were used within Cinema 4D to texture a custom built, rough 3D environment, which was then rendered out frame by frame and processed through GauGAN2. Given the extensive volume of 23,785 individual frames, the processing workflow was automated via a custom-built script.
White image flashes, reminiscent of firing synapses, occurred due to faulty repetitive frames. This happened whenever the custom script, which allocated about 8 seconds per frame, saved the image before it was fully processed. In such cases, the output from the previous frame (e.g., 701) was saved as the next frame in the pipeline (e.g., 702).These were replaced in a second pass, directly linking the film to the working process. This highlights that despite identical segmentation maps, the output image depends on many variables and coincidences. However, if they are processed in a single pass, a hidden seed creates similiarities, which are hard to reproduce.
On a further immersive level — similar to the video — the auditory aspect also abstracts the senses. Initially, some barely perceptible low-frequency sound effects, such as an almost omnipresent rattling, evoke the playback of memories, or the video frames resemble an old film projector. Highs and lows in the sound give viewers time to immerse themselves and also offer moments of relief.
The 13-minute short film Moving Still premiered as part of the interdisciplinary exhibition „Licht_Raum“, organized by the Zentrum für Kollektivkultur (ZfK) in Bremen and supported by the Bremen Department of Culture. The exhibition took place in a partially abandoned industrial building on the historic Hag grounds – once home to the former coffee manufacturer Kaffee HAG.
The raw, unrestored nature of the venue resonated perfectly with the abstract aesthetic atmosphere of the film. The installation was one of twelve artistic positions exploring the intersection of light, space, and perception.
As darkness fell, the building came alive: projections, light sculptures, and sonic interventions transformed the rooms into a dynamic gallery of passing experiences. The event program featured artist talks and live performances, including ambient and electronic sound sets that were opening and closing the exhibition.
Moving Still was exhibited on the upper floor of the former research wing – a secluded, quiet space, that allowed the recipient to fully let himself fall to the immersive visual and sound.
data = {
"web-resources": [
"Lichtkunst auf dem Hag-Gelände",
//[Weser_Kurier_Newspaper;Anke_Velten]
//[www.weser-kurier.de/lichtraum-zfk]
"Zentrum für Kollektivkultur - LichtRaum",
//[www.zfk-hb.de/news/lichtraum/]
}
The short film began as an exploration of various techniques using Cinema4D and "GauGAN2." The core idea and workflow centered around creating segmentation maps, where solid colors were used to define shapes and objects.
Each specific hex color code corresponded to a distinct object or material type—such as light blue for the sky, green for a meadow, or gray for a stone. Further direction for "GauGan2" can be given by uploading a style image, as a reference for rough color palette and mood.
Those segmentation maps were created in Cinema4D and rendered out as images sequences, to be processed by "GauGan2".
GauGan2 output with 2 different stye filtes enabed. You can notice that the silhouettes are not strictly followed, but rather give the overall composition, while being able to to adjust it. In this example, the small patches of clouds are connecting with eachother on the generated image, even though they are disconnected on the segmentation map.
The web-interface of GauGAN (Beta) around 2020 (it was first released in 2019).
As you can tell, it looks kind of rudimentary compared todays GAN´s.
Though one must keep in mind, it was the first ever generative-adversarial-networks (GAN) for artificial image generation, atleast released to the public.
I had played around with GauGAN (Beta) a bit but kind of forgot about it. In 2022 I got back to it with "GauGAN2". Initially for an event of Luft & Laune to be used as social media story ad and live visual content on stage.
While creating still images and short video sequences was enjoyable, I found the long, aggressively pulsating video scenes of nature to be the most fascinating. This came due to the GAN’s lack of frame consistency—unsurprising, given that it was only trained to generate single images.
As we’ve seen before, there’s always some variability in how the GAN processes input, even when using the exact same segmentation map.
Though the main issue was the web interface, which, at the time, was the only way to use the GAN. It allowed just one upload at a time—you had to click “process,” wait about seven seconds, and then manually download the generated output. Doing this hundreds or even thousands of times would have been absolutely dreadful and mind boggling.
So with the help of Paul Schulze, I enhanced a Python script — originally created by gormlabenz — for bulk uploading and downloading of input segmentation maps. Modifications also made it possible to set a style image and execute multiple iterations simultaneously.
class Gaugan2Renderer:
def __init__(self, waiting_time=5):
self.waiting_time = waiting_time
self.output_images = []
chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.add_argument("--remote-debugging-port=9222")
#chrome_options.binary_location = "/usr/bin/chromedriver" self.driver = webdriver.Firefox(
#ChromeDriverManager().install(),
# options=chrome_options
) def open(self):
self.driver.get("http://gaugan.org/gaugan2/")
WebDriverWait(self.driver, 10).until(
EC.presence_of_element_located((By.ID, "viewport"))
)
self.close_popups() def close_popups(self):
close_button = self.driver.find_element(By.XPATH,
"/html/body/div[2]/div/header/button")
if close_button:
close_button.click() terms_and_conditions = self.driver.find_element(
By.XPATH, '//*[@id="myCheck"]') if terms_and_conditions:
terms_and_conditions.click() def download_image(self, file_path):
output_canvas = self.driver.find_element(
By.ID, 'output')
canvas_base64 = self.driver.execute_script(
"return arguments[0].toDataURL('image/png').substring(21);", output_canvas)
canvas_png = base64.b64decode(canvas_base64) with open(file_path, 'wb') as f:
f.write(canvas_png) def create_output_dir(self):
os.makedirs(self.output_path, exist_ok=True) def render_image(self, file_path, style_filter_path): # segmentation map
self.driver.find_element(
By.XPATH, '//*[@id="segmapfile"]').send_keys(file_path)
self.driver.find_element(
By.XPATH, '//*[@id="btnSegmapLoad"]').click() # custom style filter
self.driver.find_element(
By.XPATH, '//*[@id="imgfile"]').send_keys(style_filter_path)
self.driver.find_element(
By.XPATH, '//*[@id="btnLoad"]').click()
self.driver.find_element(
By.XPATH, '//*[@id="render"]').click() def run(self, input_folder, style_filter_path, output_path):
self.image_paths = glob(input_folder + "/*.png")
self.output_path = output_path self.open()
self.create_output_dir() for file_path in tqdm(self.image_paths):
file_path = os.path.abspath(file_path)
basename = os.path.basename(file_path)
output_image = os.path.join(self.output_path,
basename) self.render_image(file_path, style_filter_path)
time.sleep(self.waiting_time)
self.download_image(output_image)
self.output_images.append(output_image)
self.driver.close()
def create_video(self, output_video):
images = [imageio.imread(image) for image in self.output_images]
imageio.mimsave(output_video, images, fps=10)
from gaugan2_renderer import Gaugan2Rendererrenderer = Gaugan2Renderer(waiting_time=10)
renderer.run("./input_folder", "path/to/styleframe/styleframe.png", "./output_folder")
#renderer.create_video("./output.mp4")
A quick test involved using shapes that didn’t align with their designated "colors" (object/material types, such as stone). I noticed that all objects and materials on the segmentation map seemed interconnected. For example, if a small patch of snow was placed in the foreground, trees in the background would also appear snow-covered, even if the segmentation map didn’t explicitly include snow in those areas. Same with fog in the examples below.
Over time, I kind of figured out what works and what doesn’t, discovered a visual aesthetic, and developed a visual narrative and perception I was excited to explore more deeply.
However, a major issue persisted. As I talked about before, every element of the segmentation (e.g., dirt) is connected to the other elements on it (e.g., snow). But when similar elements on 2 otherwise different segmentation maps are visible, even though the elements differ in size and location, the segmentation map seems to act similar to a masking process.
This means that if the bottom half is covered in light blue, representing straw, this part — in its output — will almost always have the same look [1]. One could even say it’s the same picture. Even if the pattern is broken up by smaller dots, like stones or bushes (in the segmentation input) [2], it still remains unchanged, as it only seems to include parts if they reach a certain size threshold.
And this isn’t an isolated issue with just this combination of elements — it happens with almost anything. This could be due to several factors: insufficient variation in training data, issues with the seed (which basically adds a randomness factor to the result), or something with the script utilized for bulk processing.
Regardless, when attempting to create a moving scenery, it becomes obviously distracting — perhaps even nauseating — when some elements appear to move along while others, like the ground, seem to remain still, at least with this degree of persistence.
The learnings from this are, that the ground / overall elements need to be:
A:
So small / far away, so that the difference between frames next to each other is big enough, so that the output given has a distinguishable look when compared to each other.
B:
Can´t be too small, as from a certain treshhold on (about min. size about 15x15px of a 512x512 full resolution input map), elements are no more processed.
C:
Big Elements, such as the ground need to be constantly broken up with various DIFFERENT elements, (represented as colors in the segmentation map) in order for the camera movement to be recognized by the recipient.
In the example below, you can tell that fixing the problems mentioned above (regarding segmentation map) substantially improved the output given by GauGAN2.
Due to complexity and performance limitations, the majority of the project's content is not available on mobile.
Curious? Check out the full page on tablet or desktop devices!
↓ A small selection: