I'm still finding VQGAN+CLIP with the ImageNet model a really interesting tool for image synthesis. You can define an initial image, that is then iterated upon to match the prompt, for example, here's a picture of Adastral Park, iterated using the prompt "Cyberpunk".
Or how about "fairy-tale castle"?
With a light touch you can change an image into another style, with a heavier touch you can create entirely new scenes that retain a reminisence of the original.
I can see this being highly applicable to games, but I also wonder if there's a potential opportunity in data visualisation. With the right weighting of prompts and using a combination of text and image, you could create consistent transformations that change how a user experiences data.