Ya, 48 hours is tight. We ended up getting Falcon 7B using basic descriptions to then prompt Stable diffusion.
What we did was we had the gameplay set, and specific strings used to define the enemies (e.g. "A weak long ranged enemy") then used the setup to ask the LLM for
"give me the name of an enemy that would be in a world {world description} fighting against {character description} with the description {enemy description}"
That then chained into
"give me the visual description of the character {Enemy Name} in the world {world description}"
Which chained to asking stable diffusion for an image with background removed of
"A figure in the center of frame. {Enemy Visual Description}. 4K. colorful. High Quality..."
I'd already been using this kind of chaining prompts in other projects. It does require a decent amount of prompt engineering to get them just right, as a prompt early in the chain being off makes the ones later in the chain even further off.