It's all just python and the a1111 SD webui api, but there is a lot more than just prompts. Basically the prompts are generated from the requirements/state of the image (IE, add "(pregnant)" for pregnant bodies, race descriptions for races, etc.), there are a ton of different LoRAs thrown in as well. Faces are isolated using an implementation of CLIPSeg to create masks that guarantee their consistency across images and make the portraits, as well as used to eliminate as many errors as I can automatically detect (faces too big/body too small, too many faces aka multiple characters, maybe some others can't remember off the top of my head). Controlnet is used extensively to keep the bodies/poses broadly the same across images, and to as loosely as possible track a series of hand generated templates that are pretty much there to try to minimize the number of misplaced tails.
That's the broad strokes at least. I can go into more detail if there is interest.
I am very interested, i used to find and crop images in the strive discord and finding full sets of characters all nude, clothed, and pregnant was a nightmare and i never managed to get more than 3-4 done and what your doing seems a lot better considering you can make the same character in all positions/clothing/nonclothing.
if you could link/help me me to the things you use i would very much appreciate it.
I might make a full writeup in the future, but I'm just going to summarize manual steps for now. Like I mentioned, I used the API and I wrote a fair amount of python to get the whole thing automated. Plus actually generating this number of images took a fair amount of computing resources. I should (and probably will at some point) share the source, but honestly it's pretty hacky (it's a script and I wasn't really planning to post it anywhere) and I doubt it would realistically run for anyone else; for now it seems like you might be more interested in just generating your own individual images by hand anyway, so I'll give general steps that should point you in the right direction for that:
General setup is you need the a1111 stable diffusion webui, and you need the controlnet extension. You will want for controlnet the canny model and the depth model.
1. Generate a nude base body. You could experiment with generating the clothed body first; I didn't because I was worried that it would be more likely to place accessories, and non-formfitting clothes might cause weird body shapes later on. In any case, using prompts like "full body" and "standing" are probably useful to make sure it looks more like a portrait, as well as things like "simple background"/"grey background" to make it stand out less. Also, you probably want these in 512x768 resolution. If you are generating just a few by hand you could upscale them as well say to 1024x1536; this should take at least ~4x the time, so it was absolutely prohibitive for me.
2. Mask out the head. You can obviously just do this by hand, or you could probably find an extension that uses clipseg or fastsam or whatever to do it automatically to be closer to what I did. You probably wouldn't save time overall doing it automatically tbh unless you do a ton of images but it's up to you. This mask might be better made in gimp/photoshop/krita or whatever on your computer so that you can make sure that you don't lose it because you need it for a few steps. Or else you could just remake it if you lose which wouldn't be a huge deal.
3. Fix the face by inpainting with full resolution and using prompts that are more face specific. This isn't always necessary but it makes a lot of the faces a little less weird looking.
4. Make sure you save your face-fixed image, since you will need it in original form and to use for a few other tasks. Now you can make the portrait. Take your image and crop it to just the size of the face, trying to keep it somewhat square. Upload that to img2img and set your resolution to 512x512. Use prompts that are specific to the face, like maybe you used "pointy ears" for your elf on the full body, make sure that you use it again here. Also hairstyles, colors, etc. Try not to include any prompts that would apply only to the body here because if you ask for a tail it might give you a tail and it probably isn't what you wanted. Enable controlnet; I used canny for the portrait only, you could try just using the depth model here as well. Either way, it will help keep the face looking similar to how it was on the full body image.
5. For the clothes, go to inpaint and make sure the face is masked out. Again, I saved and used the same face mask but you could remake it if you need. Now inpaint unmasked (you want the face to stay the same so it is identifiable), and add clothing prompts to the prompt. Also, a little like the portrait try to make sure that any characteristics you need are still there IE pointy ears for elves, tails for beastkin/halfkin, etc. Enable controlnet and use depth. I use the depth_zoe module for creating the maps. You will probably need to mess with the weights or the control end step to make sure that there is still some room to actually add the clothes and deviate a bit from the base. Set the fill type to original image, and you can set the denoising strength quite high, like 0.9+. Also, make sure that you use the same seed for all the body portraits when you are doing the inpainting. It will help with the consistency.
6. Do the same as step 5 essentially for any pregnancy etc. You might need to play around with prompts to get the style and such that you want.
Also, I used Counterfeit v4 for all the images, so if you can't choose a model you could start there. I used a bunch of LoRAs as well, they are worth looking into if you want a certain effect/clothing/pose/etc.
Hey ! Following your advice I got into stable diffusion. Would like to know which model , extensions and LORAs you used for the process.
The issue I am facing is with the fact , that I am not able to get full body , if that happens , the image comes distorted.
If you can please paste an exemplar prompt. It would help a lot.
Thanks.
I'm sorry I just missed this comment completely, I've given a slightly better answer on another response, but if you are getting not full bodies make sure that you have the resolution set to something that makes sense for a portrait (IE: 512x768). Loras etc. are a lot, but mostly specific to different bodies and so forth. You can see a prompt below, but I don't know if all the loras I used are good or right so it is probably better to experiment (and some I used just because I needed consistency and couldn't hand pick results that I wanted, better to use less when you can).
Hey ! Following your advice I got into stable diffusion. Would like to know which model , extensions and LORAs you used for the process.
The issue I am facing is with the fact , that I am not able to get full body , if that happens , the image comes distorted.
If you can please paste an exemplar prompt. It would help a lot.
Thanks.