How much of the size is from the voice lines? If it is a significant size, perhaps doing a seperate download for each language would make sense.
I get that doubling the amount of downloads would be a bit messy... if your setup allows separate data and executables (like Godot and I believe Unity do... also ren'py if you count that) perhaps you could do multiplatform downloads (the executables in the zip would have their platform in the name), so you'd have 2 downloads (English and German, I'm assuming) instead of the current 4 (that cover both languages but are per-platform).
Also I don't know if it applies to your speech files (forgive me if it doesn't), but the Opus codec seems interesting for decent sound (particularly voice, not sure about music) at low cost.
I'm not sure how much of your art is touched up or redone/new, but some areas have that wobbly look like you used an upscale as a base (similar to how the official "remaster" did). Using in-game filtering or or going full vector would likely also be more scalable, particularly if you're not fully saturating the detail of 4K anyways (but you could still get smoothness+sharpness out of it). Yeah, probably too late now even it something like that was an option, but I've always preferred scalable/dynamic formats particularly for their value.