It'll be a challenge for sure - but maybe not as much as you might think at first glance.
There's a lot of repetition in these tiles, and running it through Tilificator, it comes out at just 281 tiles.
So in total around 4.5kB, and < 10% of the 64kB PRG limit for the compo. :)
And that's still with quite a few fixable problems left. Some colors will be need to be moved around to make sprite overlays work - or you might want to ditch the overlays altogether for less sprite flicker, as the skin / white difference is quite subtle anyway. Some lone pixels can also be trimmed away without it being too noticeable.
But you'll definitely need a dynamic CHR streaming system if you want to keep all these animation frames. I think it's doable to get most of the animation frames around 8 tiles, and for those frames that overflow this budget you might want to fit them statically.
You might want to consider using Tilificator at least for an initial triage. It's often pretty decent at finding ways to re-use tile data in metasprites - although it can't compete with a dedicated artist's hand-optimizations :)