My guess would be the controlnet settings. Check how your controlnet mask looks when only working with the first frame to get an idea of what may be wrong and what it's picking up. Perhaps pick a different controlnet model or just fix the settings on the one you are using.