Thank you for your fast response. I tride different ControlNet models (canny, hed, depth, normal), weight (0.3,0.5,0.9 or even 1.6), preprocessor on/off and 3 sets of guide frames with no luck. Only this happens:
What does the original video look like? it's hard to keep a consistent background unless the original background has enough detail to be picked up with ControlNet. For that reason I expect many people will just generate with a greenscreen or something then superimpose it onto a background.