There are some pretty good AI solutions for foreground/background tagging of video, perhaps a solution like that could be extended to separate objects into their own groups. Something like an Nvidia Jetson carried on the belt could be feasible, idk.