That does sound pretty good. Is there another layer you could add on top to detect the visemes? I know that conventional techniques for doing that have been around since at least 2015 since there is a plugin for adobe animator cs6 that can detect the various sounds out of audio. You might want to look in the Vtuber space for something realtime and lightweight.