I have an idea, why not combine it with simplevoice.js, and execute this once when the voice playback of the corresponding sentence text ends
This is because text is always displayed faster than speech.