Looks good. Any chance you could add face (camera) tracking too so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time.
On the TTS specificially, I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models? This game uses one that is pretty convincingly human sounding for what it is: https://jetro30087.itch.io/ai-companion-miku If you can get in contact with the dev, maybe he'll tell you what system he used.
Viewing post in Text to speech and Lipsync on UE5 comments
"Any chance you could add face (camera) tracking so that she looks at the player? Bonus points if you could do so with a model that incorporates glances away instead of just staring the whole time."
What you say can be achieved with BlendSpace animations, when I work on the animations module I'll see how far I can go.
"I'm sure it's for computational load and ease of programming that you're using the Microsoft built in system, but have you looked into local TTS models?"
Currently, to generate static voice I am using this one (Cortana's voice in the new update):
But as you say, the main reason is that the Microsoft system generates the voice and the voice analysis (it gives you the list of phonemes) with a very good performance, it practically does not represent a load for the video game and as it runs at the same time as the graphics of unreal engine and the LLMs, it is the reason why I still hold on to it. But I'm not closing my eyes. We'll see how much the TTs advance this year and if a similar performance can be achieved.