Post by Soul Shell in Speech Recognizer and Video Setup comments

Speech Recognizer and Video Setup » Comments

Viewing post in Speech Recognizer and Video Setup comments

Ok, nice, I understand the point.

About TTS, I've been doing some research, the TTS I need must run locally, be written in C++, multilingual and provide me with audio analysis, i.e. the visemes.

The one I see potentially to use would be this one:

https://github.com/PABannier/bark.cpp

It would be perfect because it is the brother of llama.cpp and whisper.cpp, and it also uses the SUNO AI technology with which I made the music for the video game.

However, it doesn't have, as far as I can see, the audio analysis for the visemes :'(, I'll keep an eye on this project.

(If I could integrate this technology then MTW could run on Windows, Linux and MAC)

XenoCow6 days ago

That does sound pretty good. Is there another layer you could add on top to detect the visemes? I know that conventional techniques for doing that have been around since at least 2015 since there is a plugin for adobe animator cs6 that can detect the various sounds out of audio. You might want to look in the Vtuber space for something realtime and lightweight.

itch.io

Viewing post in Speech Recognizer and Video Setup comments