I have an inquiry about the usage of Text To Speech (TTS) programs. I understand that using AI is prohibited, but I see a problem when we're looking at the usage of TTS programs. They are similar to sound synthesizers or programs to compose music or something in that area (pardon my lack of knowledge on what to call them) where it generally uses a thing called soundfont or something that can be regarded as a collection/library of sounds/instruments.
It is true that no human speaks/records the speech generated by the TTS programs, but the same can be said for music created using those programs. Most people would just slap the notes on the program, and convert them to a sound file and that's it. It'll be rare to find someone who plays and records the instrument themself after composing it inside the program.
"But the note used on the program is actually recorded for real by someone". Well, TTS also required the voice provider to record every important sound (alphabets, certain words combination) by themselves. It's just that they never record a full word by themselves, similar to how the music programs never record the full piece by themselves. We can regard a word on TTS as a small composition on music programs.
So, what's the ruling on this matter?
I'm asking because a robotic voice might be hard to record without playing a lot with effects, or maybe someone would like to make a full monotone voiced game.