is it possible to use text-to-speech or just text-to-speech if I then process the audio recording data