I kind of like the idea of not knowing exactly what is being said. Allows player to fill in the gaps/read the actions of the character on screen.
Aside from explicitly using text, like Zazuizo suggests, just emphasizing certain actions (character points to groin, female character shakes her head; next scene, reluctantly reaches down to groin; next scene, escalation, etc. Another example would be a female character like sister pointing and laughing at brother's penis, making gestures with her hand on how small it is, brother reacts becoming upset; next scene, sister comforts brother after reaching appropriate closeness score, gently strokes him while holding close; next scene escalation, etc.). You already did this somewhat in the original CCTV.
I do like Zazuizo's idea of just adding microphones of the same quality to each camera option with dialogue being normal text (or text boxes) periodically with lower-tier cameras/microphones. ones having text being cut off or censored. But I feel it isn't necessary for this, honestly.