Post by Esben Kran in 📈 Extra cool sources

Viewing post in 📈 Extra cool sources

This paper is a super nice representation of which types of antagonistic prompts work well to make the model give bad responses.