Play project
Interpreting Catastrophic Failure Modes in OpenAI’s Whisper's itch.io pageResults
Criteria | Rank | Score* | Raw Score |
Novelty | #1 | 3.889 | 3.889 |
ML Safety | #2 | 3.222 | 3.222 |
Generality | #4 | 3.222 | 3.222 |
Interpretability | #5 | 3.556 | 3.556 |
Reproducibility | #11 | 3.667 | 3.667 |
Ranked from 9 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.
Judge feedback
Judge feedback is anonymous.
- Cool work! I'm pleasantly surprised that the logit lens works here, and that you can remove so many encoder + decoder layers, and interesting choice of problem. And cool use of PySvelte! My guess is that this failure comes from induction heads which notice and respond to repeated patterns, so brief hiccups turn into robust repeated sequences. Looking at which heads are most key in this behaviour would feel interesting to me. Misc point - I believe that GPT-3 can also get caught repeating the same word (probs downstream of induction heads)
Where are you participating from?
London, UK
What are the names of your team member?
Edward Rees, John Hughes, Ellena Reid
What are the email addresses of all your team members?
edward.r.rees@gmail.com
Leave a comment
Log in with itch.io to leave a comment.
Comments
nice! It’s great to see steps towards interpretability of multi modal models
Link to attention score experiments with Whisper and our analysis so far when the model hallucinates (provided in report too): https://github.com/erees1/alignment-jam/blob/main/Whisper_Attention.ipynb
Very cool to see someone pulling apart such a new and interesting NN.
Here's the link to reproduce the logit lens experiments with Whisper (we didn't have time to put it in our write up): https://github.com/McHughes288/alignment-jam/blob/main/logit_lens_whisper.ipynb