The Mechanistic Interpretability Hackathon

Ratings

A jam submission

One Attention Head Is All You Need for Sorting Fixed-Length ListsView project page

Submitted by MatthewBaggins — 4 hours, 7 minutes before the deadline

Criteria	Rank	Score*	Raw Score
Mechanistic interpretability	#3	4.333	4.333
Reproducibility	#5	4.000	4.000
Novelty	#6	3.667	3.667
Generality	#12	2.667	2.667
ML Safety	#13	2.667	2.667

Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback is anonymous.

This is quite an in-depth look into how a specific algorithm works in a toy model Transformer and presents great visualizations of the actual topic (though it would have been nice for them to be within the text). You show great insight into the structure and functionality of different layers and it's a generally interesting task to understand. I'd be curious to see the extension into large language models, possibly in collaboration with someone from the https://esbenkc.itch.io/tracr team!

What are the full names of your participants?
Mateusz Bagiński, Gabin Kolly

What is you and your team's career stage?
Beginner

Does anyone from your team want to work towards publishing this work later?

Maybe

Where are you participating from?

Online

No one has posted a comment yet