Skip to main content
itch.io
Browse Games
Game Jams
Upload Game
Developer Logs
Community
Log in
Register
Indie game store
Free games
Fun games
Horror games
Game development
Assets
Comics
Sales
Bundles
Jobs
Tags
Game Engines
The Mechanistic Interpretability Hackathon
Hosted by
Esben Kran
,
Neel Nanda
,
Apart Research
,
Zaki
,
fbarez
·
#alignmentjam
15
Entries
52
Ratings
Overview
Submissions
Results
Screenshots
Submission feed
Filter Submissions
Filter Results
TraCR-Supported Mechanistic Interpretability
Esben Kran
Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small
cmathw
One Attention Head Is All You Need for Sorting Fixed-Length Lists
MatthewBaggins
We Discovered An Neuron
clementneo
Soft Prompts are a Convex Set
mentaleap
Interactive Layerscope
chris-lons
Trafo Mech Int on the web!
StefanHex
Attention Phrenology: A spatial classification of attention heads
Giles
The Start of Investigating a 1-Layer SoLU Model
jakub151
$B$ Confident Bro: Discovering Latent Knowledge In Language Models Without Supervision
fbarez
Iterative summarization interpretability
Yoann Poupart
Distillation by duplication: The importance of layer selection
roksanagow
Automated Identification of Potential Feature Neurons
lomichelle42
In search of linguistic concepts: investigating BERT's context vectors
roksanagow
Investigating Agent Behavior In different RL methods
Al-Hitawi Mohammed