The Mechanistic Interpretability Hackathon

Hosted by Esben Kran, Neel Nanda, Apart Research, Zaki, fbarez · #alignmentjam

Ratings

Overview Submissions Results Screenshots Submission feed

Results

15 entries were submitted between 2023-01-20 16:00:00 and 2023-01-23 03:15:00. 52 ratings were given to 15 entries (100.0%) between 2023-01-23 03:15:00 and 2023-01-25 14:00:00. The average number of ratings per game was 3.5 and the median was .

By criteriaJudge's choice ML Safety Mechanistic interpretability Novelty Generality Reproducibility

Criteria	Rank	Score*	Raw Score
Mechanistic interpretability	#1	4.571	4.571
Judge's choice	#2	n/a	n/a
Generality	#4	3.286	3.286
ML Safety	#4	3.429	3.429
Reproducibility	#4	4.143	4.143
Novelty	#8	3.000	3.000

Criteria	Rank	Score*	Raw Score
Mechanistic interpretability	#3	4.333	4.333
Reproducibility	#5	4.000	4.000
Novelty	#6	3.667	3.667
Generality	#12	2.667	2.667
ML Safety	#13	2.667	2.667

Criteria	Rank	Score*	Raw Score
Novelty	#1	4.500	4.500
Generality	#2	4.000	4.000
ML Safety	#3	3.500	3.500
Mechanistic interpretability	#7	4.000	4.000
Reproducibility	#10	3.500	3.500

Criteria	Rank	Score*	Raw Score
Reproducibility	#1	4.400	4.400
Generality	#3	3.400	3.400
ML Safety	#7	3.200	3.200
Mechanistic interpretability	#8	3.800	3.800
Novelty	#12	2.600	2.600

Criteria	Rank	Score*	Raw Score
Novelty	#5	3.674	4.500
Reproducibility	#8	3.674	4.500
Generality	#9	2.858	3.500
ML Safety	#9	2.858	3.500
Mechanistic interpretability	#10	3.674	4.500

Results

by cmathw

Ranked 1st in Mechanistic interpretability with 7 ratings (Score: 4.571)

by clementneo

Ranked 2nd in Mechanistic interpretability with 5 ratings (Score: 4.400)

by chris-lons, victorlf4

Ranked 3rd in Mechanistic interpretability with 3 ratings (Score: 4.333)

by MatthewBaggins

Ranked 3rd in Mechanistic interpretability with 3 ratings (Score: 4.333)

by Esben Kran, ElliotJDavies, h6

Ranked 5th in Mechanistic interpretability with 4 ratings (Score: 4.250)

by lomichelle42

Ranked 5th in Mechanistic interpretability with 4 ratings (Score: 4.250)

by Giles, soy.cola

Ranked 7th in Mechanistic interpretability with 4 ratings (Score: 4.000)

by StefanHex

Ranked 8th in Mechanistic interpretability with 5 ratings (Score: 3.800)

by mentaleap

Ranked 9th in Mechanistic interpretability with 4 ratings (Score: 3.750)

by jakub151

Ranked 10th in Mechanistic interpretability with 2 ratings (Score: 3.674)

by roksanagow

Ranked 11th in Mechanistic interpretability with 3 ratings (Score: 3.333)

by roksanagow

Ranked 12th in Mechanistic interpretability with 2 ratings (Score: 2.858)

by Yoann Poupart

Ranked 13th in Mechanistic interpretability with 2 ratings (Score: 2.449)

by fbarez

Ranked 14th in Mechanistic interpretability with 2 ratings (Score: 1.225)

by Al-Hitawi Mohammed

Ranked 15th in Mechanistic interpretability with 2 ratings (Score: 0.816)