The Mechanistic Interpretability Hackathon

Hosted by Esben Kran, Neel Nanda, Apart Research, Zaki, fbarez · #alignmentjam

Ratings

Overview Submissions Results Screenshots Submission feed

Results

15 entries were submitted between 2023-01-20 16:00:00 and 2023-01-23 03:15:00. 52 ratings were given to 15 entries (100.0%) between 2023-01-23 03:15:00 and 2023-01-25 14:00:00. The average number of ratings per game was 3.5 and the median was .

By criteriaJudge's choice ML Safety Mechanistic interpretability Novelty Generality Reproducibility

Criteria	Rank	Score*	Raw Score
Judge's choice	#1	n/a	n/a
Reproducibility	#1	4.400	4.400
Mechanistic interpretability	#2	4.400	4.400
Novelty	#3	4.200	4.200
Generality	#11	2.800	2.800
ML Safety	#11	2.800	2.800

Criteria	Rank	Score*	Raw Score
Mechanistic interpretability	#3	4.333	4.333
Reproducibility	#5	4.000	4.000
Novelty	#6	3.667	3.667
Generality	#12	2.667	2.667
ML Safety	#13	2.667	2.667

Criteria	Rank	Score*	Raw Score
Novelty	#5	3.674	4.500
Reproducibility	#8	3.674	4.500
Generality	#9	2.858	3.500
ML Safety	#9	2.858	3.500
Mechanistic interpretability	#10	3.674	4.500

Criteria	Rank	Score*	Raw Score
Novelty	#1	4.500	4.500
Generality	#2	4.000	4.000
ML Safety	#3	3.500	3.500
Mechanistic interpretability	#7	4.000	4.000
Reproducibility	#10	3.500	3.500

Criteria	Rank	Score*	Raw Score
ML Safety	#9	2.858	3.500
Reproducibility	#11	2.858	3.500
Generality	#13	2.449	3.000
Mechanistic interpretability	#13	2.449	3.000
Novelty	#13	2.449	3.000

Results

by clementneo

Ranked 1st in Reproducibility with 5 ratings (Score: 4.400)

by StefanHex

Ranked 1st in Reproducibility with 5 ratings (Score: 4.400)

by lomichelle42

Ranked 3rd in Reproducibility with 4 ratings (Score: 4.250)

by cmathw

Ranked 4th in Reproducibility with 7 ratings (Score: 4.143)

by Esben Kran, ElliotJDavies, h6

Ranked 5th in Reproducibility with 4 ratings (Score: 4.000)

by MatthewBaggins

Ranked 5th in Reproducibility with 3 ratings (Score: 4.000)

by mentaleap

Ranked 5th in Reproducibility with 4 ratings (Score: 4.000)

by jakub151

Ranked 8th in Reproducibility with 2 ratings (Score: 3.674)

by chris-lons, victorlf4

Ranked 9th in Reproducibility with 3 ratings (Score: 3.667)

by Giles, soy.cola

Ranked 10th in Reproducibility with 4 ratings (Score: 3.500)

by Yoann Poupart

Ranked 11th in Reproducibility with 2 ratings (Score: 2.858)

by Al-Hitawi Mohammed

Ranked 12th in Reproducibility with 2 ratings (Score: 2.449)

by fbarez

Ranked 13th in Reproducibility with 2 ratings (Score: 2.041)

by roksanagow

Ranked 14th in Reproducibility with 3 ratings (Score: 2.000)

by roksanagow

Ranked 15th in Reproducibility with 2 ratings (Score: 1.633)