The Mechanistic Interpretability Hackathon

Hosted by Esben Kran, Neel Nanda, Apart Research, Zaki, fbarez · #alignmentjam

Ratings

Overview Submissions Results Screenshots Submission feed

Results

15 entries were submitted between 2023-01-20 16:00:00 and 2023-01-23 03:15:00. 52 ratings were given to 15 entries (100.0%) between 2023-01-23 03:15:00 and 2023-01-25 14:00:00. The average number of ratings per game was 3.5 and the median was .

By criteriaJudge's choice ML Safety Mechanistic interpretability Novelty Generality Reproducibility

Criteria	Rank	Score*	Raw Score
Novelty	#1	4.500	4.500
Generality	#2	4.000	4.000
ML Safety	#3	3.500	3.500
Mechanistic interpretability	#7	4.000	4.000
Reproducibility	#10	3.500	3.500

Criteria	Rank	Score*	Raw Score
Judge's choice	#1	n/a	n/a
Reproducibility	#1	4.400	4.400
Mechanistic interpretability	#2	4.400	4.400
Novelty	#3	4.200	4.200
Generality	#11	2.800	2.800
ML Safety	#11	2.800	2.800

Criteria	Rank	Score*	Raw Score
Mechanistic interpretability	#1	4.571	4.571
Judge's choice	#2	n/a	n/a
Generality	#4	3.286	3.286
ML Safety	#4	3.429	3.429
Reproducibility	#4	4.143	4.143
Novelty	#8	3.000	3.000

Criteria	Rank	Score*	Raw Score
Judge's choice	#3	n/a	n/a
Reproducibility	#3	4.250	4.250
Mechanistic interpretability	#5	4.250	4.250
Generality	#5	3.000	3.000
Novelty	#8	3.000	3.000
ML Safety	#12	2.750	2.750

Criteria	Rank	Score*	Raw Score
ML Safety	#5	3.333	3.333
Generality	#5	3.000	3.000
Novelty	#11	2.667	2.667
Mechanistic interpretability	#11	3.333	3.333
Reproducibility	#14	2.000	2.000

Results

by Giles, soy.cola

Ranked 1st in Novelty with 4 ratings (Score: 4.500)

by Esben Kran, ElliotJDavies, h6

Ranked 1st in Novelty with 4 ratings (Score: 4.500)

by clementneo

Ranked 3rd in Novelty with 5 ratings (Score: 4.200)

by mentaleap

Ranked 4th in Novelty with 4 ratings (Score: 3.750)

by jakub151

Ranked 5th in Novelty with 2 ratings (Score: 3.674)

by MatthewBaggins

Ranked 6th in Novelty with 3 ratings (Score: 3.667)

by chris-lons, victorlf4

Ranked 7th in Novelty with 3 ratings (Score: 3.333)

by cmathw

Ranked 8th in Novelty with 7 ratings (Score: 3.000)

by lomichelle42

Ranked 8th in Novelty with 4 ratings (Score: 3.000)

by roksanagow

Ranked 10th in Novelty with 2 ratings (Score: 2.858)

by roksanagow

Ranked 11th in Novelty with 3 ratings (Score: 2.667)

by StefanHex

Ranked 12th in Novelty with 5 ratings (Score: 2.600)

by Yoann Poupart

Ranked 13th in Novelty with 2 ratings (Score: 2.449)

by fbarez

Ranked 14th in Novelty with 2 ratings (Score: 2.041)

by Al-Hitawi Mohammed

Ranked 15th in Novelty with 2 ratings (Score: 0.816)