AI Testing Hackathon

Hosted by Esben Kran, Apart Research, Zaki, fbarez, haydn belfield · #alignmentjam

32

Ratings

Overview Submissions Results

Screenshots Submission feed

A jam submission

Model Hubris: On the Presumptuousness of Large Language ModelsView project page

Probing the boundary between logical inference, common sense, and nonsense reasoning

Submitted by Giles — 15 minutes, 18 seconds before the deadline

Add to collection

Play project

Model Hubris: On the Presumptuousness of Large Language Models's itch.io page

Results

Criteria	Rank	Score*	Raw Score
Reproducibility	#4	2.887	5.000
Generality	#6	2.309	4.000
Novelty	#7	2.309	4.000
Benchmark	#7	2.309	4.000
Safety	#8	2.887	5.000

Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback

Judge feedback is anonymous.

This is a very interesting project and investigates some cool failure cases. Building further on this would be to make the tasks into benchmarks that we can use as a test for models.

Where did you participate?
Online

What are the full names of the participants?
Giles Edkins, Anna Swanson

What is your team name?
The Presumers

Leave a comment

Log in with itch.io to leave a comment.

Comments

No one has posted a comment yet