Play project
Model Hubris: On the Presumptuousness of Large Language Models's itch.io pageResults
Criteria | Rank | Score* | Raw Score |
Reproducibility | #4 | 2.887 | 5.000 |
Generality | #6 | 2.309 | 4.000 |
Novelty | #7 | 2.309 | 4.000 |
Benchmark | #7 | 2.309 | 4.000 |
Safety | #8 | 2.887 | 5.000 |
Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.
Judge feedback
Judge feedback is anonymous.
- This is a very interesting project and investigates some cool failure cases. Building further on this would be to make the tasks into benchmarks that we can use as a test for models.
Where did you participate?
Online
What are the full names of the participants?
Giles Edkins, Anna Swanson
What is your team name?
The Presumers
Leave a comment
Log in with itch.io to leave a comment.