Winner of the fourth prize!! Congratulations!
General: Based off of another project, very neat. Proposes a clean solution to a pretty serious problem. I like the next steps.
Fazl: Worth running the same prompts on different datasets from the inverse scaling challenge.
Alignment: Creates an easy solution to a clearly defined problem and might generalize well beyond this. Does not “solve” cognition for the AI but increases its alignment drastically. Prompt engineers trained by model, since there’s big shifts based on the prompt.
AI Psychology: “Let’s think step by step” works in larger models. Maybe it is a general solution for things. Maybe it is a general alignment solution to instigate system 2 thinking. Escapes biasing prompts. Very limited actual understanding. Diverges from prompt game.
Novelty: Have not seen this simple prompt before.
Generality: Yes, accepted by the Inverse Scaling Prize team as well.
Reproducibility: A code base but needs manual annotation afterwards because of code limitations. 4 extra things: Rick-rolling YouTube links, ASCII art bias, only larger models can explain jokes, moral uncertainty is person-dependent. Awesome stuff!