Play project
Internal Conflict in GPT-3: Agreeableness vs Truth's itch.io pageWho else worked on this with you?
Luke Ring and Aleks Baskakovs
Leave a comment
Log in with itch.io to leave a comment.
Who else worked on this with you?
Luke Ring and Aleks Baskakovs
Log in with itch.io to leave a comment.
Comments
Congratulations on the First Prize!!
General: Great setup, good introduction, good splitting into sections that clearly represent your project.
Fazl: Very good depth of prior work. The definition of truthfulness could benefit from TruthfulQA.
Alignment: Is about truthfulness, a quite important factor in alignment. Shows clear ways that pre-prompting for helpfulness (a desired trait) makes the model less truthful.
AI psychology: A multi-factor experimental design that very well represents how we might have to look at AI models in the future. Very interesting results as well. Represents naturalistic chatbot interactions (e.g. Alexa) very well.
Novelty: This has not been seen before, as far as the judges know.
Generality: Seems like a generalizable effect that have quite a few percentage point impacts on model truthfulness. Replicability: Has a Github without documentation but seems simple to run based on the experimental design. Parameters chosen are clear.
<3 Thanks for an awesome hackathon! We had a lot of fun, and it's super exciting to get first prize and also happy we came up with a novel approach!
Some documentation added to github, if there's value in it I can also comment all the code
Thank you, that's awesome to hear! And always nice with neat documentation ;)