A jam submission

Internal Conflict in GPT-3: Agreeableness vs TruthView project page

Team name: Optimized Prime

Submitted by Zaki (@sabrinazakih) — 59 seconds before the deadline

Add to collection

Play project

Internal Conflict in GPT-3: Agreeableness vs Truth's itch.io page

Who else worked on this with you?
Luke Ring and Aleks Baskakovs

Comments

Esben KranHostSubmitted2 years ago(+1)

Congratulations on the First Prize!!

General: Great setup, good introduction, good splitting into sections that clearly represent your project.

Fazl: Very good depth of prior work. The definition of truthfulness could benefit from TruthfulQA.

Alignment: Is about truthfulness, a quite important factor in alignment. Shows clear ways that pre-prompting for helpfulness (a desired trait) makes the model less truthful.

AI psychology: A multi-factor experimental design that very well represents how we might have to look at AI models in the future. Very interesting results as well. Represents naturalistic chatbot interactions (e.g. Alexa) very well.

Novelty: This has not been seen before, as far as the judges know.

Generality: Seems like a generalizable effect that have quite a few percentage point impacts on model truthfulness. Replicability: Has a Github without documentation but seems simple to run based on the experimental design. Parameters chosen are clear.

Like Reply

zeyus2 years ago (1 edit) (+1)

<3 Thanks for an awesome hackathon! We had a lot of fun, and it's super exciting to get first prize and also happy we came up with a novel approach!

Some documentation added to github, if there's value in it I can also comment all the code

Like Reply

Esben KranHostSubmitted2 years ago

Thank you, that's awesome to hear! And always nice with neat documentation ;)

Like Reply

itch.io

The Language Model Hackathon

Internal Conflict in GPT-3: Agreeableness vs TruthView project page

Play project

Leave a comment

Comments