A jam submission

AI: My Partner in CrimeView project page

In principle, we wanted GPT-3's text-Davinci-002 to become a “partner in crime” given specific prompts.

Submitted by Cudon — 20 minutes, 50 seconds before the deadline

Add to collection

Play project

AI: My Partner in Crime's itch.io page

Who else worked on this with you?
COCO, Nina, Sam, Ingrid

Comments

Esben KranHostSubmitted2 years ago

Congratulations! Winner of the second prize!!

General: Includes an introduction that sets the stage quite nicely. Red teaming just works. Standardized experimental conditions.

Fazl: Very nice - I'd love to see how this develops further.

Alignment: Crazy outputs and really good descriptions of how these outputs happen. Shows general tendencies that inform when and how LLMs become dangerous. Also makes “the AI dangerous” as a good example. A generalized analysis of many AI safety-critical capability holes. Shows holes in the openai API flagging system.

AI Psychology: Good use of empirical psychology to encode a bunch of properties of responses.

Novelty: The principle is described here as well: https://arxiv.org/abs/2209.07858. But it’s manual coding on a bunch of interesting factors on top of that so still very novel.

Generality: It covers a lot of different prompt types, though we cannot confirm systematically which specific types are generalizable. The qualitative descriptions are very good. Replicability: The data is just directly available, very nice. Not super quantitatively developed but it seems like a replicable principle to base questions off of.

Like Reply

itch.io

The Language Model Hackathon

AI: My Partner in CrimeView project page

Play project

Leave a comment

Comments