Esben Kran

Alignment: Is about truthfulness, a quite important factor in alignment. Shows clear ways that pre-prompting for helpfulness (a desired trait) makes the model less truthful.

AI psychology: A multi-factor experimental design that very well represents how we might have to look at AI models in the future. Very interesting results as well. Represents naturalistic chatbot interactions (e.g. Alexa) very well.

Novelty: This has not been seen before, as far as the judges know.

Generality: Seems like a generalizable effect that have quite a few percentage point impacts on model truthfulness. Replicability: Has a Github without documentation but seems simple to run based on the experimental design. Parameters chosen are clear.

Reasoning with Chain of Thought jam comments · Posted in Reasoning with Chain of Thought jam comments

Esben Kran2 years ago

General: Nice replication of the original paper. Shows great prompting effects. Runs interesting experiments that are well-defined. Nicely expands on the original paper.

Fazl : I like the notebook and it allows us to re-run the experiment.

It would be nice to have an intro at the beginning.

Alignment: Not too much.

AI Psychology: Shows interesting mathematical prompting modulations.

Novelty: These experiments seem like they would be done before since they’re about pre-prompting for multiple steps. The specific pre-prompting seems quite alright.

Generality: Not necessarily super general beyond the specific dataset but the principle of pre-prompt engineering is well represented as a general effector on the output. Replicability: The report is a literal ipynb which is nice. We also expect to replicate it because it replicates from another paper and see good results.

all-trees-are-fish jam comments · Posted in all-trees-are-fish jam comments

Esben Kran2 years ago

Winner of the third prize! Congratulations!!

General: Really nice experiment that represents and experiments the concept well. Missing some sort of graph, though the results were there as a table (sort of a graph). The data goes deep as well and invites a lot of further investigation!

Alignment: Represents a basic language model error very in-depth.

AI Psychology: Replicates an interesting symbolic re-definition principle. Has an interesting method for evaluating

Novelty: Not seen before in this format. Syllogisms have been seen before and the symbolic redefinition problem has been seen before. But describing these in a new combination, i.e. with nonsense vs. original syllogisms.

Generality: Many datasets tested with different question formats and prompts. Very nice. Can imagine the further generality from the results. The syllogisms were mostly in one form, though. Replicability: Code, data, and results laid out in very neat form.

AI: My Partner in Crime jam comments · Posted in AI: My Partner in Crime jam comments

Esben Kran2 years ago

Congratulations! Winner of the second prize!!

General: Includes an introduction that sets the stage quite nicely. Red teaming just works. Standardized experimental conditions.

Fazl: Very nice - I'd love to see how this develops further.

Alignment: Crazy outputs and really good descriptions of how these outputs happen. Shows general tendencies that inform when and how LLMs become dangerous. Also makes “the AI dangerous” as a good example. A generalized analysis of many AI safety-critical capability holes. Shows holes in the openai API flagging system.

AI Psychology: Good use of empirical psychology to encode a bunch of properties of responses.

Novelty: The principle is described here as well: https://arxiv.org/abs/2209.07858. But it’s manual coding on a bunch of interesting factors on top of that so still very novel.

Generality: It covers a lot of different prompt types, though we cannot confirm systematically which specific types are generalizable. The qualitative descriptions are very good. Replicability: The data is just directly available, very nice. Not super quantitatively developed but it seems like a replicable principle to base questions off of.

Hello Language model, please agree with me jam comments · Posted in Hello Language model, please agree with me jam comments

Esben Kran2 years ago

General: Very simple design with clear outputs. Like the 2x2x2 factorial design. Clearly explained graphs.

Alignment: Tells us that the framing of questions has a large effect on the model’s opinion. Often, leading questions are part of our normal interaction and GPT-3 is clearly biased.

AI Psychology: Showcases a clear cognitive psychology experiment that is newly implemented in GPT-3. A very nice application of the theme of the jam.

Novelty: I have not seen this specific experiment before, though I bet it does not necessarily surprise anyone too much that this is the case.

Generality: The dataset seems to pretty well represent the different cases by way of verb-noun combinations. Reproducibility: Very clear instructions on the Github as to how to replicate the experiment! I expect it to replicate given the generality.

Simulating an alien jam comments · Posted in Simulating an alien jam comments

Esben Kran2 years ago

General: Based off of another paper, very nice. Interesting and novel application.

Alignment: Probably nothing major.

AI Psychology: Interesting to relate the human and AI answers to each other. Very AI Psychology-like project.

Novelty: Really nice approach to represent the alien-AI correlation.

Generality: It can probably answer many similar questions and it is an approach that can be used generally. Reproducibility: We can reproduce it but the experiments are not described.

“Let's think step by step” reduces hindsight-neglect jam comments · Posted in “Let's think step by step” reduces hindsight-neglect jam comments

Esben Kran2 years ago (1 edit)

Winner of the fourth prize!! Congratulations!

General: Based off of another project, very neat. Proposes a clean solution to a pretty serious problem. I like the next steps.

Fazl: Worth running the same prompts on different datasets from the inverse scaling challenge.

Alignment: Creates an easy solution to a clearly defined problem and might generalize well beyond this. Does not “solve” cognition for the AI but increases its alignment drastically. Prompt engineers trained by model, since there’s big shifts based on the prompt.

AI Psychology: “Let’s think step by step” works in larger models. Maybe it is a general solution for things. Maybe it is a general alignment solution to instigate system 2 thinking. Escapes biasing prompts. Very limited actual understanding. Diverges from prompt game.

Novelty: Have not seen this simple prompt before.

Generality: Yes, accepted by the Inverse Scaling Prize team as well.

Reproducibility: A code base but needs manual annotation afterwards because of code limitations. 4 extra things: Rick-rolling YouTube links, ASCII art bias, only larger models can explain jokes, moral uncertainty is person-dependent. Awesome stuff!

The Language Model Hackathon community · Created a new topic WINNERS & certificates

Esben Kran2 years ago

The winners are!!

Agreeableness vs. truthfulness - Team Optimize Prime
AI - My Partner in Crime - Team Partner in Crime
All Trees are Fish - Lucas Sato
"Let's Think Step by Step" reduces hindsight bias - Team VVVV

You can input your name into the certificates and post on itch.io, on Github, or on LinkedIn and every social media out there: https://docs.google.com/presentation/d/1RhV_VXTbHdlikhySF9sWuYSleolAztE_YX2dzJlF...

(we of course expect you to only put your name to the correct certificate but that goes without saying ;))

all-trees-are-fish jam comments · Replied to satojk in all-trees-are-fish jam comments

Esben Kran2 years ago

Haha, I bet that's alright! Here's some I generated with MidJourney for your project.

The Language Model Hackathon community · Posted in 📈 Extra cool sources

Esben Kran2 years ago

This is some funky stories that were AI generated if you'd like some inspiration for weird prompts as well: https://docs.google.com/document/d/1JDRiTy9MyJQWXJW9wm6-2gOha9a0dPZVNk2lTLe3OVM/...

The Language Model Hackathon community · Posted in 📈 Extra cool sources

Esben Kran2 years ago

Check out related papers to the Red Teaming paper: https://www.connectedpapers.com/main/592c55198a72862f81e3d26a8ead8fefa9f43d15/Re..

The Language Model Hackathon community · Posted in 📈 Extra cool sources

Esben Kran2 years ago

You can also use Outh's Elicit engine for finding some interesting papers!

https://elicit.org/search?q=How+do+language+models+process+language+differently+...

The Language Model Hackathon community · Posted in 📈 Extra cool sources

Esben Kran2 years ago

This is one of the papers Ian was talking about! It's about how models regurgitate training data and memorize data in different ways based on scale.

https://arxiv.org/pdf/2202.07646.pdf

The Language Model Hackathon community · Posted in 📈 Extra cool sources

Esben Kran2 years ago

This paper is a super nice representation of which types of antagonistic prompts work well to make the model give bad responses.

https://arxiv.org/pdf/2209.07858.pdf

The Language Model Hackathon community · Created a new topic 📈 Extra cool sources

Esben Kran2 years ago

This is for any cool papers and interesting things you find relevant to the competition! <3

The Language Model Hackathon community · Posted in 🤔 Questions and answers

Esben Kran2 years ago

"Hey Esben, how do we upload our projects? <3 "

The Language Model Hackathon community · Posted in 🥰 Updates from the teams!

Esben Kran2 years ago

Would love to see how you're working as well!

The Language Model Hackathon community · Posted in 🥰 Updates from the teams!

Esben Kran2 years ago

Here in the physical lair, it's now 9PM and we have two teams working together - one of 3 people and one of 5 people. They've been mostly experimenting with the Playground and getting a feel for the GPT-3 models -- a very good strategy! Go on a very weird date with your model to get to know it better ;))

The Language Model Hackathon community · Created a new topic 🥰 Updates from the teams!

Esben Kran2 years ago

Post any updates you'd like to share with the other groups! Fun perspectives, interesting outputs, memes, and much more.

The Language Model Hackathon community · Created a new topic 📃 Information for the jam

Esben Kran2 years ago

Greetings, all you wonderful AI safety hackers

We’re kicking off the hackathon in ~3 hours so here is the information you need to join!

Everyone working online will join the GatherTown room. The space is already open and you’re more than welcome to join and socialize with the other participants an hour before the event starts (5PM CET / 8AM PST).

We’ll start at 6PM CET with an hour for introduction to the event, a talk by Ian McKenzie on the Inverse Scaling Prize, and group forming. You’re welcome to check out the resource docs before arriving.

We expect to be around 30-35 people in total and we look forward to seeing you!

Introduction slides: Language Model Hackathon

The Language Model Hackathon community · Created a new topic 🤔 Questions and answers

Esben Kran2 years ago

Write any smaller questions you might have about the alignment jam here and we'll try our best to answer them!

Loop to Learn comments · Replied to iwanPlays in Loop to Learn comments

Esben Kran3 years ago

Thanks for pointing it out! I haven't had time to update it since the original game release but it works in the editor - I believe it's some problems regarding the neural networks :))

EpicShop comments · Replied to Imbaspree in EpicShop comments

Esben Kran6 years ago

Thanks so much for the feedback - I'm happy you and your friends like the game and the changes we've made!

We'll definitely look at your points and implement updates as necessary :-)

BLINDED BY THE LIGHT comments · Posted in BLINDED BY THE LIGHT comments

Esben Kran6 years ago

This is awesome! Really cool and unique level design elements and concepts and a nice aesthetic (like the itch page background synchronization, too!)

Color Smasher jam comments · Replied to Jupiter_Hadley in Color Smasher jam comments

Esben Kran6 years ago

Thank you :)

Shooting Star jam comments · Replied to LostPenguin in Shooting Star jam comments

Esben Kran6 years ago

It would be quite fun to continue developing and maybe I'll do it later

Color Smasher jam comments · Replied to dbr in Color Smasher jam comments

Esben Kran6 years ago

Thanks! Interesting proposal - didn't think about that :)

CrossColor jam comments · Posted in CrossColor jam comments

Esben Kran6 years ago

Fun interplay between the level design and mechanics! You might want to have a look at some blocks that can't be phased through by any color, though. I could skip quite an amount of the levels through clever usage of the collision detection ;)