Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
Tags

Esben Kran

34
Posts
5
Topics
16
Followers
7
Following
A member registered Apr 10, 2016 · View creator page →

Creator of

Recent community posts

Great project! Thank you for the feedback as well <3

Fascinating, I've not seen Conmy's automatic circuit discovery tool before https://arthurconmy.github.io/automatic_circuit_discovery/

And you can imagine I searched around quite a bit for exactly that!

10/10 for the title haha

😍 Haha I love it! Would enjoy more juice, keyboard commands for e.g. rotation, a few more hints / tutorial, and some UI for player turns etc.

The group limits are mostly symbolic so it is in fact federally legal! We'll accept it :)

Thank you, that's awesome to hear! And always nice with neat documentation ;)

Congratulations on the First Prize!!

General: Great setup, good introduction, good splitting into sections that clearly represent your project. 

Fazl: Very good depth of prior work. The definition of truthfulness could benefit from TruthfulQA. 

Alignment: Is about truthfulness, a quite important factor in alignment. Shows clear ways that pre-prompting for helpfulness (a desired trait) makes the model less truthful.

AI psychology: A multi-factor experimental design that very well represents how we might have to look at AI models in the future. Very interesting results as well. Represents naturalistic chatbot interactions (e.g. Alexa) very well.

Novelty: This has not been seen before, as far as the judges know.

Generality: Seems like a generalizable effect that have quite a few percentage point impacts on model truthfulness.  Replicability: Has a Github without documentation but seems simple to run based on the experimental design. Parameters chosen are clear.

General: Nice replication of the original paper. Shows great prompting effects. Runs interesting experiments that are well-defined. Nicely expands on the original paper.

Fazl : I like the notebook and it allows us to re-run the experiment. 

It would be nice to have an intro at the beginning.

Alignment: Not too much.

AI Psychology: Shows interesting mathematical prompting modulations.

Novelty: These experiments seem like they would be done before since they’re about pre-prompting for multiple steps. The specific pre-prompting seems quite alright.

Generality: Not necessarily super general beyond the specific dataset but the principle of pre-prompt engineering is well represented as a general effector on the output. Replicability: The report is a literal ipynb which is nice. We also expect to replicate it because it replicates from another paper and see good results.

Winner of the third prize! Congratulations!!

General: Really nice experiment that represents and experiments the concept well. Missing some sort of graph, though the results were there as a table (sort of a graph). The data goes deep as well and invites a lot of further investigation!

Alignment: Represents a basic language model error very in-depth.

AI Psychology: Replicates an interesting symbolic re-definition principle. Has an interesting method for evaluating 

Novelty: Not seen before in this format. Syllogisms have been seen before and the symbolic redefinition problem has been seen before. But describing these in a new combination, i.e. with nonsense vs. original syllogisms.

Generality: Many datasets tested with different question formats and prompts. Very nice. Can imagine the further generality from the results. The syllogisms were mostly in one form, though. Replicability: Code, data, and results laid out in very neat form.

Congratulations! Winner of the second prize!!

General: Includes an introduction that sets the stage quite nicely. Red teaming just works. Standardized experimental conditions. 

Fazl: Very nice - I'd love to see how this develops further. 

Alignment: Crazy outputs and really good descriptions of how these outputs happen. Shows general tendencies that inform when and how LLMs become dangerous. Also makes “the AI dangerous” as a good example. A generalized analysis of many AI safety-critical capability holes. Shows holes in the openai API flagging system.

AI Psychology: Good use of empirical psychology to encode a bunch of properties of responses.

Novelty: The principle is described here as well: https://arxiv.org/abs/2209.07858. But it’s manual coding on a bunch of interesting factors on top of that so still very novel.

Generality: It covers a lot of different prompt types, though we cannot confirm systematically which specific types are generalizable. The qualitative descriptions are very good. Replicability: The data is just directly available, very nice. Not super quantitatively developed but it seems like a replicable principle to base questions off of.

General: Very simple design with clear outputs. Like the 2x2x2 factorial design. Clearly explained graphs.

Alignment: Tells us that the framing of questions has a large effect on the model’s opinion. Often, leading questions are part of our normal interaction and GPT-3 is clearly biased.

AI Psychology: Showcases a clear cognitive psychology experiment that is newly implemented in GPT-3. A very nice application of the theme of the jam.

Novelty: I have not seen this specific experiment before, though I bet it does not necessarily surprise anyone too much that this is the case.

Generality: The dataset seems to pretty well represent the different cases by way of verb-noun combinations. Reproducibility: Very clear instructions on the Github as to how to replicate the experiment! I expect it to replicate given the generality.

General: Based off of another paper, very nice. Interesting and novel application.

Alignment: Probably nothing major. 

AI Psychology: Interesting to relate the human and AI answers to each other. Very AI Psychology-like project.

Novelty: Really nice approach to represent the alien-AI correlation.

Generality: It can probably answer many similar questions and it is an approach that can be used generally. Reproducibility: We can reproduce it but the experiments  are not described.

(1 edit)

Winner of the fourth prize!!  Congratulations!

General: Based off of another project, very neat. Proposes a clean solution to a pretty serious problem. I like the next steps.

Fazl: Worth running the same prompts on different datasets from the inverse scaling challenge. 

Alignment: Creates an easy solution to a clearly defined problem and might generalize well beyond this. Does not “solve” cognition for the AI but increases its alignment drastically. Prompt engineers trained by model, since there’s big shifts based on the prompt.

AI Psychology: “Let’s think step by step” works in larger models. Maybe it is a general solution for things. Maybe it is a general alignment solution to instigate system 2 thinking. Escapes biasing prompts. Very limited actual understanding. Diverges from prompt game. 

Novelty: Have not seen this simple prompt before.

Generality: Yes, accepted by the Inverse Scaling Prize team as well.

Reproducibility: A code base but needs manual annotation afterwards because of code limitations. 4 extra things: Rick-rolling YouTube links, ASCII art bias, only larger models can explain jokes, moral uncertainty is person-dependent. Awesome stuff!

The winners are!!

  1. Agreeableness vs. truthfulness - Team Optimize Prime
  2. AI - My Partner in Crime - Team Partner in Crime
  3. All Trees are Fish - Lucas Sato
  4. "Let's Think Step by Step" reduces hindsight bias - Team VVVV

You can input your name into the certificates and post on itch.io, on Github, or on LinkedIn and every social media out there: https://docs.google.com/presentation/d/1RhV_VXTbHdlikhySF9sWuYSleolAztE_YX2dzJlF...

(we of course expect you to only put your name to the correct certificate but that goes without saying ;))

Haha, I bet that's alright! Here's some I generated with MidJourney for your project.

This is some funky stories that were AI generated if you'd like some inspiration for weird prompts as well: https://docs.google.com/document/d/1JDRiTy9MyJQWXJW9wm6-2gOha9a0dPZVNk2lTLe3OVM/...

Check out related papers to the Red Teaming paper: https://www.connectedpapers.com/main/592c55198a72862f81e3d26a8ead8fefa9f43d15/Re..

You can also use Outh's Elicit engine for finding some interesting papers!

https://elicit.org/search?q=How+do+language+models+process+language+differently+...

This is one of the papers Ian was talking about! It's about how models regurgitate training data and memorize data in different ways based on scale.

https://arxiv.org/pdf/2202.07646.pdf

This paper is a super nice representation of which types of antagonistic prompts work well to make the model give bad responses.

  https://arxiv.org/pdf/2209.07858.pdf

This is for any cool papers and interesting things you find relevant to the competition! <3

"Hey Esben, how do we upload our projects? <3 "

Would love to see how you're working as well!


Here in the physical lair, it's now 9PM and we have two teams working together - one of 3 people and one of 5 people. They've been mostly experimenting with the Playground and getting a feel for the GPT-3 models -- a very good strategy! Go on a very weird date with your model to get to know it better ;))

Post any updates you'd like to share with the other groups! Fun perspectives, interesting outputs, memes, and much more.

Greetings, all you wonderful AI safety hackers 

We’re kicking off the hackathon in ~3 hours so here is the information you need to join!

Everyone working online will join the GatherTown room. The space is already open and you’re more than welcome to join and socialize with the other participants an hour before the event starts (5PM CET / 8AM PST).

We’ll start at 6PM CET with an hour for introduction to the event, a talk by Ian McKenzie on the Inverse Scaling Prize, and group forming. You’re welcome to check out the resource docs before arriving.

We expect to be around 30-35 people in total and we look forward to seeing you!

Introduction slides: Language Model Hackathon

Write any smaller questions you might have about the alignment jam here and we'll try our best to answer them!

Thanks for pointing it out! I haven't had time to update it since the original game release but it works in the editor - I believe it's some problems regarding the neural networks :))

Thanks so much for the feedback - I'm happy you and your friends like the game and the changes we've made!

We'll definitely look at your points and implement updates as necessary :-)

This is awesome! Really cool and unique level design elements and concepts and a nice aesthetic (like the itch page background synchronization, too!)

Thank you :)

It would be quite fun to continue developing and maybe I'll do it later 

Thanks! Interesting proposal - didn't think about that :)

Fun interplay between the level design and mechanics! You might want to have a look at some blocks that can't be phased through by any color, though. I could skip quite an amount of the levels through clever usage of the collision detection ;)