UP | HOME

Recommendations

Table of Contents

If nothing else checkout the following:

1. General Reading Recommendations πŸ”—

2. General Methodology Skillup πŸ”—

2.1. Thinking out Loud πŸ”—

During the AISC interview people where to think out loud as they try to solve alignment (see AISC 2025 Interview Instructions). Mostly I didn't do anything. You might have been suprised at how well you where doing. I expect that in large part this was because you where thinking out loud. Somehow talking out loud me more than 50% better at thinking. I recommend you try to emulate this setup.

  • Talking out loud to yourself is really useful I think. I recommend doing this even when thinking alone. If it's hard to talk alone (it's for me), record yourself with OBS, and explain things to the camera. Upload the video to youtube in case that helps. Your brain now understands that really you are talking to the audience.
  • Write things down, while thinking. I recommend first saying things out loud, and then writing them down. Optinally first write them down using whisper (after having talked about it out loud without transcribing), and then write (again from scratch) what you want to write.
  • Use a physical whiteboard. Buy one if neccesary. The larger the better. Use multiple colors.

Of cause you can also meet with other people (or me). However likely people will not always be available. So being able to utilize language by talking out loud, even when working solo, is a good skill to have.

3. Research Directing Methodology πŸ”—

Read the entire set of instructions before starting.

Motivation: Getting good at deciding what to work on is one of the most important skills to train. It doesn't matter how good you are at doing something if it's not important to do.

I reccomend the following first go through these two things:

  • Alignment Research Field Guide - An amazing article containing a variety of useful tips on how to do research, from "You are allowed to think" to "How to structure a research session with multiple people" and more.
  • You and your research - Richard Hamming describes how many scientists fail to do good research, and how to actually do good research, partly based on him observing how John von Neumann and other prominent scientists he worked with, go about things.
  • Go through the Research Hamming Questions, setting a 12 minute timer, during which you try to answer each question.

Now you probably have a better model of what constitutes good research directions. Spend more time thinking about it if that seems valuable. Then look at the following post, which gives a lot of hints about what good reserach directions are:

While reading this post, if you realize that there where significant updates to your model about what is important, stop reading and set a timer for 5 minutes. During which again think about what is actually important to solve the problem. While engaging in this process you may also read the following material to get some ideas how to better think about the problem:

Another important thing to consider once you generated some research directions, is to ask yourself if there isn't anything else you're missing. It's all to easy to generate some good research directions, and terminate the search for research directions. But continuing the search often yields significantly better research directions. Comming up with research directions is hard and takes time. So usually once you got one you just want to stop on work on the object level.

4. Technical Skillup Recommendations πŸ”—

5. AISC 2024 Reading List πŸ”—

Some people have asked me what reading I recommend in preparation for my AISC project. The following is a short list of things. I think the materials are useful in general, almost irrespective of what agenda you intend to work on. Therefore I am sending it to everybody in my Python emailer script I made for AISC. Hope this is useful and not annoying.


Almost all (possibly all) of MIRI's research is not published by default. See here. What are your thoughts on this? IIRC they thought long and hard about whether Logical induction - MIRI is dangerous before they published it. And the effort in doing this evaluation was at least in part responsible for the move to the nondisclosed-by-default policy.


If the above gets too much, watch these videos in this rational Animations Playlist.

6. To Robert-Alexandru Stroi πŸ”—

Recommendations specifc to the interview for AISC-2025 on [2024-12-18 Mi].

Things to read:

  • Deep deception - How AI can be deceptive without being deceptive in how we intuitively think about deception.
  • Dimond Maximizer - We don't know what we want, but even if we knew what we want, and even if it was a pretty straightforward thing, alignment isn't solved.
  • Shell Games - It's easy to accedentally hide important parts of a problem.

See Thinking out Loud.

Maybe useful: Preventing self-modication is Hard. These are notes I took during the interview on the [2024-12-18 Mi]. Also possibly interesting (I already said it during the meeting): Hiding the Hard Parts.

7. To Jayson Amati πŸ”—

  • Sorting Pebles into correct heaps - Increasing intelligence doesn't increase alignment.
  • The power of Intelligence
  • AI safety mindset - It's important to try to break your own problems. One general strategy is to actually try to simulate a world e.g. in which you have a powerful optimizer, without injecting your human intuitions into the mind of the AI.
  • AGI Ruin is a list of problems in AI alignment. I recommend you also read the links to orthogonality and instrumental convergence.
  • Dimond Maximizer - We don't know what we want, but even if we knew what we want, and even if it was a pretty straightforward thing, alignment isn't solved.

8. To Patrick Liu and Amirmahdi Karami πŸ”—

I think you would benefit a lot from reading up on rationality and some important foundational concepts in alignment.

  • Sorting Pebles into correct heaps - Increasing intelligence doesn't increase alignment.
  • The power of Intelligence
  • Other videos of the rational animation chanellel (which the previous videas are from) might be good.
  • Mesa Optimization - This is an important concept in AI alignment (there is also a paper if you prefer reading). Other videos from Robert Miles AI Safety channel are also good to watch.
  • The Sequences - A series of books about how to think well. There is also a shortened version here.
  • AI safety mindset describes what kind of thinking patterns you need to solve alignment. Specifically about how to spot flaws in your own approaches.
  • Other reccomendations in this document especially AISC 2024 Reading List

9. To Atharva Nihalani πŸ”—

Thinking out Loud - During the interview you thought well. Maybe better than usual? Maybe because you talked out loud. I think so. Being able to utilize language even when working solo is a good skill to have.

Here is some reading I expect to be helpful:

Also look at

10. To PΓ‘l ViktΓ³ria πŸ”—

I expect there are many concepts in economics that are useful for thinking about AI alignment, such as markets, agents, pareto frontier, etc. However, I think it would be very high value for you to learn at least the basics of computer science. I reccomend going through "The Structure and Interpretation of Computer Programs". There are lectures and a book. I reccomend you start with the lectures. Ideally actually write code as you do so.

Another thing I expect to be very useful is if you just read up on some important background material about AI alignment. If videos are easier defnetly check out these youtube channels:

  • This rational Animations Playlist (other videos from the channel are also good).
  • Mesa Optimization - This is an important concept in AI alignment (there is also a paper if you prefer reading). Other videos from Robert Miles AI Safety channel are also good to watch.

Reading the sequences seems also very high value:

  • The Sequences - A series of books about how to think well. There is also a shortened version here.

One very important thing for you I think is to read AGI Ruin, a list of important problems in AI alignment. I recommend you also read the links to orthogonality and instrumental convergence.

See also

11. Negar Arj πŸ”—

Typst Document

Negar is awesome because she intrinsically likes algorithms.

Negar:

  • If you know how something works you can change how it works.

To say:

  • Build a rocket with your methodology of understanding why the rocket blew up. How can you make the rocket go to the moon?
  • Don't damage me
  • Don't do anything without my permission
  • Obey me

We tell the AI some input. The first step is to understand the input.

Law

  1. A robot must protect its own existence as long as

such protection does not conflict with the First or Second Law;

Negar: "This is the same as":

  1. The third rule: it should try to exist only if it's not in conflict with the first and second rules.
  • AI will destroy world
  • AI will kill all people

Imagine you tell the AI to get you bana into your right hand as fast as possible. It might tell you that you should get yourself a banana as quickly as possible or it will kill you (making a credible death thread). This gives you a bana quicker because now you move your body to where the bana is at.

  • run over people.
  • robbing not allowed.
  • jump out of window not allow.
  • don't rob people.
  • don't explode.
  • destroying different places is not allowed.
  • AI dosn't hurt me.