UP | HOME

Getting AI Policy to Work

Table of Contents

1. Understanding πŸ”—

You can't make the right decisions when missing important information. Would you sign a buying contract for a car before knowing it's price? In World War I the french repeatedly tried to charged machine gun nests with bayonets. Didn't work. Neither they nor their commanders wanted them to die. The commanders didn't understand machine gun nests.

Anectotally, it happened that I explain X to Bob, until Bob says "Ah yes that makes total sense I get how X works and why it's important." The next day I observe Bob explaining X to Alice. Afterward Alice says "It seems you didn't consider Y. I think Y invalidates your argument." I think "Ah, obviously X dosn't apply here, because …" But Bob says "Hmm, right. Then X is probably untrue."

I mostly failed at transmitting my understanding. I said reasonable sounding things, and Bob agreed, giving me the illusion that he now knows all the things I know. Communication is hard!

You can tell somebody "Alignment is a hard technical problem. We have only one shot at building an aligned superintelligence. We don't have the technical knowledge required to align an AGI. It's possible to build an AGI without understanding how it works. Deep Learning we are not close to solving. Building an AGI without understanding how it works will almost guarantee that it will be misaligned, …"

Once you have the correct model it's also stable.

Of cause the best way make samebody belive something is by given them the correct model that.

1.1. People who Understand can Transmit their Understanding   ATTACH πŸ”—

So it's required that people need to understand very well in order to be effective.

Really understanding, something is the first step for being able to explain it very well. Consider this simplified model. We make Bob toughoughly understand, and train him in how to transmit their understand to others (and we assume Bob wants to transmit his understand).

Let's assume that we did a thourough enough Job such that Bob can successfully communicate his understanding to 2 people, such that each of these people will convince 2 more people in turn.

:TODO: Maybe generate a table demonstrating exponential groth?

  (defun number-of-tree-nodes (depth)
        (- (expt 2 depth) 1))
  
  (number-of-tree-nodes 4)
15

You want to actively encurage the people who understand to notice when other people are confused and actively engage them, and try to make them understand effectively.

:TODO: Expnontial groth. People models. (Have these bullets as summary at the top.

  • ☐ Focus on having few people who understand well?
  • ☐ Such people could then reexplain their models to others.
  • ☐ If one person reexplains it to 1.1 people, we would get an exponential explosion of people who understand.
  • ☐ For this to work it is probably neccesary to explain to every person this entire setup and make sure that they understand.
  • ☐ think about how to support people who want to explain it to others. There should be high-quality resources that they can utilize, such as videos and articles that explain specific topics, such that they can then look at what this person would need as an explanation and then select the appropriate resource. That seems a much easier task than explaining it themselves.
    • A dependency tree of concepts might be useful to identify which parts people are missing.

1.2. TODO A correct Model is hard to Dethrone πŸ”—

Once somebody deeply understood something, it's hard to make them not belive it. Many arguments that might convince somebody with partial understanding can now be recocnized as incorrect by a person with deep understanding.

1.3. TODO How to get across the understanding? πŸ”—

I don't have good models on what's the most effective way to get across the rigth models.

  • Writing an article allows you to iteratively distill it untill it's very high quality.
  • Many people can read an article. You don't need to reexplain it from scratch each time.
    • Ideally meet with them to discuss the article.
    • It is probably best to heavily encurage people to ask questions they have about an article.

1.4. Example interaction on explaining how you can get powerful AI without understanding how it works. πŸ”—

J: Hello.

D: Hello.

1.4.1. Explaining the Goal and Setup πŸ”—

J: My goal here is to transmit my understanding to you. I belive that only when one has a deep understanding of the problem can they make the right decisions. [Put some analogy here. Maybe french charging the machinegun nests.] The goal is not to just tell you a bunch of facts. To a large extend the goal is to give you the understanding that allows you to derive at these conclusions yourself.

D: ?

J: I want to use a technique where I ask you to reexplain your current understanding. Some people find this offensive. They interpreted as me questioning their ability to understand. But it's actually about detecting my own failure in transmitting my understanding. And I expect to fail a lot. If I don't detect that failure I can't correct it. Does this seem good to you?

D: ?

J: Ok then let's test it. Could you explain back to me everything I said so far?

D: ?

J: Ok. There are a few more things I'd like you to do. If you notice that you are confused please tell me immediately. And if I am saying something you already know, please also tell me immediately. You can then describe what you already know such that I can skip forward to the right part of the explanations.

1.4.2. Getting A good Antenna without understanding what makes it good πŸ”—

((Explain the thing from this article.))

[Evolved antenna example.] The nice thing about an evolved antenna is that you can see everything. What makes an antenna good or bad is based on the shape alone. Now which of these antennas is the best? [Show three example antennas from No description for this link.

2024-09-28_14-17-30_screenshot.png

(From "Rapid Re-Evolution of an X-Band Antenna for Nasa’s Space Technology 5 Mission")

The way the NASA engeneers know which atenna is best is by runing a computersimulation of maxwells equations, which then tells them how good the antenna is at emitting and receiving radiowaves. But they couldn't tell you without much more analysis why moving endpice of the second to last antenna in No description for this link up makes the antenna better. And they don't need to know in order to figure this out.

Let's consider a very simplied setup. Consider that we have all possible antennas. Every way you can lay out the 6 metal wire pices in 3D space. Now we can just run our simulation to check which one is best. We don't need to know why it's best. Basically we need to have the "I know it when I see it" knowledge of what makse a good antenna. But we don't need the knowledge that explains

We don't need to build up a good understanding about what makes a good antenna. We just select the right one.

Of cause in practice we can't compute for all antenna designs which one is best. There are simply to many. In practice there are certain techniquees like evolutionary algorithms, and stochhastic gradient descent, which make this efficient. E.g. the idea behind stochhastic gradient descent is quite simple. In the paper they used EA but they could have used SGD. If we think about how SGD works when desiging an antenna, then we start with some inital antenna design. We then compute in what direction the antenna becomes better according to the simulation [Visually draw an antenna, point at the joints. Ideally have a 3D model of antenna in blender where I can move around the joints as I am explaining things.] And then we move each joint such that it moves into the direction that is locally best. Then we simply repeat this procedure.

Basically calculus allows you to do this efficiently. It's all things you would learn in the 1st semester when studying mathematics in Germany.

So once we can meanuse automatically how good an antenna is, we can use some method like SGD, to find which antenna is actually good. Notice how we don't need to understand why it is good, in terms of the structure of the antenna to move a joint in a particular way. We skip the step of understanding what structural properties of an antenna make a good antenna entirely, and instead only determine how good the antenna is via simulation.

:TODO:

  • ☐ Better explain the core point of how you don't need to understand.
    • ☐ Clarify the difference between the different kinds of understanding.
      • Model of that allows you desing a thing based on how you know things work (e.g. build a car, rocket, copmuter, etc.)
      • Model that allows you to evaluate how good e.g. a car is. How fast does it go? How much fuel does it need?
  • ☐ Add better signposting. Why are we walking antennas? Why is this important?
  • ☐ Simplify the explanation of SGD to "try every possibel direction we could move in and see which one is best" and then later explain that with calculus you can efficiently compute which direction to move in.
  • ☐ (Maybe) make a game where you can play around with SGD updating an antenna.
    • ☐ How would this look. What would it explain?

1.4.3. TODO Building the Superintellingent AI you don't understand πŸ”—

Neural networks are trained with SGD. You only need to understand what the system should be able to do. We can create an AI that is very good at generating text, without knowing at all how it is able to generate the text. That is the sitution we are in.

:TODO:

  • ☐ Explain why this sitution is so bad.
    • ☐ We don't understand if AI is bad
    • ☐ There are reasons to belive that an AI will be bad by default

1.5. Other Topics   notes πŸ”—

What are other topics that are important and could be explained in this way?

  • Inner and outer alignment.
  • Why to expect that with SGD we get a misaligned AGI with the right training objective.

2. Why don't People get AI? πŸ”—

Why did it take people this long to realize that AI is powerful? Somehow most people never ask themselfs the question if AI could become smarter as a human. And when they do they seem to fix the bottom line to no, and then do motivated stopping. If they do they seem to instinctively dismiss it.

How pretty do you think you are. There are some studies that show that you are likely to overrate yourself (I am not sure about the rigor of these studies but it seems anecdotally true). And this might really be the wrong question. It's not about how accurate your rating is, but how likely you are to try generating it. It might not be a topic you are comfortable thinking about if your not the prettiest.

If you notice blood in your stool, you may want to consider, "It's probably hemorrhoids. Surely it's not cancer." TotalBiscuit had serious cancer symtoms for a year before seeing a doctor.

He died the next year.

When you go to an event and you need to take a corona test, did you ever feel like not taking it? If you test positive you can't go to the event. Bad news. My brain is notacibly aversive to performing the test, for it might bring bad news.

Humans don't like to hear bad news. The worse the news the less we want to hear it. Not hearing the bad news often leads to worse outcomes. Insofar as the news contains important information about reality, you are unable to use that important information in your decision making.

Did you ever have a feel negative towards somebody because they where better than you at something?

In myself I observed a similar aversion for evaluating how intelligent I am. If you value intelligence, which many do, then realizing that somebody is smarter. You no longer have special status as "the best".

But intelligence can also inspire fear. What makes the Unabomber scarry? If he was a complete idiot he probably wouldn't have been the Unabomber.

Imagine the dumbest person you know with deep conviction proclaims "I am gonna make your life hell." And now imagine the smartest person you know with deep conviction proclaims "I am gonna make your life hell". Really try to imagine it. Which one is scarrier?

Sometimes, without prompt, my brain starts to imagine scenarios of how some very capable person I deal with could harm me. I don't do this with most people, only for people my brain evaluates as extraordenary intelligent/capable.

What if it is possible to build an AI much much smarter than you? What if not solving alignment leads to a catastrophic outcome of everybody dieing. What if alignment is a really hard problem? What if it's possible to build an AGI without solving alignment.

If that where true it would be quite bad news, on many different channels. Intelligence is dangerous! Humans are no longer the smartest! You don't want to hear that the AIs are gonna take your jobs. And you really don't want to hear that we are on a trajectory where literally every man, woman, and child will end up dead.

I belive we live in this terrible world.

How do I know this?

The predicament we find ourself in primarly stems from from the fact that there is no short easy answer. Only if decision makers have good models of how the world is can they make decisions that steer the world in the right direction.

We need to figure out how to give the correct models to the decision makers, despite some part of their brain pushes towards the comfortable untruth that there is on problem. But then, just like TotalBiscuit, humanity will die.

3. How to make People understand   notes πŸ”—

TODO in order to make the other people load the correct model we need to use all brain syncronization techniques possible.

  • Ask the other person to explain their current understanding.
  • Explain to the person that they should not ignore their confusion, and instead actively tell you what they are confused about.
  • Use techniques that allow you to be direct, while avoiding making them angry.
    • "The computer in your head has this bug." (maybe bad because of CS terminology, they might not know.)
  • Possibly have multiple analogies reading, based on what the person knows in the concept space.
  • Use a whiteboard.
  • Preplan explanations and test them extensively.
  • Make the people aware of the relevant cognitive biases.
  • I think I really need to train myself to not get agitated at all. This will make the other person agitated, and that mental state isn't very conducive to loading the right model.

It seems like these are general techniques that might be valuable on their own to communicate, to everybody who is doing policy work. But of cause I don't have a very good model of what works in practice. These are techniques for doing research, and I expect many will proof beneficial. But it seems important that I try them in practice. It's also not like I am flawless at using them in a research context. More often than not I fail. Often simply by forgetting to use them. Training methods that teach you how to do this well seem feasable.

Somehow common misconceptions need to be adressed e.g.

  • "Smartness makes you nice" β†’ "There are psychopaths that are not nice. Do you expect making them smarter makes them automatically nicer, or just better at exploiting you?"
  • Only get one chance such that we can't see that there is a misaligned AI and then learn from our mistakes.