UP | HOME

Getting AI Policy to Work

Table of Contents

1. Understanding

You can't make the right decisions when missing important information. Would you sign a buying contract for a car before knowing it's price? In World War I the french repeatedly tried to charged machine gun nests with bayonets. Didn't work. Neither they nor their commanders wanted them to die. The commanders didn't understand machine gun nests.

Anectotally, it happened that I explain X to Bob, until Bob says "Ah yes that makes total sense I get how X works and why it's important." The next day I observe Bob explaining X to Alice. Afterward Alice says "It seems you didn't consider Y. I think Y invalidates your argument." I think "Ah, obviously X dosn't apply here, because …" But Bob says "Hmm, right. Then X is probably untrue."

I mostly failed at transmitting my understanding. I said reasonable sounding things, and Bob agreed, giving me the illusion that he now knows all the things I know. Communication is hard!

You can tell somebody "Alignment is a hard technical problem. We have only one shot at building an aligned superintelligence. We don't have the technical knowledge required to align an AGI. It's possible to build an AGI without understanding how it works. Deep Learning we are not close to solving. Building an AGI without understanding how it works will almost guarantee that it will be misaligned, …"

Once you have the correct model it's also stable.

Of cause the best way make samebody belive something is by given them the correct model that.

1.1. People who Understand can Transmit their Understanding   ATTACH

Really understanding, something is the first step for being able to explain it very well. Consider this simplified model. We make Bob toughoughly understand, and train him in how to transmit their understand to others (and we assume Bob wants to transmit his understand).

Let's assume that we did a thourough enough Job such that Bob can successfully communicate his understanding to 2 people, such that each of these people will convince 2 more people in turn.

:TODO: Maybe generate a table demonstrating exponential groth?

(defun number-of-tree-nodes (depth)
      (- (expt 2 depth) 1))

(number-of-tree-nodes 4)
15

\(1.1^100 approx 13780\)

You want to actively encurage the people who understand to notice when other people are confused and actively engage them, and try to make them understand effectively.

:TODO: Expnontial groth. People models. (Have these bullets as summary at the top.

  • ☐ Focus on having few people who understand well?
  • ☐ Such people could then reexplain their models to others.
  • ☐ If one person reexplains it to 1.1 people, we would get an exponential explosion of people who understand.
  • ☐ For this to work it is probably neccesary to explain to every person this entire setup and make sure that they understand.

1.2. TODO A correct Model is hard to Dethrone

Once somebody deeply understood something, it's hard to make them not belive it. Many arguments that might convince somebody with partial understanding can now be recocnized as incorrect by a person with deep understanding.

1.3. How to get across the understanding?

I don't have good models on what's the most effective way to get across the rigth models.

  • Writing an article allows you to iteratively distill it untill it's very high quality.
  • Many people can read an article. You don't need to reexplain it from scratch each time.
    • Ideally meet with them to discuss the article.
    • It is probably best to heavily encurage people to ask questions they have about an article.

2. Why don't People get AI?

Why did it take people this long to realize that AI is powerful? Somehow most people never ask themselfs the question if AI could become smarter as a human. And when they do they seem to fix the bottom line to no, and then do motivated stopping. If they do they seem to instinctively dismiss it.

How pretty do you think you are. There are some studies that show that you are likely to overrate yourself (I am not sure about the rigor of these studies but it seems anecdotally true). And this might really be the wrong question. It's not about how accurate your rating is, but how likely you are to try generating it. It might not be a topic you are comfortable thinking about if your not the prettiest.

If you notice blood in your stool, you may want to consider, "It's probably hemorrhoids. Surely it's not cancer." TotalBiscuit had serious cancer symtoms for a year before seeing a doctor.

He died the next year.

When you go to an event and you need to take a corona test, did you ever feel like not taking it? If you test positive you can't go to the event. Bad news. My brain is notacibly aversive to performing the test, for it might bring bad news.

Humans don't like to hear bad news. The worse the news the less we want to hear it. Not hearing the bad news often leads to worse outcomes. Insofar as the news contains important information about reality, you are unable to use that important information in your decision making.

Did you ever have a feel negative towards somebody because they where better than you at something?

In myself I observed a similar aversion for evaluating how intelligent I am. If you value intelligence, which many do, then realizing that somebody is smarter. You no longer have special status as "the best".

But intelligence can also inspire fear. What makes the Unabomber scarry? If he was a complete idiot he probably wouldn't have been the Unabomber.

Imagine the dumbest person you know with deep conviction proclaims "I am gonna make your life hell." And now imagine the smartest person you know with deep conviction proclaims "I am gonna make your life hell". Really try to imagine it. Which one is scarrier?

Sometimes, without prompt, my brain starts to imagine scenarios of how some very capable person I deal with could harm me. I don't do this with most people, only for people my brain evaluates as extraordenary intelligent/capable.

What if it is possible to build an AI much much smarter than you? What if not solving alignment leads to a catastrophic outcome of everybody dieing. What if alignment is a really hard problem? What if it's possible to build an AGI without solving alignment.

If that where true it would be quite bad news, on many different channels. Intelligence is dangerous! Humans are no longer the smartest! You don't want to hear that the AIs are gonna take your jobs. And you really don't want to hear that we are on a trajectory where literally every man, woman, and child will end up dead.

I belive we live in this terrible world.

How do I know this?

The predicament we find ourself in primarly stems from from the fact that there is no short easy answer. Only if decision makers have good models of how the world is can they make decisions that steer the world in the right direction.

We need to figure out how to give the correct models to the decision makers, despite some part of their brain pushes towards the comfortable untruth that there is on problem. But then, just like TotalBiscuit, humanity will die.

3. How to make People understand   notes

TODO in order to make the other people load the correct model we need to use all brain syncronization techniques possible.

  • Ask the other person to explain their current understanding.
  • Explain to the person that they should not ignore their confusion, and instead actively tell you what they are confused about.
  • Use techniques that allow you to be direct, while avoiding making them angry.
    • "The computer in your head has this bug." (maybe bad because of CS terminology, they might not know.)
  • Possibly have multiple analogies reading, based on what the person knows in the concept space.
  • Use a whiteboard.
  • Preplan explanations and test them extensively.
  • Make the people aware of the relevant cognitive biases.
  • I think I really need to train myself to not get agitated at all. This will make the other person agitated, and that mental state isn't very conducive to loading the right model.

It seems like these are general techniques that might be valuable on their own to communicate, to everybody who is doing policy work. But of cause I don't have a very good model of what works in practice. These are techniques for doing research, and I expect many will proof beneficial. But it seems important that I try them in practice. It's also not like I am flawless at using them in a research context. More often than not I fail. Often simply by forgetting to use them. Training methods that teach you how to do this well seem feasable.

Somehow common misconceptions need to be adressed e.g.

  • "Smartness makes you nice" → "There are psychopaths that are not nice. Do you expect making them smarter makes them automatically nicer, or just better at exploiting you?"
  • Only get one chance such that we can't see that there is a misaligned AI and then learn from our mistakes.

Author: Johannes C. Mayer

Created: 2024-09-27 Fr 17:39

Validate